I want to know "On average, how often the user change his devices by month?"
This is the table_1 in SQL, AWS Athenas
id date month device carrier
-----------------------------------------------
12-ac8147 2019 4 Sgx1 alca
12-ac8147 2019 5 Sgx1 alca
12-ac8147 2019 5 Sgx1 alca
12-ac8147 2019 6 Sgx1 alca
12-ac8147 2019 6 Sgx1 alca
12-ac8147 2019 6 iPhone8,1 telc
12-ac8147 2019 9 iPhone8,1 telc
12-ac8147 2019 9 iPhone8,1 telc
12-ac8147 2020 1 iPhone8,1 telc
12-ac8147 2020 10 iPhone11,2 telc
12-ac8147 2020 12 iPhone11,2 telc
12-ac8147 2020 12 iPhone11,2 service_m
12-ac8147 2020 12 iPhone11,2 service_m
12-ac8147 2021 1 iPhone11,2 service_m
12-ac8147 2021 12 iPhone11,5 service_m
12-ac8147 2021 12 iPhone11,5 movil_tel
So my idea was (in SQL athenas AWS amazon)
with table as
(
id, device, max(year * 100 month) mas_n, min(year * 100 month) min_n)
from table_1
where device is not null and device <> ''
group by id, device
)
SELECT id, AVG(max_n - min_n) prom_dif
FROM table
GROUP BY id
But this doesn't work, because it counts all the months:
id prom_dif
12-ac8147 206
CodePudding user response:
First detect changes. You can do this with LAG t compare with the previous row. Then count changes per user and month. At last get the average per user.
select id, avg(cnt * 1.0) avg_changes
from
(
select id, month, device, count(*) as cnt
from
(
select
id, year, month,
device,
lag(device) over (partiton by id order by year, month) as prev_device
from table_1
) with_previous_month
where device <> prev_device
group by id, year, month
) monthly_changes
group by id
order by id;
This takes into consideration that a user can switch from device A to B, then back to A, then back to B for instance, so we must count three changes, although it's only two different devices.
If months are missing from the table, the query is missing them, too. This means, if you only have January and December and one change, then the query will show you 0.5 (1 change in two months) instead of 0.083 (one change in twelve months) or 0,091 (one change in eleven months, in case you don't want to include both January and December, but only one of them). If you want to amend for this, you must generate all months first and outer join your data.
(ID is a bad name here, by the way, because despite its name it does not uniquely identify a row in the table, but a user it seems. So better call this user_num, or - if a user table exists - user_id.)
CodePudding user response:
select id,
count(distinct device) * 1.0 / count(distinct year * 100 month) as prom_dif
from table_1
where device is not null and device <> ''
group by id
