I would like to calculate the Click-Through Ratio (CTR) of several articles of a website using SQL.
The formula of the CTR is CTR = number clicks / number impressions, i.e. a ratio of how many times an article has been clicked and how many times it has been shown.
I have two tables:
- ´article_click´: A table with several columns, namely ´article_id´ (denoting the id of the article), ´description´ (a brief description of the article), ´timestamp´ (when it has been clicked), among others. Every time a user clicks an article, a new row is created in the table.
- ´article_impression´: Similarly, a table with several columns, namely ´article_id´ (denoting the id of the article), ´description´ (a brief description of the article), ´timestamp´ (when it has been shown), among others. Every time an article is shown to a user, a new row is created in the table.
Both tables 1 and 2 look like this:
------------ ------------- ------------------ -----
| article_id | description | timestamp | ... |
------------ ------------- ------------------ -----
| 102 | Potatoe | 2021-01-01 13:45 | ... |
| 11 | Lettuce | 2020-02-11 11:00 | ... |
| 34 | Train | 2019-12-12 09:31 | ... |
| 21 | Car | 2011-11-11 08:32 | ... |
| 201 | Train | 2014-02-10 02:12 | ... |
| ... | ... | ... | ... |
------------ ------------- ------------------ -----
And I would like to create a table such that:
------------ -----
| article_id | CTR |
------------ -----
| 11 | 0.4 |
| 23 | 0.6 |
| 34 | 0.2 |
| 44 | 0.8 |
| 45 | 0.3 |
| ... | ... |
------------ -----
In order to do so, I have tried:
SELECT article_click.article_id, COUNT(article_click.article_id) / COUNT(article_impression.article_id) AS CTR
FROM article_click
INNER JOIN article_impression ON article_click.article_id = article_impression.article_id
GROUP BY article_click.article_id DESC;
But I obtain something like:
------------ -----
| article_id | CTR |
------------ -----
| 11 | 1.0 |
| 23 | 1.0 |
| 34 | 1.0 |
| 44 | 1.0 |
| 45 | 1.0 |
| ... | ... |
------------ -----
Can anyone spot the mistake here? I'm using MySQL as RDBMS.
CodePudding user response:
If the click-through-rate (CTR) is number clicks / number impressions then you'll need to calculate the number of clicks on an article and the number of impressions on an article before joining them to perform the calculation.
You could do this with subqueries or CTEs, but I've opted for the former here.
SELECT c.article_id, c.click_count / i.impression_count AS CTR
FROM (
SELECT article_id, COUNT(*) AS click_count
FROM article_click
GROUP BY article_id) AS c
INNER JOIN (
SELECT article_id, COUNT(*) AS impression_count
FROM article_impression
GROUP BY article_id) AS i
ON c.article_id = i.article_id;
Note that using an INNER JOIN will exclude articles that have impressions but no clicks, so you won't get results where the CTR is 0. If you want those, you can use a LEFT JOIN from impressions to clicks. Since an article cannot be clicked if it has not been shown, we know that a LEFT JOIN from impressions to clicks is sufficient to show all data.
SELECT i.article_id, COALESCE(c.click_count, 0) / i.impression_count AS CTR
FROM (
SELECT article_id, COUNT(*) AS impression_count
FROM article_impression
GROUP BY article_id) AS i
LEFT JOIN (
SELECT article_id, COUNT(*) AS click_count
FROM article_click
GROUP BY article_id) AS c
ON i.article_id = c.article_id;
Note that we have to use the article_id from article_impression since article_click might be NULL. For the same reason, we have to COALESCE the click_count so that we don't end up with an error trying to divide NULL.
CodePudding user response:
Before using joins duplicate data must be avoided. Get individual counts of each table and join both the queries.
select a.article_id, article_click/article_impression_click as ctr
from ( select a.article_id, count(a.article_id) article_click from
article_click a group by article_id) a inner join (select
a.article_id, count(a.article_id) article_impression_click from
article_impression a group by article_id) b on
a.article_id=b.article_id
CodePudding user response:
WITH
v_article AS
( SELECT 'S' type, article_impression.id FROM article_impression
UNION ALL
SELECT 'C' type, article_click.id FROM article_click
)
SELECT
v_article.id,
COUNT(CASE WHEN v_article.type = 'C' THEN 1 END) nb_show,
COUNT(CASE WHEN v_article.type = 'S' THEN 1 END) nb_click,
CASE
WHEN COUNT(CASE WHEN v_article.type = 'S' THEN 1 END) > 0 THEN
ROUND(COUNT(CASE WHEN v_article.type = 'C' THEN 1 END) / COUNT(CASE WHEN v_article.type = 'S' THEN 1 END), 2)
END ratio_click_show
FROM v_article
GROUP BY
v_article.id
;
If you're sure an article can be click only if it has been previously shown (nb_show > 0 and nb_show > nb_click), you can remove the CASE around the ratio calculation.
