how to iteratively SELECT by hour for averages by userid?-CodePudding

I'm trying to average some metrics by user id, and by hour, in a select statement.

Would love some help understanding the best approach to doing this without using parameters in a python script :P

this is the current query,

SELECT 
            user_id,
            AVG(sentiment) as sentiment,
            AVG(magnitude) as magnitude,
            SUM(sentiment) as total_sentiment,
            SUM(magnitude) as total_magnitude,
            MAX(sentiment) as max_sentiment,
            MIN(sentiment) as min_sentiment,
            COUNT(user_id) as count
        FROM
            sentiments
        WHERE
            created
            BETWEEN %s and %s
        GROUP BY
            user_id;

Postgres by the way, and sending query with pandas via python for a test :)

update trying answer,

sql = """
        SELECT 
            user_id,
            AVG(sentiment) as sentiment,
            AVG(magnitude) as magnitude,
            SUM(sentiment) as total_sentiment,
            SUM(magnitude) as total_magnitude,
            MAX(sentiment) as max_sentiment,
            MIN(sentiment) as min_sentiment,
            COUNT(user_id) as count
        FROM
            sentiments
        GROUP BY
            user_id,
            date_part('hour', created);

    """

conn = db_conn.main()
cur = conn.cursor()
df = pd.read_sql(sql, con=conn)
cur.close()
df

This returns an error that

': function date_part(unknown, integer) does not exist
LINE 15: date_part('hour', created);
^
HINT: No function matches the given name and argument types. You might need to add explicit type casts.

Thanks in advance,

CodePudding user response：

The easiest way to truncate a timestamp is the DATE_TRUNC function.

SELECT DATE_TRUNC('hour', created), user_id,
... 
GROUP BY DATE_TRUNC('hour', created), user_id;

But if created contains a number that's a unix timestamp, then you first need to convert it to a timestamp.

SELECT DATE_TRUNC('hour', TO_TIMESTAMP(created)), user_id,
... 
GROUP BY DATE_TRUNC('hour', TO_TIMESTAMP(created)), user_id;

CodePudding user response：

In posgresql you can group by year, month, day and hours using,

group by
 date_part('year', created),
 date_part('month', created),
 date_part('day', created),
 date_part('hour', created)

Or doing some math with epoch

group by floor(date_part('epoch', created_at)/3600)

alternatively you could use extract, e.g. extract(epoch from created)

CodePudding user response：

There are a couple ways to interpret grouping by hour that might be useful. Below is SQL to round or truncate the time to an hour in time. This will give you a timestamp for each hour there is data:

SELECT 
user_id,
AVG(sentiment) as sentiment,
AVG(magnitude) as magnitude,
SUM(sentiment) as total_sentiment,
SUM(magnitude) as total_magnitude,
MAX(sentiment) as max_sentiment,
MIN(sentiment) as min_sentiment,
COUNT(user_id) as count,
date_trunc('hour',created) hourcreated
FROM
sentiments
WHERE
created BETWEEN %s and %s
GROUP BY date_trunc('hour',created), user_id
ORDER BY date_trunc('hour',created), user_id;

You could also want to look at what your data looks like on an average day, you may want to look for spikes in the afternoon for example.

In this case you will get maximum 24 results by extracting the hour. Replace date_trunc('hour',created) above with below

extract(hour from created)