Currently, I have a little problem where I'm expected to build a table that shows the energy generated for the respected days.
I have solved this problem using python with SQL data polling combined with a for loop to look at the energy generated at the beginning of the day to the end of the day and the difference between the two will result in the total energy generated for the particular day. But unfortunately due to the amount of data that's coming out of the SQL database the python function is too slow.
I was wondering if this can be integrated within an SQL query to just spit out a table after it has done the aggregation. I have shown an example below for a better understanding of the table.
SQL TABLE
| date/time | value |
|---|---|
| 24/01/2022 2:00 | 2001 |
| 24/01/2022 4:00 | 2094 |
| 24/01/2022 14:00 | 3024 |
| 24/01/2022 17:00 | 4056 |
| 25/01/2022 2:00 | 4056 |
| 25/01/2022 4:00 | 4392 |
| 25/01/2022 17:00 | 5219 |
Final Table From the above table, we can work that the energy generated for 24/01/2022 is 4056(max)-2001(min)= 2055
| date | value |
|---|---|
| 24/01/2022 | 2055 |
| 25/01/2022 | 1163 |
CodePudding user response:
Yes, it will be faster to aggregate stuff before sending it over the network, group by is very fast:
https://stackoverflow.com/questions/4889669/whats-the-asymptotic-complexity-of-groupby-operation#:~:text=Ignoring the base SQL that,the size of the dataset).
CodePudding user response:
Usually, the time spent sending more stuff across the network makes the app-solution slower.
The GROUP BY may cost an extra sort, or it may be "free" if the data is sorted that way. (OK, you say unindexed.)
Show us the query and SHOW CREATE TABLE; we can help with indexing.
Generally, there is much less coding for the user if the work is done in SQL.
MySQL, in particular, picks between
Case 1: Sort the data O(N*log N), then make a linear pass through the data; this may or may not involve I/O which would add overhead
Case 2: Build a lookup table in RAM for collecting the grouped info, then making a linear pass over the data (no index needed); but then you need something like O(N*log n) for counting/summing/whatever the grouped value.
Notes:
- I used
Nfor the number or rows in the table andnfor the number of rows in the output. - I do not know the conditions that would cause the Optimizer to pick one method versus the other.
If you drag all the data into the client, you would probably pick one of those algorithms. If you happen to know that you are grouping on a simple integer, the lookup (for the second algorithm) could be a simply array lookup -- O(N). But, as I say, the network cost is likely to kill the performance.
It is simple enough to write is SQL:
SELECT DATE(`date`) AS "day",
MAX(value) - MIN(value) AS range
FROM tbl
GROUP BY DATE(`date`);
