optimizaton SQL query by using already calculated variables-CodePudding

In project we use MySQL table for analytics. It is big table 40 columns, more than 10kk rows. A big part of time in query take result calculation (50 cols in result). Idea is to reuse calculated values as variables and make it faster. Query example:

SELECT col1, SUM(col2) as s_col2, SUM(col3) as s_col3, AVG(col2) as a_col2, ...,
SUM(col2)/SUM(col3) as aaa,
ROUND(AVG(col4), 2) as a_col4,
ROUND(SUM(col5), 2) as s_col5,
ROUND(SUM(col5)/AVG(col4), 2) as zzz,
...
JOIN ...
GROUP_BY ...
ORDER BY ...

Idea is to use @variable, for example:

SELECT col1, @s_col2 := SUM(col2) as s_col2, @s_col3 := SUM(col3) as s_col3, ...,
@s_col2/@s_col3 as aaa,

It works only for a few variables which are outside function, but I don't need additional columns for every variable.

@a_col4 := AVG(col4), #I don`t need this column
@s_col5 := SUM(col5), #I don`t need this column
ROUND(@a_col4, 2) as a_col4,
ROUND(@s_col5, 2) as s_col5,
ROUND(@s_col5/@a_col4, 2) as zzz,

How I can assign variables inside functions?

ROUND(@a_col4 := AVG(col4), 2) as a_col4,   #not works
ROUND((@s_col5 := SUM(col5)), 2) as s_col5, #not works
ROUND(@s_col5/@a_col4, 2) as zzz,

UPDATED: Thanks guys for your help.

The MySQL engine is probably smart enough to compute the value of SUM(col5) only once

I am not sure because for a big quantity of columns

SUM(col1) as a1,
SUM(col1) as a2,
SUM(col1) as a3,
SUM(col1) as a4,
SUM(col1) as a5,

Is slower than

@a1 := SUM(col1) as a1,
@a1 as a2,
@a1 as a3,
@a1 as a4,
@a1 as a5,

you can also use CTE for reusing table results

I tried, but some values use too many functions one inside another, sometimes 7 levels deep and not all variables can be reused in this way (COALESCE(ROUND(COALESCE(ROUND(SUM(AVG(IF..AND...OR...AND)...

All my changes, (15 variables) have very small effect, for the small period it takes 139 sec (was 151 sec), but some of our reports take a few hours and we need stronger optimisation.

We will try to analyse server bottlenecks, maybe use partitioning, sharding...

As a general rule, the number of rows touched is much more important to how long a query will take than the functions being evaluated.

The number of rows is always big, a lot of indexes and it works really fast. If I comment columns where we need calculations and only select existing it will take 40 sec (instead of 150)

CodePudding user response：

The qwy to do it in sql is to use the Select withe with the sum and average as basis for an outer select

SELECT *,
    s_col2/s_col3 as aaa,
    ROUND(a_col4, 2) as a_col4,
    ROUND(acol5, 2) as s_col5,
    ROUND(s_col5/a_col4, 2) as zzz.
    ...
FROM
    (SELECT col1, SUM(col2) as s_col2, SUM(col3) as s_col3, AVG(col2) as a_col2, ...,

    ...
    JOIN ...
    GROUP_BY ...) t1
ORDER BY ...

in MySQL 8 you can also use CTE for reusing tbale rewsults see manual

CodePudding user response：

Don't worry.

As a general rule, the number of rows touched is much more important to how long a query will take than the functions being evaluated.

A similar question comes from the choice between these:

SELECT foo, COUNT(*) FROM x GROUP BY foo ORDER BY COUNT(*) DESC LIMIT 5;
SELECT foo, COUNT(*) FROM x GROUP BY foo ORDER BY    1     DESC LIMIT 5;

I often do the latter because it is fewer keystrokes. I have not been able to determine whether it is faster.

I suggest you write your question in the simplest or clearest way. A subquery (or CTE) may actually be clearer in spite of taking more keystrokes.

Clarity and correctness are more important than speed.

And beware -- With both JOIN and GROUP BY in the query, you may have incorrect results. The JOIN is done before the aggregation; the GROUP BY comes after. Check to see if COUNT or SUM is bigger than it should be. If so, you will need a subquery or CTE.