Please note that I'm an absolute n00b in MySQL but somehow I managed to build some (for me) complex queries that work as they should. My main problem now is that for a many of the queries we're working on:
- The querie is becoming too big and very hard to see through.
- The same subqueries get repeated many times and that is adding to the complexity (and probably to the time needed to process the query).
We want to further expand this query but we are reaching a point where we can no longer oversee what we are doing. I've added one of these subqueries at the end of this post, just as an example.
!! You can fast foward to the Problem section if you want to skip the details below. I think the question can be answered also without the additional info.
What we want to do
Create a MySQL query that calculates purchase orders and forecasts for a given supplier based on:
- Sales history in a given period (past [x] months = interval)
- Current stock
- Items already in backorder (from supplier)
- Reserved items (for customers)
- Supplier ID
I've added an example of a subquery at the bottom of this message. We're showing just this part to keep things simple for now. The output of the subquery is:
- Part number
- Units sold
- Units sold (outliers removed)
- Units sold per month (outliers removed)
- Number of invoices with the part number in the period (interval)
It works quite OK for us, although I'm sure it can be optimised. It removes outliers from the sales history (e.g. one customer that orders 50 pcs of one product in one order). Unfortunately it can only remove outliers with substantial data, so if the first order happens to be 50 pcs then it is not considered an outlier. For that reason we take the amount of invoices into account in the main query. The amount of invoices has to exceed a certain number otherwise the system wil revert to a fixed value of "maximum stock" for that product.
As mentioned this is only a small part of the complete query and we want to expand it even further (so that it takes into account the "sales history" of parts that where used in assembled products).
For example if we were to build and sell cars, and we want to place an order with our tyre supplier, the query calculates the amount of tyres we need to order based on the sales history of the various car models (while also taking into account the stock of the cars, reserved cars and stock of the tyres).
Problem
The query is becomming massive and incomprehensible. We are repeating the same subqueries many times which to us seems highly inefficient and it is the main cause why the query is becomming so bulky.
What we have tried
(Please note that we are on MySQL 5.5.33. We will update our server soon but for now we are limited to this version.)
Create a VIEW from the subqueries. The main issue here is that we can't execute the view with parameters like supplier_id and interval period. Our subquery calculates the sum of the sold items for a given supplier within the given period. So even if we would build the VIEW so that it calculates this for ALL products from ALL suppliers we would still have the issue that we can't define the interval period after the VIEW has been executed.
A stored procedure. Correct me if I'm wrong but as far as I know, MySQL only allows us to perform a Call on a stored procedure so we still can't run it against the parameters (period, supplier id...)
Even this workaround won't help us because we still can't run the SP against the parameters.
- Using WITH at the beginning of the query
A common table expression in MySQL is a temporary result whose scope is confined to a single statement. You can refer this expression multiple times with in the statement. The WITH clause in MySQL is used to specify a Common Table Expression, a with clause can have one or more comms-separated subclauses.
Not sure if this would be the solution because we can't test it. WITH is not supported untill MySQL version 8.0.
What now?
My last resort would be to put the mentioned subqueries in a temp table before starting the main query. This might not completely eliminate our problems but at least the main query will be more comprehensible and with less repetition of fetching the same data. Would this be our best option or have I overlooked a more efficient way to tackle this?
Many thanks for your kind replies.
SELECT
GREATEST((verkocht_sd/6*((100 0)/100)),0) as 'units sold p/month ',
GREATEST(ROUND((((verkocht_sd/6)*3)-voorraad reserved-backorder),0),0) as 'Order based on units sold',
SUM(b.aantal) as 'Units sold in period',
t4.verkocht_sd as 'Units sold in period, outliers removed',
COUNT(*) as 'Number of invoices in period',
b.art_code as 'Part number'
FROM bongegs b -- Table that has all the sales records for all products
RIGHT JOIN totvrd ON (totvrd.art_code = b.art_code) -- Right Join stock data to also include items that are not in table bongegs (no sales history).
LEFT JOIN artcred ON (artcred.art_code = b.art_code) -- add supplier ID to the part numbers.
LEFT JOIN
(
SELECT
SUM(b.aantal) as verkocht_sd,
b.art_code
FROM bongegs b
RIGHT JOIN totvrd ON (totvrd.art_code = b.art_code)
LEFT JOIN artcred ON (artcred.art_code = b.art_code)
WHERE
b.bon_datum > DATE_SUB(CURDATE(), INTERVAL 6 MONTH)
and b.bon_soort = "f" -- Selects only invoices
and artcred.vln = 1 -- 1 = Prefered supplier
and artcred.cred_nr = 9117 -- Supplier ID
and b.aantal < (select * from (SELECT AVG(b.aantal) 3*STDDEV(aantal)
FROM bongegs b
WHERE
b.bon_soort = 'f' and
b.bon_datum > DATE_SUB(CURDATE(), INTERVAL 6 MONTH)) x)
GROUP BY b.art_code
) AS t4
ON (b.art_code = t4.art_code)
WHERE
b.bon_datum > DATE_SUB(CURDATE(), INTERVAL 6 MONTH)
and b.bon_soort = "f"
and artcred.vln = 1
and artcred.cred_nr = 9117
GROUP BY b.art_code
Bongegs | all rows from sales forms (invoices F, offers O, delivery notes V)
| art_code | bon_datum | bon_soort | aantal |
|:---------|:---------: |:---------:|:------:|
| item_1 | 2021-08-21 | f | 6 |
| item_2 | 2021-08-29 | v | 3 |
| item_6 | 2021-09-03 | o | 2 |
| item_4 | 2021-10-21 | f | 6 |
| item_1 | 2021-11-21 | o | 6 |
| item_3 | 2022-01-17 | v | 6 |
| item_1 | 2022-01-21 | o | 6 |
| item_4 | 2022-01-26 | f | 6 |
Artcred | supplier ID's
| art_code | vln | cred_nr |
|:---------|:----:|:-------:|
| item_1 | 1 | 1001 |
| item_2 | 1 | 1002 |
| item_3 | 1 | 1001 |
| item_4 | 1 | 1007 |
| item_5 | 1 | 1004 |
| item_5 | 2 | 1008 |
| item_6 | 1 | 1016 |
| item_7 | 1 | 1567 |
Voorraad | stock
| art_code | voorraad | reserved | backorder |
|:---------|:---------: |:--------:|:---------:|
| item_1 | 1 | 0 | 5 |
| item_2 | 0 | 0 | 0 |
| item_3 | 88 | 0 | 0 |
| item_4 | 9 | 0 | 0 |
| item_5 | 67 | 2 | 20 |
| item_6 | 112 | 9 | 0 |
| item_7 | 65 | 0 | 0 |
| item_8 | 7 | 1 | 0 |
Now, on to the query. You have LEFT JOINs to the artcred table, but then include artcred in the WHERE clause making it an INNER JOIN (required both left and right tables) in the result. Was this intended, or are you expecting more records in the bongegs table that do NOT exist in the artcred.
Well to be honest I was not fully aware that this would essentially form an INNER JOIN but in this case it doesn't really matter. A record that exists in bongegs always exists in artcred as well (every sold product must have a supplier). That doesn't work both ways since a product can be in artcred without ever being sold.
You also have RIGHT JOIN on totvrd which implies you want every record in the TotVRD table regardless of a record in the bongegs table. Is this correct?
Yes it is intended. Otherwise only products with actual sales in the period would end up in the result and we also wanted to include products with zero sales.
CodePudding user response:
I would definitely consider TEMPORARAY tables. Unique per session so if multiple users called some stored procedure, they wont be in conflict of results.
https://www.mysqltutorial.org/mysql-temporary-table/
Now, on to the query. You have LEFT JOINs to the artcred table, but then include artcred in the WHERE clause making it an INNER JOIN (required both left and right tables) in the result. Was this intended, or are you expecting more records in the bongegs table that do NOT exist in the artcred.
By using a temp table, I do think that will significantly help minimize the output as the baseline data will already be filtered by your bon_soort = 'f' and date range thus removing that whole additional context out of the remainder.
Not having the full structure of the tables to know what columns are in which tables makes things harder. You should always use table.column or alias.column in your queries to help others like out here at S/O, or others who may come into your company and need to follow-along. It prevents ambiguity where things reside. Edit your existing post and please provide the supplemental table structures for clarification.
create temporary table if not exists TempBongegs as
(SELECT
b.*
from
bongegs b
where
b.bon_soort = 'f'
AND b.bon_datum > DATE_SUB(CURDATE(), INTERVAL 6 MONTH)
)
Since you are looking for data within the last 6 months, you can just do the bon_datum GREATER than 6 months ago. This implies up to the current date. I would make sure you have an index on (bon_soort, bon_datum) to help optimize this data pull.
You also have RIGHT JOIN on totvrd which implies you want every record in the TotVRD table regardless of a record in the bongegs table. Is this correct? But also, since not all columns are qualified as table.column in the query, we dont even know if any columns from this table are being used.
So, please review this, edit your existing post (instead of adding comments) and post the additional information. Edit your query to show the proper table.column, show table structures, etc.
SELECT
GREATEST((verkocht_sd/6*((100 0)/100)),0) as '1mv',
GREATEST(ROUND((((verkocht_sd/6)*3)-voorraad reserved-backorder),0),0) as 'OrdVerk',
SUM(aantal) as Verkocht,
verkocht_sd,
COUNT(*) as 'aant_fact',
b.art_code
FROM
TempBongegs b
RIGHT JOIN totvrd
ON (b.art_code = totvrd.art_code)
LEFT JOIN artcred
ON (b.art_code = artcred.art_code)
LEFT JOIN
( SELECT
SUM(aantal) as verkocht_sd,
b2.art_code
FROM
TempBongegs b2
RIGHT JOIN totvrd
ON (b2.art_code = totvrd.art_code)
LEFT JOIN artcred
ON (b2.art_code = artcred.art_code)
WHERE
and artcred.vln = 1
and artcred.cred_nr = 9117
and aantal < (select *
from ( SELECT AVG(aantal) 3 * STDDEV(aantal)
FROM TempBongegs ) x
)
GROUP BY
art_code ) AS t4
ON b.art_code = t4.art_code
WHERE
artcred.vln = 1
and artcred.cred_nr = 9117
GROUP BY
b.art_code
CodePudding user response:
One simplification:
and b.aantal < ( SELECT * from ( SELECT AVG ...
-->
and b.aantal < ( SELECT AVG ...
A personal problem: my brain hurts when I see RIGHT JOIN; please rewrite as LEFT JOIN.
Check you RIGHTs and LEFTs -- that keeps the other table's rows even if there is no match; are you expecting such NULLs? That is, it looks like they can all be plain JOINs (aka INNER JOINs).
These might help performance:
b: INDEX(bon_soort, bon_datum, aantal, art_code)
totvrd: INDEX(art_code)
artcred: INDEX(vln, cred_nr, art_code)
Is b the what you keep needing? Build a temp table:
CREATE TEMPORARY TABLE tmp_b
SELECT ...
FROM b
WHERE ...;
But if you need to use tmp_b multiple times in the same query, (and since you are not yet on MySQL 8.0), you may need to make it a non-TEMPORARY table for long enough to run the query. (If you have multiple connections building the same permanent table, there will be trouble.)
Yes, 5.5.33 is rather antique; upgrade soon.
(pre
