I have 2 tables, customers (3000 rows) and phone_call_log (350 000 rows).
I need to materialize the time of last call to each customer, using the call log (faster for frontend searches)
Indexes are on:
- start_time (timestamp)
- callee(bigint(32) unsigned)
- caller(bigint(32) unsigned)
- phonenumber(bigint(32) unsigned)
- last_call(timestamp)
Running this query without the OR statement completes in < 2 seconds for either caller / callee columns, but with the OR in place, it will not complete (I've not allowed it to run longer than 30 minutes in testing).
UPDATE customers
SET customers.last_call =
(select max(phone_call_log.start_time)
from phone_call_log
WHERE (
(
phone_call_log.callee = customers.phonenumber
)
or (
phone_call_log.caller = customers.phonenumber
)
)
)
where customers.phonenumber is not null
AND LENGTH(customers.phonenumber) > 6
AND customers.phonenumber > 1000000;
CodePudding user response:
Queries using OR cannot use index (as efficiently). I suggest you try the following:
UPDATE customers
SET last_call = GREATEST(
(SELECT MAX(start_time) FROM phone_call_log WHERE callee = customers.phonenumber),
(SELECT MAX(start_time) FROM phone_call_log WHERE caller = customers.phonenumber)
)
Be advised that GREATEST has issues with NULL values.
CodePudding user response:
Fastest
Change the data flow to update customers.last_call when a call comes in.
UPDATE JOIN
UPDATE works better with a JOIN than IN ( SELECT ... )
OR
OR is a performance killer. The query is likely to scan the entire phone_call_log each for each customer.
One fix is to do two UPDATEs and have suitable indexes:
UPDATE
SET customers.last_call = GREATEST( customers.last_call,
( select max(phone_call_log.start_time)
FROM phone_call_log
WHERE phone_call_log.callee = customers.phonenumber
)
WHERE ...
UPDATE
SET customers.last_call = GREATEST( customers.last_call,
( ... caller ... )
)
WHERE ...
That needs these indexes on phone_call_log:
INDEX(callee, start_time)
INDEX(caller, start_time)
and remove your current 1-column indexes on caller and callee.
Datatype
It is probably wrong and counterproductive to use BIGINT for a phone number, especially in light of LENGTH(customers.phonenumber) > 6.
In fact, all of this boils down to a single test:
where customers.phonenumber is not null
AND LENGTH(customers.phonenumber) > 6
AND customers.phonenumber > 1000000;
Each of the > check for NOT NULL anyway; use only one of the others, depending on datatype. And do have it indexed.
(Please provide SHOW CREATE TABLE; the 'English' is less precise.)
