Best indexes to optimize a large MySQL table?-CodePudding

So I have a MySQL table with over 10 millions rows. A new row is created every time a new download is made and this row is updated when the file is downloaded again (the download count is updated). To check if this row already exists, I execute the following MySQL query :

SELECT `id`, `download_count` 
FROM `product_files_downloads` 
WHERE (`user_id` = ? AND `variant_id` = ? AND `product_files_id` = ? AND `order_id` = ?) 
ORDER BY `id` DESC LIMIT 1;

I have a simple index set on user_id and the query is pretty fast if the user (defined by the user_id) does not have a lot of rows in this table. However, I have some users that have more than 100k rows attached to their user_id and in this case, the query takes a few seconds to execute.

Should I add a new index on user_id and product_files_id like this :

ALTER TABLE `product_files_downloads` ADD INDEX `user_id_product_files_id_idx` (`user_id`, `product_files_id`);

... or is there a better index to create to speed up the query?

CodePudding user response：

An index on the two columns (user_id, product_filed_id) would narrow down the search partially, but why not narrow it down exactly to the row you're searching for?

ALTER TABLE `product_files_downloads` ADD INDEX `myindex` (
  `user_id`, `variant_id`, `product_files_id`, `order_id`
);

The id column, assuming that's your primary key, is implicitly appended as a fifth column, which is helpful because it will make the ORDER BY a no-op. MySQL knows how to scan rows in order to avoid sorting, if you are searching by the first 4 columns of the index and ordering by the 5th column.

You might like to view my presentation How to Design Indexes, Really, and the video.

CodePudding user response：

First, you'll want many of the columns used in the WHERE clause in your index. Optimally this should go from most selective to least selective.

Let's say there are 100 users, two variants, 10000 product files and 1000 orders. Then the product file ID will probably be the most selective. If you look for one product file, you get about 1/10000 of the rows in the table. So, by the first column in the index you'll have already narrowed down the rows to look up by a high degree.

This would lead to

create index idx1 on product_files_downloads
  (product_files_id, order_id, user_id, variant_id);

If you are not sure about selectivity, you can provide as many indexes as you there are combinations you want to try:

create index idx2 on product_files_downloads
  (product_files_id, user_id, order_id, variant_id);
create index idx3 on product_files_downloads
  (order_id, product_files_id,  user_id, variant_id);

You can always drop indexes again, when you see they are not used by the DBMS. (You'll find out with EXPLAIN).

Then, you may want to provide the sort key right with the index, so as to get the entries in the correct order without having the DBMS to perform an additional sort:

create index idx1 on product_files_downloads
  (product_files_id, order_id, user_id, variant_id, id);

At last, if you want so, you can even provide a covering index that contains the columns to select, too. Thus looking up the index alone suffices and the table doesn't have to get accessed:

create index idx1 on product_files_downloads
  (product_files_id, order_id, user_id, variant_id, id, download_count);