I have this query that runs very slowly when DISTINCT
is used;
SELECT COUNT(*) FROM (SELECT DISTINCT query_text FROM search_query WHERE (store_id IN (1)) AND (num_results > 0) ORDER BY popularity desc LIMIT 100) AS result WHERE (result.query_text = '<search_term>');
If I run this query as is, it takes between three and four seconds. However, if I remove DISTINCT
, results are functionally instant. However, I can't simply remove that DISTINCT
from the equation since this query needs to gather a distinct list of 100 search terms with no repeats. Since this is part of a Magento install, store_id
could be a list of multiple store IDs. While the current setup is single store, this could change so I'd rather leave the DISTINCT
as it sits.
This is the create table statement;
CREATE TABLE `search_query` (
`query_id` int unsigned NOT NULL AUTO_INCREMENT COMMENT 'Query ID',
`query_text` varchar(255) DEFAULT NULL COMMENT 'Query text',
`num_results` int unsigned NOT NULL DEFAULT '0' COMMENT 'Num results',
`popularity` int unsigned NOT NULL DEFAULT '0' COMMENT 'Popularity',
`redirect` varchar(255) DEFAULT NULL COMMENT 'Redirect',
`store_id` smallint unsigned NOT NULL DEFAULT '0' COMMENT 'Store ID',
`display_in_terms` smallint NOT NULL DEFAULT '1' COMMENT 'Display in terms',
`is_active` smallint DEFAULT '1' COMMENT 'Active status',
`is_processed` smallint DEFAULT '0' COMMENT 'Processed status',
`updated_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT 'Updated at',
PRIMARY KEY (`query_id`),
UNIQUE KEY `SEARCH_QUERY_QUERY_TEXT_STORE_ID` (`query_text`,`store_id`),
KEY `SEARCH_QUERY_QUERY_TEXT_STORE_ID_POPULARITY` (`query_text`,`store_id`,`popularity`),
KEY `SEARCH_QUERY_STORE_ID` (`store_id`),
KEY `SEARCH_QUERY_IS_PROCESSED` (`is_processed`),
KEY `SEARCH_QUERY_STORE_ID_POPULARITY` (`store_id`,`popularity`),
CONSTRAINT `SEARCH_QUERY_STORE_ID_STORE_STORE_ID` FOREIGN KEY (`store_id`) REFERENCES `store` (`store_id`) ON DELETE CASCADE
And this is what EXPLAIN
returns;
mysql> EXPLAIN SELECT COUNT(*) FROM (SELECT DISTINCT query_text FROM search_query WHERE (store_id IN (1)) AND (num_results > 0) ORDER BY popularity desc LIMIT 100) AS result WHERE (result.query_text = '<search_term>');
+----+-------------+--------------+------------+------+-------------------------------------------------------------------------------------------------------------------------------------+-----------------------+---------+-------+--------+----------+----------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------------+------------+------+-------------------------------------------------------------------------------------------------------------------------------------+-----------------------+---------+-------+--------+----------+----------------------------------------------+
| 1 | PRIMARY | <derived2> | NULL | ref | <auto_key0> | <auto_key0> | 768 | const | 10 | 100.00 | Using index |
| 2 | DERIVED | search_query | NULL | ref | SEARCH_QUERY_QUERY_TEXT_STORE_ID,SEARCH_QUERY_QUERY_TEXT_STORE_ID_POPULARITY,SEARCH_QUERY_STORE_ID,SEARCH_QUERY_STORE_ID_POPULARITY | SEARCH_QUERY_STORE_ID | 2 | const | 460386 | 33.33 | Using where; Using temporary; Using filesort |
+----+-------------+--------------+------------+------+-------------------------------------------------------------------------------------------------------------------------------------+-----------------------+---------+-------+--------+----------+----------------------------------------------+
2 rows in set, 1 warning (0.00 sec)
I tried creating a different covering index, but it didn't help. This is the covering index I made (and subsequently forced the query to use);
ALTER TABLE `search_query` ADD INDEX `QUERY_PERFORMANCE_FIX_TEST_INDEX` (`store_id`, `num_results`, `popularity` desc, `query_text`);
That helped a little but not by much. A difference of a couple hundred ms at best so that won't be the solution. Without this index and without DISTINCT
, the results are nearly instant so I don't think it's necessarily the indexes being wrong, but, perhaps, a missing one?
Am I missing an index (such as a FULLTEXT
index on query_text
)? Is there a better way of determining this information?
query_text
, and they have differentpopularity
values, which value should be used for sorting?query_text = '<search_term>'
into the subquery?