1

I have this query that runs very slowly when DISTINCT is used;

SELECT COUNT(*) FROM (SELECT DISTINCT query_text FROM search_query WHERE (store_id IN (1)) AND (num_results > 0) ORDER BY popularity desc LIMIT 100) AS result WHERE (result.query_text = '<search_term>');

If I run this query as is, it takes between three and four seconds. However, if I remove DISTINCT, results are functionally instant. However, I can't simply remove that DISTINCT from the equation since this query needs to gather a distinct list of 100 search terms with no repeats. Since this is part of a Magento install, store_id could be a list of multiple store IDs. While the current setup is single store, this could change so I'd rather leave the DISTINCT as it sits.

This is the create table statement;

CREATE TABLE `search_query` (
  `query_id` int unsigned NOT NULL AUTO_INCREMENT COMMENT 'Query ID',
  `query_text` varchar(255) DEFAULT NULL COMMENT 'Query text',
  `num_results` int unsigned NOT NULL DEFAULT '0' COMMENT 'Num results',
  `popularity` int unsigned NOT NULL DEFAULT '0' COMMENT 'Popularity',
  `redirect` varchar(255) DEFAULT NULL COMMENT 'Redirect',
  `store_id` smallint unsigned NOT NULL DEFAULT '0' COMMENT 'Store ID',
  `display_in_terms` smallint NOT NULL DEFAULT '1' COMMENT 'Display in terms',
  `is_active` smallint DEFAULT '1' COMMENT 'Active status',
  `is_processed` smallint DEFAULT '0' COMMENT 'Processed status',
  `updated_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT 'Updated at',
  PRIMARY KEY (`query_id`),
  UNIQUE KEY `SEARCH_QUERY_QUERY_TEXT_STORE_ID` (`query_text`,`store_id`),
  KEY `SEARCH_QUERY_QUERY_TEXT_STORE_ID_POPULARITY` (`query_text`,`store_id`,`popularity`),
  KEY `SEARCH_QUERY_STORE_ID` (`store_id`),
  KEY `SEARCH_QUERY_IS_PROCESSED` (`is_processed`),
  KEY `SEARCH_QUERY_STORE_ID_POPULARITY` (`store_id`,`popularity`),
  CONSTRAINT `SEARCH_QUERY_STORE_ID_STORE_STORE_ID` FOREIGN KEY (`store_id`) REFERENCES `store` (`store_id`) ON DELETE CASCADE

And this is what EXPLAIN returns;

mysql> EXPLAIN SELECT COUNT(*) FROM (SELECT DISTINCT query_text FROM search_query WHERE (store_id IN (1)) AND (num_results > 0) ORDER BY popularity desc LIMIT 100) AS result WHERE (result.query_text = '<search_term>');
+----+-------------+--------------+------------+------+-------------------------------------------------------------------------------------------------------------------------------------+-----------------------+---------+-------+--------+----------+----------------------------------------------+
| id | select_type | table        | partitions | type | possible_keys                                                                                                                       | key                   | key_len | ref   | rows   | filtered | Extra                                        |
+----+-------------+--------------+------------+------+-------------------------------------------------------------------------------------------------------------------------------------+-----------------------+---------+-------+--------+----------+----------------------------------------------+
|  1 | PRIMARY     | <derived2>   | NULL       | ref  | <auto_key0>                                                                                                                         | <auto_key0>           | 768     | const |     10 |   100.00 | Using index                                  |
|  2 | DERIVED     | search_query | NULL       | ref  | SEARCH_QUERY_QUERY_TEXT_STORE_ID,SEARCH_QUERY_QUERY_TEXT_STORE_ID_POPULARITY,SEARCH_QUERY_STORE_ID,SEARCH_QUERY_STORE_ID_POPULARITY | SEARCH_QUERY_STORE_ID | 2       | const | 460386 |    33.33 | Using where; Using temporary; Using filesort |
+----+-------------+--------------+------------+------+-------------------------------------------------------------------------------------------------------------------------------------+-----------------------+---------+-------+--------+----------+----------------------------------------------+
2 rows in set, 1 warning (0.00 sec)

I tried creating a different covering index, but it didn't help. This is the covering index I made (and subsequently forced the query to use);

ALTER TABLE `search_query` ADD INDEX `QUERY_PERFORMANCE_FIX_TEST_INDEX` (`store_id`, `num_results`, `popularity` desc, `query_text`);

That helped a little but not by much. A difference of a couple hundred ms at best so that won't be the solution. Without this index and without DISTINCT, the results are nearly instant so I don't think it's necessarily the indexes being wrong, but, perhaps, a missing one?

Am I missing an index (such as a FULLTEXT index on query_text)? Is there a better way of determining this information?

2
  • That query is somewhat meaningless -- If there are two rows the the same query_text, and they have different popularity values, which value should be used for sorting?
    – Rick James
    Commented Nov 10, 2023 at 19:32
  • Would it work to move query_text = '<search_term>' into the subquery?
    – Rick James
    Commented Nov 10, 2023 at 19:34

1 Answer 1

0

Marisa, Try this, please

SELECT COUNT(*) FROM 
    (SELECT DISTINCT query_text FROM search_query WHERE (store_id IN (1)) AND 
        (num_results > 0) AND (query_text = '<search_term>') 
        ORDER BY popularity desc LIMIT 100) AS result; 

NOT TESTED. Let us know your results, please.

3
  • That won't work for this situation. I have to have the search query outside of the subquery, otherwise it won't fit the logic. It's meant to determine if a particular search term is in the most popular 100 terms or not. It's part of a search suggestion functionality.
    – Marisa
    Commented Nov 8, 2023 at 18:41
  • Marisa, Please run this query to determine Maximum and Average Length of some of your columns used in search_query. Commented Nov 8, 2023 at 21:18
  • # sel-mmaMLALsearch_query.sql min max avg last upd 2023-11-08 WLH # original 2023-11-08 WLH SELECT 'Begin Time',NOW(),DATABASE(),' ',COUNT(*),' ', MIN(query_id),MAX(query_id),AVG(query_id),MAX(LENGTH(query_id)),AVG(LENGTH(query_id)),'query_id',' ', MIN(query_text),MAX(query_text),AVG(query_text),MAX(LENGTH(query_text)),AVG(LENGTH(query_text)),'query_text',' ', FROM search_query \G; SELECT 'End Time',NOW(); Commented Nov 8, 2023 at 21:20

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.