0

I have two jsonb columns in my table: etl_value and user_value. I have jsonb_path_ops gin index on both columns.

    "ix_variable_instance_versioned_etl_value" gin (etl_value jsonb_path_ops)
    "ix_variable_instance_versioned_user_value" gin (user_value jsonb_path_ops)

Following query on user_value uses the index:

=> explain analyze select count(*) from clinical.variable_instances_versioned where user_value @? '$.coding[*].code ? (@ like_regex "C" flag "i")';
                                                                               QUERY PLAN                                                                                
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1650284.49..1650284.50 rows=1 width=8) (actual time=691.730..691.731 rows=1 loops=1)
   ->  Bitmap Heap Scan on variable_instances_versioned  (cost=1649812.91..1650284.19 rows=118 width=0) (actual time=118.388..691.377 rows=4268 loops=1)
         Recheck Cond: (user_value @? '$."coding"[*]."code"?(@ like_regex "C" flag "i")'::jsonpath)
         Rows Removed by Index Recheck: 1248999
         Heap Blocks: exact=58317 lossy=35532
         ->  Bitmap Index Scan on ix_variable_instance_versioned_user_value  (cost=0.00..1649812.88 rows=118 width=0) (actual time=107.017..107.017 rows=184334 loops=1)
               Index Cond: (user_value @? '$."coding"[*]."code"?(@ like_regex "C" flag "i")'::jsonpath)
 Planning Time: 0.098 ms
 Execution Time: 693.030 ms
(9 rows)

But the exact same query on etl_value does not use the index. Why would that be?

=> explain analyze select count(*) from clinical.variable_instances_versioned where etl_value @? '$.coding[*].code ? (@ like_regex "C" flag "i")';
                                                                             QUERY PLAN                                                                              
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=3558858.31..3558858.32 rows=1 width=8) (actual time=12911.026..12912.472 rows=1 loops=1)
   ->  Gather  (cost=3558858.10..3558858.31 rows=2 width=8) (actual time=12910.955..12912.467 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate  (cost=3557858.10..3557858.11 rows=1 width=8) (actual time=12908.707..12908.707 rows=1 loops=3)
               ->  Parallel Seq Scan on variable_instances_versioned  (cost=0.00..3556075.62 rows=712989 width=0) (actual time=0.053..12871.096 rows=509619 loops=3)
                     Filter: (etl_value @? '$."coding"[*]."code"?(@ like_regex "C" flag "i")'::jsonpath)
                     Rows Removed by Filter: 30108635
 Planning Time: 0.188 ms
 Execution Time: 12912.500 ms
(10 rows)
5
  • Do these plans change when you run a vacuum analyze clinical.variable_instances_versioned; and reindex table clinical.variable_instances_versioned;?
    – Zegarek
    Commented Feb 10, 2023 at 20:37
  • 1
    those queries count vastly different number of rows. It is not surprising they choose different plans.
    – jjanes
    Commented Feb 10, 2023 at 21:13
  • 1
    The like_regex is not really indexable. As far as I can tell, the only rows excluded by the index are ones where the entire field is NULL..
    – jjanes
    Commented Feb 10, 2023 at 21:16
  • Doesn't change after vacuum analyze. Doesn't change after reindexing. Doesn't change after vacuum analyze post reindexing
    – na_ka_na
    Commented Feb 10, 2023 at 21:16
  • Yeah == uses the index for both columns
    – na_ka_na
    Commented Feb 10, 2023 at 21:35

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.