I have two jsonb
columns in my table: etl_value
and user_value
. I have jsonb_path_ops
gin
index on both columns.
"ix_variable_instance_versioned_etl_value" gin (etl_value jsonb_path_ops)
"ix_variable_instance_versioned_user_value" gin (user_value jsonb_path_ops)
Following query on user_value
uses the index:
=> explain analyze select count(*) from clinical.variable_instances_versioned where user_value @? '$.coding[*].code ? (@ like_regex "C" flag "i")';
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1650284.49..1650284.50 rows=1 width=8) (actual time=691.730..691.731 rows=1 loops=1)
-> Bitmap Heap Scan on variable_instances_versioned (cost=1649812.91..1650284.19 rows=118 width=0) (actual time=118.388..691.377 rows=4268 loops=1)
Recheck Cond: (user_value @? '$."coding"[*]."code"?(@ like_regex "C" flag "i")'::jsonpath)
Rows Removed by Index Recheck: 1248999
Heap Blocks: exact=58317 lossy=35532
-> Bitmap Index Scan on ix_variable_instance_versioned_user_value (cost=0.00..1649812.88 rows=118 width=0) (actual time=107.017..107.017 rows=184334 loops=1)
Index Cond: (user_value @? '$."coding"[*]."code"?(@ like_regex "C" flag "i")'::jsonpath)
Planning Time: 0.098 ms
Execution Time: 693.030 ms
(9 rows)
But the exact same query on etl_value
does not use the index. Why would that be?
=> explain analyze select count(*) from clinical.variable_instances_versioned where etl_value @? '$.coding[*].code ? (@ like_regex "C" flag "i")';
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=3558858.31..3558858.32 rows=1 width=8) (actual time=12911.026..12912.472 rows=1 loops=1)
-> Gather (cost=3558858.10..3558858.31 rows=2 width=8) (actual time=12910.955..12912.467 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=3557858.10..3557858.11 rows=1 width=8) (actual time=12908.707..12908.707 rows=1 loops=3)
-> Parallel Seq Scan on variable_instances_versioned (cost=0.00..3556075.62 rows=712989 width=0) (actual time=0.053..12871.096 rows=509619 loops=3)
Filter: (etl_value @? '$."coding"[*]."code"?(@ like_regex "C" flag "i")'::jsonpath)
Rows Removed by Filter: 30108635
Planning Time: 0.188 ms
Execution Time: 12912.500 ms
(10 rows)
vacuum analyze clinical.variable_instances_versioned;
andreindex table clinical.variable_instances_versioned;
?vacuum analyze
. Doesn't change after reindexing. Doesn't change aftervacuum analyze
post reindexing==
uses the index for both columns