GiST vs btree podczas wyszukiwania wartości w przedziale
Programując bazę danych PostgreSQL często korzystam z wyszukiwania wartości w przedziale – to znaczy czy wartość X jest pomiędzy A i B. Aby zoptymalizować tego typu zapytanie można założyć indeks dwukolumnowy na btree( A, B ). Dziś spróbowałem jednak czegoś innego – indeksu GiST. W tym celu wykorzystałem bibliotego SEG z katalogu contrib w PostgreSQL.
Nie opłaciło się to. W tabelce A mam dwa miliony wylosowanych rekordów. Dla każdego wiersza A<B i S=’a..b’. Przeprowadziłem badania dla początku, środka i końca przedziału wyznaczonego przez minimalne i maksymalne wartości A i B. Czas wyszukiwania dla zapytań z użyciem indeksu btree i GiST są podobne.
htest=# \d a
Table "public.a"
Column | Type | Modifiers
--------+------------------+-----------
a | double precision |
b | double precision |
dif | double precision |
s | seg |
Indexes:
"a_ab" btree (a, b)
"a_s" gist (s)
htest=# SELECT count(*), min(a) as mina, max(a) as maxa, min(b) as minb, max(b) as maxb from a;
count | mina | maxa | minb | maxb
---------+----------------------+------------------+----------------------+------------------
2000001 | 1.68569386083898e-07 | 0.99999854248017 | 0.000242068432384202 | 1.99908247962551
htest=# SELECT count(*) from a where s @ '0.001'::seg;
count
-------
1974
(1 row)
htest=# SELECT count(*) from a where 0.001 between a and b;
count
-------
1974
(1 row)
htest=# EXPLAIN ANALYZE SELECT count(*) from a where s @ '0.001'::seg;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=5694.69..5694.70 rows=1 width=0) (actual time=42.009..42.011 rows=1 loops=1)
-> Bitmap Heap Scan on a (cost=22.00..5689.68 rows=2000 width=0) (actual time=4.066..37.296 rows=1974 loops=1)
Recheck Cond: (s @ '0.001'::seg)
-> Bitmap Index Scan on a_s (cost=0.00..22.00 rows=2000 width=0) (actual time=3.159..3.159 rows=1974 loops=1)
Index Cond: (s @ '0.001'::seg)
Total runtime: 42.097 ms
(6 rows)
htest=# EXPLAIN ANALYZE SELECT count(*) from a where 0.001 between a and b;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=736.79..736.80 rows=1 width=0) (actual time=37.937..37.939 rows=1 loops=1)
-> Bitmap Heap Scan on a (cost=3.20..736.29 rows=200 width=0) (actual time=1.832..34.346 rows=1974 loops=1)
Recheck Cond: ((0.001::double precision >= a) AND (0.001::double precision <= b))
-> Bitmap Index Scan on a_ab (cost=0.00..3.20 rows=200 width=0) (actual time=0.923..0.923 rows=1974 loops=1)
Index Cond: ((0.001::double precision >= a) AND (0.001::double precision <= b))
Total runtime: 38.020 ms
(6 rows)
htest=# EXPLAIN ANALYZE SELECT count(*) from a where s @ '0.01'::seg;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=5694.69..5694.70 rows=1 width=0) (actual time=330.687..330.689 rows=1 loops=1)
-> Bitmap Heap Scan on a (cost=22.00..5689.68 rows=2000 width=0) (actual time=40.345..294.110 rows=19677 loops=1)
Recheck Cond: (s @ '0.01'::seg)
-> Bitmap Index Scan on a_s (cost=0.00..22.00 rows=2000 width=0) (actual time=30.849..30.849 rows=19677 loops=1)
Index Cond: (s @ '0.01'::seg)
Total runtime: 330.906 ms
(6 rows)
htest=# EXPLAIN ANALYZE SELECT count(*) from a where 0.01 between a and b;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=17913.90..17913.91 rows=1 width=0) (actual time=337.855..337.856 rows=1 loops=1)
-> Bitmap Heap Scan on a (cost=181.55..17868.26 rows=18256 width=0) (actual time=31.973..301.142 rows=19677 loops=1)
Recheck Cond: ((0.01::double precision >= a) AND (0.01::double precision <= b))
-> Bitmap Index Scan on a_ab (cost=0.00..181.55 rows=18256 width=0) (actual time=15.159..15.159 rows=19677 loops=1)
Index Cond: ((0.01::double precision >= a) AND (0.01::double precision <= b))
Total runtime: 338.070 ms
(6 rows)
htest=# EXPLAIN ANALYZE SELECT count(*) from a where s @ '0.1'::seg;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=5694.69..5694.70 rows=1 width=0) (actual time=1239.315..1239.316 rows=1 loops=1)
-> Bitmap Heap Scan on a (cost=22.00..5689.68 rows=2000 width=0) (actual time=189.930..898.891 rows=189570 loops=1)
Recheck Cond: (s @ '0.1'::seg)
-> Bitmap Index Scan on a_s (cost=0.00..22.00 rows=2000 width=0) (actual time=164.654..164.654 rows=189570 loops=1)
Index Cond: (s @ '0.1'::seg)
Total runtime: 1239.599 ms
(6 rows)
htest=# EXPLAIN ANALYZE SELECT count(*) from a where 0.1 between a and b;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=22163.00..22163.01 rows=1 width=0) (actual time=1198.103..1198.105 rows=1 loops=1)
-> Bitmap Heap Scan on a (cost=2002.74..21663.96 rows=199615 width=0) (actual time=154.321..854.816 rows=189570 loops=1)
Recheck Cond: ((0.1::double precision >= a) AND (0.1::double precision <= b))
-> Bitmap Index Scan on a_ab (cost=0.00..2002.74 rows=199615 width=0) (actual time=130.457..130.457 rows=189570 loops=1)
Index Cond: ((0.1::double precision >= a) AND (0.1::double precision <= b))
Total runtime: 1198.389 ms
(6 rows)
htest=# EXPLAIN ANALYZE SELECT count(*) from a where s @ '1'::seg;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=5694.69..5694.70 rows=1 width=0) (actual time=5003.059..5003.060 rows=1 loops=1)
-> Bitmap Heap Scan on a (cost=22.00..5689.68 rows=2000 width=0) (actual time=801.516..3216.648 rows=999981 loops=1)
Recheck Cond: (s @ '1'::seg)
-> Bitmap Index Scan on a_s (cost=0.00..22.00 rows=2000 width=0) (actual time=780.752..780.752 rows=999981 loops=1)
Index Cond: (s @ '1'::seg)
Total runtime: 5003.345 ms
(6 rows)
htest=# EXPLAIN ANALYZE SELECT count(*) from a where 1 between a and b;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Aggregate (cost=49189.14..49189.15 rows=1 width=0) (actual time=5087.386..5087.387 rows=1 loops=1)
-> Seq Scan on a (cost=0.00..46667.01 rows=1008848 width=0) (actual time=10.998..3317.487 rows=999981 loops=1)
Filter: ((1::double precision >= a) AND (1::double precision <= b))
Total runtime: 5087.446 ms
(4 rows)
htest=# EXPLAIN ANALYZE SELECT count(*) from a where s @ '1.999'::seg;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Aggregate (cost=5694.69..5694.70 rows=1 width=0) (actual time=0.347..0.349 rows=1 loops=1)
-> Bitmap Heap Scan on a (cost=22.00..5689.68 rows=2000 width=0) (actual time=0.331..0.334 rows=1 loops=1)
Recheck Cond: (s @ '1.999'::seg)
-> Bitmap Index Scan on a_s (cost=0.00..22.00 rows=2000 width=0) (actual time=0.302..0.302 rows=1 loops=1)
Index Cond: (s @ '1.999'::seg)
Total runtime: 0.431 ms
(6 rows)
htest=# EXPLAIN ANALYZE SELECT count(*) from a where 1.999 between a and b;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=20499.44..20499.45 rows=1 width=0) (actual time=422.610..422.612 rows=1 loops=1)
-> Index Scan using a_ab on a (cost=0.00..20498.93 rows=200 width=0) (actual time=422.263..422.590 rows=1 loops=1)
Index Cond: ((1.999::double precision >= a) AND (1.999::double precision <= b))
Total runtime: 422.681 ms
(4 rows)
Zwracam jednak uwagę, na wyniki ostatniego zapytania, o wartości z końca przedziału. W tym wypadku indeks GiST był 1000 razy szybszy. Założyłem więc dodatkowy indeks btree( B ) i potestowałem wartości z końca przedziału. Tadnem indeksów btree był tak samo lub nawet minimalnie szybszy niż indeks GiST.
htest=# SELECT count(*) from a where s @ '1.995'::seg;
count
-------
16
(1 row)
htest=# EXPLAIN ANALYZE SELECT count(*) from a where s @ '1.995'::seg;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Aggregate (cost=5694.69..5694.70 rows=1 width=0) (actual time=0.278..0.280 rows=1 loops=1)
-> Bitmap Heap Scan on a (cost=22.00..5689.68 rows=2000 width=0) (actual time=0.192..0.245 rows=16 loops=1)
Recheck Cond: (s @ '1.995'::seg)
-> Bitmap Index Scan on a_s (cost=0.00..22.00 rows=2000 width=0) (actual time=0.179..0.179 rows=16 loops=1)
Index Cond: (s @ '1.995'::seg)
Total runtime: 0.363 ms
(6 rows)
htest=# EXPLAIN ANALYZE select count(*) from a where 1.995 between a and b;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Aggregate (cost=736.29..736.30 rows=1 width=0) (actual time=0.134..0.136 rows=1 loops=1)
-> Bitmap Heap Scan on a (cost=2.70..735.79 rows=200 width=0) (actual time=0.047..0.102 rows=16 loops=1)
Recheck Cond: (1.995::double precision <= b)
Filter: (1.995::double precision >= a)
-> Bitmap Index Scan on a_b (cost=0.00..2.70 rows=200 width=0) (actual time=0.028..0.028 rows=16 loops=1)
Index Cond: (1.995::double precision <= b)
Total runtime: 0.209 ms
(7 rows)
Po przeprowadzeniu testów mam następujące wnioski:
- Sprawdź, jaki typ indeksu najbardziej odpowiada danym, które masz w tabeli
- Dla moich losowych wygenerowanych danych najlepszy był tandem indeksów btree( A, B ) i btree( B ).
Categories: PostgreSQL
Puk, puk ! Czy cos tu sie jeszcze w ogole dzieje ?? Co z nowymi postami ??
puk puk1Ja juz wszystko powiedzialam