Archive

Archive for September, 2006

GiST vs btree podczas wyszukiwania wartości w przedziale

September 21st, 2006 2 comments
Programując bazę danych PostgreSQL często korzystam z wyszukiwania wartości w przedziale – to znaczy czy wartość X jest pomiędzy A i B. Aby zoptymalizować tego typu zapytanie można założyć indeks dwukolumnowy na btree( A, B ). Dziś spróbowałem jednak czegoś innego – indeksu GiST. W tym celu wykorzystałem bibliotego SEG z katalogu contrib w PostgreSQL. Nie opłaciło się to. W tabelce A mam dwa miliony wylosowanych rekordów. Dla każdego wiersza A<B i S=’a..b’. Przeprowadziłem badania dla początku, środka i końca przedziału wyznaczonego przez minimalne i maksymalne wartości A i B. Czas wyszukiwania dla zapytań z użyciem indeksu btree i GiST są podobne.
htest=# \d a
          Table "public.a"
Column |       Type       | Modifiers
--------+------------------+-----------
a      | double precision |
b      | double precision |
dif    | double precision |
s      | seg              |
Indexes:
   "a_ab" btree (a, b)
   "a_s" gist (s)


htest=# SELECT count(*), min(a) as mina, max(a) as maxa, min(b) as minb, max(b) as maxb from a;
 count  |         mina         |       maxa       |         minb         |       maxb      
---------+----------------------+------------------+----------------------+------------------
2000001 | 1.68569386083898e-07 | 0.99999854248017 | 0.000242068432384202 | 1.99908247962551



htest=# SELECT count(*) from a where s @ '0.001'::seg;
count
-------
 1974
(1 row)

htest=# SELECT count(*) from a where 0.001 between a and b;
count
-------
 1974
(1 row)

htest=# EXPLAIN ANALYZE SELECT count(*) from a where s @ '0.001'::seg;
                                                       QUERY PLAN                                                        
-------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=5694.69..5694.70 rows=1 width=0) (actual time=42.009..42.011 rows=1 loops=1)
   ->  Bitmap Heap Scan on a  (cost=22.00..5689.68 rows=2000 width=0) (actual time=4.066..37.296 rows=1974 loops=1)
         Recheck Cond: (s @ '0.001'::seg)
         ->  Bitmap Index Scan on a_s  (cost=0.00..22.00 rows=2000 width=0) (actual time=3.159..3.159 rows=1974 loops=1)
               Index Cond: (s @ '0.001'::seg)
 Total runtime: 42.097 ms
(6 rows)

htest=# EXPLAIN ANALYZE SELECT count(*) from a where 0.001 between a and b;
                                                       QUERY PLAN                                                       
------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=736.79..736.80 rows=1 width=0) (actual time=37.937..37.939 rows=1 loops=1)
   ->  Bitmap Heap Scan on a  (cost=3.20..736.29 rows=200 width=0) (actual time=1.832..34.346 rows=1974 loops=1)
         Recheck Cond: ((0.001::double precision >= a) AND (0.001::double precision <= b))
         ->  Bitmap Index Scan on a_ab  (cost=0.00..3.20 rows=200 width=0) (actual time=0.923..0.923 rows=1974 loops=1)
               Index Cond: ((0.001::double precision >= a) AND (0.001::double precision <= b))
 Total runtime: 38.020 ms
(6 rows)

htest=# EXPLAIN ANALYZE SELECT count(*) from a where s @ '0.01'::seg;
                                                         QUERY PLAN                                                         
----------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=5694.69..5694.70 rows=1 width=0) (actual time=330.687..330.689 rows=1 loops=1)
   ->  Bitmap Heap Scan on a  (cost=22.00..5689.68 rows=2000 width=0) (actual time=40.345..294.110 rows=19677 loops=1)
         Recheck Cond: (s @ '0.01'::seg)
         ->  Bitmap Index Scan on a_s  (cost=0.00..22.00 rows=2000 width=0) (actual time=30.849..30.849 rows=19677 loops=1)
               Index Cond: (s @ '0.01'::seg)
 Total runtime: 330.906 ms
(6 rows)

htest=# EXPLAIN ANALYZE SELECT count(*) from a where 0.01 between a and b;
                                                          QUERY PLAN                                                           
-------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=17913.90..17913.91 rows=1 width=0) (actual time=337.855..337.856 rows=1 loops=1)
   ->  Bitmap Heap Scan on a  (cost=181.55..17868.26 rows=18256 width=0) (actual time=31.973..301.142 rows=19677 loops=1)
         Recheck Cond: ((0.01::double precision >= a) AND (0.01::double precision <= b))
         ->  Bitmap Index Scan on a_ab  (cost=0.00..181.55 rows=18256 width=0) (actual time=15.159..15.159 rows=19677 loops=1)
               Index Cond: ((0.01::double precision >= a) AND (0.01::double precision <= b))
 Total runtime: 338.070 ms
(6 rows)
htest=# EXPLAIN ANALYZE SELECT count(*) from a where s @ '0.1'::seg;
                                                          QUERY PLAN                                                           
-------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=5694.69..5694.70 rows=1 width=0) (actual time=1239.315..1239.316 rows=1 loops=1)
   ->  Bitmap Heap Scan on a  (cost=22.00..5689.68 rows=2000 width=0) (actual time=189.930..898.891 rows=189570 loops=1)
         Recheck Cond: (s @ '0.1'::seg)
         ->  Bitmap Index Scan on a_s  (cost=0.00..22.00 rows=2000 width=0) (actual time=164.654..164.654 rows=189570 loops=1)
               Index Cond: (s @ '0.1'::seg)
 Total runtime: 1239.599 ms
(6 rows)

htest=# EXPLAIN ANALYZE SELECT count(*) from a where 0.1 between a and b;
                                                             QUERY PLAN                                                             
------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=22163.00..22163.01 rows=1 width=0) (actual time=1198.103..1198.105 rows=1 loops=1)
   ->  Bitmap Heap Scan on a  (cost=2002.74..21663.96 rows=199615 width=0) (actual time=154.321..854.816 rows=189570 loops=1)
         Recheck Cond: ((0.1::double precision >= a) AND (0.1::double precision <= b))
         ->  Bitmap Index Scan on a_ab  (cost=0.00..2002.74 rows=199615 width=0) (actual time=130.457..130.457 rows=189570 loops=1)
               Index Cond: ((0.1::double precision >= a) AND (0.1::double precision <= b))
 Total runtime: 1198.389 ms
(6 rows)

htest=# EXPLAIN ANALYZE SELECT count(*) from a where s @ '1'::seg;
                                                          QUERY PLAN                                                           
-------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=5694.69..5694.70 rows=1 width=0) (actual time=5003.059..5003.060 rows=1 loops=1)
   ->  Bitmap Heap Scan on a  (cost=22.00..5689.68 rows=2000 width=0) (actual time=801.516..3216.648 rows=999981 loops=1)
         Recheck Cond: (s @ '1'::seg)
         ->  Bitmap Index Scan on a_s  (cost=0.00..22.00 rows=2000 width=0) (actual time=780.752..780.752 rows=999981 loops=1)
               Index Cond: (s @ '1'::seg)
 Total runtime: 5003.345 ms
(6 rows)

htest=# EXPLAIN ANALYZE SELECT count(*) from a where 1 between a and b;
                                                     QUERY PLAN                                                     
--------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=49189.14..49189.15 rows=1 width=0) (actual time=5087.386..5087.387 rows=1 loops=1)
   ->  Seq Scan on a  (cost=0.00..46667.01 rows=1008848 width=0) (actual time=10.998..3317.487 rows=999981 loops=1)
         Filter: ((1::double precision >= a) AND (1::double precision <= b))
 Total runtime: 5087.446 ms
(4 rows)

htest=# EXPLAIN ANALYZE SELECT count(*) from a where s @ '1.999'::seg;
                                                      QUERY PLAN                                                      
----------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=5694.69..5694.70 rows=1 width=0) (actual time=0.347..0.349 rows=1 loops=1)
   ->  Bitmap Heap Scan on a  (cost=22.00..5689.68 rows=2000 width=0) (actual time=0.331..0.334 rows=1 loops=1)
         Recheck Cond: (s @ '1.999'::seg)
         ->  Bitmap Index Scan on a_s  (cost=0.00..22.00 rows=2000 width=0) (actual time=0.302..0.302 rows=1 loops=1)
               Index Cond: (s @ '1.999'::seg)
 Total runtime: 0.431 ms
(6 rows)

htest=# EXPLAIN ANALYZE SELECT count(*) from a where 1.999 between a and b;
                                                       QUERY PLAN                                                       
------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=20499.44..20499.45 rows=1 width=0) (actual time=422.610..422.612 rows=1 loops=1)
   ->  Index Scan using a_ab on a  (cost=0.00..20498.93 rows=200 width=0) (actual time=422.263..422.590 rows=1 loops=1)
         Index Cond: ((1.999::double precision >= a) AND (1.999::double precision <= b))
 Total runtime: 422.681 ms
(4 rows)

Zwracam jednak uwagę, na wyniki ostatniego zapytania, o wartości z końca przedziału. W tym wypadku indeks GiST był 1000 razy szybszy. Założyłem więc dodatkowy indeks btree( B ) i potestowałem wartości z końca przedziału. Tadnem indeksów btree był tak samo lub nawet minimalnie szybszy niż indeks GiST.
htest=# SELECT count(*) from a where s @ '1.995'::seg;
 count 
-------
    16
(1 row)

htest=# EXPLAIN ANALYZE SELECT count(*) from a where s @ '1.995'::seg;
                                                      QUERY PLAN                                                       
-----------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=5694.69..5694.70 rows=1 width=0) (actual time=0.278..0.280 rows=1 loops=1)
   ->  Bitmap Heap Scan on a  (cost=22.00..5689.68 rows=2000 width=0) (actual time=0.192..0.245 rows=16 loops=1)
         Recheck Cond: (s @ '1.995'::seg)
         ->  Bitmap Index Scan on a_s  (cost=0.00..22.00 rows=2000 width=0) (actual time=0.179..0.179 rows=16 loops=1)
               Index Cond: (s @ '1.995'::seg)
 Total runtime: 0.363 ms
(6 rows)

htest=# EXPLAIN ANALYZE select count(*) from a where 1.995 between a and b;
                                                     QUERY PLAN                                                      
---------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=736.29..736.30 rows=1 width=0) (actual time=0.134..0.136 rows=1 loops=1)
   ->  Bitmap Heap Scan on a  (cost=2.70..735.79 rows=200 width=0) (actual time=0.047..0.102 rows=16 loops=1)
         Recheck Cond: (1.995::double precision <= b)
         Filter: (1.995::double precision >= a)
         ->  Bitmap Index Scan on a_b  (cost=0.00..2.70 rows=200 width=0) (actual time=0.028..0.028 rows=16 loops=1)
               Index Cond: (1.995::double precision <= b)
 Total runtime: 0.209 ms
(7 rows)
Po przeprowadzeniu testów mam następujące wnioski:
  1. Sprawdź, jaki typ indeksu najbardziej odpowiada danym, które masz w tabeli
  2. Dla moich losowych wygenerowanych danych najlepszy był tandem indeksów btree( A, B ) i btree( B ).
Categories: PostgreSQL Tags:

VIM – zmiana tekstu w wielu plikach jednocześnie

September 14th, 2006 No comments
To takie proste i tak bardzo cieszy. Chcę zmodyfikować wiele plików jednocześnie i w każdym z nich zmienić tekst. Używając VIMa to tylko:
vim `egrep -ril aaa .`
:argdo %s/aaa/bbb/g | update
Categories: Vim Tags:

FreeBSD i Eclipse 3.2

September 13th, 2006 No comments
Bardzo zasmuciłem się tym, że Eclipse Callisto nie przygotował pakietów binarnych dla systemu FreeBSD. W oficjalnych portach FreeBSD jeszcze nie ma portu dla Eclipse 3.2.< Ja lubie nowe programy, więc się uparłem, że Eclipsa chce mieć. Przeszukałem Google. Zaintsalowałem, mniej więcej tak:
cd tmp
wget “http://www.freshdevonrails.com/eclipse.shar”
sh eclipse.shar
unsetenv JAVA_HOME
cd eclipse
make
make install
make package
make clean
Pierwsze testy - mam wrażenie, że działa poprawnie.
Categories: Eclipse, FreeBSD Tags: