TuX as LLG logo
Software
DMX4Linux
Driver Suite for Linux
csv2iif.pl suite
convert PayPal transactions to IIF, OFX, QIF
Hardware
DMX30 Interface
128Ch SPP
DMX43 Interface
2out 2in EPP
LED Hardware
for Linux and Windows
EPROM Sampler
for 8 bits of sound
Misc
CatWeasel
Linux drivers for MK3/4 PCI
pg_trompe
PostgreSQL replication
trycatch
C exception/signal handling lib
Patches
to various software
Tools
and small scripts
Docs
misc documents
Links
to lighting stuff

Message Digest Aggregate Functions for PostgreSQL

This library provides message digest aggregate function for PostgreSQL which complement the built in MD5 function and the other message digests from the pgcrypto contrib module. You can use these aggregate functions to calculate a message digest for a column over multiple rows. This can be used to effectively compare two tables with just a single equality test for (each column). See the example below.

License

New and simplified BSD license

Releases

DateFileNews
2012-09-29 pg-mdagg-1.1.tar.gz for PostgreSQL 8.4.x and 9.x

Contact

For questions and suggestions contact Dirk Jagdmann <doj@cubic.org>.

Functions

md5agg(text) RETURNS text
Aggregate function which works on text columns. Returns a 32 character hex-string message digest. Unfortunately this function can not be named md5(text), because it would be indistinguishable from the expression function of the same signature.
md5(oid) RETURNS text
Expression function to calculate the MD5 digest of a large object. This complements the PostgreSQL md5(text) function. Returns a 32 character hex-string message digest.

Example

We create two tables which are nearly identical:

md5agg_a
i int t text
0a
1b
2c
3d
4e
5f
6g
7h
8i
9j
md5agg_b
i int t text
0a
1b
2c
3D
4e
5f
6g
7h
8i
9j

Comparing the i columns will evaluate to the same digest:

# SELECT md5agg(i) FROM (SELECT i::text FROM md5agg_a ORDER BY i) AS tab;
              md5agg
----------------------------------
 781e5e245d69b566979b86e28d23f2c7
(1 row)

# SELECT md5agg(i) FROM (SELECT i::text FROM md5agg_b ORDER BY i) AS tab;
              md5agg
----------------------------------
 781e5e245d69b566979b86e28d23f2c7
(1 row)
Note that calculating the message digest only makes sense, if the column values are fed into the aggregate function in a specific order. The SQL language makes it necessary to write this table expression to use the aggregate function.

If we now compare the text column we see an obvious difference in the calculated digest:

# SELECT md5agg(t) FROM (SELECT t FROM md5agg_a ORDER BY i) AS tab;
              md5agg
----------------------------------
 a925576942e94b2ef57a066101b48876
(1 row)

# SELECT md5agg(t) FROM (SELECT t FROM md5agg_b ORDER BY i) AS tab;
              md5agg
----------------------------------
 6884d0022fd5202573f69d6e2a7fa375
(1 row)
We can even combine both calculations into a single equality comparision:
# SELECT (SELECT md5agg(t) FROM (SELECT t FROM md5agg_a ORDER BY i) AS tab)
         =
         (SELECT md5agg(t) FROM (SELECT t FROM md5agg_b ORDER BY i) AS tab)
         as "compare col t must be FALSE";
 compare col t must be FALSE
-----------------------------
 f
(1 row)

TODO

  • let md5(oid) use md5 state and finalize function to reduce memory usage
  • use other digest algorithms: SHA-1, SHA-2, RIPEMD-160
  • maybe merge with pgcrypto contrib module

Search:
https://llg.cubic.org © 2001-2024 by Dirk Jagdmann