A custom calendar, address, link management system using PostgreSQL to store data
Installed on three computers: home, work, internet server
Keeping three instances in sync required too much manual attention, even though the application could save/load arbitrary data sets
Home and Work computers are not always connected to the internet server
What I need
asynchronous multi master which does not need permanent network connections
General problems with AMM
Update problem: two [or more] (simultaneous) transactions on the same row
Insert problem: two [or more] inserts of unique values
My view of the situation
We have layers of responsibilities:
1 user
2 application
2.5 replication
3 database
I don't care for theory
I can only work at one place at a time, thus will only work on one
replicated database.
I want multiple database instances primarily for data loss prevention.
I need databases for locality. Usually one DB per location, so apps
running at that location have consistent view on that DB.
My solution for the theory
1) inserts into unique columns
→ unique columns should be avoided at
database level. Alternatively application is responsible to provide
unique values.
2) two updates on the same row
→ chance of winning update problem is
50%. In practice this is not different from two users working on
single-master DB, because their
submit is not synchronized.
3) two deletes on the same row
→ one will fail, but end result is the
same.
4) update on deleted row
→ will fail, but end result is the same as
if order of events would have been reversed.
Constraints of my approach
table columns must be data types supported by perl/DBI, DBD::Pg
DB replication graph must be without cycles (spanning tree)
Each table must contain column: "nr serial8 primary key" or "nr serial8 not null unique"
How many times did I preach not to use those proprietary non-relational sequential numbers as primary keys? And you still refuse to listen and continue to use these concepts... bookmark
Implementation
schema "replication" contains tables and functions
every DB is assigned a node number
trigger on tables which monitor insert/update/delete and writes to "log" table
events in "log" table are merged, because we only store PK in there
Every DB only knows it's neighbours
perl program reads "log" table and replicates with one neighbour DB