I'm merging databases with each other. One problem I have is that both of the database tables have the same primary keys (starting from 1, but otherwise they are different records).
To solve this, I run the following (as pseudocode):
last=1 for db in databases: ALTER TABLE table1 DROP CONSTRAINT table1_pkey CREATE SEQUENCE temp_seq START last UPDATE table1 SET table1_pk = nextval('temp_seq') ALTER TABLE table1 ADD PRIMARY KEY table1_pk last = nextval('temp_seq') DROP SEQUENCE temp_seq
This goes through all the databases and resets their indices starting from 1. First database has indices 1-50, second 51-125, third 126-223 and so on.. After this I dump and restore them to one database and there is no overlap anymore.
Everything works well, other than the index reset part is extremely slow with bigger databases (many GBs of data). I found out that postgres is running VACUUM ANALYZE after every
UPDATE table1 SET table1_pk = nextval('temp_seq'). This makes everything even slower than before. According to my understanding, this is done to prevent transaction ID wraparound? From the PostgreSQL docs: https://www.postgresql.org/docs/9.3/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND
PostgreSQL's MVCC transaction semantics depend on being able to compare transaction ID (XID) numbers: a row version with an insertion XID greater than the current transaction's XID is "in the future" and should not be visible to the current transaction. But since transaction IDs have limited size (32 bits) a cluster that runs for a long time (more than 4 billion transactions) would suffer transaction ID wraparound: the XID counter wraps around to zero, and all of a sudden transactions that were in the past appear to be in the future — which means their output become invisible. In short, catastrophic data loss. (Actually the data is still there, but that's cold comfort if you cannot get at it.) To avoid this, it is necessary to vacuum every table in every database at least once every two billion transactions.
Is this the correct reason? Is there a faster way to do primary key reset? Or to tune the vacuuming parameters without any data loss?