Speeding up Postgresql manual index reset?

by Asgmch   Last Updated June 13, 2019 09:06 AM

I'm merging databases with each other. One problem I have is that both of the database tables have the same primary keys (starting from 1, but otherwise they are different records).

To solve this, I run the following (as pseudocode):

for db in databases:
   ALTER TABLE table1 DROP CONSTRAINT table1_pkey
   CREATE SEQUENCE temp_seq START last
   UPDATE table1 SET table1_pk = nextval('temp_seq')
   ALTER TABLE table1 ADD PRIMARY KEY table1_pk
   last = nextval('temp_seq')
   DROP SEQUENCE temp_seq

This goes through all the databases and resets their indices starting from 1. First database has indices 1-50, second 51-125, third 126-223 and so on.. After this I dump and restore them to one database and there is no overlap anymore.

Everything works well, other than the index reset part is extremely slow with bigger databases (many GBs of data). I found out that postgres is running VACUUM ANALYZE after every UPDATE table1 SET table1_pk = nextval('temp_seq'). This makes everything even slower than before. According to my understanding, this is done to prevent transaction ID wraparound? From the PostgreSQL docs: https://www.postgresql.org/docs/9.3/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND

PostgreSQL's MVCC transaction semantics depend on being able to compare transaction ID (XID) numbers: a row version with an insertion XID greater than the current transaction's XID is "in the future" and should not be visible to the current transaction. But since transaction IDs have limited size (32 bits) a cluster that runs for a long time (more than 4 billion transactions) would suffer transaction ID wraparound: the XID counter wraps around to zero, and all of a sudden transactions that were in the past appear to be in the future — which means their output become invisible. In short, catastrophic data loss. (Actually the data is still there, but that's cold comfort if you cannot get at it.) To avoid this, it is necessary to vacuum every table in every database at least once every two billion transactions.

Is this the correct reason? Is there a faster way to do primary key reset? Or to tune the vacuuming parameters without any data loss?

Related Questions

Updated June 24, 2019 10:06 AM

Updated May 17, 2017 09:06 AM

Updated January 18, 2018 10:06 AM

Updated November 08, 2018 16:06 PM