I’ve been working with moving large databases in PostgreSQL and have found that PostgreSQL is a true memory hog when inserting millions of rows in one statement. If the target table has a number of foreign key references PostgreSQL seems to choke horribly and use insane amounts of memory, 1GB per million rows in my original case.
We have a rather simple table that is composed of six integer columns, two timestamps, and one boolean. I first ran into this issue when inserting with a select statement that had a simple join and where clause. Initially I thought the select statement was the one causing the out of memory issues. But I found running the select alone worked and changing the statement to a select into piped the data into a new table without any issues.
I created a simple test case to reproduce this with the following schema,
CREATE TABLE table1 (
table1_id integer NOT NULL,
name character varying NOT NULL,
PRIMARY KEY(table1_id)
);
CREATE TABLE table2 (
table2_id integer NOT NULL,
name character varying NOT NULL,
PRIMARY KEY(table2_id)
);
CREATE TABLE table3 (
table3_id integer NOT NULL,
name character varying NOT NULL,
PRIMARY KEY(table3_id)
);
CREATE TABLE table4 (
table4_id integer NOT NULL,
name character varying NOT NULL,
PRIMARY KEY(table4_id)
);
CREATE TABLE test_target (
table1_id integer NOT NULL,
table2_id integer NOT NULL,
visible boolean NOT NULL,
date_added timestamp with time zone NOT NULL,
date_updated timestamp with time zone NOT NULL,
table3_id1 integer NOT NULL,
table3_id2 integer NOT NULL,
table4_id1 integer NOT NULL,
table4_id2 integer NOT NULL,
FOREIGN KEY (table1_id) REFERENCES table1(table1_id),
FOREIGN KEY (table2_id) REFERENCES table2(table2_id),
FOREIGN KEY (table3_id1) REFERENCES table3(table3_id),
FOREIGN KEY (table3_id2) REFERENCES table3(table3_id),
FOREIGN KEY (table4_id1) REFERENCES table4(table4_id),
FOREIGN KEY (table4_id2) REFERENCES table4(table4_id)
);
Running this INSERT / SELECT below can cause the PostgreSQL to keep expanding in memory usage. On a 32-bit machine it aborts due to an out of memory error around 2GB, but on a 64-bit Linux machine it keeps using memory until all of the main memory and swap is full and then the oom-killer process is spawned and starts killing processes.
INSERT INTO test_target
(
table1_id, table2_id, visible,
date_added, date_updated,
table3_id1, table3_id2,
table4_id1, table4_id2
)
SELECT
0 as table1_id, 0 as table2_id,
TRUE as visible,
now() as date_added,
now() as date_updated,
0 as table3_id1, 0 as table3_id2,
0 as table4_id1, 0 as table4_id2
FROM generate_series(1, 13000000);
I've tried this on PostgreSQL 8.3.7 under Linux and I found the same behavior in PostgreSQL 8.4 Beta for Windows.
A simple workaround I found is to simply drop the foreign keys before the insert, and re-add the foreign keys after the data is loaded into the table.