Optimizing is the key

Oct 1, 2007 15:10 GMT  ·  By

Avoid allowing each insertion to be committed separately, because PostgreSQL is doing a lot of work for each row that is added. For that you should consider doing just one commit at the end and avoiding autocommit turn-off. The main advantage of inserting one big transaction is that if the insertion of one row fails, then the other insertions roll back so you won't have partial data.

Another method to optimize postgresql is to use COPY commands instead of INSERT. Because COPY command is optimized for loading large numbers of rows, you can load all the rows in one command. Now you do with only one command, what you did before with a series of INSERT commands. Another advantage of using COPY command is that you don't have to turn off autocommit when populating a table, because COPY is a single command.

Example of COPY command: COPY (SELECT * FROM country WHERE country_name LIKE 'A%') TO '/usr1/proj/bray/sql/a_list_countries.copy'; Using this command, you will copy all the countries whose names start with A into a file. You will notice that using COPY command is faster than using INSERT commands, even on large numbers of rows.

In addition, you can use PREPARE and EXECUTE commands. With PREPARE you can prepare INSERT statements and with EXECUTE you can run the INSERT statements everytime you need to.

maintenance_work_mem specifies the maximum amount of memory to be used in maintenance operations, such as VACUUM, CREATE INDEX, and ALTER TABLE ADD FOREIGN KEY. The default value is 16MB, but increasing temporally this value can help you speed up CREATE INDEX commands and ALTER TABLE ADD FOREIGN KEY commands. Unfortunately, this is useful only if you use CREATE INDEX commands and ALTER TABLE ADD FOREIGN KEY commands.

Another way to increase the performance is to raise the value for checkpoint_segments; checkpoint segment represents the maximum distance between automatic WAL checkpoints, in log file segments. The parameter can be changed from postgresql.conf file. Whenever a checkpoint occurs, all dirty pages must be flushed to disk. By increasing checkpoint_segments temporarily during bulk data loads, the number of checkpoints that are required can be reduced.