Skip to main content

Fast db insert with postgresql binary copy

Project description

https://travis-ci.org/altaurog/pgcopy.svg?branch=master

pgcopy is a small system for very fast bulk insertion of data into a PostgreSQL database table using binary copy.

Installation

To install:

$ pip install pgcopy

pgcopy requires pytz and the psycopg2 db adapter. nose is required to run the tests.

Use

pgcopy provides facility for copying data from an iterable of tuple-like objects using a CopyManager, which must be instantiated with a psycopg2 db connection, the table name, and an iterable containing the names of the columns to be inserted in the order in which they will be provided. pgcopy inspects the database to determine the datatypes of the columns.

For example:

from datetime import datetime
from pgcopy import CopyManager
import psycopg2
cols = ('id', 'timestamp', 'location', 'temperature')
now = datetime.now()
records = [
        (0, now, 'Jerusalem', 72.2),
        (1, now, 'New York', 75.6),
        (2, now, 'Moscow', 54.3),
    ]
conn = psycopg2.connect(database='weather_db')
mgr = CopyManager(conn, 'measurements_table', cols)
mgr.copy(records)

# don't forget to commit!
conn.commit()

By default, a temporary file on disk is used. If there’s enough memory, you can get a slight performance benefit with in-memory storage:

from io import BytesIO
mgr.copy(records, BytesIO)

A db schema can be specified in the table name using dot notation:

mgr = CopyManager(conn, 'myschema.measurements', cols)

Supported datatypes

Currently the following PostgreSQL datatypes are supported:

  • bool

  • smallint

  • integer

  • bigint

  • real

  • double precision

  • char

  • varchar

  • text

  • bytea

  • date

  • timestamp

  • timestamp with time zone

  • numeric (data must be decimal.Decimal)

  • json

  • jsonb

  • uuid

Unicode strings in the data to be inserted (all values of type str in Python 3) should be encoded as bytes before passing them to copy. Values intended to be NULL in the database should be encoded as None rather than as empty strings.

Testing

For a fast test run using current environment, use nose:

$ nosetests

For more thorough testing, Tox configuration will run tests on python versions 2.7 and 3.3 - 3.6:

$ tox

Additionally, test can be run with no local requirements other than the ubiquitous docker:

$ docker-compose up pgcopy

Benchmarks

Below are simple benchmarks for 100000 records. This gives a general idea of the kind of speedup available with pgcopy:

$ nosetests -c tests/benchmark.cfg
          ExecuteManyBenchmark:   7.75s
               PGCopyBenchmark:   0.54s
----------------------------------------------------------------------
Ran 2 tests in 9.101s

Replacing a Table

When possible, faster insertion may be realized by inserting into an empty table with no indices or constraints. In a case where the entire contents of the table can be reinserted, the Replace context manager automates the process. On entry, it creates a new table like the original, with a temporary name. Default column values are included. It provides the temporary name for populating the table within the context. On exit, it recreates the constraints, indices, triggers, and views on the new table, then replaces the old table with the new. It can be used so:

from pgcopy import CopyManager, Replace
with Replace(conn, 'mytable') as temp_name:
    mgr = CopyManager(conn, temp_name, cols)
    mgr.copy(records)

Replace renames new db objects like the old, where possible. Names of foreign key and check constraints will be mangled. As of v0.6 there is also pgcopy.util.RenameReplace, which instead of dropping the original objects renames them using a transformation function.

Note that on PostgreSQL 9.1 and earlier, concurrent queries on the table will fail once the table is dropped.

See Also

cpgcopy, a Cython implementation, about twice as fast.

Changelog

1.3.1

date:

14 Feb, 2018

  • Mention commit in the README

1.3.0

date:

22 Aug, 2017

  • Support unlimited varchar fields (thanks John A. Bachman)

  • Updated documentation regarding string encoding in Python 3 (thanks John A. Bachman)

  • Fix bug in varchar truncation

  • Fix bug in numeric type formatter (reported by Peter Van Eynde)

1.2.0

date:

25 Mar, 2017

  • Support db schema (thanks Marcin Gozdalik)

1.1.0

date:

26 Jan, 2017

  • Support uuid, json, and jsonb types (thanks Igor Mastak)

  • Integrate Travis CI

  • Add docker test strategy

1.0.0

date:

19 Jan, 2017

  • Run tests with tox

  • Support Python 3

  • Initial release on PyPi

0.7

date:

19 Jan, 2017

  • Add support for serializing Python decimal.Decimal to PostgreSQL numeric.

0.6

date:

21 Oct, 2014

  • RenameReplace variant

0.5

date:

14 Jul, 2014

  • Support default values and sequences

0.4

date:

14 Jul, 2014

  • Fix Replace utility class bugs

  • Add view support to Replace

0.3

date:

8 Jul, 2014

  • Move Cython optimization to separate project

  • Add Replace utility class

0.2

date:

7 Jul, 2014

  • Cython optimization

0.1

date:

29 Jun, 2014

  • Initial version

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pgcopy-1.3.1.tar.gz (10.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pgcopy-1.3.1-py2.py3-none-any.whl (13.0 kB view details)

Uploaded Python 2Python 3

File details

Details for the file pgcopy-1.3.1.tar.gz.

File metadata

  • Download URL: pgcopy-1.3.1.tar.gz
  • Upload date:
  • Size: 10.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for pgcopy-1.3.1.tar.gz
Algorithm Hash digest
SHA256 a18c74dc61d9d99846f7af9c5977adca80cbb520bb85da68298e3ad97bafd017
MD5 866474994a464cc545293990b4ac1db5
BLAKE2b-256 31cd1912ae23c1494f8340a55a5ba0f5a9ea8a73148dcab36fa3fe63b0aa3088

See more details on using hashes here.

File details

Details for the file pgcopy-1.3.1-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for pgcopy-1.3.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 44b3a3bc895fb509fe2b02168705835b8816662c13f7fad08e79ef0c59dece8e
MD5 563c16c594c02866a18064d5612f119c
BLAKE2b-256 859774577eebb7588c886b127fbbab25dcd8c66a4b342ce106e49161f32e322f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page