Skip to main content

No project description provided

Project description

ConnectorX status discussions Downloads

Load data from to , the fastest way.

ConnectorX enables you to load data from databases into Python in the fastest and most memory efficient way.

What you need is one line of code:

import connectorx as cx

cx.read_sql("postgresql://username:password@server:port/database", "SELECT * FROM lineitem")

Optionally, you can accelerate the data loading using parallelism by specifying a partition column.

import connectorx as cx

cx.read_sql("postgresql://username:password@server:port/database", "SELECT * FROM lineitem", partition_on="l_orderkey", partition_num=10)

The function will partition the query by evenly splitting the specified column to the amount of partitions. ConnectorX will assign one thread for each partition to load and write data in parallel. Currently, we support partitioning on numerical columns (cannot contain NULL) for SPJA queries.

Experimental: We are now providing federated query support (PostgreSQL only and do not support partition for now), you can write a single query to join tables from two or more databases! (JRE >= 1.8 is required)

import connectorx as cx

db1 = "postgresql://username1:password1@server1:port1/database1"
db2 = "postgresql://username2:password2@server2:port2/database2"

cx.read_sql({"db1": db1, "db2": db2}, "SELECT * FROM db1.nation n, db2.region r where n.n_regionkey = r.r_regionkey")

Check out more detailed usage and examples here. A general introduction of the project can be found in this blog post.

Installation

pip install connectorx

Check out here to see how to build python wheel from source.

Performance

We compared different solutions in Python that provides the read_sql function, by loading a 10x TPC-H lineitem table (8.6GB) from Postgres into a DataFrame, with 4 cores parallelism.

Time chart, lower is better.

time chart

Memory consumption chart, lower is better.

memory chart

In conclusion, ConnectorX uses up to 3x less memory and 21x less time (3x less memory and 13x less time compared with Pandas.). More on here.

How does ConnectorX achieve a lightning speed while keeping the memory footprint low?

We observe that existing solutions more or less do data copy multiple times when downloading the data. Additionally, implementing a data intensive application in Python brings additional cost.

ConnectorX is written in Rust and follows "zero-copy" principle. This allows it to make full use of the CPU by becoming cache and branch predictor friendly. Moreover, the architecture of ConnectorX ensures the data will be copied exactly once, directly from the source to the destination.

How does ConnectorX download the data?

Upon receiving the query, e.g. SELECT * FROM lineitem, ConnectorX will first issue a LIMIT 1 query SELECT * FROM lineitem LIMIT 1 to get the schema of the result set.

Then, if partition_on is specified, ConnectorX will issue SELECT MIN($partition_on), MAX($partition_on) FROM (SELECT * FROM lineitem) to know the range of the partition column. After that, the original query is split into partitions based on the min/max information, e.g. SELECT * FROM (SELECT * FROM lineitem) WHERE $partition_on > 0 AND $partition_on < 10000. ConnectorX will then run a count query to get the partition size (e.g. SELECT COUNT(*) FROM (SELECT * FROM lineitem) WHERE $partition_on > 0 AND $partition_on < 10000). If the partition is not specified, the count query will be SELECT COUNT(*) FROM (SELECT * FROM lineitem).

Finally, ConnectorX will use the schema info as well as the count info to allocate memory and download data by executing the queries normally.

Once the downloading begins, there will be one thread for each partition so that the data are downloaded in parallel at the partition level. The thread will issue the query of the corresponding partition to the database and then write the returned data to the destination row-wise or column-wise (depends on the database) in a streaming fashion.

Supported Sources & Destinations

Example connection string, supported protocols and data types for each data source can be found here.

For more planned data sources, please check out our discussion.

Sources

  • Postgres
  • Mysql
  • Mariadb (through mysql protocol)
  • Sqlite
  • Redshift (through postgres protocol)
  • Clickhouse (through mysql protocol)
  • SQL Server
  • Azure SQL Database (through mssql protocol)
  • Oracle
  • Big Query
  • ODBC (WIP)
  • ...

Destinations

  • Pandas
  • PyArrow
  • Modin (through Pandas)
  • Dask (through Pandas)
  • Polars (through PyArrow)

Documentation

Doc: https://sfu-db.github.io/connector-x/intro.html Rust docs: stable nightly

Next Plan

Checkout our discussion to participate in deciding our next plan!

Historical Benchmark Results

https://sfu-db.github.io/connector-x/dev/bench/

Developer's Guide

Please see Developer's Guide for information about developing ConnectorX.

Supports

You are always welcomed to:

  1. Ask questions & propose new ideas in our github discussion.
  2. Ask questions in stackoverflow. Make sure to have #connectorx attached.

Organizations and Projects using ConnectorX

To add your project/organization here, reply our post here

Citing ConnectorX

If you use ConnectorX, please consider citing the following paper:

Xiaoying Wang, Weiyuan Wu, Jinze Wu, Yizhou Chen, Nick Zrymiak, Changbo Qu, Lampros Flokas, George Chow, Jiannan Wang, Tianzheng Wang, Eugene Wu, Qingqing Zhou. ConnectorX: Accelerating Data Loading From Databases to Dataframes. VLDB 2022.

BibTeX entry:

@article{connectorx2022,
  author    = {Xiaoying Wang and Weiyuan Wu and Jinze Wu and Yizhou Chen and Nick Zrymiak and Changbo Qu and Lampros Flokas and George Chow and Jiannan Wang and Tianzheng Wang and Eugene Wu and Qingqing Zhou},
  title     = {ConnectorX: Accelerating Data Loading From Databases to Dataframes},
  journal   = {Proc. {VLDB} Endow.},
  volume    = {15},
  number    = {11},
  pages     = {2994--3003},
  year      = {2022},
  url       = {https://www.vldb.org/pvldb/vol15/p2994-wang.pdf},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

connectorx-0.3.2-cp311-none-win_amd64.whl (42.9 MB view details)

Uploaded CPython 3.11Windows x86-64

connectorx-0.3.2-cp311-cp311-manylinux_2_28_x86_64.whl (50.8 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

connectorx-0.3.2-cp311-cp311-macosx_11_0_arm64.whl (43.7 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

connectorx-0.3.2-cp311-cp311-macosx_10_7_x86_64.whl (45.3 MB view details)

Uploaded CPython 3.11macOS 10.7+ x86-64

connectorx-0.3.2-cp310-none-win_amd64.whl (42.9 MB view details)

Uploaded CPython 3.10Windows x86-64

connectorx-0.3.2-cp310-cp310-manylinux_2_28_x86_64.whl (50.8 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.28+ x86-64

connectorx-0.3.2-cp310-cp310-macosx_11_0_arm64.whl (43.7 MB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

connectorx-0.3.2-cp310-cp310-macosx_10_7_x86_64.whl (45.3 MB view details)

Uploaded CPython 3.10macOS 10.7+ x86-64

connectorx-0.3.2-cp39-none-win_amd64.whl (42.9 MB view details)

Uploaded CPython 3.9Windows x86-64

connectorx-0.3.2-cp39-cp39-manylinux_2_28_x86_64.whl (50.8 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.28+ x86-64

connectorx-0.3.2-cp39-cp39-macosx_11_0_arm64.whl (43.7 MB view details)

Uploaded CPython 3.9macOS 11.0+ ARM64

connectorx-0.3.2-cp39-cp39-macosx_10_7_x86_64.whl (45.3 MB view details)

Uploaded CPython 3.9macOS 10.7+ x86-64

connectorx-0.3.2-cp38-none-win_amd64.whl (42.9 MB view details)

Uploaded CPython 3.8Windows x86-64

connectorx-0.3.2-cp38-cp38-manylinux_2_28_x86_64.whl (50.8 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.28+ x86-64

connectorx-0.3.2-cp38-cp38-macosx_11_0_arm64.whl (43.7 MB view details)

Uploaded CPython 3.8macOS 11.0+ ARM64

connectorx-0.3.2-cp38-cp38-macosx_10_7_x86_64.whl (45.3 MB view details)

Uploaded CPython 3.8macOS 10.7+ x86-64

File details

Details for the file connectorx-0.3.2-cp311-none-win_amd64.whl.

File metadata

  • Download URL: connectorx-0.3.2-cp311-none-win_amd64.whl
  • Upload date:
  • Size: 42.9 MB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for connectorx-0.3.2-cp311-none-win_amd64.whl
Algorithm Hash digest
SHA256 6b5f518194a2cf12d5ad031d488ded4e4678eff3b63551856f2a6f1a83197bb8
MD5 d7dee771732bb64e38cab96068d095b2
BLAKE2b-256 98718f270f2aeb9c06229f80e1e9657b402aeafb871b5e72a677268e7e34bc13

See more details on using hashes here.

File details

Details for the file connectorx-0.3.2-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for connectorx-0.3.2-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 9403902685b3423cba786db01a36f36efef90ae3d429e45b74dadb4ae9e328dc
MD5 1db6853724f4d5601ac52446d9e079b1
BLAKE2b-256 389c3a3a831bfbd30fdedd61994d35df41fd0d47145693fe706976589214f811

See more details on using hashes here.

File details

Details for the file connectorx-0.3.2-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for connectorx-0.3.2-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 8cc6c963237c3d3b02f7dcd47e1be9fc6e8b93ef0aeed8694f65c62b3c4688a1
MD5 d653f8700b8c54a1063300e285df489b
BLAKE2b-256 bfe69bc11cd0c7019ae797fc94e0dc335203d6e6d8ad8e58124bad641ba74c10

See more details on using hashes here.

File details

Details for the file connectorx-0.3.2-cp311-cp311-macosx_10_7_x86_64.whl.

File metadata

File hashes

Hashes for connectorx-0.3.2-cp311-cp311-macosx_10_7_x86_64.whl
Algorithm Hash digest
SHA256 d5277fc936a80da3d1dcf889020e45da3493179070d9be8a47500c7001fab967
MD5 5046ffcfc58dcfe0512583f92b9add24
BLAKE2b-256 3e7ab835c919f04f3f0531d74d59b28f3a22b67c33bd5e731dd1eb3e62bb5905

See more details on using hashes here.

File details

Details for the file connectorx-0.3.2-cp310-none-win_amd64.whl.

File metadata

  • Download URL: connectorx-0.3.2-cp310-none-win_amd64.whl
  • Upload date:
  • Size: 42.9 MB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for connectorx-0.3.2-cp310-none-win_amd64.whl
Algorithm Hash digest
SHA256 b370ebe8f44d2049254dd506f17c62322cc2db1b782a57f22cce01ddcdcc8fed
MD5 1027f9b05b8e0e883821998b1565b19e
BLAKE2b-256 8ba69903ea7d26549c6e7f51645c7fe735cbd3ce3a3af8efd497542edb391503

See more details on using hashes here.

File details

Details for the file connectorx-0.3.2-cp310-cp310-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for connectorx-0.3.2-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 3f6431a30304271f9137bd7854d2850231041f95164c6b749d9ede4c0d92d10c
MD5 e7b6420de23a3e9aeb6ec5614f2148b0
BLAKE2b-256 372603d1a9d461dd770a360d9eaab2e01cd418b3fc6724ec2e5ec3d9cde65418

See more details on using hashes here.

File details

Details for the file connectorx-0.3.2-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for connectorx-0.3.2-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 e2b11ba49efd330a7348bef3ce09c98218eea21d92a12dd75cd8f0ade5c99ffc
MD5 77eaa430693a8090196f0a35b63eb3af
BLAKE2b-256 ca58ec59dbd3dbf1c86c8287bdd8389b937ba169ed403ec5b105ae30e93cbcfc

See more details on using hashes here.

File details

Details for the file connectorx-0.3.2-cp310-cp310-macosx_10_7_x86_64.whl.

File metadata

File hashes

Hashes for connectorx-0.3.2-cp310-cp310-macosx_10_7_x86_64.whl
Algorithm Hash digest
SHA256 98274242c64a2831a8b1c86e0fa2c46a557dd8cbcf00c3adcf5a602455fb02d7
MD5 3c09200e4bd13243a316502126bf032f
BLAKE2b-256 7fe78103bdf82d4bda127b0c3cdf6b856bd4fea9d0958fe69d09c95c8db5bf85

See more details on using hashes here.

File details

Details for the file connectorx-0.3.2-cp39-none-win_amd64.whl.

File metadata

  • Download URL: connectorx-0.3.2-cp39-none-win_amd64.whl
  • Upload date:
  • Size: 42.9 MB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for connectorx-0.3.2-cp39-none-win_amd64.whl
Algorithm Hash digest
SHA256 0b80acca13326856c14ee726b47699011ab1baa10897180240c8783423ca5e8c
MD5 b6f25e5ba56f706f121f1ad9b34f2f9c
BLAKE2b-256 426232afd98a14671be8ec0a88a48bda1b8f2286d6e4a93e9a46bf99490385fe

See more details on using hashes here.

File details

Details for the file connectorx-0.3.2-cp39-cp39-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for connectorx-0.3.2-cp39-cp39-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 74f5b93535663cf47f9fc3d7964f93e652c07003fa71c38d7a68f42167f54bba
MD5 222a8724c29616c90c0b79df84f5f822
BLAKE2b-256 120239741cb4e0495dc4ee1f4eb4fbd0cba704f2811770fe5aa682af7b663659

See more details on using hashes here.

File details

Details for the file connectorx-0.3.2-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for connectorx-0.3.2-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 4009b16399457340326137a223921a24e3e166b45db4dbf3ef637b9981914dc2
MD5 9e7aae37165ede41eba4dbd6ea819a99
BLAKE2b-256 622f8434b8d9703364bfd02534b78bfe990718f2b9a8f970d2ddbf7fc468a8fd

See more details on using hashes here.

File details

Details for the file connectorx-0.3.2-cp39-cp39-macosx_10_7_x86_64.whl.

File metadata

File hashes

Hashes for connectorx-0.3.2-cp39-cp39-macosx_10_7_x86_64.whl
Algorithm Hash digest
SHA256 4473fc06ac3618c673cea63a7050e721fe536782d5c1b6e433589c37a63de704
MD5 cf5140b5fcf702cd5fa0e56e3b7979d8
BLAKE2b-256 61d220ec9b80c60eafb11ac7416a1d0047e79620f3456142519c3bdd80b6cda8

See more details on using hashes here.

File details

Details for the file connectorx-0.3.2-cp38-none-win_amd64.whl.

File metadata

  • Download URL: connectorx-0.3.2-cp38-none-win_amd64.whl
  • Upload date:
  • Size: 42.9 MB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for connectorx-0.3.2-cp38-none-win_amd64.whl
Algorithm Hash digest
SHA256 4b1920c191be9a372629c31c92d5f71fc63f49f283e5adfc4111169de40427d9
MD5 bbb0134e31a60d96f17bbe56e415fde0
BLAKE2b-256 82091d1dec7b7b2cc99955cfa25e3a310a451e5a7b15f896a56f146c4bb633c0

See more details on using hashes here.

File details

Details for the file connectorx-0.3.2-cp38-cp38-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for connectorx-0.3.2-cp38-cp38-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 c4387bb27ba3acde0ab6921fdafa3811e09fce0db3d1f1ede8547d9de3aab685
MD5 69c825c18140a8b0f6eff14bb60e9fb9
BLAKE2b-256 ffd0c69a0285b36eea11d93a78eae0fc4580e708f7d19672608de4dae6edfb16

See more details on using hashes here.

File details

Details for the file connectorx-0.3.2-cp38-cp38-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for connectorx-0.3.2-cp38-cp38-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 7c5959bfb4a049bb8ce1f590b5824cd1105460b6552ffec336c4bd740eebd5bd
MD5 f44e2aa3bc64bf7d263c20ca71858bc4
BLAKE2b-256 1becb644d6136e44959acd79501c0229a83c57bef87ee6a3c77d9b2835496ed1

See more details on using hashes here.

File details

Details for the file connectorx-0.3.2-cp38-cp38-macosx_10_7_x86_64.whl.

File metadata

File hashes

Hashes for connectorx-0.3.2-cp38-cp38-macosx_10_7_x86_64.whl
Algorithm Hash digest
SHA256 a5602ae0531e55c58af8cfca92b8e9454fc1ccd82c801cff8ee0f17c728b4988
MD5 fa6e99ea967c1d92ee6fa58b9d9bee92
BLAKE2b-256 00199770d8baa82b5fd3430841b1f8e3d695eca4083290c8afa9ba85ee32b3e6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page