Skip to main content

pandabase links pandas DataFrames to SQL databases. Supports read, append, and upsert.

Project description

pandabase

Build Status

pandabase is a tool for reading & writing DataFrames to & from SQLite

By default, uses DataFrame.index as primary key. By using an explicit primary key, pandabase makes it easy to upsert pandas data into SQL databases.

Designed for time series datasets that need to be updated over time and stored to disk, but are used in-memory for computation.

Tested under Python 3.6 and 3.7, with new versions of Pandas and SQLAlchemy

Features

  • primary keys (any named index is assumed to be the PK)
  • also supports 'auto_index'
  • insert modes: 'create_only', 'upsert', and 'append'
  • replaces pd.DataFrame.to_sql and pd.read_sql
  • tested under SQLite
  • postgres support coming soon
  • automated tests in pytest
  • 96% coverage
  • also includes pandabase.companda.companda for rich comparisons of DataFrames

Design Considerations

  • Minimal dependencies: SQLAlchemy and Pandas are the only requirements
  • Database is the source of truth: will coerce incoming DataFrames to fit existing schema
    • but also is reasonably smart about how new tables are created from DataFrames
  • Not horrendously slow

License

MIT license

Thanks

Code partially stolen from Dataset and pandas.sql

Installation

From your inside your virtual environment of choice:

~/$ pip install pandabase

For latest version:

~/$ git clone https://github.com/notsambeck/pandabase
~/$ cd pandabase
~/pandabase/$ pip install -r requirements.txt
~/pandabase/$ pip install .

Usage

# Python >= 3.6
>>> import pandas as pd
>>> import pandabase
>>> my_data = pd.DataFrame(index=range(7, 12), 
                           columns=['some_number'],
                           data=pd.np.random.random((5,1)))
>>> my_data.index.name = 'made_up_name'        # index must be named to use as PK
>>> pandabase.to_sql(my_data, table_name='my_table', con='sqlite:///new_sqlite_db.sqlite', how='create_only')
Table('my_table', ...
>>> exit()

That's all!

Your data is now persistently stored in a SQLite database, using my_data.index as primary key. To append or update data, replace 'create_only' with 'append' or 'upsert'. To store records without an explicit index, use 'autoindex=True'.

~/pandabase$ ls
brand_new_sqlite_db.sqlite
>>> import pandabase
>>> df = pandabase.read_sql('my_table', con='sqlite:///new_sqlite_db.sqlite'))
>>> df
    some_number 
7   0.722416 
8   0.076045 
9   0.213118 
10  0.453716 
11  0.406995

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

pandabase-0.1.1-py3-none-any.whl (11.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page