Python writable in-memory virtual filesystem for SQLite
Project description
sqlite-memory-vfs
Python virtual filesystem for SQLite to read from and write to memory.
While SQLite supports the special filename :memory: that allows the creation of empty databases in memory, and sqlite_deserialize allows the population of an in-memory database from a contiguous block of raw bytes of a serialized database, there is no built-in way to populate such a database using non-contiguous raw bytes of a serialized database. The function sqlite_serialize can also only serialize a database to a contiguous block of memory. This virtual filesystem overcomes these limitations, and so allows larger databases to be downloaded and queried without hitting disk.
Based on simonwo's gist and uktrade's sqlite-s3vfs, and inspired by phiresky's sql.js-httpvfs, dacort's Stack Overflow answer and michalc's sqlite-s3-query.
Installation
sqlite-memory-vfs can be installed from PyPI using pip.
pip install sqlite-memory-vfs
This will automatically install APSW along with any other dependencies.
Deserializing (getting a regular SQLite file into the VFS)
This library allows the raw bytes of a SQLite database to be queried without having to save it to disk. This can be done by using the deserialize_iter method of MemoryVFS, passing it an iterable of bytes instances that contain the SQLite database.
import apsw
import httpx
import sqlite_memory_vfs
memory_vfs = sqlite_memory_vfs.MemoryVFS()
# Any iterable of bytes can be used. In this example, they come via HTTP
with httpx.stream("GET", "https://data.api.trade.gov.uk/v1/datasets/uk-trade-quotas/versions/v1.0.366/data?format=sqlite") as r:
memory_vfs.deserialize_iter('quota_balances.sqlite', r.iter_bytes())
with apsw.Connection('quota_balances.sqlite', vfs=memory_vfs.name) as db:
cursor = db.cursor()
cursor.execute('SELECT * FROM quotas;')
print(cursor.fetchall())
If the deserialize_iter step is ommitted an empty database is automatically created in memory.
See the APSW documentation for more usage examples.
Serializing (getting a regular SQLite file out of the VFS)
The bytes corresponding to each SQLite database in the VFS can be extracted with the serialize_iter function, which returns an iterable of bytes
with open('my_db.sqlite', 'wb') as f:
for chunk in memory_vfs.serialize_iter('my_db.sqlite'):
f.write(chunk)
Comparison with sqlite_deserialize
The main reason for using sqlite-memory-vfs over sqlite_deserialize is the lower memory usage for larger databases. For example the following may not even complete due to running out of memory:
import resource
import apsw
import httpx
url = "https://data.api.trade.gov.uk/v1/datasets/uk-tariff-2021-01-01/versions/v4.0.46/data?format=sqlite"
with apsw.Connection(':memory:') as db:
db.deserialize('main', httpx.get(url).read())
cursor = db.cursor()
cursor.execute('SELECT * FROM measures;')
print(cursor.fetchall())
print('Max memory usage:', resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)
But the following does / should output a lower value of memory usage:
import resource
import apsw
import httpx
import sqlite_memory_vfs
url = "https://data.api.trade.gov.uk/v1/datasets/uk-tariff-2021-01-01/versions/v4.0.46/data?format=sqlite"
memory_vfs = sqlite_memory_vfs.MemoryVFS()
with httpx.stream("GET", url) as r:
memory_vfs.deserialize_iter('tariff.sqlite', r.iter_bytes())
with apsw.Connection('tariff.sqlite', vfs=memory_vfs.name) as db:
cursor = db.cursor()
cursor.execute('SELECT count(*) FROM measures;')
print(cursor.fetchall())
print('Max memory usage:', resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)
Tests
The tests require the dev dependencies installed
pip install -e ".[dev]"
and can then run with pytest
pytest
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sqlite_memory_vfs-0.0.9.tar.gz.
File metadata
- Download URL: sqlite_memory_vfs-0.0.9.tar.gz
- Upload date:
- Size: 6.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/4.0.2 CPython/3.11.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dac853139ebd19b6842dbb09644eb2f8572d309d6859e23e6ebc2e49aed7d0d3
|
|
| MD5 |
9dd6c84f99d581440b9e2979fd127e2c
|
|
| BLAKE2b-256 |
6ebfc7cba9ef6f256c2a5f0362480e8e1fa66f9751c5aa092adeec969dfb3206
|
File details
Details for the file sqlite_memory_vfs-0.0.9-py3-none-any.whl.
File metadata
- Download URL: sqlite_memory_vfs-0.0.9-py3-none-any.whl
- Upload date:
- Size: 5.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/4.0.2 CPython/3.11.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
da630d208e0b2375913cc3e7526adce17db587e90b5d33e47841a44a445b3edc
|
|
| MD5 |
9d8f756ed66ea88c6a1b1207b6a57123
|
|
| BLAKE2b-256 |
28d1fb941d160e91669be780f6b2977c31ff61717b3114a40a917ecf0d5072bb
|