Efficient matrix representations for working with tabular data.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Efficient matrix representations for working with tabular data

Installation

Simply install via conda-forge!

conda install -c conda-forge tabmat

Getting Started

The easiest way to start with tabmat is to use the convenience constructor tabmat.from_pandas.

import tabmat as tm
import numpy as np

dense_array = np.random.normal(size=(100, 1))

Use case

TL;DR: We provide matrix classes for efficiently building statistical algorithms with data that is partially dense, partially sparse and partially categorical.

Data used in economics, actuarial science, and many other fields is often tabular, containing rows and columns. Further common properties are also common:

It often is very sparse.
It often contains a mix of dense and sparse columns.
It often contains categorical data, processed into many columns of indicator values created by "one-hot encoding."

High-performance statistical applications often require fast computation of certain operations, such as

Computing sandwich products of the data, transpose(X) @ diag(d) @ X. A sandwich product shows up in the solution to weighted least squares, as well as in the Hessian of the likelihood in generalized linear models such as Poisson regression.
Matrix-vector products, possibly on only a subset of the rows or columns. For example, when limiting computation to an "active set" in a L1-penalized coordinate descent implementation, we may only need to compute a matrix-vector product on a small subset of the columns.
Computing all operations on standardized predictors which have mean zero and standard deviation one. This helps with numerical stability and optimizer efficiency in a wide range of machine learning algorithms.

This library and its design

We designed this library with the above use cases in mind. We built this library first for estimating generalized linear models, but expect it will be useful in a variety of econometric and statistical use cases. This library was borne out of our need for speed, and its unified API is motivated by the desire to work with a unified matrix API internal to our statistical algorithms.

Design principles:

Speed and memory efficiency are paramount.
You don't need to sacrifice functionality by using this library: DenseMatrix and SparseMatrix subclass np.ndarray and scipy.sparse.csc_matrix respectively, and inherit behavior from those classes wherever it is not improved on.
As much as possible, syntax follows NumPy syntax, and dimension-reducing operations (like sum) return NumPy arrays, following NumPy dimensions about the dimensions of results. The aim is to make these classes as close as possible to being drop-in replacements for numpy.ndarray. This is not always possible, however, due to the differing APIs of numpy.ndarray and scipy.sparse.
Other operations, such as toarray, mimic Scipy sparse syntax.
All matrix classes support matrix-vector products, sandwich products, and getcol.

Individual subclasses may support significantly more operations.

Matrix types

DenseMatrix represents dense matrices, subclassing numpy nparray. It additionally supports methods getcol, toarray, sandwich, standardize, and unstandardize.
SparseMatrix represents column-major sparse data, subclassing scipy.sparse.csc_matrix. It additionally supports methods sandwich and standardize.
CategoricalMatrix represents one-hot encoded categorical matrices. Because all the non-zeros in these matrices are ones and because each row has only one non-zero, the data can be represented and multiplied much more efficiently than a generic sparse matrix.
SplitMatrix represents matrices with both dense, sparse and categorical parts, allowing for a significant speedup in matrix multiplications.
StandardizedMatrix efficiently and sparsely represents a matrix that has had its column normalized to have mean zero and variance one. Even if the underlying matrix is sparse, such a normalized matrix will be dense. However, by storing the scaling and shifting factors separately, StandardizedMatrix retains the original matrix sparsity.

Wide data set

Benchmarks

See here for detailed benchmarking.

API documentation

See here for detailed API documentation.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

4.0.0

Apr 23, 2024

4.0.0a3 pre-release

Jan 26, 2024

4.0.0a2 pre-release

Aug 29, 2023

4.0.0a1 pre-release

Aug 17, 2023

4.0.0a0 pre-release

Aug 15, 2023

4.0.0.dev1 pre-release

Aug 14, 2023

3.1.14

Feb 28, 2024

3.1.13

Oct 17, 2023

3.1.12

Oct 16, 2023

3.1.11

Oct 13, 2023

3.1.10

Jun 23, 2023

3.1.9

Jun 16, 2023

3.1.8

Jun 13, 2023

3.1.7

Mar 28, 2023

3.1.6

Mar 27, 2023

3.1.5

Mar 20, 2023

3.1.4

Feb 7, 2023

3.1.3

Jan 30, 2023

3.1.2

Jul 1, 2022

3.1.1

Jul 1, 2022

3.1.0

Mar 7, 2022

3.0.8

Jan 3, 2022

3.0.7

Nov 23, 2021

3.0.6

Nov 12, 2021

3.0.5

Nov 5, 2021

3.0.4

Nov 3, 2021

3.0.3

Oct 15, 2021

3.0.1

Oct 8, 2021

3.0.0

Oct 7, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tabmat-4.0.0.tar.gz (2.2 MB view hashes)

Uploaded Apr 23, 2024 Source

Built Distributions

tabmat-4.0.0-cp312-cp312-win_amd64.whl (663.1 kB view hashes)

Uploaded Apr 23, 2024 CPython 3.12 Windows x86-64

tabmat-4.0.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.3 MB view hashes)

Uploaded Apr 23, 2024 CPython 3.12 manylinux: glibc 2.17+ x86-64

tabmat-4.0.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (7.2 MB view hashes)

Uploaded Apr 23, 2024 CPython 3.12 manylinux: glibc 2.17+ ARM64

tabmat-4.0.0-cp312-cp312-macosx_11_0_arm64.whl (1.6 MB view hashes)

Uploaded Apr 23, 2024 CPython 3.12 macOS 11.0+ ARM64

tabmat-4.0.0-cp312-cp312-macosx_10_9_x86_64.whl (1.8 MB view hashes)

Uploaded Apr 23, 2024 CPython 3.12 macOS 10.9+ x86-64

tabmat-4.0.0-cp311-cp311-win_amd64.whl (665.6 kB view hashes)

Uploaded Apr 23, 2024 CPython 3.11 Windows x86-64

tabmat-4.0.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.4 MB view hashes)

Uploaded Apr 23, 2024 CPython 3.11 manylinux: glibc 2.17+ x86-64

tabmat-4.0.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (7.4 MB view hashes)

Uploaded Apr 23, 2024 CPython 3.11 manylinux: glibc 2.17+ ARM64

tabmat-4.0.0-cp311-cp311-macosx_11_0_arm64.whl (1.6 MB view hashes)

Uploaded Apr 23, 2024 CPython 3.11 macOS 11.0+ ARM64

tabmat-4.0.0-cp311-cp311-macosx_10_9_x86_64.whl (1.8 MB view hashes)

Uploaded Apr 23, 2024 CPython 3.11 macOS 10.9+ x86-64

tabmat-4.0.0-cp310-cp310-win_amd64.whl (664.9 kB view hashes)

Uploaded Apr 23, 2024 CPython 3.10 Windows x86-64

tabmat-4.0.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.2 MB view hashes)

Uploaded Apr 23, 2024 CPython 3.10 manylinux: glibc 2.17+ x86-64

tabmat-4.0.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (7.1 MB view hashes)

Uploaded Apr 23, 2024 CPython 3.10 manylinux: glibc 2.17+ ARM64

tabmat-4.0.0-cp310-cp310-macosx_11_0_arm64.whl (1.6 MB view hashes)

Uploaded Apr 23, 2024 CPython 3.10 macOS 11.0+ ARM64

tabmat-4.0.0-cp310-cp310-macosx_10_9_x86_64.whl (1.8 MB view hashes)

Uploaded Apr 23, 2024 CPython 3.10 macOS 10.9+ x86-64

tabmat-4.0.0-cp39-cp39-win_amd64.whl (666.0 kB view hashes)

Uploaded Apr 23, 2024 CPython 3.9 Windows x86-64

tabmat-4.0.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.2 MB view hashes)

Uploaded Apr 23, 2024 CPython 3.9 manylinux: glibc 2.17+ x86-64

tabmat-4.0.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (7.1 MB view hashes)

Uploaded Apr 23, 2024 CPython 3.9 manylinux: glibc 2.17+ ARM64

tabmat-4.0.0-cp39-cp39-macosx_11_0_arm64.whl (1.6 MB view hashes)

Uploaded Apr 23, 2024 CPython 3.9 macOS 11.0+ ARM64

tabmat-4.0.0-cp39-cp39-macosx_10_9_x86_64.whl (1.8 MB view hashes)

Uploaded Apr 23, 2024 CPython 3.9 macOS 10.9+ x86-64

Hashes for tabmat-4.0.0.tar.gz

Hashes for tabmat-4.0.0.tar.gz
Algorithm	Hash digest
SHA256	`60a9f93ed16a540458957b5ef56c6b99424e999145aad31fbae8ebbfd985f444`
MD5	`9fc2cdcb7625af79504e2067692345e2`
BLAKE2b-256	`87e8c45b6050167a671f828b888e162638b97ff9d2ea7d8a6796d72fbf8a3253`

Hashes for tabmat-4.0.0-cp312-cp312-win_amd64.whl

Hashes for tabmat-4.0.0-cp312-cp312-win_amd64.whl
Algorithm	Hash digest
SHA256	`08f4c58504cd7fdb443fabef0871054f6b46f4541e2b5f447a2298169ef46a09`
MD5	`43b25aeff3708572b10fc7d71ee498af`
BLAKE2b-256	`79c78808c0249ae0b3f5eff2047200efe2ef0cab733bdfad105ef57a3c4c554e`

Hashes for tabmat-4.0.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

Hashes for tabmat-4.0.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`a73df3d6959f6a23ce74bb528133b8d355460ce80b0b863387eddc65c01c13c1`
MD5	`f3e5ac5e65e233742b67af52dd0446a4`
BLAKE2b-256	`65d9b285b49bd4e1b3d5c50c01d80df3bf3194432f1c3fe0d98fa63f2a283e60`

Hashes for tabmat-4.0.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl

Hashes for tabmat-4.0.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm	Hash digest
SHA256	`a8de44c99afefaf2791169fc6724506c2421df83779c383caa03cf3e9079ab53`
MD5	`7ddc0db25cd81bf9936982e0b72d18cc`
BLAKE2b-256	`15fbd60ed862d37dc9a052ebed96117474e56d31d28e7d479f28afa62e36bf8c`

Hashes for tabmat-4.0.0-cp312-cp312-macosx_11_0_arm64.whl

Hashes for tabmat-4.0.0-cp312-cp312-macosx_11_0_arm64.whl
Algorithm	Hash digest
SHA256	`67d39bb1465c08989c0f84b9262bf697ee277d7e463148190bdb6e3d521ed441`
MD5	`823d6b49aa8ac370f3b5a55007e129c6`
BLAKE2b-256	`50e7e0f9be9a716ce94219d449becb7059b01faf6ec6aa723560884bbfd40f02`

Hashes for tabmat-4.0.0-cp312-cp312-macosx_10_9_x86_64.whl

Hashes for tabmat-4.0.0-cp312-cp312-macosx_10_9_x86_64.whl
Algorithm	Hash digest
SHA256	`82c7cf0ee3ffba97bfef78f02aa04cb8a1c91c7409e62d9162bc7fa8ee1b9ed5`
MD5	`9d0e10c0e6dec4b37844889e43d7812f`
BLAKE2b-256	`0d41ba8ad9aa952517250720832a19df7330d823c22e86eddd53d400f30bf6db`

Hashes for tabmat-4.0.0-cp311-cp311-win_amd64.whl

Hashes for tabmat-4.0.0-cp311-cp311-win_amd64.whl
Algorithm	Hash digest
SHA256	`9046b9a88a6657e9ebae70319390ad2cf60f8f6b0a39c926633dbfa024ca8422`
MD5	`6ba0526d19f35f5cd9ee2b6d29c8c523`
BLAKE2b-256	`82c8290a9a49aeea818fbafa9d5aa943bcdbcb2840af57d124f2261c150bfeb9`

Hashes for tabmat-4.0.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

Hashes for tabmat-4.0.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`070036e57eeee3900214550dd41dcdaeb2ef340c5288dcc79b6ae85e92d78846`
MD5	`946a38f37827387ca2efd4115a7eaff1`
BLAKE2b-256	`2cf4b09a85732ea1d90ae27b01f7fa17b934cd23eee8d07dc2970f883cbc3a85`

Hashes for tabmat-4.0.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl

Hashes for tabmat-4.0.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm	Hash digest
SHA256	`a782155f39b0a81bf99e31591a43a6891a39d8cd2316c60d5ef5db840d6dee3e`
MD5	`b81821d64f3ba6633131bda6fddabb42`
BLAKE2b-256	`018af930e55a4ff9e75d42afb67ca59b9ef09e2d812d41f014f116fbe003cc4a`

Hashes for tabmat-4.0.0-cp311-cp311-macosx_11_0_arm64.whl

Hashes for tabmat-4.0.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm	Hash digest
SHA256	`cecdcd88c52625aa7d0ef39d65c6fa2cbca4861860b20c3f18787ca5d0118f45`
MD5	`a584686a271f09bf5cf6fc143c5624b7`
BLAKE2b-256	`9e23c4b098252dbaa456b09de782e2c55de0b4420be31e55a3cfbb4d38c95751`

Hashes for tabmat-4.0.0-cp311-cp311-macosx_10_9_x86_64.whl

Hashes for tabmat-4.0.0-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm	Hash digest
SHA256	`7d1b5603a897148b313184ed33ffd63d8c1bda245222a34b4b46a3d293e494c1`
MD5	`48e0fa9c7cbe349ec723f622ce611839`
BLAKE2b-256	`94ddfec14b0e62cd02879f27caa36df5949f8c930efb5d5393ba98fde496e739`

Hashes for tabmat-4.0.0-cp310-cp310-win_amd64.whl

Hashes for tabmat-4.0.0-cp310-cp310-win_amd64.whl
Algorithm	Hash digest
SHA256	`4fa55483361cff21fa83721708917cf12a9f4194043024c27d28e02910c51c98`
MD5	`e0bca016b34d576305cb29c0ec217359`
BLAKE2b-256	`17e54788d0d4cf0d1e9aed0c7e28de7ae916f817c2e39acdd7a49d3b4bf06f0a`

Hashes for tabmat-4.0.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

Hashes for tabmat-4.0.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`65c34ae8b6ed4fa16ed639d0a04d7586f981456c1b4249f6c825c2f04fac8267`
MD5	`294929cd5ba569b53e3486b0619ded31`
BLAKE2b-256	`69e60a8534bdaf0b11ca27b3e02fe7cbe468c8b5d7b7b4dc43696734461d18a9`

Hashes for tabmat-4.0.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl

Hashes for tabmat-4.0.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm	Hash digest
SHA256	`cee9c2ad7911472a0635bfce3e901b8cc56704ad12ff5109f587805bd9c3e631`
MD5	`099b577c55d33457fe2c8011727e02f7`
BLAKE2b-256	`5ec1231cb0b6c9651343c78b73d01ce5c4a901d901d14bbef514aed8c02c980c`

Hashes for tabmat-4.0.0-cp310-cp310-macosx_11_0_arm64.whl

Hashes for tabmat-4.0.0-cp310-cp310-macosx_11_0_arm64.whl
Algorithm	Hash digest
SHA256	`4fc486be7090f562535915b7cd9243018961d51387866afa06596a7df8225cf1`
MD5	`de4c618d80b7f15e2073b3b0154637f3`
BLAKE2b-256	`3d01514ae7f0c0ab8e6711bbd78b4538d663e5c04b3ef675f38e57d45e54c1bb`

Hashes for tabmat-4.0.0-cp310-cp310-macosx_10_9_x86_64.whl

Hashes for tabmat-4.0.0-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm	Hash digest
SHA256	`bbd7ec6ecd7391dd714160915178bf6120c1ecd880cbeb0c47deead6d5742d01`
MD5	`b0a2248972f3cca8812346108cd6f47b`
BLAKE2b-256	`0d582c1f0c1cfd97b097951420a843486f08c38a6c94ef370572759eefaff651`

Hashes for tabmat-4.0.0-cp39-cp39-win_amd64.whl

Hashes for tabmat-4.0.0-cp39-cp39-win_amd64.whl
Algorithm	Hash digest
SHA256	`6b7820eb71f8690d8440e6f74f8370e612ae63489bfd79fb99511808d90d8c38`
MD5	`0d8d6e3fa9c5f573a87cf10401f40b94`
BLAKE2b-256	`7bc4ee87d60d61a0cbebfcbaac8d21525f09111afb86e5cf61030d83147e24aa`

Hashes for tabmat-4.0.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

Hashes for tabmat-4.0.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`16853534fb7b06a4309d5785c2b82c7f60c0640e90dbf13b1c08180958d00259`
MD5	`09c78193f669125e8b973d5b1a6a543f`
BLAKE2b-256	`b2563a4e6a6f086f9285076eb29d833bdc2c0df9faf8f4d88b8afd286c148eb5`

Hashes for tabmat-4.0.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl

Hashes for tabmat-4.0.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm	Hash digest
SHA256	`d3503639280baf30a00868f9b8f8eac0ed775a8637d2347cd82992df0c12ce99`
MD5	`3bdd6114089a8ba016defa27c70cd9fe`
BLAKE2b-256	`8da587169242a552945b52cc11f06da957ad2e5fa6a0704b0f8aceb420c4d095`

Hashes for tabmat-4.0.0-cp39-cp39-macosx_11_0_arm64.whl

Hashes for tabmat-4.0.0-cp39-cp39-macosx_11_0_arm64.whl
Algorithm	Hash digest
SHA256	`a1929fb064c0b1352277087bb27c644fbc46420f7125daae0c0bbd912e94b997`
MD5	`3a43f3af21736c551ced331897cb9d77`
BLAKE2b-256	`f70c86b7faabd913f0a3b4e329100cf06e81c33dabfd3d9cfc932dd8ae36a4ca`

Hashes for tabmat-4.0.0-cp39-cp39-macosx_10_9_x86_64.whl

Hashes for tabmat-4.0.0-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm	Hash digest
SHA256	`45fc292d674403b312cd6b4440c9d62963014cb4c5a56970e4d668deca76ccfe`
MD5	`097c85e21882e8eda835f700fd76d0d4`
BLAKE2b-256	`134b3153dd795686e4c0456172f20254e79f160e23120f52667fe23c54ed5a4c`