Skip to main content

Modin: Make your pandas code run faster by changing one line of code.

Project description

Scale your pandas workflows by changing one line of code

To use Modin, replace the pandas import:

# import pandas as pd
import modin.pandas as pd

Installation

Modin can be installed from PyPI:

pip install modin

If you don't have Ray or Dask installed, you will need to install Modin with one of the targets:

pip install modin[ray] # Install Modin dependencies and Ray to run on Ray
pip install modin[dask] # Install Modin dependencies and Dask to run on Dask
pip install modin[all] # Install all of the above

Modin will automatically detect which engine you have installed and use that for scheduling computation!

Pandas API Coverage

pandas Object Modin's Ray Engine Coverage Modin's Dask Engine Coverage
pd.DataFrame
pd.Series
pd.read_csv
pd.read_table
pd.read_parquet
pd.read_sql
pd.read_feather
pd.read_excel
pd.read_json ✳️ ✳️
pd.read_<other> ✴️ ✴️

Some pandas APIs are easier to implement than other, so if something is missing feel free to open an issue!
Choosing a Compute Engine

If you want to choose a specific compute engine to run on, you can set the environment variable MODIN_ENGINE and Modin will do computation with that engine:

export MODIN_ENGINE=ray  # Modin will use Ray
export MODIN_ENGINE=dask  # Modin will use Dask

This can also be done within a notebook/interpreter before you import Modin:

import os

os.environ["MODIN_ENGINE"] = "ray"  # Modin will use Ray
os.environ["MODIN_ENGINE"] = "dask"  # Modin will use Dask

import modin.pandas as pd

Note: You should not change the engine after you have imported Modin as it will result in undefined behavior

Which engine should I use?

If you are on Windows, you must use Dask. Ray does not support Windows. If you are on Linux or Mac OS, you can install and use either engine. There is no knowledge required to use either of these engines as Modin abstracts away all of the complexity, so feel free to pick either!

Advanced usage

In Modin, you can start a custom environment in Dask or Ray and Modin will connect to that environment automatically. For example, if you'd like to limit the amount of resources that Modin uses, you can start a Dask Client or Initialize Ray and Modin will use those instances. Make sure you've set the correct environment variable so Modin knows which engine to connect to!

For Ray:

import ray
ray.init(plasma_directory="/path/to/custom/dir", object_store_memory=10**10)
# Modin will connect to the existing Ray environment
import modin.pandas as pd

For Dask:

from distributed import Client
client = Client(n_workers=6)
# Modin will connect to the Dask Client
import modin.pandas as pd

This gives you the flexibility to start with custom resource constraints and limit the amount of resources Modin uses.

Full Documentation

Visit the complete documentation on readthedocs: https://modin.readthedocs.io

Scale your pandas workflow by changing a single line of code.

import modin.pandas as pd
import numpy as np

frame_data = np.random.randint(0, 100, size=(2**10, 2**8))
df = pd.DataFrame(frame_data)

In local (without a cluster) modin will create and manage a local (dask or ray) cluster for the execution

To use Modin, you do not need to know how many cores your system has and you do not need to specify how to distribute the data. In fact, you can continue using your previous pandas notebooks while experiencing a considerable speedup from Modin, even on a single machine. Once you've changed your import statement, you're ready to use Modin just like you would pandas.

Faster pandas, even on your laptop

The modin.pandas DataFrame is an extremely light-weight parallel DataFrame. Modin transparently distributes the data and computation so that all you need to do is continue using the pandas API as you were before installing Modin. Unlike other parallel DataFrame systems, Modin is an extremely light-weight, robust DataFrame. Because it is so light-weight, Modin provides speed-ups of up to 4x on a laptop with 4 physical cores.

In pandas, you are only able to use one core at a time when you are doing computation of any kind. With Modin, you are able to use all of the CPU cores on your machine. Even in read_csv, we see large gains by efficiently distributing the work across your entire machine.

import modin.pandas as pd

df = pd.read_csv("my_dataset.csv")

Modin is a DataFrame designed for datasets from 1MB to 1TB+

We have focused heavily on bridging the solutions between DataFrames for small data (e.g. pandas) and large data. Often data scientists require different tools for doing the same thing on different sizes of data. The DataFrame solutions that exist for 1KB do not scale to 1TB+, and the overheads of the solutions for 1TB+ are too costly for datasets in the 1KB range. With Modin, because of its light-weight, robust, and scalable nature, you get a fast DataFrame at small and large data. With preliminary cluster and out of core support, Modin is a DataFrame library with great single-node performance and high scalability in a cluster.

Modin Architecture

We designed Modin to be modular so we can plug in different components as they develop and improve:

Architecture

Visit the Documentation for more information, and checkout the difference between Modin and Dask!

modin.pandas is currently under active development. Requests and contributions are welcome!

More information and Getting Involved

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

modin-0.8.3.post1.tar.gz (395.9 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

modin-0.8.3.post1-py3-none-win_amd64.whl (537.3 kB view details)

Uploaded Python 3Windows x86-64

modin-0.8.3.post1-py3-none-win32.whl (537.3 kB view details)

Uploaded Python 3Windows x86

modin-0.8.3.post1-py3-none-manylinux1_x86_64.whl (537.4 kB view details)

Uploaded Python 3

modin-0.8.3.post1-py3-none-manylinux1_i686.whl (537.3 kB view details)

Uploaded Python 3

modin-0.8.3.post1-py3-none-macosx_10_9_x86_64.whl (537.4 kB view details)

Uploaded Python 3macOS 10.9+ x86-64

File details

Details for the file modin-0.8.3.post1.tar.gz.

File metadata

  • Download URL: modin-0.8.3.post1.tar.gz
  • Upload date:
  • Size: 395.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.0 importlib_metadata/3.7.3 packaging/20.4 pkginfo/1.6.1 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.8.5

File hashes

Hashes for modin-0.8.3.post1.tar.gz
Algorithm Hash digest
SHA256 1c6ed764a70860048a3436a4a1e2f266a971d6714bb1c6f4647578ad7874f70c
MD5 221fffa13146af8e947d9451bb264304
BLAKE2b-256 0e64646486fcf3b449ddf3a292b6e041f3eaa180cfbb2aaaf9217e965138cbda

See more details on using hashes here.

File details

Details for the file modin-0.8.3.post1-py3-none-win_amd64.whl.

File metadata

  • Download URL: modin-0.8.3.post1-py3-none-win_amd64.whl
  • Upload date:
  • Size: 537.3 kB
  • Tags: Python 3, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.0 importlib_metadata/3.7.3 packaging/20.4 pkginfo/1.6.1 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.8.5

File hashes

Hashes for modin-0.8.3.post1-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 8ec6c5fb3d819f0adf953d291f28aaf0d1c0dd4079b296433f5469a34b93fa06
MD5 a038a896608b89edf037addb2329831f
BLAKE2b-256 326010dffa4b44af7d365a0413c7bc889c5437c002d16ad5e004bfdecbfb3479

See more details on using hashes here.

File details

Details for the file modin-0.8.3.post1-py3-none-win32.whl.

File metadata

  • Download URL: modin-0.8.3.post1-py3-none-win32.whl
  • Upload date:
  • Size: 537.3 kB
  • Tags: Python 3, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.0 importlib_metadata/3.7.3 packaging/20.4 pkginfo/1.6.1 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.8.5

File hashes

Hashes for modin-0.8.3.post1-py3-none-win32.whl
Algorithm Hash digest
SHA256 c54d77d38d200de698f14358b15ddcf24f2461baeec85fa264c884e742458c3a
MD5 cb4a6b977cacb043777a08cd7313618d
BLAKE2b-256 67cc7d54a96210d399ee932e5bdf4f68feae468cab9f5e66e0476dfd42cb9c51

See more details on using hashes here.

File details

Details for the file modin-0.8.3.post1-py3-none-manylinux1_x86_64.whl.

File metadata

  • Download URL: modin-0.8.3.post1-py3-none-manylinux1_x86_64.whl
  • Upload date:
  • Size: 537.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.0 importlib_metadata/3.7.3 packaging/20.4 pkginfo/1.6.1 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.8.5

File hashes

Hashes for modin-0.8.3.post1-py3-none-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 8ba5ae50ac49434793c6a7c47318642e390e50629ae072f9a0b351a0ec5f05d1
MD5 e4981142cae47fe09f7dbf1ffcfbad7f
BLAKE2b-256 18143fd2e799a5d75632999fd20914f3c3af2decd01a3230b55c8a840bf28070

See more details on using hashes here.

File details

Details for the file modin-0.8.3.post1-py3-none-manylinux1_i686.whl.

File metadata

  • Download URL: modin-0.8.3.post1-py3-none-manylinux1_i686.whl
  • Upload date:
  • Size: 537.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.0 importlib_metadata/3.7.3 packaging/20.4 pkginfo/1.6.1 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.8.5

File hashes

Hashes for modin-0.8.3.post1-py3-none-manylinux1_i686.whl
Algorithm Hash digest
SHA256 68bfdb5d90e5442ae4638035d990f068da12ea45f5f6922212ddf47e38493f11
MD5 078580b0784d12698e7e60f5515f5bfc
BLAKE2b-256 ba55ef97e2d7020c502ff661fbb17d6e831d3f5749ae2c2b440cd8b9d8bda044

See more details on using hashes here.

File details

Details for the file modin-0.8.3.post1-py3-none-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: modin-0.8.3.post1-py3-none-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 537.4 kB
  • Tags: Python 3, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.0 importlib_metadata/3.7.3 packaging/20.4 pkginfo/1.6.1 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.8.5

File hashes

Hashes for modin-0.8.3.post1-py3-none-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 99e87102c0d30d6a615ad10ad1ab468c29610114a5f0af623c50db7baef264ca
MD5 68192e851f355cae67877a7d140d33f6
BLAKE2b-256 de155674b528f362d3770805e32da1f5b74e57313441032d0a97e56946af6381

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page