Skip to main content

Optimus is the missing library for cleaning and preprocessing data in a distributed fashion with pyspark.

Project description

Optimus is the missing library for cleaning and pre-processing data in a distributed fashion. It uses all the power of Apache Spark (optimized via Catalyst) to do it. It implements several handy tools for data wrangling and munging that will make your life much easier. The first obvious advantage over any other public data cleaning library is that it will work on your laptop or your big cluster, and second, it is amazingly easy to install, use and understand.

Requirements * Apache Spark 1.6 * Python 3.5

## Installation:

In your terminal just type:

pip install optimuspyspark

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

optimuspyspark-0.3.2.tar.gz (23.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

optimuspyspark-0.3.2-py3-none-any.whl (27.1 kB view details)

Uploaded Python 3

File details

Details for the file optimuspyspark-0.3.2.tar.gz.

File metadata

File hashes

Hashes for optimuspyspark-0.3.2.tar.gz
Algorithm Hash digest
SHA256 fc0e7415496d41030f18a50606b00f2f4008375de261da65d71c04e998c04496
MD5 01a28a4fee6e6002a37f2b61a135deea
BLAKE2b-256 897b5e4f8db809675aaf29d64e1217f6e3cad98a00bf5ff228459876a114f853

See more details on using hashes here.

File details

Details for the file optimuspyspark-0.3.2-py3-none-any.whl.

File metadata

File hashes

Hashes for optimuspyspark-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 3acce3fbbe36efc63b9dbec90fc1c92a9cb368d425bd189e3a4fc4c98ff98f85
MD5 1b1c1d28eb1710e70e5462c989823f35
BLAKE2b-256 7ea36e9945a1f670970108ca67110ab0ce0c3ba2f12c1477c007a614a6baa5ec

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page