Optimus is the missing framework for cleaning and pre-processing data in a distributed fashion.

These details have not been verified by PyPI

Project links

Homepage

Project description

Optimus

Tests Docker image updated

Get started 🏃

Try Optimus

To launch a live notebook server to test optimus using binder or Colab, click on one of the following badges:

Installation (pip):

In your terminal just type pip install pyoptimus

Requirements

Python 3.7 or 3.8

Examples

You can go to the 10 minutes to Optimus notebook where you can find the basic to start working.

Also you can go to Examples and found specific notebooks about data cleaning, data munging, profiling, data enrichment and how to create ML and DL models.

Besides check the Cheat Sheet

Start Optimus

Start Optimus using "pandas", "dask", "cudf" or "dask_cudf".

from optimus import Optimus
op = Optimus("pandas")

Loading data

Now Optimus can load data in csv, json, parquet, avro, excel from a local file or URL.

#csv
df = op.load.csv("../examples/data/foo.csv")

#json
df = op.load.json("../examples/data/foo.json")

# using a url
df = op.load.json("https://raw.githubusercontent.com/hi-primus/optimus/develop-21.8/examples/data/foo.json")

# parquet
df = op.load.parquet("../examples/data/foo.parquet")

# ...or anything else
df = op.load.file("../examples/data/titanic3.xls")

Also, you can load data from oracle, redshift, mysql and postgres.

Saving Data

#csv
df.save.csv("data/foo.csv")

# json
df.save.json("data/foo.json")

# parquet
df.save.parquet("data/foo.parquet")

You can also save data to oracle, redshift, mysql and postgres.

Create dataframes

Also, you can create a dataframe from scratch

df = op.create.dataframe({
    'A': ['a', 'b', 'c', 'd'],
    'B': [1, 3, 5, 7],
    'C': [2, 4, 6, None],
    'D': ['1980/04/10', '1980/04/10', '1980/04/10', '1980/04/10']
})

Using display you have a beautiful way to show your data with extra information like column number, column data type and marked white spaces.

display(df)

Cleaning and Processing

Optimus was created to make data cleaning a breeze. The API was designed to be super easy to newcomers and very familiar for people that comes from Pandas. Optimus expands the standard DataFrame functionality adding .rows and .cols accessors.

For example you can load data from a url, transform and apply some predefined cleaning functions:

new_df = df\
    .rows.sort("rank", "desc")\
    .cols.lower(["names", "function"])\
    .cols.date_format("date arrival", "yyyy/MM/dd", "dd-MM-YYYY")\
    .cols.years_between("date arrival", "dd-MM-YYYY", output_cols="from arrival")\
    .cols.normalize_chars("names")\
    .cols.remove_special_chars("names")\
    .rows.drop(df["rank"]>8)\
    .cols.rename("*", str.lower)\
    .cols.trim("*")\
    .cols.unnest("japanese name", output_cols="other names")\
    .cols.unnest("last position seen", separator=",", output_cols="pos")\
    .cols.drop(["last position seen", "japanese name", "date arrival", "cybertronian", "nulltype"])

Need help? 🛠️

Feedback

Feedback is what drive Optimus future, so please take a couple of minutes to help shape the Optimus' Roadmap: http://bit.ly/optimus_survey

Also if you want to a suggestion or feature request use https://github.com/hi-primus/optimus/issues

Troubleshooting

If you have issues, see our Troubleshooting Guide

Contributing to Optimus 💡

Contributions go far beyond pull requests and commits. We are very happy to receive any kind of contributions
including:

Documentation updates, enhancements, designs, or bugfixes.
Spelling or grammar fixes.
README.md corrections or redesigns.
Adding unit, or functional tests
Triaging GitHub issues -- especially determining whether an issue still persists or is reproducible.
Blogging, speaking about, or creating tutorials about Optimus and its many features.
Helping others on our official chats

Backers and Sponsors

Become a backer or a sponsor and get your image on our README on Github with a link to your site.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

23.5.0b0 pre-release

May 1, 2023

22.10.0b0 pre-release

Oct 17, 2022

22.9.0b0 pre-release

Sep 16, 2022

22.6.0b3 pre-release

Sep 15, 2022

22.6.0b0 pre-release

Jun 19, 2022

22.2.0b4 pre-release

Mar 17, 2022

22.2.0b3 pre-release

Mar 3, 2022

22.2.0b2 pre-release

Feb 23, 2022

22.2.0b1 pre-release

Feb 7, 2022

22.1.0b5 pre-release

Jan 11, 2022

22.1.0b4 pre-release

Jan 10, 2022

22.1.0b3 pre-release

Jan 10, 2022

22.1.0b2 pre-release

Jan 6, 2022

22.1.0b1 pre-release

Jan 6, 2022

22.1.0b0 pre-release

Jan 5, 2022

21.11.0b4 pre-release

Nov 17, 2021

21.11.0b3 pre-release

Nov 10, 2021

21.11.0b1 pre-release

Nov 5, 2021

21.11.0b0 pre-release

Nov 4, 2021

21.9.0b4 pre-release

Oct 4, 2021

21.9.0b3 pre-release

Sep 16, 2021

21.9.0b2 pre-release

Sep 16, 2021

21.9.0b1 pre-release

Sep 1, 2021

21.9.0b0 pre-release

Aug 20, 2021

This version

21.8.0b4 pre-release

Jul 26, 2021

21.8.0b3 pre-release

Jul 14, 2021

21.8.0b2 pre-release

Jul 14, 2021

21.8.0b1 pre-release

Jul 3, 2021

21.8.0b0 pre-release

Jul 1, 2021

3.0.0b3 pre-release

Jun 17, 2021

3.0.0b2 pre-release

Jun 17, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyoptimus-21.8.0b4.tar.gz (219.9 kB view details)

Uploaded Jul 26, 2021 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pyoptimus-21.8.0b4-py3-none-any.whl (274.8 kB view details)

Uploaded Jul 26, 2021 Python 3

File details

Details for the file pyoptimus-21.8.0b4.tar.gz.

File metadata

Download URL: pyoptimus-21.8.0b4.tar.gz
Upload date: Jul 26, 2021
Size: 219.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.2 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.6

File hashes

Hashes for pyoptimus-21.8.0b4.tar.gz
Algorithm	Hash digest
SHA256	`6cc1d99c7902f7f2a583228758f9e35a2caa4a13eb3195b082da9f98740406b4`
MD5	`c680d885e503e5e501236b2cc0321c00`
BLAKE2b-256	`fd253b9869f5218bb3b24fd43b9521984c25014c284994461091bb44711a18d3`

See more details on using hashes here.

File details

Details for the file pyoptimus-21.8.0b4-py3-none-any.whl.

File metadata

Download URL: pyoptimus-21.8.0b4-py3-none-any.whl
Upload date: Jul 26, 2021
Size: 274.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.2 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.6

File hashes

Hashes for pyoptimus-21.8.0b4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`887c77e98364390e274cdd70effe7b2f5b14ad24ca553ba6a52920a21be85709`
MD5	`37132692f0afaeeade44e46a51a26149`
BLAKE2b-256	`cec263224e554a87f025f921647b5fb2aba2b0965e0bf2d142fae410d6219bd9`

See more details on using hashes here.

pyoptimus 21.8.0b4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Optimus

Get started 🏃

Try Optimus

Installation (pip):

Requirements

Examples

Start Optimus

Loading data

Saving Data

Create dataframes

Cleaning and Processing

Need help? 🛠️

Feedback

Troubleshooting

Contributing to Optimus 💡

Backers and Sponsors

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes