Skip to main content

Dataframe manipulation tools

Project description

yx

Dataframe manipulation tools

To install: pip install yx

Overview

The yx package provides a comprehensive suite of tools designed for efficient and flexible manipulation of pandas DataFrames. It includes functions for handling NaN values, type casting, digitizing data, and more complex operations like conditional replacements and rolling out columns. This package is particularly useful for data preprocessing, cleaning, and transformation tasks in data analysis and machine learning workflows.

Features

  • NaN Handling: Functions to replace NaN values and check for NaNs in various ways.
  • Type Handling: Utilities to determine and cast data types of DataFrame columns, especially for numeric types.
  • Data Transformation: Functions to digitize data, perform conditional replacements, and manipulate DataFrame columns and rows.
  • Utility Functions: A collection of helper functions to perform operations like flattening hierarchical indices, removing columns, and filtering data.

Usage Examples

Checking and Replacing Empty Cells

You can easily check for and replace "empty" cells (cells that are NaN, None, or falsy):

import pandas as pd
from yx import is_empty, replace_empty_cells_with

df = pd.DataFrame({'data': [None, np.nan, 0, '', [], False, True]})
print(df.applymap(is_empty))
# Replace empty cells with a specified value (e.g., 'Empty')
df = replace_empty_cells_with(df, 'Empty')
print(df)

Type Handling

Determine the most common numerical type in a column and cast all possible columns to numeric types:

from yx import common_numerical_type, cast_all_cols_to_numeric_if_possible

df = pd.DataFrame({'numbers': [1, 2.5, np.nan, 4]})
print(common_numerical_type(df['numbers']))  # Outputs: float
df = cast_all_cols_to_numeric_if_possible(df)
print(df.dtypes)

Digitizing Data

Divide data into bins and then group by these digitized values:

from yx import digitize_and_group

df = pd.DataFrame({'values': np.random.rand(10)})
digit_groups = digitize_and_group(df, digit_cols=['values'], digit_agg_fun='mean')
print(digit_groups)

Rolling Out Columns

Expand list-like entries in DataFrame columns into separate rows:

from yx import rollout_cols

df = pd.DataFrame({
    'A': [1, 2],
    'B': [[10, 20], [30]]
})
df_rolled = rollout_cols(df, cols_to_rollout='B')
print(df_rolled)

Function Documentation

Below are some of the key functions provided by the yx package:

is_empty(cell)

Checks if a DataFrame cell is considered "empty". A cell is deemed empty if it is a NaN-like value, an empty iterable, or a falsy value excluding the boolean value False.

conditional_replace(df, replacement, condition)

Replaces values in a DataFrame based on a specified condition function.

common_numerical_type(iterable)

Determines the most common numerical type (int or float) in an iterable, handling NaNs and None values gracefully.

digitize_and_group(df, digit_cols=None, digit_agg_fun='mean', agg_fun='mean', **kwargs)

Digitizes the columns of a DataFrame into specified bins and groups them, optionally aggregating using a specified function.

rollout_cols(df, cols_to_rollout=None)

Expands list-like entries in specified DataFrame columns into separate rows, aligning other column values accordingly.

These utilities make it easier to preprocess and manipulate data efficiently in Python, leveraging the power of pandas and numpy.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yx-0.0.11.tar.gz (11.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

yx-0.0.11-py3-none-any.whl (11.5 kB view details)

Uploaded Python 3

File details

Details for the file yx-0.0.11.tar.gz.

File metadata

  • Download URL: yx-0.0.11.tar.gz
  • Upload date:
  • Size: 11.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.13

File hashes

Hashes for yx-0.0.11.tar.gz
Algorithm Hash digest
SHA256 ac55542b108685f794c48ac4f61de5cddb52e9570a4e12fbfd55061acc244e62
MD5 b343bb37f04f364377b924e7460bab6c
BLAKE2b-256 ea02ab85f90ac17662965ad8fe3b6ca3b720d9ffcc724b0b4eabbc7d2be3f4d3

See more details on using hashes here.

File details

Details for the file yx-0.0.11-py3-none-any.whl.

File metadata

  • Download URL: yx-0.0.11-py3-none-any.whl
  • Upload date:
  • Size: 11.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.13

File hashes

Hashes for yx-0.0.11-py3-none-any.whl
Algorithm Hash digest
SHA256 ead0ba7356cd5ac4f49219ae41b496ae57477937e3b76dc408178a35c7a20fb9
MD5 3ef81b6e9528a0f129250d379cb94250
BLAKE2b-256 512895c34ff3ec433c5ba9be3b79e247827ac1ce9e5056817d119b9762fbd877

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page