Core functionality for lightweight, collaborative data science projects
Project description
ballet
A lightweight framework for collaborative data science projects through feature engineering.
Ballet is under active development, please report all bugs.
- Free software: MIT license
- Documentation: https://hdi-project.github.io/ballet
- Homepage: https://github.com/HDI-Project/ballet
Overview
Ballet projects maintain a feature engineering pipeline invariant: at any point, the code and features within a project repository can be used for end-to-end feature engineering for a given dataset. To expand on an existing feature engineering pipeline, well-structured feature source code submissions can be proposed by contributors and extensively validated for compatibility and performance.
How do you use the Ballet framework? First, you render a brand new ballet project from a
provided project template using a quickstart command and push it to GitHub. This project
contains an "empty" feature engineering pipeline. Next, you and your collaborators write
feature engineering source code and submit pull requests to include your new features in the
project and grow the pipeline. Features are instances of ballet.Feature
, usually
leveraging ballet.eng
, a library of versatile transformers and transformer building blocks
for developing features that learn. Once new pull requests are received by your project, a
continuous integration service runs a streaming logical feature selection algorithm. This is
part of an extensive feature validation suite that makes sure both that the proposed
features are useful and that they can be safely integrated into your project. If the
proposed feature is accepted, it can be safely merged.
History
0.6 (2019-11-12)
- Implement GFSSF validators and random validators
- Improve validators and allow validators to be configured in ballet.yml
- Improve project template
- Create ballet CLI
- Bug fixes and performance improvements
0.5 (2018-10-14)
- Add project template and ballet-quickstart command
- Add project structure checks and feature API checks
- Implement multi-stage validation routine driver
0.4 (2018-09-21)
- Implement
Modeler
for versatile modeling and evaluation - Change project name
0.3 (2018-04-28)
- Implement
PullRequestFeatureValidator
- Add
util.travis
,util.modutil
,util.git
util modules
0.2
- Implement
ArrayLikeEqualityTestingMixin
- Implement
collect_contrib_features
0.1
- First release on PyPI
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for ballet-0.6.7-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 50137b5e4ee83127aa5cf239f043fd58ed1c5a2fdbd0224a09c77aeb77759020 |
|
MD5 | 1e7ae090e58a958b1529265d933a891b |
|
BLAKE2b-256 | c2a3ec33ba4d927783e24952edbfaccb228d5fbf71f58730a7e15181c7f7553c |