Skip to main content

Alpha version of the Rasgo Python interface.

Project description

pyRasgo is a python SDK to interact with the Rasgo API. Rasgo accelerates feature engineering for Data Scientists.

Visit us at https://www.rasgoml.com/ to turn your data into Features in minutes!

Documentation is available at: https://docs.rasgoml.com/rasgo-docs/pyrasgo/

Package Dependencies

  • idna>=2.5,<3
  • more-itertools
  • pandas
  • pyarrow>=3.0
  • pydantic
  • pyyaml
  • requests
  • snowflake-connector-python>=2.4.0
  • tqdm

Release Notes

  • v0.2.6a1 (Sept 1, 2021)

    • Adds support for creating features using python source code (assuming an existing parent source).
    • Users can provide a python function that will transform a source and create a new set of features using the results of that function
  • v0.2.5 (Aug 18, 2021)

    • adds handling and user notification for highly null dataframes which would otherwise not function well with evaluate.profile or evaluate.feature_importance
  • v0.2.4 (Aug 4, 2021)

    • supports tables named as restricted Snowflake keywords (e.g. ACCOUNT, CASE, ORDER) to be registered as Rasgo Sources
  • v0.2.3 (July 30, 2021)

    • introduces publish.features_from_source_code() function. This function allows users to pass a custom SQL string to create a view in Snowflake using their own code. This function will: register a child source based off the parent source provided as input, register features from the new child source table.
    • introduces new workflow to publish.source_data() function. Pass in source_type="sql", sql_definition="<valid sql select string>" to create a new Rasgo DataSource as a view in Snowflake using custom SQL.
    • makes the features parameter optional in publish.features_from_source() function. If param is not passed, all columns in the underlying table that are not in the dimensions list will be registered as features
    • adds trigger_stats parameter to all publish method. When set to False, statistical profiling will not run for the data objects being published. Default = True
    • adds verbose parameter to all publish methods. When set to True, prints status messages to stdout to update users on progress of the function. Default = False.
    • introduces .sourceCode property on Rasgo DataSource and FeatureSet classes to display the SQL or python source code used to build the underlying table
    • introduces .render_sql_definition() method on Collection class to display the SQL used to create the underlying collection view
    • introduces .dimensions property on Rasgo Collection class to display all unique dimension columns in a Collection
    • introduces trigger_stats parameter in collection.generate_training_data() method to allow users to generate a sql view without kicking off correlation and join stats. Set to False to suppress stats jobs. Default=True.
    • Add support for optional catboost parameter train_dir in evaluate.feature_importance() function, which allows users to dictate where temporary training files are generated
  • v0.2.2(July 14, 2021)

    • Allow for consistency in evaluate.feature_importance() evaluation metrics for unchanged dataframes
    • Allow users to control certain CatBoost parameters when running evaluate.feature_importance()
  • v0.2.1(July 01, 2021)

    • expand evaluate.feature_importance() to support calculating importance for collections
  • v0.2.0(June 24, 2021)

    • introduce publish.experiment() method to fast track dataframes to Rasgo objects
    • fix register bug
  • v0.1.14(June 17, 2021)

    • improve new user signup experience in register() method
    • fix dataframe bug when experiment wasn't set
  • v0.1.13(June 16, 2021)

    • intelligently run Regressor or Classifier model in evaluate.feature_importance()
    • improve model performance statistics in evaluate.feature_importance(): include AUC, Logloss, precision, recall for classification
  • v0.1.12(June 11, 2021)

    • support fqtn in publish.source_data(table) parameter
    • trim timestamps in dataframe profiles to second grain
  • v0.1.11(June 9, 2021)

    • hotfix for unexpected histogram output
  • v0.1.10(June 8, 2021)

    • pin pyarrow dependency to < version 4.0 to prevent segmentation fault errors
  • v0.1.9(June 8, 2021)

    • improve model performance in evaluate.feature_importance() by adding test set to catboost eval
  • v0.1.8(June 7, 2021)

    • evaluate.train_test_split() function supports non-timeseries dataframes
    • evaluate.feature_importance() function now runs on an 80% training set
    • adds timeseries_index parameter to evaluate.feature_importance() & prune.features() functions
  • v0.1.7(June 2, 2021)

    • expands dataframe series type recognition for profiling
  • v0.1.6(June 2, 2021)

    • cleans up dataframe profiles to enhance stats and visualization for non-numeric data
  • v0.1.5(June 2, 2021)

    • introduces pip install "pyrasgo[df]" option which will install: shap, catboost, & scikit-learn
  • v0.1.4(June 2, 2021)

    • various improvements to dataframe profiles & feature_importance
  • v0.1.3(May 27, 2021)

    • introduces experiment tracking on dataframes
    • fixes errors when running feature_importance on dataframes with NaN values
  • v0.1.2(May 26, 2021)

    • generates column profile automatically when running feature_importance
  • v0.1.1(May 24, 2021)

    • supports sharing public dataframe profiles
    • enforces assignment of granularity to dimensions in Publish methods based on list ordering
  • v0.1.0(May 17, 2021)

    • introduces dataframe methods: evaluate, prune, transform
    • supports free pyrago trial registration
  • v0.0.79(April 19, 2021)

    • support additional datetime data types on Features
    • resolve import errors
  • v0.0.78(April 5, 2021)

    • adds include_shared param to get_collections() method
  • v0.0.77(April 5, 2021)

    • adds convenience method to rename a Feature’s displayName
    • adds convenience method to promote a Feature from Sandbox to Production status
    • fixes permissions bug when trying to read Community data sources from a public org
  • v0.0.76(April 5, 2021)

    • adds columns to DataSource primitive
    • adds verbose error message to inform users when a Feature name conflict is preventing creation
  • v0.0.75(April 5, 2021)

    • introduce interactive Rasgo primitives
  • v0.0.74(March 25, 2021)

    • upgrade Snowflake python connector dependency to 2.4.0
    • upgrade pyarrow dependency to 3.0

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyrasgo-0.2.6a1.tar.gz (59.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyrasgo-0.2.6a1-py3-none-any.whl (76.1 kB view details)

Uploaded Python 3

File details

Details for the file pyrasgo-0.2.6a1.tar.gz.

File metadata

  • Download URL: pyrasgo-0.2.6a1.tar.gz
  • Upload date:
  • Size: 59.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.7.11

File hashes

Hashes for pyrasgo-0.2.6a1.tar.gz
Algorithm Hash digest
SHA256 e0dbeff161b5b9a0fd3b2da7c8cac06d691fdb1169cb637a13687191fb357e8e
MD5 f60266b3a01b97e9e26bf79a428b7e68
BLAKE2b-256 5087bb0b227d71d560289467eba17aa24e4380673fc253b81f067b758d24fd8c

See more details on using hashes here.

File details

Details for the file pyrasgo-0.2.6a1-py3-none-any.whl.

File metadata

  • Download URL: pyrasgo-0.2.6a1-py3-none-any.whl
  • Upload date:
  • Size: 76.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.7.11

File hashes

Hashes for pyrasgo-0.2.6a1-py3-none-any.whl
Algorithm Hash digest
SHA256 408e7b2fe641145f16a34be9c1ec9599ec4e6c223f3ade6c2b311d630c250c48
MD5 f448ad9aca4378aa0c3877cf08d20f3d
BLAKE2b-256 1ef492d6d9c1ce41ca500e8c4b3787ab563c1b0230fb1c24eebcdff9c8ddcc19

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page