Skip to main content

Alpha version of the Rasgo Python interface.

Project description

pyRasgo is a python SDK to interact with the Rasgo API. Rasgo accelerates feature engineering for Data Scientists.

Visit us at https://www.rasgoml.com/ to turn your data into Features in minutes!

Documentation is available at: https://docs.rasgoml.com/rasgo-docs/pyrasgo/

Package Dependencies

  • idna>=2.5,<3
  • more-itertools
  • pandas
  • pyarrow>=3.0
  • pydantic
  • pyyaml
  • requests
  • snowflake-connector-python>=2.4.0
  • tqdm

Release Notes

  • v0.2.3(Alpha)

    • introduces publish.features_from_source_code() function. This function allows customers to pass a custom SQL string to create a view in Snowflake using their own code. This function will: register a child source based off the parent source provided as input, register features from the new child source table. (NOTE: custom python functionality is coming later, mvp is only custom SQL)
    • introduces new workflow to publish.source_data() function. Pass in source_type="sql", sql_definition="<valid sql select string>" to create a new Rasgo DataSource as a view in Snowflake using custom SQL. (NOTE: custom python functionality is coming later, mvp is only custom SQL)
    • makes the features parameter optional in publish.features_from_source() function. If param is not passed, all columns in the underlying table that are not in the dimensions list will be registered as features
    • adds verbose parameter to all publish methods. When set to True, prints status messages to stdout to update users on progress of the function. Default = False.
    • introduces .sourceCode property on Rasgo DataSource and FeatureSet classes to display the SQL or python source code used to build the underlying table
    • introduces .display_source_code() method on Rasgo DataSource, FeatureSet, Feature classes to display the SQL of python code used to build the underlying table (NOTE: This is redundant to the above property. Including in alpha preview for feedback on which expeirence is better)
    • introduces .rebuild_from_source_code() method on Rasgo DataSource, FeatureSet, Feature classes to run the SQL or python code used to build the underlying table - effectively rebuilding that table. (NOTE: This is not functional in alpha preview. More work is needed before including this in next version push.)
    • introduces .render_sql_definition() method on Collection class to display the SQL used to create the underlying collection view
    • introduces .dimensions property on Rasgo Collection class to display all unique dimension columns in a Collection
    • introduces trigger_stats parameter in collection.generate_training_data() method to allow users to generate a sql view without kicking off correlation and join stats. Set to False to suppress stats jobs. Default=True. (NOTE: This is not fully functional in alpha preview, more API work needed before adding this to next version push. Only including for feedback on experience, not functionality.)
  • v0.2.2(July 14, 2021)

    • Allow for consistency in evaluate.feature_importance() evaluation metrics for unchanged dataframes
    • Allow users to control certain CatBoost parameters when running evaluate.feature_importance()
  • v0.2.1(July 01, 2021)

    • expand evaluate.feature_importance() to support calculating importance for collections
  • v0.2.0(June 24, 2021)

    • introduce publish.experiment() method to fast track dataframes to Rasgo objects
    • fix register bug
  • v0.1.14(June 17, 2021)

    • improve new user signup experience in register() method
    • fix dataframe bug when experiment wasn't set
  • v0.1.13(June 16, 2021)

    • intelligently run Regressor or Classifier model in evaluate.feature_importance()
    • improve model performance statistics in evaluate.feature_importance(): include AUC, Logloss, precision, recall for classification
  • v0.1.12(June 11, 2021)

    • support fqtn in publish.source_data(table) parameter
    • trim timestamps in dataframe profiles to second grain
  • v0.1.11(June 9, 2021)

    • hotfix for unexpected histogram output
  • v0.1.10(June 8, 2021)

    • pin pyarrow dependency to < version 4.0 to prevent segmentation fault errors
  • v0.1.9(June 8, 2021)

    • improve model performance in evaluate.feature_importance() by adding test set to catboost eval
  • v0.1.8(June 7, 2021)

    • evaluate.train_test_split() function supports non-timeseries dataframes
    • evaluate.feature_importance() function now runs on an 80% training set
    • adds timeseries_index parameter to evaluate.feature_importance() & prune.features() functions
  • v0.1.7(June 2, 2021)

    • expands dataframe series type recognition for profiling
  • v0.1.6(June 2, 2021)

    • cleans up dataframe profiles to enhance stats and visualization for non-numeric data
  • v0.1.5(June 2, 2021)

    • introduces pip install "pyrasgo[df]" option which will install: shap, catboost, & scikit-learn
  • v0.1.4(June 2, 2021)

    • various improvements to dataframe profiles & feature_importance
  • v0.1.3(May 27, 2021)

    • introduces experiment tracking on dataframes
    • fixes errors when running feature_importance on dataframes with NaN values
  • v0.1.2(May 26, 2021)

    • generates column profile automatically when running feature_importance
  • v0.1.1(May 24, 2021)

    • supports sharing public dataframe profiles
    • enforces assignment of granularity to dimensions in Publish methods based on list ordering
  • v0.1.0(May 17, 2021)

    • introduces dataframe methods: evaluate, prune, transform
    • supports free pyrago trial registration
  • v0.0.79(April 19, 2021)

    • support additional datetime data types on Features
    • resolve import errors
  • v0.0.78(April 5, 2021)

    • adds include_shared param to get_collections() method
  • v0.0.77(April 5, 2021)

    • adds convenience method to rename a Feature’s displayName
    • adds convenience method to promote a Feature from Sandbox to Production status
    • fixes permissions bug when trying to read Community data sources from a public org
  • v0.0.76(April 5, 2021)

    • adds columns to DataSource primitive
    • adds verbose error message to inform users when a Feature name conflict is preventing creation
  • v0.0.75(April 5, 2021)

    • introduce interactive Rasgo primitives
  • v0.0.74(March 25, 2021)

    • upgrade Snowflake python connector dependency to 2.4.0
    • upgrade pyarrow dependency to 3.0

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyrasgo-0.2.3a1.tar.gz (56.0 kB view hashes)

Uploaded Source

Built Distribution

pyrasgo-0.2.3a1-py3-none-any.whl (73.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page