Skip to main content

Alpha version of the Rasgo Python interface.

Project description

pyRasgo is a python SDK to interact with the Rasgo API. Rasgo accelerates feature engineering for Data Scientists.

Visit us at https://www.rasgoml.com/ to turn your data into Features in minutes!

Documentation is available at: https://docs.rasgoml.com/rasgo-docs/pyrasgo/

Package Dependencies

  • idna>=3.3
  • more-itertools
  • pandas
  • pyarrow>=5.0.0
  • pydantic
  • pyyaml
  • requests
  • snowflake-connector-python>=2.7.0
  • tqdm

Release Notes

v0.5.3 (Apr 6, 2022)

  • Transform creation tweaks

  • v0.5.2 (Apr 5, 2022)

    • Adds param table_name for rasgo.publish.dataset() for specifying the table name you want to set for the dataset
  • v0.5.1 (Apr 4, 2022)

    • Add deprecation warnings for pre-1.0 functions
  • v0.5.0 (Mar 23, 2022)

    • Update Snowflake connection to use correct role
  • v0.4.36 (Mar 22, 2022)

    • Make ds.refresh_table() complete refresh when function finishes running always
  • v0.4.35 (Mar 16, 2022)

    • Adds a PyRasgo Primitive for an Accelerator
    • Adds the following methods for working with Accelerators in PyRasgo
      • rasgo.get.accelerator()
      • rasgo.get.accelerators()
      • rasgo.create.accelerator()
      • rasgo.delete.accelerator()
      • rasgo.create.dataset_from_accelerator()
      • Accelerator.apply()
  • v0.4.34 (Mar 11, 2022)

    • Add to_dbt() function to Datasets
      • Use this method to export a published Dataset as a DBT Model
    • Support tracking of dataset dependencies passed transforms that accept lists of datasets (multi-join)
  • v0.4.33 (Mar 11, 2022)

    • Handles long running ds.refresh_table() process
  • v0.4.32 (Mar 10, 2022)

    • Fixed uses of apply transform
  • v0.4.31 (Mar 04, 2022)

    • Raise an error if you supply an arg to transform which doesn't exist
    • Fix dependency management for transform arguments of type table_list
  • v0.4.30 (Mar 01, 2022)

    • Cache and return dataset columns if not set when calling ds.columns and ds from API
    • New method rasgo.update.column() to set/update metadata about a ds column
  • v0.4.29 (Mar 01, 2022)

    • Bugfixes
  • v0.4.28 (Feb 28, 2022)

    • Fetches all datasets on a call to rasgo.get.datasets()
  • v0.4.27 (Feb 25, 2022)

    • Adds ability to set tags when creating a transform in the function rasgo.create.transform()
  • v0.4.26 (Feb 22, 2022)

    • Allow published datasets with tables as their output to be refreshed using dataset.refresh_table()
  • v0.4.25 (Feb 22, 2022)

    • Creates more informative generated Data Warehouse Table names; Now tables/views names made in PyRasgo will look like the folowing below
      • RASGO_SDK__OP<op_num>__<transform_name>_transform__<guid>
    • Adds proper error message with steps to take to fix, when publishing a DF with incompatible pandas date types
  • v0.4.24 (Feb 21, 2022)

    • Adds the optional parameter generate_stats to toggle stats generation when publishing with rasgo.publish.table/df() (defaults to True if not passed)
  • v0.4.23 (Feb 17, 2022)

    • Adds the parameter parents to specify parent dataset dependencies of table or pandas dataframe when publishing with rasgo.publish.table/df()
  • v0.4.22 (Feb 15, 2022)

    • Allows users to get the PyRasgo code used to generate a dataset with the function dataset.generate_py()
  • v0.4.21 (Feb 08, 2022)

    • Enable users to append to an existing Rasgo Dataset using rasgo.publish.df(fqtn="MY.FQTN.STRING", if_exists="append")
  • v0.4.20 (Feb 07, 2022)

    • Add render_only optional parameter to Dataset.transform() to support printing the SQL that will be executed by an applied transform instead of creating a new Dataset.
      • This option allows testing of transform arguments without having to execute the transform
  • v0.4.19 (Feb 02, 2022)

    • Bug fixes
  • v0.4.18 (Feb 02, 2022)

    • Add optional rasgo.publish.dataset() parameter table_type to support materializing a dataset as a table instead of a view.
  • v0.4.17 (Feb 01, 2022)

    • Add Dataset.generate_yaml() to allow users to export their datasets and associated operation sets as a YAML string
    • Add Dataset.versions attribute to support retrieving all versions of a Dataset
  • v0.4.16 (Jan 31, 2022)

    • Add Dataset.run_stats() to allow users to trigger stats generation for a dataset
    • Add Dataset.profile() to give users a link to the Rasgo UI, where they can view details on their Dataset, including any generated stats
  • v0.4.15 (Jan 27, 2022)

    • Update timeseries tracking attribute name to time_index to match keyword
  • v0.4.14 (Jan 26, 2022)

    • Remove unnecessary import
  • v0.4.13 (Jan 26, 2022)

    • Add the ability to publish dataset attributes when publishing a dataset
  • v0.4.12 (Jan 21, 2022)

    • Change experimental_async to async_compute, default to True
  • v0.4.11 (Jan 25, 2022)

    • Bug fixes
  • v0.4.10 (Jan 24, 2022)

    • Adds dataset snapshot information to Dataset.snapshots and provides a hook to return a snapshot's data with Dataset.to_df(snapshot_index=<int>)
  • v0.4.9 (Jan 17, 2022)

    • Adds parameters filters, order_by, and columns to dataset.to_df() and dataset.preview() methods
  • v0.4.8 (Jan 14, 2022)

    • Adds experimental_async flag to transforms to take advantage of experimental long-running operation creation
  • v0.4.7 (Jan 13, 2022)

    • Return errors for operation creation
  • v0.4.6 (Jan 12, 2022)

    • Adds support for long running operation creations
  • v0.4.5 (Dec 21, 2021)

    • Fixes dependency installation
  • v0.4.4 (Dec 21, 2021)

    • Adds support for Python versions 3.7.12, 3.8, 3.9, and 3.10
  • v0.4.3 (Dec 17, 2021)

    • Method added rasgo.update.transform() to update a transform
  • v0.4.2 (Dec 15, 2021)

    • Adds the ability to reference Dataset attributes directly
      • Dataset.id
      • Dataset.name
      • Dataset.description
      • Dataset.status
      • Dataset.fqtn
      • Dataset.columns
      • Dataset.created_date
      • Dataset.update_date
      • Dataset.attributes
      • Dataset.dependencies
      • Dataset.sql
    • Adds ability function for getting Datasets by fqtn
      • rasgo.get.dataset(fqtn='MY_FQTN'>)
  • v0.4.1 (Dec 13, 2021)

    • "Updates"
  • v0.4.0 (Dec 07, 2021)

    • Add Rasgo Datasets
      • Datasets are the new, single primitive available in Rasgo. Users can explore, transform, and create new data warehouse tables using this single primitive object.
      • Transforming a previously saved Dataset will produce a new Dataset definition that builds on top of the transformed Dataset. This new dataset will consist of a new operation that references the transformed Dataset as the source_table in the applied transform. Further transforms will add to the list of operations until .save is called to persist the created operations as a new Dataset in Rasgo.
      • New Rasgo Functions:
        • rasgo.get.datasets - Get a list of all available Datasets
        • rasgo.get.dataset - Get a single Dataset by ID, including the list of operations that created it (if they exist)
        • rasgo.update.dataset - Update name and description
        • rasgo.delete.dataset - Delete a Dataset
        • rasgo.publish.dataset - Save a new dataset to Rasgo. Can only save new Datasets that have been created by transforming old Datasets
        • rasgo.publish.df - Publish a Pandas DataFrame as a Rasgo Dataset
        • rasgo.publish.table - Publish an existing table as a Rasgo dataset
      • Dataset Primitive Functions:
        • Dataset.transform - Transform a previously existing Dataset with a given Transform to create a new Dataset definition
          • You can also reference transforms by name directly.
          • e.g. dataset.join(...) as opposed to dataset.transform(transform_name='join', ...)
        • Dataset.to_df - Read a Dataset into a Pandas DataFrame
        • Dataset.preview - Get a Pandas DataFrame consisting of the top 10 rows produced by this Dataset
      • Dataset Attributes:
        • Dataset.sql - A sql string representation of the operations that produce this dataset (if they exist)
  • v0.3.4 (Dec 03, 2021)

    • Temporary hotfix: DataSource.to_dict() returns sourceTable attribute as a table name, instead of fqtn. Plan is to revert to fqtn in a future version when publish methods offer first-class handling of fqtn.
  • v0.3.3 (Nov 08, 2021)

    • Added detailed Transform Argument Definitions during Transform creation
    • Allow null values for User Defined Transform arguments
  • v0.3.2 (Oct 13, 2021)

    • Adds Jinja as the templating engine for User Defined Transforms
    • Source transforms may now be previewed, tested and deleted to enable a full creation experience.
    • Adds Rasgo template functions to enable dynamic template building
  • v0.3.1 (Sept 27, 2021)

    • Adds filter and limit params to read.collection_snapshot_data function
    • Fixes Collection response model bug
  • v0.3.0 (Sept 22, 2021)

    • Deprecates FeatureSet primitive (see docs for migration path: https://docs.rasgoml.com/rasgo-docs/pyrasgo-version-log/version-0.3)
    • Adds support for creating features using python source code
    • Adds support for user-defined transformation functionality
    • Adds methods to interact with Collection snapshots (DEPRECATED):
      • get.collection_snapshots()
      • read.collection_snapshot_data()
    • Adds methods to Collection primitive:
      • .preview() to view data in a pandas df
      • .get_compatible_features() to list features available to join
    • Adds .to_dict and .to_yml methods to DataSource primitive
  • v0.2.5 (Aug 18, 2021)

    • adds handling and user notification for highly null dataframes which would otherwise not function well with evaluate.profile or evaluate.feature_importance
  • v0.2.4 (Aug 4, 2021)

    • supports tables named as restricted Snowflake keywords (e.g. ACCOUNT, CASE, ORDER) to be registered as Rasgo Sources
  • v0.2.3 (July 30, 2021)

    • introduces publish.features_from_source_code() function. This function allows users to pass a custom SQL string to create a view in Snowflake using their own code. This function will: register a child source based off the parent source provided as input, register features from the new child source table.
    • introduces new workflow to publish.source_data() function. Pass in source_type="sql", sql_definition="<valid sql select string>" to create a new Rasgo DataSource as a view in Snowflake using custom SQL.
    • makes the features parameter optional in publish.features_from_source() function. If param is not passed, all columns in the underlying table that are not in the dimensions list will be registered as features
    • adds trigger_stats parameter to all publish method. When set to False, statistical profiling will not run for the data objects being published. Default = True
    • adds verbose parameter to all publish methods. When set to True, prints status messages to stdout to update users on progress of the function. Default = False.
    • introduces .sourceCode property on Rasgo DataSource and FeatureSet classes to display the SQL or python source code used to build the underlying table
    • introduces .render_sql_definition() method on Collection class to display the SQL used to create the underlying collection view
    • introduces .dimensions property on Rasgo Collection class to display all unique dimension columns in a Collection
    • introduces trigger_stats parameter in collection.generate_training_data() method to allow users to generate a sql view without kicking off correlation and join stats. Set to False to suppress stats jobs. Default=True.
    • Add support for optional catboost parameter train_dir in evaluate.feature_importance() function, which allows users to dictate where temporary training files are generated
  • v0.2.2(July 14, 2021)

    • Allow for consistency in evaluate.feature_importance() evaluation metrics for unchanged dataframes
    • Allow users to control certain CatBoost parameters when running evaluate.feature_importance()
  • v0.2.1(July 01, 2021)

    • expand evaluate.feature_importance() to support calculating importance for collections
  • v0.2.0(June 24, 2021)

    • introduce publish.experiment() method to fast track dataframes to Rasgo objects
    • fix register bug
  • v0.1.14(June 17, 2021)

    • improve new user signup experience in register() method
    • fix dataframe bug when experiment wasn't set
  • v0.1.13(June 16, 2021)

    • intelligently run Regressor or Classifier model in evaluate.feature_importance()
    • improve model performance statistics in evaluate.feature_importance(): include AUC, Logloss, precision, recall for classification
  • v0.1.12(June 11, 2021)

    • support fqtn in publish.source_data(table) parameter
    • trim timestamps in dataframe profiles to second grain
  • v0.1.11(June 9, 2021)

    • hotfix for unexpected histogram output
  • v0.1.10(June 8, 2021)

    • pin pyarrow dependency to < version 4.0 to prevent segmentation fault errors
  • v0.1.9(June 8, 2021)

    • improve model performance in evaluate.feature_importance() by adding test set to catboost eval
  • v0.1.8(June 7, 2021)

    • evaluate.train_test_split() function supports non-timeseries dataframes
    • evaluate.feature_importance() function now runs on an 80% training set
    • adds timeseries_index parameter to evaluate.feature_importance() & prune.features() functions
  • v0.1.7(June 2, 2021)

    • expands dataframe series type recognition for profiling
  • v0.1.6(June 2, 2021)

    • cleans up dataframe profiles to enhance stats and visualization for non-numeric data
  • v0.1.5(June 2, 2021)

    • introduces pip install "pyrasgo[df]" option which will install: shap, catboost, & scikit-learn
  • v0.1.4(June 2, 2021)

    • various improvements to dataframe profiles & feature_importance
  • v0.1.3(May 27, 2021)

    • introduces experiment tracking on dataframes
    • fixes errors when running feature_importance on dataframes with NaN values
  • v0.1.2(May 26, 2021)

    • generates column profile automatically when running feature_importance
  • v0.1.1(May 24, 2021)

    • supports sharing public dataframe profiles
    • enforces assignment of granularity to dimensions in Publish methods based on list ordering
  • v0.1.0(May 17, 2021)

    • introduces dataframe methods: evaluate, prune, transform
    • supports free pyrago trial registration
  • v0.0.79(April 19, 2021)

    • support additional datetime data types on Features
    • resolve import errors
  • v0.0.78(April 5, 2021)

    • adds include_shared param to get_collections() method
  • v0.0.77(April 5, 2021)

    • adds convenience method to rename a Feature’s displayName
    • adds convenience method to promote a Feature from Sandbox to Production status
    • fixes permissions bug when trying to read Community data sources from a public org
  • v0.0.76(April 5, 2021)

    • adds columns to DataSource primitive
    • adds verbose error message to inform users when a Feature name conflict is preventing creation
  • v0.0.75(April 5, 2021)

    • introduce interactive Rasgo primitives
  • v0.0.74(March 25, 2021)

    • upgrade Snowflake python connector dependency to 2.4.0
    • upgrade pyarrow dependency to 3.0

Project details


Release history Release notifications | RSS feed

This version

0.5.3

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyrasgo-0.5.3.tar.gz (93.6 kB view hashes)

Uploaded Source

Built Distribution

pyrasgo-0.5.3-py3-none-any.whl (121.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page