The machine learning client library that is used for interacting with Snowflake to build machine learning solutions.
Project description
Snowpark ML
Snowpark ML is a set of tools including SDKs and underlying infrastructure to build and deploy machine learning models. With Snowpark ML, you can pre-process data, train, manage and deploy ML models all within Snowflake, using a single SDK, and benefit from Snowflake’s proven performance, scalability, stability and governance at every stage of the Machine Learning workflow.
Key Components of Snowpark ML
The Snowpark ML Python SDK provides a number of APIs to support each stage of an end-to-end Machine Learning development and deployment process, and includes two key components.
Snowpark ML Development [Public Preview]
Snowpark ML Development provides a collection of python APIs enabling efficient ML model development directly in Snowflake:
-
Modeling API (
snowflake.ml.modeling) for data preprocessing, feature engineering and model training in Snowflake. This includes thesnowflake.ml.modeling.preprocessingmodule for scalable data transformations on large data sets utilizing the compute resources of underlying Snowpark Optimized High Memory Warehouses, and a large collection of ML model development classes based on sklearn, xgboost, and lightgbm. -
Framework Connectors: Optimized, secure and performant data provisioning for Pytorch and Tensorflow frameworks in their native data loader formats.
-
FileSet API: FileSet provides a Python fsspec-compliant API for materializing data into a Snowflake internal stage from a query or Snowpark Dataframe along with a number of convenience APIs.
Snowpark Model Management [Public Preview]
Snowpark Model Management complements the Snowpark ML Development API, and provides model management capabilities along with integrated deployment into Snowflake. Currently, the API consists of:
- Registry: A python API for managing models within Snowflake which also supports deployment of ML models into Snowflake as native MODEL object running with Snowflake Warehouse.
Getting started
Have your Snowflake account ready
If you don't have a Snowflake account yet, you can sign up for a 30-day free trial account.
Installation
Follow the installation instructions in the Snowflake documentation.
Python versions 3.8, 3.9 & 3.10 are supported. You can use miniconda or anaconda to create a Conda environment (recommended), or virtualenv to create a virtual environment.
Conda channels
The Snowflake Conda Channel contains the official snowpark ML package releases.
The recommended approach is to install snowflake-ml-python this conda channel:
conda install \
-c https://repo.anaconda.com/pkgs/snowflake \
--override-channels \
snowflake-ml-python
See the developer guide for installation instructions.
The latest version of the snowpark-ml-python package is also published in a conda channel in this repository. Package versions
in this channel may not yet be present in the official Snowflake conda channel.
Install snowflake-ml-python from this channel with the following (being sure to replace <version_specifier> with the
desired version, e.g. 1.0.10):
conda install \
-c https://raw.githubusercontent.com/snowflakedb/snowflake-ml-python/conda/releases/ \
-c https://repo.anaconda.com/pkgs/snowflake \
--override-channels \
snowflake-ml-python==<version_specifier>
Note that until a snowflake-ml-python package version is available in the official Snowflake conda channel, there may
be compatibility issues. Server-side functionality that snowflake-ml-python depends on may not yet be released.
Release History
1.2.0
Bug Fixes
- Model Registry: Fix "XGBoost version not compiled with GPU support" error when running CPU inference against open-source XGBoost models deployed to SPCS.
- Model Registry: Fix model deployment to SPCS on Windows machines.
Behavior Changes
New Features
- Model Development: Introduced XGBoost external memory training feature. This feature enables training XGBoost models on large datasets that don't fit into memory.
- Registry: New Registry class named
snowflake.ml.registry.Registryproviding similar APIs as the old one but works with new MODEL object in Snowflake SQL. Also, we are providingsnowflake.ml.model.Modelandsnowflake.ml.model.ModelVersionto represent a model and a specific version of a model. - Model Development: Add support for
fit_predictmethod inAgglomerativeClustering,DBSCAN, andOPTICSclasses; - Model Development: Add support for
fit_transformmethod inMDS,SpectralEmbeddingandTSNEclass.
Additional Notes
- Model Registry: The
snowflake.ml.registry.model_registry.ModelRegistryhas been deprecated starting from version 1.2.0. It will stay in the Private Preview phase. For future implementations, kindly utilizesnowflake.ml.registry.Registry, except when specifically required. The old model registry will be removed once all its primary functionalities are fully integrated into the new registry.
1.1.2
Bug Fixes
- Generic: Fix the issue that stack trace is hidden by telemetry unexpectedly.
- Model Development: Execute model signature inference without materializing full dataframe in memory.
- Model Registry: Fix occasional 'snowflake-ml-python library does not exist' error when deploying to SPCS.
Behavior Changes
- Model Registry: When calling
predictwith Snowpark DataFrame, both inferred or normalized column names are accepted. - Model Registry: When logging a Snowpark ML Modeling Model, sample input data or manually provided signature will be ignored since they are not necessary.
New Features
- Model Development: SQL implementation of binary
precision_scoremetric.
1.1.1
Bug Fixes
- Model Registry: The
predicttarget method on registered models is now compatible with unsupervised estimators. - Model Development: Fix confusion_matrix incorrect results when the row number cannot be divided by the batch size.
New Features
- Introduced passthrough_col param in Modeling API. This new param is helpful in scenarios requiring automatic input_cols inference, but need to avoid using specific columns, like index columns, during training or inference.
1.1.0
Bug Fixes
- Model Registry: Fix panda dataframe input not handling first row properly.
- Model Development: OrdinalEncoder and LabelEncoder output_columns do not need to be valid snowflake identifiers. They would previously be excluded if the normalized name did not match the name specified in output_columns.
Behavior Changes
New Features
- Model Registry: Add support for invoking public endpoint on SPCS service, by providing a "enable_ingress" SPCS deployment option.
- Model Development: Add support for distributed HPO - GridSearchCV and RandomizedSearchCV execution will be distributed on multi-node warehouses.
1.0.12
Bug Fixes
- Model Registry: Fix regression issue that container logging is not shown during model deployment to SPCS.
- Model Development: Enhance the column capacity of OrdinalEncoder.
- Model Registry: Fix unbound
batch_sizeerror when deploying a model other than Hugging Face Pipeline and LLM with GPU on SPCS.
Behavior Changes
- Model Registry: Raise early error when deploying to SPCS with db/schema that starts with underscore.
- Model Registry:
conda-forgechannel is now automatically added to channel lists when deploying to SPCS. - Model Registry:
relax_versionwill not strip all version specifier, instead it will relax==x.y.zspecifier to>=x.y,<(x+1). - Model Registry: Python with different patchlevel but the same major and minor will not result a warning when loading the model via Model Registry and would be considered to use when deploying to SPCS.
- Model Registry: When logging a
snowflake.ml.model.models.huggingface_pipeline.HuggingFacePipelineModelobject, versions of local installed libraries won't be picked as dependencies of models, instead it will pick up some pre- defined dependencies to improve user experience.
New Features
- Model Registry: Enable best-effort SPCS job/service log streaming when logging level is set to INFO.
1.0.11
New Features
- Model Registry: Add log_artifact() public method.
- Model Development: Add support for
kneighbors.
Behavior Changes
- Model Registry: Change log_model() argument from TrainingDataset to List of Artifact.
- Model Registry: Change get_training_dataset() to get_artifact().
Bug Fixes
- Model Development: Fix support for XGBoost and LightGBM models using SKLearn Grid Search and Randomized Search model selectors.
- Model Development: DecimalType is now supported as a DataType.
- Model Development: Fix metrics compatibility with Snowpark Dataframes that use Snowflake identifiers
- Model Registry: Resolve 'delete_deployment' not deleting the SPCS service in certain cases.
1.0.10
Behavior Changes
- Model Development: precision_score, recall_score, f1_score, fbeta_score, precision_recall_fscore_support, mean_absolute_error, mean_squared_error, and mean_absolute_percentage_error metric calculations are now distributed.
- Model Registry:
deploywill now returnDeploymentfor deployment information.
New Features
- Model Registry: When the model signature is auto-inferred, it will be printed to the log for reference.
- Model Registry: For SPCS deployment,
Deploymentdetails will containsimage_name,service_specandservice_function_sql.
Bug Fixes
- Model Development: Fix an issue that leading to UTF-8 decoding errors when using modeling modules on Windows.
- Model Development: Fix an issue that alias definitions cause
SnowparkSQLUnexpectedAliasExceptionin inference. - Model Registry: Fix an issue that signature inference could be incorrect when using Snowpark DataFrame as sample input.
- Model Registry: Fix too strict data type validation when predicting. Now, for example, if you have a INT8 type feature in the signature, if providing a INT64 dataframe but all values are within the range, it would not fail.
1.0.9 (2023-09-28)
Behavior Changes
- Model Development: log_loss metric calculation is now distributed.
Bug Fixes
- Model Registry: Fix an issue that building images fails with specific docker setup.
- Model Registry: Fix an issue that unable to embed local ML library when the library is imported by
zipimport. - Model Registry: Fix out-of-date doc about
platformargument in thedeployfunction. - Model Registry: Fix an issue that unable to deploy a GPU-trained PyTorch model to a platform where GPU is not available.
1.0.8 (2023-09-15)
Bug Fixes
- Model Development: Ordinal encoder can be used with mixed input column types.
- Model Development: Fix an issue when the sklearn default value is
np.nan. - Model Registry: Fix an issue that incorrect docker executable is used when building images.
- Model Registry: Fix an issue that specifying
tokenargument when usingsnowflake.ml.model.models.huggingface_pipeline.HuggingFacePipelineModelwithtransformers < 4.32.0is not effective. - Model Registry: Fix an issue that incorrect system function call is used when deploying to SPCS.
- Model Registry: Fix an issue when using a
transformers.pipelinethat does not have atokenizer. - Model Registry: Fix incorrectly-inferred image repository name during model deployment to SPCS.
- Model Registry: Fix GPU resource retention issue caused by failed or stuck previous deployments in SPCS.
1.0.7 (2023-09-05)
Bug Fixes
- Model Development & Model Registry: Fix an error related to
pandas.io.json.json_normalize. - Allow disabling telemetry.
1.0.6 (2023-09-01)
New Features
- Model Registry: add
create_if_not_existsparameter in constructor. - Model Registry: Added get_or_create_model_registry API.
- Model Registry: Added support for using GPU inference when deploying XGBoost (
xgboost.XGBModelandxgboost.Booster), PyTorch (torch.nn.Moduleandtorch.jit.ScriptModule) and TensorFlow (tensorflow.Moduleandtensorflow.keras.Model) models to Snowpark Container Services. - Model Registry: When inferring model signature,
Sequenceof built-in types,Sequenceofnumpy.ndarray,Sequenceoftorch.Tensor,Sequenceoftensorflow.TensorandSequenceoftensorflow.Tensorcan be used instead of onlyListof them. - Model Registry: Added
get_training_datasetAPI. - Model Development: Size of metrics result can exceed previous 8MB limit.
- Model Registry: Added support save/load/deploy HuggingFace pipeline object (
transformers.Pipeline) and our wrapper (snowflake.ml.model.models.huggingface_pipeline.HuggingFacePipelineModel) to it. Using the wrapper to specify configurations and the model for the pipeline will be loaded dynamically when deploying. Currently, following tasks are supported to log without manually specifying model signatures:- "conversational"
- "fill-mask"
- "question-answering"
- "summarization"
- "table-question-answering"
- "text2text-generation"
- "text-classification" (alias "sentiment-analysis" available)
- "text-generation"
- "token-classification" (alias "ner" available)
- "translation"
- "translation_xx_to_yy"
- "zero-shot-classification"
Bug Fixes
- Model Development: Fixed a bug when using simple imputer with numpy >= 1.25.
- Model Development: Fixed a bug when inferring the type of label columns.
Behavior Changes
- Model Registry:
log_model()now return aModelReferenceobject instead of a model ID. - Model Registry: When deploying a model with 1
target methodonly, thetarget_methodargument can be omitted. - Model Registry: When using the snowflake-ml-python with version newer than what is available in Snowflake Anaconda
Channel,
embed_local_ml_libraryoption will be set asTrueautomatically if not. - Model Registry: When deploying a model to Snowpark Container Services and using GPU, the default value of num_workers will be 1.
- Model Registry:
keep_orderandoutput_with_input_featuresin the deploy options have been removed. Now the behavior is controlled by the type of the input when callingmodel.predict(). If the input is apandas.DataFrame, the behavior will be the same askeep_order=Trueandoutput_with_input_features=Falsebefore. If the input is asnowpark.DataFrame, the behavior will be the same askeep_order=Falseandoutput_with_input_features=Truebefore. - Model Registry: When logging and deploying PyTorch (
torch.nn.Moduleandtorch.jit.ScriptModule) and TensorFlow (tensorflow.Moduleandtensorflow.keras.Model) models, we no longer accept models whose input is a list of tensor and output is a list of tensors. Instead, now we accept models whose input is 1 or more tensors as positional arguments, and output is a tensor or a tuple of tensors. The input and output dataframe when predicting keep the same as before, that is every column is an array feature and contains a tensor.
1.0.5 (2023-08-17)
New Features
- Model Registry: Added support save/load/deploy xgboost Booster model.
- Model Registry: Added support to get the model name and the model version from model references.
Bug Fixes
- Model Registry: Restore the db/schema back to the session after
create_model_registry(). - Model Registry: Fixed an issue that the UDF name created when deploying a model is not identical to what is provided and cannot be correctly dropped when deployment getting dropped.
- connection_params.SnowflakeLoginOptions(): Added support for
private_key_path.
1.0.4 (2023-07-28)
New Features
- Model Registry: Added support save/load/deploy Tensorflow models (
tensorflow.Module). - Model Registry: Added support save/load/deploy MLFlow PyFunc models (
mlflow.pyfunc.PyFuncModel). - Model Development: Input dataframes can now be joined against data loaded from staged files.
- Model Development: Added support for non-English languages.
Bug Fixes
- Model Registry: Fix an issue that model dependencies are incorrectly reported as unresolvable on certain platforms.
1.0.3 (2023-07-14)
Behavior Changes
- Model Registry: When predicting a model whose output is a list of NumPy ndarray, the output would not be flattened, instead, every ndarray will act as a feature(column) in the output.
New Features
- Model Registry: Added support save/load/deploy PyTorch models (
torch.nn.Moduleandtorch.jit.ScriptModule).
Bug Fixes
- Model Registry: Fix an issue that when database or schema name provided to
create_model_registrycontains special characters, the model registry cannot be created. - Model Registry: Fix an issue that
get_model_descriptionreturns with additional quotes. - Model Registry: Fix incorrect error message when attempting to remove a unset tag of a model.
- Model Registry: Fix a typo in the default deployment table name.
- Model Registry: Snowpark dataframe for sample input or input for
predictmethod that contains a column with SnowflakeNUMBER(precision, scale)data type wherescale = 0will not lead to error, and will now correctly recognized asINT64data type in model signature. - Model Registry: Fix an issue that prevent model logged in the system whose default encoding is not UTF-8 compatible from deploying.
- Model Registry: Added earlier and better error message when any file name in the model or the file name of model itself contains characters that are unable to be encoded using ASCII. It is currently not supported to deploy such a model.
1.0.2 (2023-06-22)
Behavior Changes
- Model Registry: Prohibit non-snowflake-native models from being logged.
- Model Registry:
_use_local_snowmlparameter in options ofdeploy()has been removed. - Model Registry: A default
Falseembed_local_ml_libraryparameter has been added to the options oflog_model(). With this set toFalse(default), the version of the local snowflake-ml-python library will be recorded and used when deploying the model. With this set toTrue, local snowflake-ml-python library will be embedded into the logged model, and will be used when you load or deploy the model.
New Features
- Model Registry: A new optional argument named
code_pathshas been added to the arguments oflog_model()for users to specify additional code paths to be imported when loading and deploying the model. - Model Registry: A new optional argument named
optionshas been added to the arguments oflog_model()to specify any additional options when saving the model. - Model Development: Added metrics:
- d2_absolute_error_score
- d2_pinball_score
- explained_variance_score
- mean_absolute_error
- mean_absolute_percentage_error
- mean_squared_error
Bug Fixes
- Model Development:
accuracy_score()now works when given label column names are lists of a single value.
1.0.1 (2023-06-16)
Behavior Changes
- Model Development: Changed Metrics APIs to imitate sklearn metrics modules:
accuracy_score(),confusion_matrix(),precision_recall_fscore_support(),precision_score()methods move from respective modules tometrics.classification.
- Model Registry: The default table/stage created by the Registry now uses "SYSTEM" as a prefix.
- Model Registry:
get_model_history()method as been enhanced to include the history of model deployment.
New Features
- Model Registry: A default
Falseflag namedreplace_udfhas been added to the options ofdeploy(). Setting this toTruewill allow overwrite existing UDF with the same name when deploying. - Model Development: Added metrics:
- f1_score
- fbeta_score
- recall_score
- roc_auc_score
- roc_curve
- log_loss
- precision_recall_curve
- Model Registry: A new argument named
permanenthas been added to the argument ofdeploy(). Setting this toTrueallows the creation of a permanent deployment without needing to specify the UDF location. - Model Registry: A new method
list_deployments()has been added to enumerate all permanent deployments originating from a specific model. - Model Registry: A new method
get_deployment()has been added to fetch a deployment by its deployment name. - Model Registry: A new method
delete_deployment()has been added to remove an existing permanent deployment.
1.0.0 (2023-06-09)
Behavior Changes
- Model Registry:
predict()method moves from Registry to ModelReference. - Model Registry:
_snowml_wheel_pathparameter in options ofdeploy(), is replaced with_use_local_snowmlwith default value ofFalse. Setting this toTruewill have the same effect of uploading local SnowML code when executing model in the warehouse. - Model Registry: Removed
idfield fromModelReferenceconstructor. - Model Development: Preprocessing and Metrics move to the modeling package:
snowflake.ml.modeling.preprocessingandsnowflake.ml.modeling.metrics. - Model Development:
get_sklearn_object()method is renamed toto_sklearn(),to_xgboost(), andto_lightgbm()for respective native models.
New Features
- Added PolynomialFeatures transformer to the snowflake.ml.modeling.preprocessing module.
- Added metrics:
- accuracy_score
- confusion_matrix
- precision_recall_fscore_support
- precision_score
Bug Fixes
- Model Registry: Model version can now be any string (not required to be a valid identifier)
- Model Deployment:
deploy()&predict()methods now correctly escapes identifiers
0.3.2 (2023-05-23)
Behavior Changes
- Use cloudpickle to serialize and deserialize models throughout the codebase and removed dependency on joblib.
New Features
- Model Deployment: Added support for snowflake.ml models.
0.3.1 (2023-05-18)
Behavior Changes
- Standardized registry API with following
- Create & open registry taking same set of arguments
- Create & Open can choose schema to use
- Set_tag, set_metric, etc now explicitly calls out arg name as metric_name, tag_name, metric_name, etc.
New Features
- Changes to support python 3.9, 3.10
- Added kBinsDiscretizer
- Support for deployment of XGBoost models & int8 types of data
0.3.0 (2023-05-11)
Behavior Changes
- Big Model Registry Refresh
- Fixed API discrepancies between register_model & log_model.
- Model can be referred by Name + Version (no opaque internal id is required)
New Features
- Model Registry: Added support save/load/deploy SKL & XGB Models
0.2.3 (2023-04-27)
Bug Fixes
- Allow using OneHotEncoder along with sklearn style estimators in a pipeline.
New Features
- Model Registry: Added support for delete_model. Use delete_artifact = False to not delete the underlying model data but just unregister.
0.2.2 (2023-04-11)
New Features
- Initial version of snowflake-ml modeling package.
- Provide support for training most of scikit-learn and xgboost estimators and transformers.
Bug Fixes
- Minor fixes in preprocessing package.
0.2.1 (2023-03-23)
New Features
- New in Preprocessing:
- SimpleImputer
- Covariance Matrix
- Optimization of Ordinal Encoder client computations.
Bug Fixes
- Minor fixes in OneHotEncoder.
0.2.0 (2023-02-27)
New Features
- Model Registry
- PyTorch & Tensorflow connector file generic FileSet API
- New to Preprocessing:
- Binarizer
- Normalizer
- Pearson correlation Matrix
- Optimization in Ordinal Encoder to cache vocabulary in temp tables.
0.1.3 (2023-02-02)
New Features
- Initial version of transformers including:
- Label Encoder
- Max Abs Scaler
- Min Max Scaler
- One Hot Encoder
- Ordinal Encoder
- Robust Scaler
- Standard Scaler
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file snowflake_ml_python-1.2.0-py3-none-any.whl.
File metadata
- Download URL: snowflake_ml_python-1.2.0-py3-none-any.whl
- Upload date:
- Size: 1.8 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.9.6 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/1.0.0 urllib3/1.26.18 tqdm/4.64.1 importlib-metadata/4.8.3 keyring/23.4.1 rfc3986/1.5.0 colorama/0.4.5 CPython/3.6.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
660e7cb45c61f2d8c267b36f5ccd1c6e871fc5146906329051456e97881e9af3
|
|
| MD5 |
d53c97eb7c36c44fc7416ed5d6d538db
|
|
| BLAKE2b-256 |
7b633b4ffe11682941dc3e362e3eb2d7ad7cb0ec871f215354294416ba6c31a7
|