Skip to main content

DataBricks CLI eXtensions aka dbx

Project description

DataBricks CLI eXtensions - aka dbx is a CLI tool for advanced Databricks jobs management.

https://badge.fury.io/py/dbx.svg https://github.com/databrickslabs/dbx/actions/workflows/onpush.yml/badge.svg?branch=master https://codecov.io/gh/databrickslabs/dbx/branch/master/graph/badge.svg?token=S7ADH3W2E3 https://img.shields.io/lgtm/alerts/g/databrickslabs/dbx.svg?logo=lgtm&logoWidth=18 https://img.shields.io/lgtm/grade/python/g/databrickslabs/dbx.svg?logo=lgtm&logoWidth=18

Concept

dbx simplifies jobs launch and deployment process across multiple environments. It also helps to package your project and deliver it to your Databricks environment in a versioned fashion. Designed in a CLI-first manner, it is built to be actively used both inside CI/CD pipelines and as a part of local tooling for fast prototyping.

Requirements

  • Python Version > 3.6

  • pip or conda

Installation

  • with pip:

pip install dbx
  • with conda:

conda install dbx

Quickstart

As a prerequisite, you need to install databricks-cli with a configured profile. In this instruction we’re based on Databricks Runtime 7.3 LTS ML. If you don’t need to use ML libraries, we still recommend to use ML-based version due to %pip magic support.

For Python-based deployments, we recommend to use cicd-templates for quickstart. However, if you don’t like the project structure defined in cicd-templates, feel free to use the instruction below for full customization.

After configuring the profile, please do the following in the root of your project:

  • Configure your project environments and storage locations:

dbx configure
  • Create a conf/deployment.json file with specs defined in documentation

  • Run dbx deploy to perform an initial deployment

  • Run dbx launch --job=<job-name> --trace to launch the job and trace it’s status

Documentation

Please refer to the docs provided in the docs folder.

Differences from other tools

Tool

Comment

databricks-cli

dbx is NOT a replacement for databricks-cli. Quite the opposite - dbx is heavily dependent on databricks-cli and uses most of the APIs exactly from databricks-cli SDK.

mlflow cli

dbx is NOT a replacement for mlflow cli. dbx uses some of the MLflow APIs under the hood to store serialized job objects, but doesn’t use mlflow CLI directly.

Databricks Terraform Provider

While dbx is primarily oriented on versioned job management, Databricks Terraform Provider provides much wider set of infrastructure settings. In comparison, dbx doesn’t provide infrastructure management capabilities, but brings more flexible deployment and launch options.

cicd-templates

cicd-templates is a Python project template, which actively uses dbx for jobs management and CI-related operations. You can choose, whenever you would like to use this template, or use dbx separately and choose the project structure on your own.

Databricks Stack CLI

Databricks Stack CLI is a great component for managing a stack of objects. dbx concentrates on the versioning and packaging jobs together, not treating files and notebooks as a separate component.

Limitations

  • Python > 3.6

  • dbx execute can only be used on clusters with Databricks ML Runtime 7.X

Feedback

Issues with dbx? Found a bug? Have a great idea for an addition? Feel free to file an issue.

Contributing

Please find more details about contributing to dbx in the contributing doc.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

dbx-0.0.9-py3-none-any.whl (19.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page