Skip to main content

Lightning support for Intel Habana accelerators

Project description

Lightning ⚡ Intel Habana

lightning PyPI Status PyPI - Python Version PyPI - Downloads Deploy Docs

General checks Build Status pre-commit.ci status

Habana® Gaudi® AI Processor (HPU) training processors are built on a heterogeneous architecture with a cluster of fully programmable Tensor Processing Cores (TPC) along with its associated development tools and libraries, and a configurable Matrix Math engine.

The TPC core is a VLIW SIMD processor with an instruction set and hardware tailored to serve training workloads efficiently. The Gaudi memory architecture includes on-die SRAM and local memories in each TPC and, Gaudi is the first DL training processor that has integrated RDMA over Converged Ethernet (RoCE v2) engines on-chip.

On the software side, the PyTorch Habana bridge interfaces between the framework and SynapseAI software stack to enable the execution of deep learning models on the Habana Gaudi device.

Gaudi provides a significant cost-effective benefit, allowing you to engage in more deep learning training while minimizing expenses.

For more information, check out Gaudi Architecture and Gaudi Developer Docs.


Installing Lighting Habana

To install Lightning Habana, run the following command:

pip install -U lightning lightning-habana

NOTE

Ensure either of lightning or pytorch-lightning is used when working with the plugin. Mixing strategies, plugins etc from both packages is not yet validated.


Using PyTorch Lighting with HPU

To enable PyTorch Lightning with HPU accelerator, provide accelerator=HPUAccelerator() parameter to the Trainer class.

from lightning import Trainer
from lightning_habana.pytorch.accelerator import HPUAccelerator

# Run on one HPU.
trainer = Trainer(accelerator=HPUAccelerator(), devices=1)
# Run on multiple HPUs.
trainer = Trainer(accelerator=HPUAccelerator(), devices=8)
# Choose the number of devices automatically.
trainer = Trainer(accelerator=HPUAccelerator(), devices="auto")

The devices=1 parameter with HPUs enables the Habana accelerator for single card training using SingleHPUStrategy.

The devices>1 parameter with HPUs enables the Habana accelerator for distributed training. It uses HPUParallelStrategy which is based on DDP strategy with the integration of Habana’s collective communication library (HCCL) to support scale-up within a node and scale-out across multiple nodes.

Support Matrix

SynapseAI 1.13.0
PyTorch 2.1.0
(PyTorch) Lightning* 2.1.x
Lightning Habana 1.3.0-dev0
DeepSpeed** Forked from v0.10.3 of the official DeepSpeed repo.

* covers both packages lightning and pytorch-lightning

For more information, check out HPU Support Matrix

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lightning-habana-1.3.0.dev0.tar.gz (42.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lightning_habana-1.3.0.dev0-py3-none-any.whl (57.7 kB view details)

Uploaded Python 3

File details

Details for the file lightning-habana-1.3.0.dev0.tar.gz.

File metadata

  • Download URL: lightning-habana-1.3.0.dev0.tar.gz
  • Upload date:
  • Size: 42.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.6

File hashes

Hashes for lightning-habana-1.3.0.dev0.tar.gz
Algorithm Hash digest
SHA256 6826b54001a6716b72a099009a835212d85ddaea3fb32fa5758dcfaffeb24914
MD5 20d5ccb37353004f4c639fd2e69ee135
BLAKE2b-256 204987c62307572cb7df161446dfeb247fc84dcca937970628cb07e5d4adfc35

See more details on using hashes here.

File details

Details for the file lightning_habana-1.3.0.dev0-py3-none-any.whl.

File metadata

File hashes

Hashes for lightning_habana-1.3.0.dev0-py3-none-any.whl
Algorithm Hash digest
SHA256 41ceb7d4a39cccec70f746e4d6efc8570985eef89c850ae2b51f0f9ffca769f7
MD5 f1589f4688f245d82af1f8dcd208a11a
BLAKE2b-256 380cc81e1f887bde549a14cb317829c79293d69425bd84694bedb28610b65d4e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page