No project description provided

These details have not been verified by PyPI

Project links

Homepage

Project description

FBGEMM_GPU

FBGEMM_GPU (FBGEMM GPU kernel library) is a collection of high-performance CUDA GPU operator library for GPU training and inference.

The library provides efficient table batched embedding bag, data layout transformation, and quantization supports.

Currently tested with CUDA 11.3, 11.5, 11.6, and 11.7 in CI. In all cases, we test with PyTorch packages which are built with CUDA 11.7.

Only Intel/AMD CPUs with AVX2 extensions are currently supported.

General build and install instructions are as follows:

Build dependencies: scikit-build, cmake, ninja, jinja2, torch, cudatoolkit, and for testing: hypothesis.

conda install scikit-build jinja2 ninja cmake hypothesis

If you're planning to build from source and don't have nvml.h in your system, you can install it via the command below.

conda install -c conda-forge cudatoolkit-dev

Certain operations require this library to be present. Be sure to provide the path to libnvidia-ml.so to --nvml_lib_path if installing from source (e.g. python setup.py install --nvml_lib_path path_to_libnvidia-ml.so).

PIP install

Currently only built with sm70/80 (V100/A100 GPU) wheel supports:

# Release GPU
conda install pytorch cuda -c pytorch -c "nvidia/label/cuda-11.7.1"
pip install fbgemm-gpu

# Release CPU-only
conda install pytorch cuda -c pytorch -c "nvidia/label/cuda-11.7.1"
pip install fbgemm-gpu-cpu

# Nightly GPU
conda install pytorch cuda -c pytorch-nightly -c "nvidia/label/cuda-11.7.1"
pip install fbgemm-gpu-nightly

# Nightly CPU-only
pip install --pre torch -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html
pip install fbgemm-gpu-nightly-cpu

Build from source

Additional dependencies: currently cuDNN is required to be installed. Please download and follow instructions here to install cuDNN.

# Requires PyTorch 1.13 or later
conda install pytorch cuda -c pytorch-nightly -c "nvidia/label/cuda-11.7.1"
git clone --recursive https://github.com/pytorch/FBGEMM.git
cd FBGEMM/fbgemm_gpu
# if you are updating an existing checkout
git submodule sync
git submodule update --init --recursive

# Specify CUDA version to use
# (may not be needed with only a single version installed)
export CUDA_BIN_PATH=/usr/local/cuda-11.3/
export CUDACXX=/usr/local/cuda-11.3/bin/nvcc

# Specify cuDNN library and header paths.  We tested CUDA 11.6 and 11.7 with
# cuDNN version 8.5.0.96
export CUDNN_LIBRARY=${HOME}/cudnn-linux-x86_64-8.5.0.96_cuda11-archive/lib
export CUDNN_INCLUDE_DIR=${HOME}/cudnn-linux-x86_64-8.5.0.96_cuda11-archive/include

# in fbgemm_gpu folder
# build for the CUDA architecture supported by current system (or all architectures if no CUDA device present)
python setup.py install
# or build it for specific CUDA architectures (see PyTorch documentation for usage of TORCH_CUDA_ARCH_LIST)
python setup.py install -DTORCH_CUDA_ARCH_LIST="7.0;8.0"

Usage Example:

cd bench
python split_table_batched_embeddings_benchmark.py uvm

Build on ROCm

FBGEMM_GPU supports running on AMD (ROCm) devices. A Docker container is recommended for setting up the ROCm environment. The installation on bare metal is also available. ROCm5.3 is used as an example of the installation below.

Build in a Docker container

Pull Docker container and run

docker pull rocm/pytorch:rocm5.4_ubuntu20.04_py3.8_pytorch_staging_base
sudo docker run -it --network=host --shm-size 16G --device=/dev/kfd --device=/dev/dri \
                --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined \
                --ipc=host --env PYTORCH_ROCM_ARCH="gfx906;gfx908;gfx90a" -u 0 \
                rocm/pytorch:rocm5.4_ubuntu20.04_py3.8_pytorch_staging_base

In the container

pip3 install --pre torch torchvision --extra-index-url https://download.pytorch.org/whl/nightly/rocm5.3/
cd ~
git clone https://github.com/pytorch/FBGEMM.git
cd FBGEMM/fbgemm_gpu
# if you are updating an existing checkout
git submodule sync
git submodule update --init --recursive
pip install -r requirements.txt
pip install update hypothesis

# in fbgemm_gpu folder
# build for the current ROCm architecture
gpu_arch="$(/opt/rocm/bin/rocminfo | grep -o -m 1 'gfx.*')"
export PYTORCH_ROCM_ARCH=$gpu_arch
python setup.py install develop
# or build for specific ROCm architectures
export PYTORCH_ROCM_ARCH="gfx906;gfx908"
python setup.py install develop
# otherwise the build will be for the default architectures gfx906;gfx908;gfx90a

Build on bare metal

Please refer to the installation instructions of ROCm5.3 here. Take the installation on Ubuntu20.04 as an example

sudo apt-get update
wget https://repo.radeon.com/amdgpu-install/5.3/ubuntu/focal/amdgpu-install_5.3.50300-1_all.deb
sudo apt-get install ./amdgpu-install_5.3.50300-1_all.deb
sudo amdgpu-install --usecase=hiplibsdk,rocm --no-dkms

MIOpen is required and needs to be installed separately.

sudo apt-get install miopen-hip miopen-hip-dev

The remaining steps are the same as the "in the container" section.

Run the tests on ROCm

Please add FBGEMM_TEST_WITH_ROCM=1 flag when running tests on ROCm.

cd test
FBGEMM_TEST_WITH_ROCM=1 python split_table_batched_embeddings_test.py

Issues

Building is CMAKE based and keeps state across install runs. Specifying the CUDA architectures in the command line once is enough. However on failed builds (missing dependencies ..) this can cause problems and using

python setup.py clean

to remove stale cached state can be helpful.

Examples

The tests (in test folder) and benchmarks (in bench folder) are some great examples of using FBGEMM_GPU.

Build Notes

FBGEMM_GPU uses a scikit-build CMAKE-based build flow.

Dependencies

FBGEMM_GPU requires nvcc and a Nvidia GPU with compute capability of 3.5+.

CUB is now included with CUDA 11.1+ - the section below will still be needed for lower CUDA versions (once they are tested).

For the CUB build time dependency, if you are using conda, you can continue with

conda install -c bottler nvidiacub

Otherwise download the CUB library from https://github.com/NVIDIA/cub/releases and unpack it to a folder of your choice. Define the environment variable CUB_DIR before building and point it to the directory that contains CMakeLists.txt for CUB. For example on Linux/Mac,

curl -LO https://github.com/NVIDIA/cub/archive/1.10.0.tar.gz
tar xzf 1.10.0.tar.gz
export CUB_DIR=$PWD/cub-1.10.0

PyTorch, Jinja2, scikit-build

PyTorch, Jinja2 and scikit-build are required to build and run the table batched embedding bag operator. One thing to note is that the implementation of this op relies on the version of PyTorch 1.9 or later.

conda install scikit-build jinja2 ninja cmake

Running FBGEMM_GPU

To run the tests or benchmarks after building FBGEMM_GPU (if tests or benchmarks are built), use the following command:

# run the tests and benchmarks of table batched embedding bag op,
# data layout transform op, quantized ops, etc.
cd test
python split_table_batched_embeddings_test.py
python quantize_ops_test.py
python sparse_ops_test.py
python split_embedding_inference_converter_test.py
cd ../bench
python split_table_batched_embeddings_benchmark.py

To run the tests and benchmarks on a GPU-capable device in CPU-only mode use CUDA_VISIBLE_DEVICES=-1

CUDA_VISIBLE_DEVICES=-1 python split_table_batched_embeddings_test.py

How FBGEMM_GPU works

For a high-level overview, design philosophy and brief descriptions of various parts of FBGEMM_GPU please see our Wiki (work in progress).

Full documentation

We have extensively used comments in our source files. The best and up-to-date documentation is available in the source files.

Building API Documentation

See docs/README.md.

Join the FBGEMM community

See the CONTRIBUTING file for how to help out.

License

FBGEMM is BSD licensed, as found in the LICENSE file.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

1.5.0

Jan 26, 2026

1.5.0rc2 pre-release yanked

Feb 19, 2026

1.5.0rc1 pre-release yanked

Jan 15, 2026

Reason this release was yanked:

1.5.0rc1

1.5.0rc0 pre-release yanked

Jan 7, 2026

Reason this release was yanked:

1.5.0rc0

1.4.2 yanked

Dec 5, 2025

Reason this release was yanked:

1.4.2

1.4.0

Dec 5, 2025

1.3.0

Aug 20, 2025

1.2.0

Apr 24, 2025

1.1.0

Jan 29, 2025

1.0.0

Oct 18, 2024

0.8.0

Jul 25, 2024

0.7.0

Apr 25, 2024

0.6.0

Jan 29, 2024

0.5.0

Oct 5, 2023

0.4.1

Mar 24, 2023

This version

0.4.0

Mar 15, 2023

0.3.2

Dec 14, 2022

0.3.0

Oct 27, 2022

0.2.0

Jun 28, 2022

0.1.1

May 3, 2022

0.0.1

Dec 25, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

fbgemm_gpu-0.4.0-cp310-cp310-manylinux1_x86_64.whl (226.7 MB view details)

Uploaded Mar 15, 2023 CPython 3.10

fbgemm_gpu-0.4.0-cp39-cp39-manylinux1_x86_64.whl (226.7 MB view details)

Uploaded Mar 15, 2023 CPython 3.9

fbgemm_gpu-0.4.0-cp38-cp38-manylinux1_x86_64.whl (226.7 MB view details)

Uploaded Mar 15, 2023 CPython 3.8

File details

Details for the file fbgemm_gpu-0.4.0-cp310-cp310-manylinux1_x86_64.whl.

File metadata

Download URL: fbgemm_gpu-0.4.0-cp310-cp310-manylinux1_x86_64.whl
Upload date: Mar 15, 2023
Size: 226.7 MB
Tags: CPython 3.10
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.7.1 importlib_metadata/4.11.3 pkginfo/1.8.3 requests/2.28.1 requests-toolbelt/0.9.1 tqdm/4.64.1 CPython/3.10.9

File hashes

Hashes for fbgemm_gpu-0.4.0-cp310-cp310-manylinux1_x86_64.whl
Algorithm	Hash digest
SHA256	`a18c319ecb8cf7d0c7988b66eb9bdefcb7ed3e5f55cf7bc42084b2017708c8e0`
MD5	`8f42980acb0aa9e130008bfb8ffd01b0`
BLAKE2b-256	`22aff6e56d52d87fbef410d086135092a7557a36479ca1900f853263b755e2a3`

See more details on using hashes here.

File details

Details for the file fbgemm_gpu-0.4.0-cp39-cp39-manylinux1_x86_64.whl.

File metadata

Download URL: fbgemm_gpu-0.4.0-cp39-cp39-manylinux1_x86_64.whl
Upload date: Mar 15, 2023
Size: 226.7 MB
Tags: CPython 3.9
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.7.1 importlib_metadata/4.11.3 pkginfo/1.8.3 requests/2.28.1 requests-toolbelt/0.9.1 tqdm/4.64.1 CPython/3.9.16

File hashes

Hashes for fbgemm_gpu-0.4.0-cp39-cp39-manylinux1_x86_64.whl
Algorithm	Hash digest
SHA256	`3ae8d0fee49582b480d0d5b905fb1da485638f0e87a5ca77f7bf751c1a610b60`
MD5	`8ac9233194a0fea7581d2ed2ab2cca82`
BLAKE2b-256	`c7650fb1ea1565e55781fee0091f61220bdf49ac0912d5d1797e98223dfc1240`

See more details on using hashes here.

File details

Details for the file fbgemm_gpu-0.4.0-cp38-cp38-manylinux1_x86_64.whl.

File metadata

Download URL: fbgemm_gpu-0.4.0-cp38-cp38-manylinux1_x86_64.whl
Upload date: Mar 15, 2023
Size: 226.7 MB
Tags: CPython 3.8
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.7.1 importlib_metadata/4.11.3 pkginfo/1.8.3 requests/2.28.1 requests-toolbelt/0.9.1 tqdm/4.64.1 CPython/3.8.16

File hashes

Hashes for fbgemm_gpu-0.4.0-cp38-cp38-manylinux1_x86_64.whl
Algorithm	Hash digest
SHA256	`ce7ca8745cbda95d1444fd84e4270b1ff17ae8444a4911a09ec7d7318071f1b4`
MD5	`6c6cace05a96baad456f6470ecbbc517`
BLAKE2b-256	`4db5c9b4d9f8fba48b35c80ffc61914b0916ef9b68b99ebc8b17ef90c53f1a92`

See more details on using hashes here.

fbgemm-gpu 0.4.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

FBGEMM_GPU

PIP install

Build from source

Usage Example:

Build on ROCm

Build in a Docker container

Build on bare metal

Run the tests on ROCm

Issues

Examples

Build Notes

Dependencies

CUB

PyTorch, Jinja2, scikit-build

Running FBGEMM_GPU

How FBGEMM_GPU works

Full documentation

Building API Documentation

Join the FBGEMM community

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distributions

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes