No project description provided

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

FBGEMM_GPU

FBGEMM_GPU (FBGEMM GPU kernel library) is a collection of high-performance CUDA GPU operator library for GPU training and inference.

The library provides efficient table batched embedding bag, data layout transformation, and quantization supports.

Currently tested with PyTorch 1.11 and CUDA 11.3 (previously tested with PyTorch 1.9 and automated CI testing planned)

Only Intel/AMD with AVX2 extensions are currently supported.

General build and install instructions are as follows:

Build dependencies: "pytorch", "scikit-build","cmake","ninja","jinja2","torch>0.9","cudatoolkit", and for testing: "hypothesis".

# requires PyTorch 1.11 or later
conda install pytorch cudatoolkit=11.3 -c pytorch-nightly
conda install scikit-build jinja2 ninja cmake hypothesis

PIP install

Currently only built with sm70/80 (V100/A100 GPU) wheel supports:

pip install fbgemm-gpu-nightly

Build from source

Additional dependencies: currently cuDNN is required to be installed.

git clone --recursive https://github.com/pytorch/FBGEMM.git
cd FBGEMM/fbgemm_gpu
# if you are updating an existing checkout
git submodule sync
git submodule update --init --recursive

# Specify CUDA version to use
# (may not be needed with only a single version installed)
export CUDA_BIN_PATH=/usr/local/cuda-11.3/
export CUDACXX=/usr/local/cuda-11.3/bin/nvcc

# if using CUDA 10 or earliers set the location to the CUB installation directory
export CUB_DIR=${CUB_DIR}
# in fbgemm_gpu folder
# build for the CUDA architecture supported by current system (or all architectures if no CUDA device present)
python setup.py install
# or build it for specific CUDA architectures (see PyTorch documentation for usage of TORCH_CUDA_ARCH_LIST)
python setup.py install -DTORCH_CUDA_ARCH_LIST="7.0;8.0"

Usage Example:

cd bench
python split_table_batched_embeddings_benchmark.py uvm

Issues

Building is CMAKE based and keeps state across install runs. Specifying the CUDA architectures in the command line once is enough. However on failed builds (missing dependencies ..) this can cause problems and using

python setup.py clean

to remove stale cached state can be helpfull.

Examples

The tests (in test folder) and benchmarks (in bench folder) are some great examples of using FBGEMM_GPU.

Build Notes

FBGEMM_GPU uses a scikit-build CMAKE-based build flow.

Dependencies

FBGEMM_GPU requires nvcc and a Nvidia GPU with compute capability of 3.5+.

CUB is now included with CUDA 11.1+ - the section below will still be needed for lower CUDA versions (once they are tested).

For the CUB build time dependency, if you are using conda, you can continue with

conda install -c bottler nvidiacub

Otherwise download the CUB library from https://github.com/NVIDIA/cub/releases and unpack it to a folder of your choice. Define the environment variable CUB_DIR before building and point it to the directory that contains CMakeLists.txt for CUB. For example on Linux/Mac,

curl -LO https://github.com/NVIDIA/cub/archive/1.10.0.tar.gz
tar xzf 1.10.0.tar.gz
export CUB_DIR=$PWD/cub-1.10.0

PyTorch, Jinja2, scikit-build

PyTorch, Jinja2 and are scikit-build required to build and run the table batched embedding bag operator. One thing to note is that the implementation of this op relies on the version of PyTorch 1.9 or later.

conda install scikit-build jinja2 ninja cmake

Running FBGEMM_GPU

To run the tests or benchmarks after building FBGEMM_GPU (if tests or benchmarks are built), use the following command:

# run the tests and benchmarks of table batched embedding bag op,
# data layout transform op, quantized ops, etc.
cd test
python split_table_batched_embeddings_test.py
python quantize_ops_test.py
python sparse_ops_test.py
python split_embedding_inference_converter_test.py
cd ../bench
python split_table_batched_embeddings_benchmark.py

How FBGEMM_GPU works

For a high-level overview, design philosophy and brief descriptions of various parts of FBGEMM_GPU please see our Wiki (work in progress).

Full documentation

We have extensively used comments in our source files. The best and up-do-date documentation is available in the source files.

Join the FBGEMM community

See the CONTRIBUTING file for how to help out.

License

FBGEMM is BSD licensed, as found in the LICENSE file.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.7.0

Apr 25, 2024

0.6.0

Jan 29, 2024

0.5.0

Oct 5, 2023

0.4.1

Mar 24, 2023

0.4.0

Mar 15, 2023

0.3.2

Dec 14, 2022

0.3.0

Oct 27, 2022

0.2.0

Jun 28, 2022

0.1.1

May 3, 2022

0.0.2

May 1, 2022

This version

0.0.1

Mar 2, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

fbgemm_gpu-0.0.1-cp39-cp39-manylinux1_x86_64.whl (80.3 MB view hashes)

Uploaded Mar 2, 2022 CPython 3.9

fbgemm_gpu-0.0.1-cp38-cp38-manylinux1_x86_64.whl (80.3 MB view hashes)

Uploaded Mar 2, 2022 CPython 3.8

fbgemm_gpu-0.0.1-cp37-cp37m-manylinux1_x86_64.whl (80.3 MB view hashes)

Uploaded Mar 2, 2022 CPython 3.7m

Hashes for fbgemm_gpu-0.0.1-cp39-cp39-manylinux1_x86_64.whl

Hashes for fbgemm_gpu-0.0.1-cp39-cp39-manylinux1_x86_64.whl
Algorithm	Hash digest
SHA256	`9cdd7e7bb1559a2d3a68daa9a263411b414e5085089605edfe804226e48a0cab`
MD5	`e696fecf95f96b55847029501099dfdf`
BLAKE2b-256	`a709344f01f70f9ac4a1824cf4b28588c29515c92855a19dd9870ca0da34cd27`

Hashes for fbgemm_gpu-0.0.1-cp38-cp38-manylinux1_x86_64.whl

Hashes for fbgemm_gpu-0.0.1-cp38-cp38-manylinux1_x86_64.whl
Algorithm	Hash digest
SHA256	`f8c71ad6e53db9a27bbf7f7183589deff4fc86a81c5e42953532a35aa6a78c3a`
MD5	`1e61157c1974c693d77838398ea3c461`
BLAKE2b-256	`f29bbe6685aee2946b190573481ae799e436d3229b8078365706953d3353c18a`

Hashes for fbgemm_gpu-0.0.1-cp37-cp37m-manylinux1_x86_64.whl

Hashes for fbgemm_gpu-0.0.1-cp37-cp37m-manylinux1_x86_64.whl
Algorithm	Hash digest
SHA256	`2374767501f27d3d22556e07d46e6f2b06486f41cd215628c3ad38b114229670`
MD5	`95295d6bd2d59b9a15110b605bed1e9b`
BLAKE2b-256	`4a3d1d455d438e77ca625f29c0239e9b5083b4616560ebe43570932add484d7a`