cli tool for downloading and quantizing LLMs

These details have not been verified by PyPI

Project links

repository

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

quantkit

A tool for downloading and converting HuggingFace models without drama.

Install

pip3 install llm-quantkit

Usage

Usage: quantkit [OPTIONS] COMMAND [ARGS]...

Options:
  --help  Show this message and exit.

Commands:
  download    Download model from huggingface.
  safetensor  Download and/or convert a pytorch model to safetensor format.
  awq         Download and/or convert a model to AWQ format.
  exl2        Download and/or convert a model to EXL2 format.
  gptq        Download and/or convert a model to GPTQ format.

The first argument after command should be an HF repo id (mistralai/Mistral-7B-v0.1) or a local directory with model files in it already. --hf-cache downloads the model to the HF cache and places symlinks to it in the output directory.
--no-cache downloads the model to the output directory without symlinks.

AWQ defaults to 4 bits, group size 128, zero-point True.
GPTQ defaults are 4 bits, group size 128, activation-order False.
EXL2 defaults to 8 head bits but there is no default bitrate.

Examples

Download a model from HF and don't use HF cache:

quantkit download teknium/Hermes-Trismegistus-Mistral-7B --no-cache

Only download the safetensors version of a model (useful for models that have torch and safetensor):

quantkit download mistralai/Mistral-7B-v0.1 --no-cache --safetensors-only -out mistral7b

Download and convert a model to safetensor, deleting the original pytorch bins:

quantkit safetensor migtissera/Tess-10.7B-v1.5b --delete-original

Download and convert a model to AWQ:

quantkit awq mistralai/Mistral-7B-v0.1 -out Mistral-7B-v0.1-AWQ

Convert a model to GPTQ (4 bits / group-size 32):

quantkit gptq mistral7b -out Mistral-7B-v0.1-GPTQ -b 4 --group-size 32

Convert a model to exllamav2:

quantkit exl2 mistralai/Mistral-7B-v0.1 -out Mistral-7B-v0.1-exl2-b8-h8 -b 8 -hb 8

Still in beta.

Project details

These details have not been verified by PyPI

Project links

repository

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.27

May 21, 2024

0.26

Apr 24, 2024

0.25

Apr 12, 2024

0.24

Apr 6, 2024

0.23

Apr 2, 2024

0.22

Mar 21, 2024

0.21

Mar 20, 2024

0.20

Mar 20, 2024

0.19

Mar 20, 2024

0.18

Mar 20, 2024

This version

0.17

Mar 15, 2024

0.16

Mar 14, 2024

0.15

Mar 13, 2024

0.14

Mar 13, 2024

0.13

Mar 13, 2024

0.12

Mar 13, 2024

0.11

Mar 13, 2024

0.1

Mar 13, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm-quantkit-0.17.tar.gz (7.3 kB view hashes)

Uploaded Mar 15, 2024 Source

Built Distribution

llm_quantkit-0.17-py3-none-any.whl (8.3 kB view hashes)

Uploaded Mar 15, 2024 Python 3

Hashes for llm-quantkit-0.17.tar.gz

Hashes for llm-quantkit-0.17.tar.gz
Algorithm	Hash digest
SHA256	`9d75da614877c0873b7992cc481bb25a0c1f045f3c9ff0fe46b002765762f71e`
MD5	`c6a27b96617bcc89639f4cca614f4ef6`
BLAKE2b-256	`edc37ee7c1e5eb70f89cf9bc567eb15da3b18534b384097dcab99c3440d0b595`

Hashes for llm_quantkit-0.17-py3-none-any.whl

Hashes for llm_quantkit-0.17-py3-none-any.whl
Algorithm	Hash digest
SHA256	`cb6f26485963c5eee0b6d4ea7db689748cab9635d18998ac64bb693e632cee2f`
MD5	`22a2e726a8e2bf254e30592acf51c755`
BLAKE2b-256	`d9f107af3da575774286089599720017881642ada5cda4c9e05213e9ea30459a`