Skip to main content

Python bindings for the Transformer models implemented in C/C++ using GGML library.

Project description

C Transformers PyPI tests build

Python bindings for the Transformer models implemented in C/C++ using GGML library.

Supported Models

Models Model Type
GPT-2 gpt2
GPT-J, GPT4All-J gptj
GPT-NeoX, StableLM gpt_neox
LLaMA llama
MPT mpt
Dolly V2 dolly-v2
StarCoder starcoder

More models coming soon.

Installation

pip install ctransformers

Usage

It provides a unified interface for all models:

from ctransformers import AutoModelForCausalLM

llm = AutoModelForCausalLM.from_pretrained('/path/to/ggml-gpt-2.bin', model_type='gpt2')

print(llm('AI is going to'))

Run in Google Colab

If you are getting illegal instruction error, try using lib='avx' or lib='basic':

llm = AutoModelForCausalLM.from_pretrained('/path/to/ggml-gpt-2.bin', model_type='gpt2', lib='avx')

It provides a generator interface for more control:

tokens = llm.tokenize('AI is going to')

for token in llm.generate(tokens):
    print(llm.detokenize(token))

This allows you to use a custom tokenizer.

It also provides access to the low-level C API. See Documentation section below.

Hugging Face Hub

It can be used with models hosted on the Hub:

llm = AutoModelForCausalLM.from_pretrained('marella/gpt-2-ggml')

If a model repo has multiple model files (.bin files), specify a model file using:

llm = AutoModelForCausalLM.from_pretrained('marella/gpt-2-ggml', model_file='ggml-model.bin')

It can be used with your own models uploaded on the Hub. For better user experience, upload only one model per repo.

To use it with your own model, add config.json file to your model repo specifying the model_type:

{
  "model_type": "gpt2"
}

You can also specify additional parameters under task_specific_params.text-generation:

{
  "model_type": "gpt2",
  "task_specific_params": {
    "text-generation": {
      "top_k": 40,
      "top_p": 0.95,
      "temperature": 0.8,
      "repetition_penalty": 1.1,
      "last_n_tokens": 64
    }
  }
}

See marella/gpt-2-ggml for a minimal example and marella/gpt-2-ggml-example for a full example.

LangChain

LangChain is a framework for developing applications powered by language models. A LangChain LLM object can be created using:

from ctransformers.langchain import CTransformers

llm = CTransformers(model='/path/to/ggml-gpt-2.bin', model_type='gpt2')

print(llm('AI is going to'))

If you are getting illegal instruction error, try using lib='avx' or lib='basic':

llm = CTransformers(model='/path/to/ggml-gpt-2.bin', model_type='gpt2', lib='avx')

It can also be used with models hosted on the Hugging Face Hub:

llm = CTransformers(model='marella/gpt-2-ggml')

Additional parameters can be passed using the config parameter:

config = {'max_new_tokens': 256, 'repetition_penalty': 1.1}

llm = CTransformers(model='marella/gpt-2-ggml', config=config)

It can be used with other LangChain modules:

from langchain import PromptTemplate, LLMChain

template = """Question: {question}

Answer:"""

prompt = PromptTemplate(template=template, input_variables=['question'])

llm_chain = LLMChain(prompt=prompt, llm=llm)

print(llm_chain.run('What is AI?'))

Documentation

Config

Parameter Type Description Default
top_k int The top-k value to use for sampling. 40
top_p float The top-p value to use for sampling. 0.95
temperature float The temperature to use for sampling. 0.8
repetition_penalty float The repetition penalty to use for sampling. 1.1
last_n_tokens int The number of last tokens to use for repetition penalty. 64
seed int The seed value to use for sampling tokens. -1
max_new_tokens int The maximum number of new tokens to generate. 256
stop List[str] A list of sequences to stop generation when encountered. None
stream bool Whether to stream the generated text. False
reset bool Whether to reset the model state before generating text. True
batch_size int The batch size to use for evaluating tokens. 8
threads int The number of threads to use for evaluating tokens. -1

class AutoModelForCausalLM


classmethod AutoModelForCausalLM.from_pretrained

from_pretrained(
    model_path_or_repo_id: str,
    model_type: Optional[str] = None,
    model_file: Optional[str] = None,
    config: Optional[ctransformers.hub.AutoConfig] = None,
    lib: Optional[str] = None,
    **kwargs
)  LLM

Loads the language model from a local file or remote repo.

Args:

  • model_path_or_repo_id: The path to a model file or directory or the name of a Hugging Face Hub model repo.
  • model_type: The model type.
  • model_file: The name of the model file in repo or directory.
  • config: AutoConfig object.
  • lib: The path to a shared library or one of avx2, avx, basic.

Returns: LLM object.

class LLM

method LLM.__init__

__init__(
    model_path: str,
    model_type: str,
    config: Optional[ctransformers.llm.Config] = None,
    lib: Optional[str] = None
)

Loads the language model from a local file.

Args:

  • model_path: The path to a model file.
  • model_type: The model type.
  • config: Config object.
  • lib: The path to a shared library or one of avx2, avx, basic.

property LLM.config

The config object.


property LLM.model_path

The path to the model file.


property LLM.model_type

The model type.


method LLM.detokenize

detokenize(tokens: Sequence[int])  str

Converts a list of tokens to text.

Args:

  • tokens: The list of tokens.

Returns: The combined text of all tokens.


method LLM.eval

eval(
    tokens: Sequence[int],
    batch_size: Optional[int] = None,
    threads: Optional[int] = None
)  None

Evaluates a list of tokens.

Args:

  • tokens: The list of tokens to evaluate.
  • batch_size: The batch size to use for evaluating tokens. Default: 8
  • threads: The number of threads to use for evaluating tokens. Default: -1

method LLM.generate

generate(
    tokens: Sequence[int],
    top_k: Optional[int] = None,
    top_p: Optional[float] = None,
    temperature: Optional[float] = None,
    repetition_penalty: Optional[float] = None,
    last_n_tokens: Optional[int] = None,
    seed: Optional[int] = None,
    batch_size: Optional[int] = None,
    threads: Optional[int] = None,
    reset: Optional[bool] = None
)  Generator[int, NoneType, NoneType]

Generates new tokens from a list of tokens.

Args:

  • tokens: The list of tokens to generate tokens from.
  • top_k: The top-k value to use for sampling. Default: 40
  • top_p: The top-p value to use for sampling. Default: 0.95
  • temperature: The temperature to use for sampling. Default: 0.8
  • repetition_penalty: The repetition penalty to use for sampling. Default: 1.1
  • last_n_tokens: The number of last tokens to use for repetition penalty. Default: 64
  • seed: The seed value to use for sampling tokens. Default: -1
  • batch_size: The batch size to use for evaluating tokens. Default: 8
  • threads: The number of threads to use for evaluating tokens. Default: -1
  • reset: Whether to reset the model state before generating text. Default: True

Returns: The generated tokens.


method LLM.is_eos_token

is_eos_token(token: int)  bool

Checks if a token is an end-of-sequence token.

Args:

  • token: The token to check.

Returns: True if the token is an end-of-sequence token else False.


method LLM.reset

reset()  None

Resets the model state.


method LLM.sample

sample(
    top_k: Optional[int] = None,
    top_p: Optional[float] = None,
    temperature: Optional[float] = None,
    repetition_penalty: Optional[float] = None,
    last_n_tokens: Optional[int] = None,
    seed: Optional[int] = None
)  int

Samples a token from the model.

Args:

  • top_k: The top-k value to use for sampling. Default: 40
  • top_p: The top-p value to use for sampling. Default: 0.95
  • temperature: The temperature to use for sampling. Default: 0.8
  • repetition_penalty: The repetition penalty to use for sampling. Default: 1.1
  • last_n_tokens: The number of last tokens to use for repetition penalty. Default: 64
  • seed: The seed value to use for sampling tokens. Default: -1

Returns: The sampled token.


method LLM.tokenize

tokenize(text: str)  List[int]

Converts a text into list of tokens.

Args:

  • text: The text to tokenize.

Returns: The list of tokens.


method LLM.__call__

__call__(
    prompt: str,
    max_new_tokens: Optional[int] = None,
    top_k: Optional[int] = None,
    top_p: Optional[float] = None,
    temperature: Optional[float] = None,
    repetition_penalty: Optional[float] = None,
    last_n_tokens: Optional[int] = None,
    seed: Optional[int] = None,
    batch_size: Optional[int] = None,
    threads: Optional[int] = None,
    stop: Optional[Sequence[str]] = None,
    stream: Optional[bool] = None,
    reset: Optional[bool] = None
)  Union[str, Generator[str, NoneType, NoneType]]

Generates text from a prompt.

Args:

  • prompt: The prompt to generate text from.
  • max_new_tokens: The maximum number of new tokens to generate. Default: 256
  • top_k: The top-k value to use for sampling. Default: 40
  • top_p: The top-p value to use for sampling. Default: 0.95
  • temperature: The temperature to use for sampling. Default: 0.8
  • repetition_penalty: The repetition penalty to use for sampling. Default: 1.1
  • last_n_tokens: The number of last tokens to use for repetition penalty. Default: 64
  • seed: The seed value to use for sampling tokens. Default: -1
  • batch_size: The batch size to use for evaluating tokens. Default: 8
  • threads: The number of threads to use for evaluating tokens. Default: -1
  • stop: A list of sequences to stop generation when encountered. Default: None
  • stream: Whether to stream the generated text. Default: False
  • reset: Whether to reset the model state before generating text. Default: True

Returns: The generated text.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ctransformers-0.2.0.tar.gz (3.2 MB view hashes)

Uploaded Source

Built Distribution

ctransformers-0.2.0-py3-none-any.whl (3.2 MB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page