Unified flow interface for data augmentation for LLM training.

These details have not been verified by PyPI

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

🌊 uniflow

uniflow is a unified interface to solve data augmentation problem for LLM training.

Quick Install

pip3 install uniflow

See more details at the full installation.

Overview

For all the flows, you must first import the Client interface and the specific uniflow configuration that you want to use.

from uniflow.client import Client
from uniflow.config import UNIFLOW_CONFIG

Then you can create a Client object to run the a particular flow.

client = Client(UNIFLOW_CONFIG)

Here is a table of the different pre-defined configurations you can use and their corresponding Flows, Model Configs, LLMs:

Config	Flow	Model Config	LLM
Config	`ModelFLow`	`ModelConfig`	`gpt-3.5-turbo-1106`
OpenAIConfig	`ModelFLow`	`OpenAIModelConfig`	`gpt-3.5-turbo-1106`
OpenAIJsonConfig	`OpenAIJsonModelFlow`	`ModelConfig`	`gpt-3.5-turbo-1106`
OpenAIFewShotConfig	`FewShotModelFlow`	`OpenAIModelConfig`	`gpt-3.5-turbo-1106`
HuggingfaceConfig	`HuggingFaceModelFlow`	`HuggingfaceModelConfig`	`mistralai/Mistral-7B-Instruct-v0.1`
LMQGConfig	`LMQGModelFlow`	`LMQGModelConfig`	`lmqg/t5-base-squad-qg-ae`

You can also run your own custom flows by passing in a custom configurations and/or parameters to the Client object. See the Custom Configuration section for more details.

Every flow takes a list of dictionaries as input. Each dictionary in the input list should have data to prompt in a format similar one of the following:

A simple list of context dicts, which can be used with a few-shot model:
```
data = [
{"context": """It was a sunny day and the sky color is blue.""",},
{"context": """My name is bobby and I am a talent software engineer working on AI/ML.""",}
]
```
This will be appended to the few_shot_template in the configuration. See the few_shot_model.ipynb notebook for a working example.

A more complex list of dictionaries with a single instruction and an examples list:

data = [{
  "instruction": """Generate one question and its corresponding answer based on the context. Following the format of the examples below to include context, question, and answer in the response.""",
  "examples": [
      {
          "context": """The quick brown fox jumps over the lazy dog.""",
          "question": """What is the color of the fox?""",
          "answer": """brown."""
      },
      {
          "context": """The quick brown fox jumps over the lazy black dog.""",
          "question": """What is the color of the dog?""",
          "answer": """black."""
      },
      {
          "context": """It was a sunny day and the sky color is blue.""",
          "question": """""",
          "answer": """""",
      }
    ],
},
...]

Next, use the client object to run the flow on the input data.

output = client.run(data)

For example if you want to run the FewShotModelFlow flow with the OpenAIFewShotConfig configuration, you can do the following:

from uniflow.client import Client
from uniflow.config import OpenAIFewShotConfig

data = [
    {"context": """It was a sunny day and the sky color is blue.""",},
    {"context": """My name is bobby and I am a talent software engineer working on AI/ML.""",}
    ]

config = OpenAIFewShotConfig()
client = Client(config)

output = client.run(data)

output[0]['output'][0]

You will see output similar to the following:

{'response': [{'context': 'It was a sunny day and the sky color is blue.',
   'question': 'what color is the sky on a sunny day?',
   'answer': 'blue.'}],
 'error': 'Failed to deserialize 0 examples'}

Flows

The Flow determines which uniflow Model is run. All the different flows are built on the Flow class, which converts the input data into nodes. The Model determines how the input data is serialized and deserialized.

Here is a table of the different Flows and their corresponding Models:

Flow	Model
`ModelFlow`	`JsonModel`
`FewShotModelFlow`	`FewShotModel`
`OpenAIJsonModelFlow`	`OpenAIJsonModel`
`HuggingFaceModelFlow`	`Model`
`LMQGModelFlow`	`Model`

Model

Serialization

This model serializes the input data into a single string, with the format:

KEY0: VALUE0\n
KEY1: VALUE1\n
...

Deserialization

It returns the raw data in the following object:

{
    RESPONSE: data,
    ERROR: "Failed to deserialize 0 examples",
}

JsonModel

Serialization

This model first appends the data to the few_shot_template in the configuration. It then serializes the data using json.dumps.

Deserialization

It iterates through the data and parses the output data into a output_list. Then returns the output_list in the following object:

{
    RESPONSE: output_list,
    ERROR: "Failed to deserialize 0 examples",
}

FewShotModel

Serialization

The FewShotModel serializes the input data into a few more steps. First, it adds the input data to the examples list within the few_shot_template. Then it serializes the data into a single string, with the format:

KEY0: VALUE0\n
KEY1: VALUE1\n
...

Deserialization

It first runs through a data filtering function to parse the output data into a output_list. Then returns the output_list in the following object:

{
    RESPONSE: output_list,
    ERROR: "Failed to deserialize 0 examples",
}

OpenAIJsonModel

Serialization

Same as the Model.

Deserialization

Same as from the Model.

Model Config

The Model Config is a configuration that is passed to the base Config object and determines which LLM model is used. It also has parameters that are specific to the LLM model.

ModelConfig

The base config is called ModelConfig and has the following parameters:

Parameter	Type	Default	Description
`model_name`	str	gpt-3.5-turbo-1106	name of the LLM model used

OpenAIModelConfig

The OpenAIModelConfig inherits from the ModelConfig and has the following additional parameters:

Parameter	Type	Default	Description
`num_calls`	int	1	The number of calls to make to the OpenAI API.
`temperature`	float	1.5	The temperature to use for the OpenAI API.
`response_format`	Dict[str, str]	{"type": "text"}	The response format to use for the OpenAI API.

OpenAIJsonModelConfig

The OpenAIJsonModelConfig inherits from the ModelConfig and has the following additional parameters:

Parameter	Type	Default	Description
`num_calls`	int	1	The number of calls to make to the OpenAI API.
`temperature`	float	1.5	The temperature to use for the OpenAI API.
`response_format`	Dict[str, str]	{"type": "json_object"}	The response format to use for the OpenAI API.

HuggingfaceModelConfig

The HuggingfaceModelConfig inherits from the ModelConfig, but overrides the model_name parameter to use the mistralai/Mistral-7B-Instruct-v0.1 model by default.

Parameter	Type	Default	Description
`model_name`	str	mistralai/Mistral-7B-Instruct-v0.1	name of the LLM model used

LMQGModelConfig

The LMQGModelConfig inherits from the ModelConfig, but overrides the model_name parameter to use the lmqg/t5-base-squad-qg-ae model by default.

Parameter	Type	Default	Description
`model_name`	str	lmqg/t5-base-squad-qg-ae	name of the LLM model used

Few Shot Template

The OpenAIFewShotConfig is the only configuration that has a default feww_shot_template. The default template is the following:

 {
            "instruction": """Generate one question and its corresponding answer based on the context.
    Following the format of the examples below. The response must include original context, question, and answer.""",
            "examples": [
                {
                    "context": """The quick brown fox jumps over the lazy dog.""",
                    "question": """What is the color of the fox?""",
                    "answer": """brown.""",
                },
                {
                    "context": """The quick brown fox jumps over the lazy black dog.""",
                    "question": """What is the color of the dog?""",
                    "answer": """black.""",
                },
            ],
        }

Custom Configuration

You can also configure the flows by passing custom configurations or arguments to the Client object.

Every configuration inherits from the Config configuration, which has the following parameters:

Parameter	Type	Description
`flow_name`	str	The name of the flow to run.
`few_shot_template`	Dict[str, str]	The template to use for the `FewShotModelFlow`.
`num_threads`	int	The number of threads to use for the flow.
`model_config`	`ModelConfig`	The configuration to pass to the model.

You can further configure the model_config by passing in one of the Model Configs with custom parameters.

Here is an example of how to pass in a custom configuration to the Client object:

from uniflow.client import Client
from uniflow.config import OpenAIFewShotConfig
from uniflow.model.config import OpenAIModelConfig

data = [
    {"context": """It was a sunny day and the sky color is blue.""",},
    {"context": """My name is bobby and I am a talent software engineer working on AI/ML.""",}
    ]

config = OpenAIFewShotConfig(
  num_threads=2,
  model_config=OpenAIModelConfig(
    model_name="gpt-4",
    num_calls=2,
    temperature=0.5,
  ),
)
client = Client(config)
output = client.run(data)

As you can see, we are passing in a custom parameters to the OpenAIModelConfig to the OpenAIFewShotConfig configurations according to our needs.

Examples

For more examples, see the example folder.

Installation

To get started with uniflow, you can install it using pip in a conda environment.

First, create a conda environment on your terminal using:

conda create -n uniflow python=3.10 -y
conda activate uniflow  # some OS requires `source activate uniflow`

Next, install the compatible pytorch based on your OS.

If you are on a GPU, install pytorch based on your cuda version. You can find your CUDA version via nvcc -V.

pip3 install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu121  # cu121 means cuda 12.1

If you are on a CPU instance,
```
pip3 install torch
```

Then, install uniflow:

pip3 install uniflow

Finally, if you are running the HuggingfaceModelFlow, you will also need to install the transformers, accelerate, bitsandbytes, scipy libraries:

pip3 install transformers accelerate bitsandbytes scipy

Congrats you have finished the installation!

Dev Setup

If you are interested in contributing to us, here are the preliminary development setups.

API keys

If you are running one of the following OpenAI flows, you will have to set up your OpenAI API key.

To do so, create a .env file in your root uniflow folder. Then add the following line to the .env file:

OPENAI_API_KEY=YOUR_API_KEY

Backend Dev Setup

conda create -n uniflow python=3.10
conda activate uniflow
cd uniflow
pip3 install poetry
poetry install --no-root

EC2 Dev Setup

If you are on EC2, you can launch a GPU instance with the following config:

EC2 g4dn.xlarge (if you want to run a pretrained LLM with 7B parameters)
Deep Learning AMI PyTorch GPU 2.0.1 (Ubuntu 20.04)
EBS: at least 100G

Project details

These details have not been verified by PyPI

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.0.31

Mar 29, 2024

0.0.30

Mar 18, 2024

0.0.29

Mar 11, 2024

0.0.28

Mar 11, 2024

0.0.27

Mar 7, 2024

0.0.26

Feb 27, 2024

0.0.25

Feb 24, 2024

0.0.24

Feb 20, 2024

0.0.23

Feb 15, 2024

0.0.22

Feb 4, 2024

0.0.21

Jan 26, 2024

0.0.20

Jan 25, 2024

0.0.19

Jan 25, 2024

0.0.18

Jan 22, 2024

0.0.17

Jan 19, 2024

0.0.16

Jan 17, 2024

0.0.15

Jan 15, 2024

0.0.14

Jan 9, 2024

0.0.13

Jan 8, 2024

0.0.12

Jan 6, 2024

0.0.11

Dec 31, 2023

0.0.10

Dec 27, 2023

0.0.9

Dec 23, 2023

This version

0.0.8

Dec 11, 2023

0.0.7

Nov 7, 2023

0.0.6

Nov 6, 2023

0.0.5

Nov 3, 2023

0.0.4

Oct 20, 2023

0.0.3

Oct 20, 2023

0.0.2

Oct 20, 2023

0.0.1

Oct 13, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

uniflow-0.0.8.tar.gz (21.9 kB view hashes)

Uploaded Dec 11, 2023 Source

Built Distribution

uniflow-0.0.8-py3-none-any.whl (24.5 kB view hashes)

Uploaded Dec 11, 2023 Python 3

Hashes for uniflow-0.0.8.tar.gz

Hashes for uniflow-0.0.8.tar.gz
Algorithm	Hash digest
SHA256	`b6467a1ff6d3465008648eee3ec8768a637e0a7af3108014864395e45323d89c`
MD5	`11e2be8d4d11f210cb9ce3f12d00c525`
BLAKE2b-256	`18c15ddcbd0b7d32409ded37be797e559694ca454a99bbf516d40de08422afad`

Hashes for uniflow-0.0.8-py3-none-any.whl

Hashes for uniflow-0.0.8-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9a62ae4ceadd56d691facbbaf663be024a1a0a27e25e00638f0985f57005f6b2`
MD5	`8cf4aa6147c0e2379ce48a018f38be7b`
BLAKE2b-256	`d10c3a548bdb7bc9f52502eff50710a2bd79bfde2c5f09df1b642de89da81884`

uniflow 0.0.8

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

🌊 uniflow

Quick Install

Overview

Flows

Model

Serialization

Deserialization

JsonModel

Serialization

Deserialization

FewShotModel

Serialization

Deserialization

OpenAIJsonModel

Serialization

Deserialization

Model Config

ModelConfig

OpenAIModelConfig

OpenAIJsonModelConfig

HuggingfaceModelConfig

LMQGModelConfig

Few Shot Template

Custom Configuration

Examples

Installation

Dev Setup

API keys

Backend Dev Setup

EC2 Dev Setup

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution