A toolkit for efficiently fine-tuning LLM
Project description
🎉 News
- [2023.08.30] XTuner is released, with multiple fine-tuned adapters on HuggingFace.
📖 Introduction
XTuner is a toolkit for efficiently fine-tuning LLM, developed by the MMRazor and MMDeploy teams.
- Efficiency: Support LLM fine-tuning on consumer-grade GPUs. The minimum GPU memory required for 7B LLM fine-tuning is only 8GB, indicating that users can use nearly any GPU (even the free resource, e.g., Colab) to fine-tune custom LLMs.
- Versatile: Support various LLMs (InternLM, Llama2, ChatGLM2, Qwen, Baichuan, ...), datasets (MOSS_003_SFT, Alpaca, WizardLM, oasst1, Open-Platypus, Code Alpaca, Colorist, ...) and algorithms (QLoRA, LoRA), allowing users to choose the most suitable solution for their requirements.
- Compatibility: Compatible with DeepSpeed 🚀 and HuggingFace 🤗 training pipeline, enabling effortless integration and utilization.
🌟 Demos
🔥 Supports
Models | SFT Datasets | Data Pipelines | Algorithms |
🛠️ Quick Start
Installation
Install XTuner with pip
pip install xtuner
or from source
git clone https://github.com/InternLM/xtuner.git
cd xtuner
pip install -e .
Chat
Examples of Plugins-based Chat 🔥🔥🔥 | ||
---|---|---|
XTuner provides tools to chat with pretrained / fine-tuned LLMs.
-
For example, we can start the chat with Llama2-7B-Plugins by
xtuner chat hf meta-llama/Llama-2-7b-hf --adapter xtuner/Llama-2-7b-qlora-moss-003-sft --bot-name Llama2 --prompt-template moss_sft --with-plugins calculate solve search --command-stop-word "<eoc>" --answer-stop-word "<eom>" --no-streamer
For more examples, please see chat.md.
Fine-tune
XTuner supports the efficient fine-tune (e.g., QLoRA) for LLMs.
-
Step 0, prepare the config. XTuner provides many ready-to-use configs and we can view all configs by
xtuner list-cfg
Or, if the provided configs cannot meet the requirements, please copy the provided config to the specified directory and make specific modifications by
xtuner copy-cfg ${CONFIG_NAME} ${SAVE_DIR}
-
Step 1, start fine-tuning. For example, we can start the QLoRA fine-tuning of InternLM-7B with oasst1 dataset by
# On a single GPU xtuner train internlm_7b_qlora_oasst1_e3 # On multiple GPUs (DIST) NPROC_PER_NODE=${GPU_NUM} xtuner train internlm_7b_qlora_oasst1_e3 (SLURM) srun ${SRUN_ARGS} xtuner train internlm_7b_qlora_oasst1_e3 --launcher slurm
For more examples, please see finetune.md.
Deployment
-
Step 0, convert the pth adapter to HuggingFace adapter, by
xtuner convert adapter_pth2hf \ ${CONFIG} \ ${PATH_TO_PTH_ADAPTER} \ ${SAVE_PATH_TO_HF_ADAPTER}
or, directly merge the pth adapter to pretrained LLM, by
xtuner convert merge_adapter \ ${CONFIG} \ ${PATH_TO_PTH_ADAPTER} \ ${SAVE_PATH_TO_MERGED_LLM} \ --max-shard-size 2GB
-
Step 1, deploy fine-tuned LLM with any other framework, such as LMDeploy 🚀.
pip install lmdeploy python -m lmdeploy.pytorch.chat ${NAME_OR_PATH_TO_LLM} \ --max_new_tokens 256 \ --temperture 0.8 \ --top_p 0.95 \ --seed 0
🎯 We are woking closely with LMDeploy, to implement the deployment of plugins-based chat!
Evaluation
- We recommend using OpenCompass, a comprehensive and systematic LLM evaluation library, which currently supports 50+ datasets with about 300,000 questions.
🤝 Contributing
We appreciate all contributions to XTuner. Please refer to CONTRIBUTING.md for the contributing guideline.
🎖️ Acknowledgement
License
This project is released under the Apache License 2.0. Please also adhere to the Licenses of models and datasets being used.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.