OpenAI CLIP text encoders for multiple languages!

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language
- Python :: 3.6
Topic
- Scientific/Engineering :: Artificial Intelligence

Project description

Multilingual-CLIP

OpenAI CLIP text encoders for any language

Colab Notebook · Pre-trained Models · Report Bug

Overview

Alt text

OpenAI recently released the paper Learning Transferable Visual Models From Natural Language Supervision in which they present the CLIP (Contrastive Language–Image Pre-training) model. This model is trained to connect text and images, by matching their corresponding vector representations using a contrastive learning objective. CLIP consists of two separate models, a visual encoder and a text encoder. These were trained on a wooping 400 Million images and corresponding captions. OpenAI has since released a set of their smaller CLIP models, which can be found on the official CLIP Github.

This repository contains

Pre-trained CLIP-Text encoders for multiple languages
Pytorch & Tensorflow inference code
Tensorflow training code

Requirements

While it is possible that other versions works equally fine, we have worked with the following:

Python = 3.6.9
Transformers = 4.8.1

Install

pip install multilingual-clip torch

You can also choose to pip install tensorflow instead of torch.

Inference Usage

Inference code for Tensorflow is also available in inference_example.py

from multilingual_clip import pt_multilingual_clip
import transformers

texts = [
    'Three blind horses listening to Mozart.',
    'Älgen är skogens konung!',
    'Wie leben Eisbären in der Antarktis?',
    'Вы знали, что все белые медведи левши?'
]
model_name = 'M-CLIP/XLM-Roberta-Large-Vit-L-14'

# Load Model & Tokenizer
model = pt_multilingual_clip.MultilingualCLIP.from_pretrained(model_name)
tokenizer = transformers.AutoTokenizer.from_pretrained(model_name)

embeddings = model.forward(texts, tokenizer)
print(embeddings.shape)

Install for development

Setup a virtualenv:

python3 -m venv .env
source .env/bin/activate
pip install -e .

Pre-trained Models

Every text encoder is a Huggingface available transformer, with an additional linear layer on top. For more information of a specific model, click the Model Name to see its model card.

Name	Model Base	Vision Model	Vision Dimensions	Pre-trained Languages	#Parameters
LABSE Vit-L/14	LaBSE	OpenAI ViT-L/14	768	109 Languages	110 M
XLM-R Large Vit-B/32	XLM-Roberta-Large	OpenAI ViT-B/32	512	100 Languages	344 M
XLM-R Large Vit-L/14	XLM-Roberta-Large	OpenAI ViT-L/14	768	100 Languages	344 M
XLM-R Large Vit-B/16+	XLM-Roberta-Large	Open CLIP ViT-B-16-plus-240	640	100 Languages	344 M

Validation & Training Curves

Following is a table of the Txt2Img @10-Recal for the humanly tanslated MS-COCO testset.

Name	En	De	Es	Fr	Zh	It	Pl	Ko	Ru	Tr	Jp
OpenAI CLIP Vit-B/32	90.3	-	-	-	-	-	-	-	-	-	-
OpenAI CLIP Vit-L/14	91.8	-	-	-	-	-	-	-	-	-	-
OpenCLIP ViT-B-16+-	94.3	-	-	-	-	-	-	-	-	-	-
LABSE Vit-L/14	91.6	89.6	89.5	89.9	88.9	90.1	89.8	80.8	85.5	89.8	73.9
XLM-R Large Vit-B/32	91.8	88.7	89.1	89.4	89.3	89.8	91.4	82.1	86.1	88.8	81.0
XLM-R Vit-L/14	92.4	90.6	91.0	90.0	89.7	91.1	91.3	85.2	85.8	90.3	81.9
XLM-R Large Vit-B/16+	95.0	93.0	93.6	93.1	94.0	93.1	94.4	89.0	90.0	93.0	84.2

The training curves for these models are available at this Weights and Biases.

Legacy Usage and Models

Older versions of M-CLIP had the linear weights stored separately from Huggingface. Whilst the new models have them directly incorporated in the Huggingface repository. More information about these older models can be found in this section.

Click for more information

Download CLIP Model

$ conda install --yes -c pytorch pytorch=1.7.1 torchvision cudatoolkit=11.0
$ pip install ftfy regex tqdm
$ pip install git+https://github.com/openai/CLIP.git

Replace cudatoolkit=11.0 above with the appropriate CUDA version on your machine or cpuonly when installing on a machine without a GPU. For more information please see the official CLIP repostitory.

Download Linear Weights

# Linear Model Weights
$ bash legacy_get-weights.sh

Inference

from multilingual_clip import multilingual_clip

print(multilingual_clip.AVAILABLE_MODELS.keys())

model = multilingual_clip.load_model('M-BERT-Distil-40')

embeddings = model(['Älgen är skogens konung!', 'Wie leben Eisbären in der Antarktis?', 'Вы знали, что все белые медведи левши?'])
print(embeddings.shape)
# Yields: torch.Size([3, 640])

For a more elaborate example, comparing the textual embeddings to the CLIP image embeddings see this colab notebook.

Legacy Pre-trained Models

Every text encoder is a Huggingface available transformer, with an additional linear layer on top. Neither of the models have been extensively tested, but for more information and qualitative test results for a specific model, click the Model Name to see its model card.

*** Make sure to update to the most recent version of the repostitory when downloading a new model, and re-run the shell script to download the Linear Weights. ***

Name	Model Base	Vision Model	Pre-trained Languages	Target Languages	#Parameters
Multilingual
M-BERT Distil 40	M-BERT Distil	RN50x4	101 Languages	40 Languages	66 M
M-BERT Base 69	M-BERT Base	RN50x4	101 Languages	68 Languages	110 M
M-BERT Base ViT-B	M-BERT Base	ViT-B/32	101 Languages	68 Languages	110 M
Monolingual
Swe-CLIP 500k	KB-BERT	RN50x4	Swedish	Swedish	110 M
Swe-CLIP 2M	KB-BERT	RN50x4	Swedish	Swedish	110 M

Training a new model

This folder contains the code used for training the above models. If you wsh to train your own model you must do the following things:

Prepare a set of translated sentence pairs from English -> Your Language(s)
Compute regular CLIP-Text embeddings for the English sentences.
Edit Training.py to load your data.
Train a new CLIP-Text encoder via Teacher Learning

Pre-computed CLIP Embeddings & Translaton Data

This Google Drive folder contains both pre-computed CLIP-Text Embeddings for a large porton of the the image captions of GCC + MSCOCO + VizWiz.

The Google Drive folder also contains the translation data used to train the currently available models. Good Luck

Contribution

If you have trained a CLIP Text encoder specific to your language, or another model covering a language not supported here, Please feel free to contact us and we will either upload your model and credit you, or simply link to your already uploaded model.

Contact

If you have questions regarding the code or otherwise related to this Github page, please open an issue.

For other purposes, feel free to contact me directly at: Fredrik.Carlsson@ri.se

Acknowledgements

License

Distributed under the MIT License. See LICENSE for more information.

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language
- Python :: 3.6
Topic
- Scientific/Engineering :: Artificial Intelligence

Release history Release notifications | RSS feed

This version

1.0.10

Jun 2, 2022

1.0.9

Jun 2, 2022

1.0.8

Jun 2, 2022

1.0.7

Jun 2, 2022

1.0.6

Jun 2, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

multilingual_clip-1.0.10.tar.gz (17.7 kB view details)

Uploaded Jun 2, 2022 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

multilingual_clip-1.0.10-py3-none-any.whl (20.4 kB view details)

Uploaded Jun 2, 2022 Python 3

File details

Details for the file multilingual_clip-1.0.10.tar.gz.

File metadata

Download URL: multilingual_clip-1.0.10.tar.gz
Upload date: Jun 2, 2022
Size: 17.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.8.12

File hashes

Hashes for multilingual_clip-1.0.10.tar.gz
Algorithm	Hash digest
SHA256	`eea1ef03ce91735636ddcd4c887f6ea54a7e45f47d4a06deef1dbe2ce8dec19c`
MD5	`9d768b7825801951b6dcd857b1e5aa98`
BLAKE2b-256	`a818a9aecb457c904696e9800d2ac538f364b23a3c7b8815326a45d9f3741a24`

See more details on using hashes here.

File details

Details for the file multilingual_clip-1.0.10-py3-none-any.whl.

File metadata

Download URL: multilingual_clip-1.0.10-py3-none-any.whl
Upload date: Jun 2, 2022
Size: 20.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.8.12

File hashes

Hashes for multilingual_clip-1.0.10-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b9acf95b8309c85a0db5e9c88c5f1b400687e08d72408c460731ae31e71dc73a`
MD5	`8e0c536e981ef2e1b41b6b044407b33f`
BLAKE2b-256	`daf7575f65ab34993153e9bc88ea5e58d59475bd9f191d2db29729e01c4231f6`

See more details on using hashes here.

multilingual-clip 1.0.10

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Multilingual-CLIP

OpenAI CLIP text encoders for any language

Overview

This repository contains

Requirements

Install

Inference Usage

Install for development

Pre-trained Models

Validation & Training Curves

Legacy Usage and Models

Download CLIP Model

Download Linear Weights

Inference

Legacy Pre-trained Models

Training a new model

Pre-computed CLIP Embeddings & Translaton Data

Contribution

Contact

Acknowledgements

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes