llm_embed(model_id, text) SQL function for Datasette
Project description
datasette-llm-embed
Datasette plugin adding a llm_embed(model_id, text)
SQL function.
Installation
datasette install datasette-llm-embed
Usage
Adds a SQL function that can be called like this:
select llm_embed('sentence-transformers/all-mpnet-base-v2', 'This is some text')
This embeds the provided text using the specified embedding model and returns a binary blob, suitable for use with plugins such as datasette-faiss.
The models need to be installed using LLM plugins such as llm-sentence-transformers.
Use llm_embed_cosine(a, b)
to calculate cosine similarity between two vector blobs:
select llm_embed_cosine(
llm_embed('sentence-transformers/all-mpnet-base-v2', 'This is some text'),
llm_embed('sentence-transformers/all-mpnet-base-v2', 'This is some other text')
)
The llm_embed_decode()
function can be used to decode a binary BLOB into a JSON array of floats:
select llm_embed_decode(
llm_embed('sentence-transformers/all-mpnet-base-v2', 'This is some text')
)
Models that require API keys
If your embedding model needs an API key - for example the ada-002
model from OpenAI - you can configure that key in metadata.yml
(or JSON) like this:
plugins:
datasette-llm-embed:
keys:
ada-002:
$env: OPENAI_API_KEY
The key here should be the full model ID of the model - not an alias.
You can then set the OPENAI_API_KEY
environment variable to the key you want to use before starting Datasette:
export OPENAI_API_KEY=sk-1234567890
Once configured, calls like this will use the API key that has been provided:
select llm_embed('ada-002', 'This is some text')
Development
To set up this plugin locally, first checkout the code. Then create a new virtual environment:
cd datasette-llm-embed
python3 -m venv venv
source venv/bin/activate
Now install the dependencies and test dependencies:
pip install -e '.[test]'
To run the tests:
```bash
pytest
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for datasette_llm_embed-0.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c3474758a5d54af523c344dcf99a331ba33930e7de73d0815feee5cc352c47ff |
|
MD5 | 878efbfc2ebd653efd488a7aa28b7472 |
|
BLAKE2b-256 | 95eaa90fcaaa81310bf05b0d3eace0a752b6b15aa7238273a72b7c373899fea5 |