Skip to main content

Visual Prompting for Large Multimodal Models (LMMs)

Project description

multimodal-maestro


version license python-version Gradio Colab

👋 hello

Multimodal-Maestro gives you more control over large multimodal models to get the outputs you want. With more effective prompting tactics, you can get multimodal models to do tasks you didn't know (or think!) were possible. Curious how it works? Try our HF space!

🚧 The project is still under construction, and the API is prone to change.

💻 install

⚠️ Our package has been renamed to maestro. Install the package in a 3.11>=Python>=3.8 environment.

pip install maestro

🚀 examples

GPT-4 Vision

Find dog.

>>> The dog is prominently featured in the center of the image with the label [9].
👉 read more
  • load image

    import cv2
    
    image = cv2.imread("...")
    
  • create and refine marks

    import maestro
    
    generator = maestro.SegmentAnythingMarkGenerator(device='cuda')
    marks = generator.generate(image=image)
    marks = maestro.refine_marks(marks=marks)
    
  • visualize marks

    mark_visualizer = maestro.MarkVisualizer()
    marked_image = mark_visualizer.visualize(image=image, marks=marks)
    

    image-vs-marked-image

  • prompt

    prompt = "Find dog."
    
    response = maestro.prompt_image(api_key=api_key, image=marked_image, prompt=prompt)
    
    >>> "The dog is prominently featured in the center of the image with the label [9]."
    
  • extract related marks

    masks = maestro.extract_relevant_masks(text=response, detections=refined_marks)
    
    >>> {'6': array([
    ...     [False, False, False, ..., False, False, False],
    ...     [False, False, False, ..., False, False, False],
    ...     [False, False, False, ..., False, False, False],
    ...     ...,
    ...     [ True,  True,  True, ..., False, False, False],
    ...     [ True,  True,  True, ..., False, False, False],
    ...     [ True,  True,  True, ..., False, False, False]])
    ... }
    

multimodal-maestro

🚧 roadmap

  • Rewriting the maestro API.
  • Update HF space.
  • Documentation page.
  • Add GroundingDINO prompting strategy.
  • CovVLM demo.
  • Qwen-VL demo.

💜 acknowledgement

🦸 contribution

We would love your help in making this repository even better! If you noticed any bug, or if you have any suggestions for improvement, feel free to open an issue or submit a pull request.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

maestro-0.1.1rc1.tar.gz (11.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

maestro-0.1.1rc1-py3-none-any.whl (12.2 kB view details)

Uploaded Python 3

File details

Details for the file maestro-0.1.1rc1.tar.gz.

File metadata

  • Download URL: maestro-0.1.1rc1.tar.gz
  • Upload date:
  • Size: 11.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.10.13 Darwin/23.0.0

File hashes

Hashes for maestro-0.1.1rc1.tar.gz
Algorithm Hash digest
SHA256 d320e580062e897a13a67dd4c4cd208d71b9517bcd60738fb080aeefc42ff243
MD5 5ef57f11c9f7a54a089d816b9f1664ad
BLAKE2b-256 6d559543f4a4669a01698b20c72a95dd7fcafb42d73f5453e8dd1cc6158ffbb0

See more details on using hashes here.

File details

Details for the file maestro-0.1.1rc1-py3-none-any.whl.

File metadata

  • Download URL: maestro-0.1.1rc1-py3-none-any.whl
  • Upload date:
  • Size: 12.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.10.13 Darwin/23.0.0

File hashes

Hashes for maestro-0.1.1rc1-py3-none-any.whl
Algorithm Hash digest
SHA256 9a2b6ec1d359b8ac2801265dac376e1518b876d0632ef5f5492da5261254ce54
MD5 68e679bcceeccb2318f6be6b30e83745
BLAKE2b-256 c7ad9cf552d673f65b6f60f979b90366815a9d4826ca1f3bdea9fe0bad8aa416

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page