Skip to main content

Python3 library for converting between various image annotation dataset formats.

Project description

The image-dataset-converter library allows the conversion between various dataset formats for image annotation datasets. Filters can be supplied as well, e.g., for cleaning up the data.

Dataset formats:

  • depth data: CSV (r/w), grayscale (r/w), numpy (r/w), PFM (r/w)

  • image classification: ADAMS (r/w), sub-dir (r/w)

  • image segmentation: blue-channel (r/w), grayscale (r/w), indexed PNG (r/w), layer segments (r/w)

  • object detection: ADAMS (r/w), COCO (r/w), OPEX (r/w), ROI (r/w), VOC (r/w), YOLO (r/w)

Examples can be found here:

https://github.com/waikato-datamining/image-dataset-converter-examples

Changelog

0.1.0 (2025-10-31)

  • added dims-to-metadata filter for transferring image dimension to metadata

  • centralized comparison code in idc.api._comparison (available via idc.api)

  • split-records filter now allows specifying the meta-data field in which to store the split name

  • the tee meta-filter can now forward or drop the incoming data based on a meta-data evaluation

  • added sub-process filter for processing data with sub-flow of filters, can be conditional based on meta-data evaluation

  • the metadata-from-name filter can work on the path now as well (must be present)

  • the label-present-is filter now lets the user to specify a minimum number of pixels that need to be present in the layers

  • switched to kasperl library for base API and generic pipeline plugins

  • requiring seppl>=0.3.0 now

  • added @abc.abstractmethod decorator where appropriate

  • the use-mask filter now sets the image format PNG in the generated output container

  • the readers from-grayscale-dp, from-indexed-png-is, from-blue-channel-is and from-grayscale-is now support reading only the annotations

  • added idc.api.binarize_image convenience function

  • added idc.api.image_to_bytesio helper method

  • the idc-exec tool now uses all remaining parameters as the pipeline components rather than having to specify them via the -p/–pipeline parameter, making it easy to simply prefix the idc-exec command to an existing idc-convert command-line

  • added the text-file and csv-file generators that work off files to populate the variable(s)

  • idc-exec can load pipelines from file now as well, useful when dealing with large pipelines

  • added –load_pipeline option to idc-convert

  • added from-text-file reader and to-text-file writer

  • readers now locate files the first time the read() method gets called rather than in the initialized(), to allow more dynamic placeholders

  • added block, stop filters for controlling the flow of data (via meta-data conditions)

  • added email support with get-email reader and send-email writer

  • added list-files reader for listing files in a directory

  • added list-to-sequence stream filter that forwards list items one by one

  • added console writer for outputting the data on stdout that is coming through

  • added move-files filter which moves incoming files to a target directory

  • added watch-dir meta-reader that uses the watchdog library to react to file-system events rather than using fixed-interval polling like poll-dir

  • added delete-files writer

  • added copy-files filter

  • added count-specks filter that adds counts of small objects to meta-data

  • added support for caching plugins via IDC_CLASS_CACHE environment variable

  • od-to-is now handles labels or regexp pattern correctly

  • added is-to-od filter that generates object detection annotations from contours determined in image segmentation layers

  • added to-metadata writer that outputs the meta-data of an image

  • added attach-metadata filter that loads meta-data from a directory and attaches it to the data passing through

  • added load-data filter to turn file names into data containers

  • added annotation-to-storage and annotation-from-storage filters

  • annotation data is now being type-checked when setting it

  • added delete-storage filter for removing objects from internal storage

0.0.13 (2025-07-15)

  • requiring seppl>=0.2.20 now for improved help requests in idc-convert tool

0.0.12 (2025-07-11)

  • dropped numpy<2.0.0 restriction

  • added grayscale-to-binary filter

  • fix: sort-pixels, rgb-to-grayscale filters

  • added ensure_grayscale and grayscale_required_info convenience methods (package: idc.api)

  • added ensure_binary and binary_required_info convenience methods (package: idc.api)

  • added –dump_pipeline option to idc-convert for saving the pipeline command

  • the rename filter now supports lower/upper case placeholders of name and extension as well

  • requiring seppl>=0.2.17 now for skippable plugin support and avoiding deprecated use of pkg_resources

  • added any-to-rgb filter for turning binary/grayscale images back into RGB ones

  • using wai_common instead of wai.common now

  • requiring fast_opex>=0.0.4 now

  • added label-to-metadata filter for transferring labels into meta-data

  • added metadata-to-placeholder filter for transferring meta-data into placeholders

  • added basic support for images with associated depth information: DepthData, DepthInformation

  • added depth-to-grayscale filter for converting depth information to grayscale image

  • prefixed image segmentation methods like from_bluechannel and to_bluechannel with imgseg_

  • added depth information readers from-grayscale-dp, from-numpy-dp, from-csv-dp and from-pfm-dp

  • added depth information writers to-grayscale-dp, to-numpy-dp, to-csv-dp and to-pfm-dp

  • added apply-ext-mask filter to applying external PNG masks to image containers (image and/or annotations)

  • added apply-label-mask filter for applying image segmentation label masks to their base images

  • added label-present-ic and label-present-is that ensure that certain label(s) are present or otherwise discard the image

  • filter label-present was renamed to label-present-od but keeping label-present as alias for the time being

  • fix: imgseg_to_bluechannel, imgseg_to_indexedpng and imgseg_to_grayscale now handle overlapping pixels correctly, no longer adding them up and introducing additional labels

  • discard-by-name filter can use names of files in specified paths now as well

  • fixed the construction of the error messages in the pyfunc reader/filter/writer classes

0.0.11 (2025-04-03)

  • fix: idc-registry –list writers now returns writer plugins instead of reader ones

0.0.10 (2025-04-03)

  • added set-placeholder filter for dynamically setting (temporary) placeholders at runtime

  • added –resume_from option to relevant readers that allows resuming the data processing from the first file that matches this glob expression (e.g., */012345.png)

  • requiring seppl>=0.2.14 now for resume support

  • using underscores now instead of dashes in dependencies (setup.py)

  • the array_to_image method no longer performs unnecessary conversions of Image objects

  • the dirs generator can limit directories now to ones that have files matching a specific regexp (–file_regexp), to avoid the Failed to locate any files using: … error message when a reader doesn’t find any matching files

  • requiring seppl>=0.2.15 now for new –split_group support

  • added the from-multi meta-reader that combines multiple base readers and returns their output

  • added the to-multi meta-writer that forwards the data to multiple base writers

  • added the use-mask filter for using the image segmentation annotations (= mask) as the new base image

0.0.9 (2025-03-14)

  • using wai_logging instead of wai.logging as dependency now

0.0.8 (2025-03-14)

  • requiring seppl>=0.2.13 now for placeholder support

  • added placeholder support to tools: idc-convert, idc-exec

  • added placeholder support to readers: from-adams-ic, from-subdir-ic, from-blue-channel-is, from-grayscale-is, from-indexed-png-is, from-layer-segments-is, from-adams-od, from-coco-od, from-opex-od, from-roicsv-od, from-voc-od, from-yolo-od, from-data, from-pyfunc, poll-dir

  • added placeholder support to filters: write-labels

  • added placeholder support to writers: to-adams-ic, to-subdir-ic, to-blue-channel-is, to-grayscale-is, to-indexed-png-is, to-layer-segments-is, to-adams-od, to-coco-od, to-opex-od, to-roicsv-od, to-voc-od, to-yolo-od, to-data

0.0.7 (2025-03-12)

  • added safe_deepcopy method to idc.api._utils which creates a deep copy of an object if not None

  • added rgb-to-grayscale filter to convert color images into gray scale ones

  • added sort-pixels filter for grayscale images

  • the following filters can operate on lists of records now as well: inspec, metadata, metadata-from-name

  • added metadata-od filter for filtering object-detection annotations based on their meta-data (e.g., scores from model predictions)

  • the filters discard-negatives and discard-invalid-images now output how many were discarded/kept when processing finishes

0.0.6 (2025-02-26)

  • LayerSegmentsImageSegmentationReader now suggest using –lenient flag in exception in case image not binary

  • added the discard-by-name filter that allows user to discard images based on name, either exact match of regexp (matching sense can be inverted)

  • requiring seppl>=0.2.10 now

  • added support for aliases

  • added to_bluechannel, to_grayscale and to_indexedpng image segmentation methods to idc.api

  • added the generate_palette_list method to idc.api which turns a predefined palette name or comma-separated list of RGB values into a flat list of int values, e.g., used for indexed PNG files

  • exposed method save_image through idc.api

  • filter-labels now handles not specifying any labels and only regexp

  • write-labels filter now allows specification of custom separator

  • write-labels: fixed retrieval of image-segmentation labels

  • using simple_palette_utils dependency now

  • idc-convert tool now flags aliases on the help screen with *

  • the from-voc-od reader now has the -r/–image_rel_path option which gets injected before the folder property from the XML file

0.0.5 (2025-01-13)

  • added setuptools as dependency

  • switched to underscores in project name

  • using 90% as default quality for JPEG images now, can be overridden with environment variable IDC_JPEG_QUALITY

  • added methods to idc.api module: jpeg_quality(), array_to_image(…), empty_image(…)

0.0.4 (2024-07-16)

  • limiting numpy to <2.0.0 due to problems with imgaug library

0.0.3 (2024-07-02)

  • switched to the fast-opex library

  • helper method from_indexedpng was using incorrect label index (off by 1)

  • Data.save_image method now ensures that source/target files exist before calling os.path.samefile

  • requiring seppl>=0.2.6 now

  • readers now support default globs, allowing the user to just specify directories as input (and the default glob gets appended)

  • the to-yolo-od writer now has an option for predefined labels (for enforcing label order)

  • the to-yolo-od writer now stores the labels/labels_cvs files in the respective output folders rather than using an absolute file name

  • the bluechannel/grayscale/indexed-png image segmentation readers/writers can use a value other than 0 now for the background

  • split filter has been renamed to split-records

0.0.2 (2024-06-13)

  • added generic plugins that take user Python functions: from-pyfunc, pyfunc-filter, to-pyfunc

  • added idc-exec tool that uses generator to produce variable/value pairs that are used to expand the provided pipeline template which then gets executed

  • added polygon-simplifier filter for reducing number of points in polygons

  • moved several geometry/image related functions from imgaug library into core library to avoid duplication

  • added python-image-complete as dependency

  • the ImageData class now uses the python-image-complete library to determine the file format rather than loading the image into memory in order to determine that

  • the convert-image-format filter now correctly creates a new container with the converted image data

  • the to-coco-od writer only allows sorting of categories when using predefined categories now

  • the from-opex-od reader now handles absent meta-data correctly

  • added the AnnotationsOnlyWriter mixin for writers that can skip the base image and just output the annotations

0.0.1 (2024-05-06)

  • initial release

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

image_dataset_converter-0.1.0.tar.gz (96.5 kB view details)

Uploaded Source

File details

Details for the file image_dataset_converter-0.1.0.tar.gz.

File metadata

  • Download URL: image_dataset_converter-0.1.0.tar.gz
  • Upload date:
  • Size: 96.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for image_dataset_converter-0.1.0.tar.gz
Algorithm Hash digest
SHA256 340681286e7b33119402d782ff8308352a34ef98fe2fef4a0f3f62ff3209ac16
MD5 fa43cd85b3d4a2099fb3a87e769e1bb0
BLAKE2b-256 b06b3257290d9c4e51d92c1a28a7d141daac954c59dee90e2dd4397f3ff828c9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page