Python3 library for converting between various image annotation dataset formats.
Project description
The image-dataset-converter library allows the conversion between various dataset formats for image annotation datasets. Filters can be supplied as well, e.g., for cleaning up the data.
Dataset formats:
depth data: CSV (r/w), grayscale (r/w), numpy (r/w), PFM (r/w)
image classification: ADAMS (r/w), sub-dir (r/w)
image segmentation: blue-channel (r/w), grayscale (r/w), indexed PNG (r/w), layer segments (r/w)
object detection: ADAMS (r/w), COCO (r/w), OPEX (r/w), ROI (r/w), VOC (r/w), YOLO (r/w)
Examples can be found here:
https://github.com/waikato-datamining/image-dataset-converter-examples
Changelog
0.1.0 (2025-10-31)
added dims-to-metadata filter for transferring image dimension to metadata
centralized comparison code in idc.api._comparison (available via idc.api)
split-records filter now allows specifying the meta-data field in which to store the split name
the tee meta-filter can now forward or drop the incoming data based on a meta-data evaluation
added sub-process filter for processing data with sub-flow of filters, can be conditional based on meta-data evaluation
the metadata-from-name filter can work on the path now as well (must be present)
the label-present-is filter now lets the user to specify a minimum number of pixels that need to be present in the layers
switched to kasperl library for base API and generic pipeline plugins
requiring seppl>=0.3.0 now
added @abc.abstractmethod decorator where appropriate
the use-mask filter now sets the image format PNG in the generated output container
the readers from-grayscale-dp, from-indexed-png-is, from-blue-channel-is and from-grayscale-is now support reading only the annotations
added idc.api.binarize_image convenience function
added idc.api.image_to_bytesio helper method
the idc-exec tool now uses all remaining parameters as the pipeline components rather than having to specify them via the -p/–pipeline parameter, making it easy to simply prefix the idc-exec command to an existing idc-convert command-line
added the text-file and csv-file generators that work off files to populate the variable(s)
idc-exec can load pipelines from file now as well, useful when dealing with large pipelines
added –load_pipeline option to idc-convert
added from-text-file reader and to-text-file writer
readers now locate files the first time the read() method gets called rather than in the initialized(), to allow more dynamic placeholders
added block, stop filters for controlling the flow of data (via meta-data conditions)
added email support with get-email reader and send-email writer
added list-files reader for listing files in a directory
added list-to-sequence stream filter that forwards list items one by one
added console writer for outputting the data on stdout that is coming through
added move-files filter which moves incoming files to a target directory
added watch-dir meta-reader that uses the watchdog library to react to file-system events rather than using fixed-interval polling like poll-dir
added delete-files writer
added copy-files filter
added count-specks filter that adds counts of small objects to meta-data
added support for caching plugins via IDC_CLASS_CACHE environment variable
od-to-is now handles labels or regexp pattern correctly
added is-to-od filter that generates object detection annotations from contours determined in image segmentation layers
added to-metadata writer that outputs the meta-data of an image
added attach-metadata filter that loads meta-data from a directory and attaches it to the data passing through
added load-data filter to turn file names into data containers
added annotation-to-storage and annotation-from-storage filters
annotation data is now being type-checked when setting it
added delete-storage filter for removing objects from internal storage
0.0.13 (2025-07-15)
requiring seppl>=0.2.20 now for improved help requests in idc-convert tool
0.0.12 (2025-07-11)
dropped numpy<2.0.0 restriction
added grayscale-to-binary filter
fix: sort-pixels, rgb-to-grayscale filters
added ensure_grayscale and grayscale_required_info convenience methods (package: idc.api)
added ensure_binary and binary_required_info convenience methods (package: idc.api)
added –dump_pipeline option to idc-convert for saving the pipeline command
the rename filter now supports lower/upper case placeholders of name and extension as well
requiring seppl>=0.2.17 now for skippable plugin support and avoiding deprecated use of pkg_resources
added any-to-rgb filter for turning binary/grayscale images back into RGB ones
using wai_common instead of wai.common now
requiring fast_opex>=0.0.4 now
added label-to-metadata filter for transferring labels into meta-data
added metadata-to-placeholder filter for transferring meta-data into placeholders
added basic support for images with associated depth information: DepthData, DepthInformation
added depth-to-grayscale filter for converting depth information to grayscale image
prefixed image segmentation methods like from_bluechannel and to_bluechannel with imgseg_
added depth information readers from-grayscale-dp, from-numpy-dp, from-csv-dp and from-pfm-dp
added depth information writers to-grayscale-dp, to-numpy-dp, to-csv-dp and to-pfm-dp
added apply-ext-mask filter to applying external PNG masks to image containers (image and/or annotations)
added apply-label-mask filter for applying image segmentation label masks to their base images
added label-present-ic and label-present-is that ensure that certain label(s) are present or otherwise discard the image
filter label-present was renamed to label-present-od but keeping label-present as alias for the time being
fix: imgseg_to_bluechannel, imgseg_to_indexedpng and imgseg_to_grayscale now handle overlapping pixels correctly, no longer adding them up and introducing additional labels
discard-by-name filter can use names of files in specified paths now as well
fixed the construction of the error messages in the pyfunc reader/filter/writer classes
0.0.11 (2025-04-03)
fix: idc-registry –list writers now returns writer plugins instead of reader ones
0.0.10 (2025-04-03)
added set-placeholder filter for dynamically setting (temporary) placeholders at runtime
added –resume_from option to relevant readers that allows resuming the data processing from the first file that matches this glob expression (e.g., */012345.png)
requiring seppl>=0.2.14 now for resume support
using underscores now instead of dashes in dependencies (setup.py)
the array_to_image method no longer performs unnecessary conversions of Image objects
the dirs generator can limit directories now to ones that have files matching a specific regexp (–file_regexp), to avoid the Failed to locate any files using: … error message when a reader doesn’t find any matching files
requiring seppl>=0.2.15 now for new –split_group support
added the from-multi meta-reader that combines multiple base readers and returns their output
added the to-multi meta-writer that forwards the data to multiple base writers
added the use-mask filter for using the image segmentation annotations (= mask) as the new base image
0.0.9 (2025-03-14)
using wai_logging instead of wai.logging as dependency now
0.0.8 (2025-03-14)
requiring seppl>=0.2.13 now for placeholder support
added placeholder support to tools: idc-convert, idc-exec
added placeholder support to readers: from-adams-ic, from-subdir-ic, from-blue-channel-is, from-grayscale-is, from-indexed-png-is, from-layer-segments-is, from-adams-od, from-coco-od, from-opex-od, from-roicsv-od, from-voc-od, from-yolo-od, from-data, from-pyfunc, poll-dir
added placeholder support to filters: write-labels
added placeholder support to writers: to-adams-ic, to-subdir-ic, to-blue-channel-is, to-grayscale-is, to-indexed-png-is, to-layer-segments-is, to-adams-od, to-coco-od, to-opex-od, to-roicsv-od, to-voc-od, to-yolo-od, to-data
0.0.7 (2025-03-12)
added safe_deepcopy method to idc.api._utils which creates a deep copy of an object if not None
added rgb-to-grayscale filter to convert color images into gray scale ones
added sort-pixels filter for grayscale images
the following filters can operate on lists of records now as well: inspec, metadata, metadata-from-name
added metadata-od filter for filtering object-detection annotations based on their meta-data (e.g., scores from model predictions)
the filters discard-negatives and discard-invalid-images now output how many were discarded/kept when processing finishes
0.0.6 (2025-02-26)
LayerSegmentsImageSegmentationReader now suggest using –lenient flag in exception in case image not binary
added the discard-by-name filter that allows user to discard images based on name, either exact match of regexp (matching sense can be inverted)
requiring seppl>=0.2.10 now
added support for aliases
added to_bluechannel, to_grayscale and to_indexedpng image segmentation methods to idc.api
added the generate_palette_list method to idc.api which turns a predefined palette name or comma-separated list of RGB values into a flat list of int values, e.g., used for indexed PNG files
exposed method save_image through idc.api
filter-labels now handles not specifying any labels and only regexp
write-labels filter now allows specification of custom separator
write-labels: fixed retrieval of image-segmentation labels
using simple_palette_utils dependency now
idc-convert tool now flags aliases on the help screen with *
the from-voc-od reader now has the -r/–image_rel_path option which gets injected before the folder property from the XML file
0.0.5 (2025-01-13)
added setuptools as dependency
switched to underscores in project name
using 90% as default quality for JPEG images now, can be overridden with environment variable IDC_JPEG_QUALITY
added methods to idc.api module: jpeg_quality(), array_to_image(…), empty_image(…)
0.0.4 (2024-07-16)
limiting numpy to <2.0.0 due to problems with imgaug library
0.0.3 (2024-07-02)
switched to the fast-opex library
helper method from_indexedpng was using incorrect label index (off by 1)
Data.save_image method now ensures that source/target files exist before calling os.path.samefile
requiring seppl>=0.2.6 now
readers now support default globs, allowing the user to just specify directories as input (and the default glob gets appended)
the to-yolo-od writer now has an option for predefined labels (for enforcing label order)
the to-yolo-od writer now stores the labels/labels_cvs files in the respective output folders rather than using an absolute file name
the bluechannel/grayscale/indexed-png image segmentation readers/writers can use a value other than 0 now for the background
split filter has been renamed to split-records
0.0.2 (2024-06-13)
added generic plugins that take user Python functions: from-pyfunc, pyfunc-filter, to-pyfunc
added idc-exec tool that uses generator to produce variable/value pairs that are used to expand the provided pipeline template which then gets executed
added polygon-simplifier filter for reducing number of points in polygons
moved several geometry/image related functions from imgaug library into core library to avoid duplication
added python-image-complete as dependency
the ImageData class now uses the python-image-complete library to determine the file format rather than loading the image into memory in order to determine that
the convert-image-format filter now correctly creates a new container with the converted image data
the to-coco-od writer only allows sorting of categories when using predefined categories now
the from-opex-od reader now handles absent meta-data correctly
added the AnnotationsOnlyWriter mixin for writers that can skip the base image and just output the annotations
0.0.1 (2024-05-06)
initial release
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file image_dataset_converter-0.1.0.tar.gz.
File metadata
- Download URL: image_dataset_converter-0.1.0.tar.gz
- Upload date:
- Size: 96.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
340681286e7b33119402d782ff8308352a34ef98fe2fef4a0f3f62ff3209ac16
|
|
| MD5 |
fa43cd85b3d4a2099fb3a87e769e1bb0
|
|
| BLAKE2b-256 |
b06b3257290d9c4e51d92c1a28a7d141daac954c59dee90e2dd4397f3ff828c9
|