Skip to main content

Connected components on 3D images, supports multiple labels.

Project description

Build Status PyPI version

Connected Components 3D

Implementation of connected components in three dimensions using a 26-connected neighborhood. This package uses a 3D variant of the two pass method by Rosenfeld and Pflatz augmented with Union-Find and a decision tree based on the 2D 8-connected work of Wu, Otoo, and Suzuki. This implementation is compatible with images containing many different labels, not just binary images. It can be used with 2D or 3D images.

I wrote this package because I was working on densely labeled 3D biomedical images of brain tissue (e.g. 512x512x512 voxels). Other off the shelf implementations I reviewed were limited to binary images. This rendered these other packages too slow for my use case as it required masking each label and running the connected components algorithm once each time. For reference, there are often between hundreds to thousands of labels in a given volume. The benefit of this package is that it labels all connected components in one shot, improving performance by one or more orders of magnitude.

Python pip Installaction

If binaries are available for your platform:

pip install connected-components-3d

Otherwise:

Requires a C++ compiler.

pip install numpy
pip install connected-components-3d --no-binary :all:

Python Manual Installation

Requires a C++ compiler.

pip install -r requirements.txt
python setup.py develop

Python Use

from cc3d import connected_components
import numpy as np

labels_in = np.ones((512, 512, 512), dtype=np.int32)
labels_out = connected_components(labels_in)

# You can extract individual components like so:
segids = [ x for x in np.unique(labels_out) if x != 0 ]
extracted_image = labels_out * (labels_out == segid[0])

If you know approximately how many labels you are going to generate, you can save substantial memory by specifying a number a bit above that range. The max label ID in your input labels must be less than max_labels.

labels_out = connected_components(labels_in, max_labels=20000)

Note: C and Fortran order arrays will be processed in row major and column major order respectively, so the numbering of labels will be "transposed". The scare quotes are there because the dimensions of the array will not change.

C++ Use

#include "cc3d.hpp"

// 3d array represented as 1d array
int* labels = new int[512*512*512](); 

uint32_t* cc_labels = cc3d::connected_components3d<int>(
  labels, /*sx=*/512, /*sy=*/512, /*sz=*/512
);

Algorithm Description

The algorithm contained in this package is an elaboration into 3D images of the 2D image connected components algorithm described by Rosenfeld and Pflatz (RP) in 1968 [1] (which is well illustrated by this youtube video) using an equivalency list implemented as Tarjan's Union-Find disjoint set with path compression and balancing [2] and augmented with a decision tree based on work by Wu, Otoo, and Suzuki (WOS). [3]

First Principles in 2D

In RP's 4-connected two-pass method for binary 2D images, the algorithm raster scans and every time it first encounters a foreground pixel (the pixels to its top and left are background), it marks it with a new label. If there is a preexisting label in its neighborhood, it uses that label instead. Whenever two labels are adjacent, it records they are equivalent so that they can be relabeled consistently in the second pass. This equivalency table can be constructed in several ways, but some popular approaches are Union-Find with path compression with balancing by rank and Selkow's algorithm (which can avoid pipeline stalls). [4] However, Selkow's algorithm is designed for two trees of depth two, appropriate for binary images. We would like to process multiple labels at the same time, making Union-Find preferable.

In the second pass, the pixels are relabeled using the equivalency table. Union-Find establishes one label as the root label of a tree, and the root is considered the representative label. Each pixel is then labeled with the representative label. Union-Find is therefore appropriate for representing disjoint sets. Path compression with balancing radically reduces the height of the tree, which accelerates the second pass.

WOS approached the problem of accelerating 8-connected 2D connected components on binary images. 8-connected labeling is achieved by extending RP's forward pass mask to the top left and top right corner pixels. In Union-Find based connected components algorithms, the unify step in the first pass is the most expensive step. WOS showed how to optimize away a large fraction of these calls using a decision tree that takes advantage of local topology. For example, since the top-center neighbor of the current pixel is also adjacent to the other mask elements, all of which have already been processed by virtue of the raster scan direction, if it is present it is sufficient to copy its value and move on. If it is absent, pick one of the remaining foreground pixels, copy their value, and use unify for the mask element on the right as it is now known to be non-neighboring with the left hand side. WOS's algorithm continues in this fashion until a match is found or all mask elements are processed at which point a new label is created.

For several years, this algorithm was the world's fastest, though it has been superceded by a newer work that exchanges the static decision tree for a dynamic one or precalculated generated one amongst other improvements. However, WOS's work is significant for both its simplicity and speed and thus serves as the inspiration for this library.

Extending to 3D

The approach presented below is very similar to that of Sutheebanjard [6]. To move to a 3D 26-connected neighborhood, the mask must be extended into three dimensions in order to connect neighboring planes. Observe that the 8-connected mask covers the trailing half of the neighborhood (the part that will have been already processed) such that the current pixel can rely on those labels. Thus the mask for the 26-connected neighborhood covers only two out of three potential planes: the entire lower plane (nine voxels), and a mask identical to WOS's (four voxels) on the current plane. While some further optimizations are possible, to begin, the problem can be conceptually decomposed into two parts: establishing a 9-connected link to the bottom plane and then an 8-connected link to the current plane. This works because the current pixel functions as a hub that transmits the connection information from the 9-connected step to the 8-connected step.

Fig. 1: Mask for an 8-connected plane. If J,K,L, and M are all eliminated, only N remains and a new label is assigned.

j k l
m n .
. . .

The very first Z plane (Z=0) the algorithm runs against is special: the edge effect omits the bottom plane of the mask. Therefore, as the remaining mask is only comprosed of the 8-connected 2D mask, after this pass, the bottom of the image is 8-connected. At Z=1, the 9-connected part of the mask kicks in, forming connections to Z=0, making the current plane now (8 + 9) 17-connected. At Z=2, the 9-connected bottom mask now forms connections from Z=1 to Z=2 on the top, making Z=1 (17 + 9) 26-connected. By induction, when this process proceeds to completion it results in a 26-connected labeling of the volume.

Following inspiration from WOS, we construct a decision tree on the densely labeled bottom plane that minimizes the number of unifications we need to perform.

Fig 2. The mask for the lower plane in 3D.

a b c
d e f
g h i

As e is connected to all other voxels, if present, it can simply be copied. If e is absent, b and h fully cover the mask. If b is absent, h, a, c comprise a covering. If h is absent, b, g, i are one. Below is a list of coverings such that each proceeding entry in the list assumes the first letters in the entries above are background.

  1. e
  2. b, (h | g, i)
  3. h, a, c
  4. d, (f | c, i)
  5. f, g, a
  6. a, c, g, i
  7. c, g, i
  8. g, i
  9. i

The decision tree is then constructed such that each of these coverings will be evaluated using the fewest unifications possible. It's possible to further optimize this by noting that e and b are both fully connected to the upper 2D mask. Therefore, if either of them are present, we can skip the 8-connected unification step. It's also possible to try the DF covering first if B is background, which would save one unification versus HAC given even statistics, but it seems to be slightly slower on the dataset I attempted. To move from binary data to multilabel data, I simply replaced tests for foreground and background with tests for matching labels.

In order to make a reasonably fast implementation, I implemented union-find with union by rank and path compression. I conservatively used two arrays (a uint64 rank array, and the IDs as the image data type) equal to the size of the image for the union-find data structure instead of a sparse map. The union-find data structure plus the output labels means the memory consumption will be input + output + rank + equivalences. If your input labels are 32-bit, the memory usage will be 4x the input size. This becomes more problematic when 64-bit labels are used, but if you know something about your data, you can decrease the size of the union-find data structure.

For more information on the history of connected components algorithms, and an even faster approach for 2D 8-connected components, consult Grana et al's paper on Block Based Decision Trees. [5]

References

  1. A. Rosenfeld and J. Pfaltz. "Sequential Operations in Digital Picture Processing". Journal of the ACM. Vol. 13, Issue 4, Oct. 1966, Pg. 471-494. doi: 10.1145/321356.321357 (link)
  2. R. E. Tarjan. "Efficiency of a good but not linear set union algorithm". Journal of the ACM, 22:215-225, 1975. (link)
  3. K. Wu, E. Otoo, K. Suzuki. "Two Strategies to Speed up Connected Component Labeling Algorithms". Lawrence Berkely National Laboratory. LBNL-29102, 2005. (link)
  4. S. Selkow. "The Tree-to-Tree Editing Problem". Information Processing Letters. Vol. 6, No. 6. June 1977. doi: 10.1016/0020-0190(77)90064-3 (link)
  5. C. Grana, D. Borghesani, R. Cucchiara. "Optimized Block-based Connected Components Labeling with Decision Trees". IEEE Transactions on Image Porcessing. Vol. 19, Iss. 6. June 2010. doi: 10.1109/TIP.2010.2044963 (link)
  6. P. Sutheebanjard. "Decision Tree for 3-D Connected Components Labeling". Proc. 2012 International Symposium on Information Technology in Medicine and EEducation. doi: 10.1109/ITiME.2012.6291402 (link)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

connected-components-3d-1.1.1.tar.gz (183.6 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

connected_components_3d-1.1.1-cp37-cp37m-macosx_10_9_x86_64.whl (111.7 kB view details)

Uploaded CPython 3.7mmacOS 10.9+ x86-64

connected_components_3d-1.1.1-cp36-cp36m-macosx_10_13_x86_64.whl (111.4 kB view details)

Uploaded CPython 3.6mmacOS 10.13+ x86-64

connected_components_3d-1.1.1-cp27-cp27m-macosx_10_14_intel.whl (117.7 kB view details)

Uploaded CPython 2.7mmacOS 10.14+ Intel (x86-64, i386)

File details

Details for the file connected-components-3d-1.1.1.tar.gz.

File metadata

  • Download URL: connected-components-3d-1.1.1.tar.gz
  • Upload date:
  • Size: 183.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.6.3 requests-toolbelt/0.8.0 tqdm/4.29.1 CPython/3.6.6

File hashes

Hashes for connected-components-3d-1.1.1.tar.gz
Algorithm Hash digest
SHA256 f52312cb9da83e4e722b53db9742315b50f8feadfbd6fc01465807a4777e2977
MD5 0d6ec2eed50df2cbaedf67a913024ee6
BLAKE2b-256 224b4b5eb441c96d88bc5407dc083c739d27165eaebcdebb0c0a9293c9d26538

See more details on using hashes here.

File details

Details for the file connected_components_3d-1.1.1-cp37-cp37m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for connected_components_3d-1.1.1-cp37-cp37m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 4a0d1c05b6413cd08c3e4c212476751f9ac343b6b27b4323456787c110e05ff4
MD5 84fc39ef0a1c5340849171011227700d
BLAKE2b-256 3fed63603b1309a436e76686dd573d057a89b8b06a07e8e96dab7befbe94ea42

See more details on using hashes here.

File details

Details for the file connected_components_3d-1.1.1-cp37-cp37m-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: connected_components_3d-1.1.1-cp37-cp37m-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 111.7 kB
  • Tags: CPython 3.7m, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.2

File hashes

Hashes for connected_components_3d-1.1.1-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 2d8b5ec66497680101526229ce82e0cdb055f3489c8bdac0815d0bf1b2e60d22
MD5 2b66e10e90e12bc3b317c4c8287b0114
BLAKE2b-256 932e02ce467a7fe762b850eca86156bbfe37bbd2e27cbfc8089fb8ae4f22daae

See more details on using hashes here.

File details

Details for the file connected_components_3d-1.1.1-cp36-cp36m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for connected_components_3d-1.1.1-cp36-cp36m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 7e342927023eb6f14ef10c4d45a3c35cbd325ec9981f72c064dc148a394381ac
MD5 b027999ed4324e8483a81f52051ad80e
BLAKE2b-256 9f07e310ab87f862a9f5561d7a30fd4bc94990ea1de6af5a1ff7fbac14a8c5cd

See more details on using hashes here.

File details

Details for the file connected_components_3d-1.1.1-cp36-cp36m-macosx_10_13_x86_64.whl.

File metadata

  • Download URL: connected_components_3d-1.1.1-cp36-cp36m-macosx_10_13_x86_64.whl
  • Upload date:
  • Size: 111.4 kB
  • Tags: CPython 3.6m, macOS 10.13+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.2

File hashes

Hashes for connected_components_3d-1.1.1-cp36-cp36m-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 bc9ece593871ec4107aead30709b979db134a4bbe2f005088b1a3dff1109ad01
MD5 44293ac31c3e3eaea50ab81baf6b747d
BLAKE2b-256 b96e86616a60c809d55e50571ff6bef85b03a5f3742162b0f4ebb39136657505

See more details on using hashes here.

File details

Details for the file connected_components_3d-1.1.1-cp35-cp35m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for connected_components_3d-1.1.1-cp35-cp35m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 1c00fbe773f56e76582a8a8fccf0e195c40cb509ded342f000fa87672d73353d
MD5 fdf7f0ead788be5c7b8b5e2b8afedb8d
BLAKE2b-256 f40d17c8c200ea210f99d178db4fac55b01a88c6f444145b9e49c680ddc63629

See more details on using hashes here.

File details

Details for the file connected_components_3d-1.1.1-cp34-cp34m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for connected_components_3d-1.1.1-cp34-cp34m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 2d85da0d2351c7e91a4b2fdba4d0eeeb422a6f9ef3ec80a9f4f435cb6975a64e
MD5 d71499f2093f4dc3458f23bde547d179
BLAKE2b-256 978e92c4d3b4c0ecfe010c1b05d0c5a7a275122214e27469f83c115f0917b7d0

See more details on using hashes here.

File details

Details for the file connected_components_3d-1.1.1-cp27-cp27m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for connected_components_3d-1.1.1-cp27-cp27m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 9ce5d0be3ac0bcff7e78bbb56f8f76177253647da4fccb4c6045a668beb4a7b3
MD5 7fe771b64040b6ecbf044e52c67745b8
BLAKE2b-256 aa240a0b7bb0fd5a84e309e7402138fe279f2dec9cec11c4bb064765e89e1bfa

See more details on using hashes here.

File details

Details for the file connected_components_3d-1.1.1-cp27-cp27m-macosx_10_14_intel.whl.

File metadata

  • Download URL: connected_components_3d-1.1.1-cp27-cp27m-macosx_10_14_intel.whl
  • Upload date:
  • Size: 117.7 kB
  • Tags: CPython 2.7m, macOS 10.14+ Intel (x86-64, i386)
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.2

File hashes

Hashes for connected_components_3d-1.1.1-cp27-cp27m-macosx_10_14_intel.whl
Algorithm Hash digest
SHA256 17dfbc2ca6b0a578b50e8a41b13abdad31f566556041a08b5b6973109327c0e6
MD5 c0924294ebde62358992e01b57a52dd7
BLAKE2b-256 f18ffaed20247ff78291b6e31577a7e74b8a55fbcb6a4357e8abc50b9e34e5c4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page