Skip to main content

Connected components on 3D images, supports multiple labels.

Project description

[![Build Status](https://travis-ci.org/seung-lab/connected-components-3d.svg?branch=master)](https://travis-ci.org/seung-lab/connected-components-3d) [![PyPI version](https://badge.fury.io/py/connected-components-3d.svg)](https://badge.fury.io/py/connected-components-3d)

Connected Components 3D
=======================

Implementation of connected components in three dimensions using a 26-connected neighborhood. This package uses a 3D variant of the two pass method by Rosenfeld and Pflatz augmented with Union-Find. This implementation is compatible with images containing many different labels, not just binary images. It can be used with 2D or 3D images.

I wrote this package because I was working on densely labeled 3D biomedical images of brain tissue (e.g. 512x512x512 voxels). Other off the shelf implementations I reviewed were limited to binary images. This rendered these other packages too slow for my use case as it required masking each label and running the connected components algorithm once each time. For reference, there are often between hundreds to thousands of labels in a given volume. The benefit of this package is that it labels all connected components in one shot, improving performance by one or more orders of magnitude.

## Python `pip` Installaction

If binaries are available for your platform:

```bash
pip install connected-components-3d
```

Otherwise:

*Requires a C++ compiler.*

```bash
pip install numpy
pip install connected-components-3d --no-binary :all:
```

## Python Manual Installation

*Requires a C++ compiler.*

```bash
pip install -r requirements.txt
python setup.py develop
```

## Python Use

```python
from cc3d import connected_components
import numpy as np

labels_in = np.ones((512, 512, 512), dtype=np.int32)
labels_out = connected_components(labels_in)

# You can extract individual components like so:
segids = [ x for x in np.unique(labels_out) if x != 0 ]
extracted_image = labels_out * (labels_out == segid[0])
```

If you know approximately how many labels you are going to generate, you can save substantial memory by specifying a number a bit above that range. The max label ID in your input labels must be less than `max_labels`.

```python
labels_out = connected_components(labels_in, max_labels=20000)
```

*Note: C and Fortran order arrays will be processed in row major and column major order respectively, so the numbering of labels will be "transposed". The scare quotes are there because the dimensions of the array will not change.*

## C++ Use

```cpp
#include "cc3d.hpp"

// 3d array represented as 1d array
int* labels = new int[512*512*512]();

int* cc_labels = cc3d::connected_components3d<int>(
labels, /*sx=*/512, /*sy=*/512, /*sz=*/512
);
```

## Algorithm Description

The algorithm contained in this package is an elaboration into 3D images of the 2D image connected components algorithm described by Rosenfeld and Pflatz in 1968 [1] (which is well illustrated by [this youtube video](https://www.youtube.com/watch?v=ticZclUYy88)) using an equivalency list implemented as a Union-Find disjoint set with path compression [2].

In RP's two-pass method for 2D images, you raster scan and every time you first encounter a foreground pixel, mark it with a new label if the pixels to its top and left are background. If there is a preexisting label in its neighborhood, use that label instead. Whenever you see that two labels are adjacent, record they are equivalent as this will be used in the second pass. This equivalency table can be constructed in several ways, but some popular approaches are Union-Find with path compression and Selkow's algorithm (which can avoid pipeline stalls). However, Selkow's algorithm is designed for two trees of depth two, appropriate for binary images. We would like to process multiple labels at the same time, making union-find mandatory.

In the second pass, the pixels are relabeled using the equivalency table. Union-Find (disjoint sets) establishes one label as the root label of a tree, and the root is considered the representative label. Each pixel is then labeled with the representative label.

To move to a 3D 26-connected neighborhood, we must first note that RP's method is 4-connected, in that they only examine the pixel to the top and left. In 2D, the 8-connected version would have looked at the top, left, top-left, and top-right. In a 3D 26-connected case, we have to look at the top, left, top-left, top-right, the same pattern shifted by z-1, and the voxel located at (x, y, z-1).

In the literature, modern connected components algorithms appear to do better than the simple one I selected by about 2x-5x depending on the data.
There appear to be some modern competing approaches involving decision trees, and an approach called "Light Speed Labeling". I picked this algorithm mainly because it is easy to understand and implement.

In order to make a reasonably fast implementation, I implemented union-find with union by rank and path compression. I conservatively used two arrays (a uint32 rank array, and the IDs as the image data type) equal to the size of the image for the union-find data structure instead of a sparse map. The union-find data structure plus the output labels means the memory consumption will be input + output + rank + equivalences. If your input labels are 32-bit, the memory usage will be 4x the input size. This becomes more problematic when 64-bit labels are used, but if you know something about your data, you can decrease the size of the union-find data structure.

## References

1. A. Rosenfeld and J. Pfaltz. "Sequential Operations in Digital Picture Processing". Journal of the ACM. Vol. 13, Issue 4, Oct. 1966, Pg. 471-494. doi: 10.1145/321356.321357 ([link](https://dl.acm.org/citation.cfm?id=321357))
2. R. E. Tarjan. "Efficiency of a good but not linear set union algorithm". Journal of the ACM, 22:215-225, 1975. ([link](https://dl.acm.org/citation.cfm?id=321884))



Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

connected-components-3d-1.0.4.tar.gz (166.5 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file connected-components-3d-1.0.4.tar.gz.

File metadata

  • Download URL: connected-components-3d-1.0.4.tar.gz
  • Upload date:
  • Size: 166.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.6.3 requests-toolbelt/0.8.0 tqdm/4.29.1 CPython/3.6.6

File hashes

Hashes for connected-components-3d-1.0.4.tar.gz
Algorithm Hash digest
SHA256 7ea28a6a8c2071ccd3e3db9f42f6ea69bf901f3b6975dd209ca81e0ac22f4fc9
MD5 73ae5a905758262861adf1de65ce9e0b
BLAKE2b-256 86590dd588f7e54de563ea34d47c612dd868f9f7e7331ac1c6d823421888701b

See more details on using hashes here.

File details

Details for the file connected_components_3d-1.0.4-cp37-cp37m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for connected_components_3d-1.0.4-cp37-cp37m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 243a5a2118762067d2d0a2172d126a29b1182b13109f563de465edf5bc039222
MD5 99da8aca1c90e4af18a6cecc28f59361
BLAKE2b-256 0a9d51b29bf6604f71777c9ac76dee63405dfaf32cdb66266d0daef5ad66f8ae

See more details on using hashes here.

File details

Details for the file connected_components_3d-1.0.4-cp36-cp36m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for connected_components_3d-1.0.4-cp36-cp36m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 6e08d1738ee54d40dbb163e737d0b4106b8df46194ca7ab10887c56048fbcf74
MD5 56813e98b66922304380035997fbd6af
BLAKE2b-256 b7e7dd14a2890e0fbe97a25e43c10624a9f881554a6ed0e33496f074902adfa7

See more details on using hashes here.

File details

Details for the file connected_components_3d-1.0.4-cp35-cp35m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for connected_components_3d-1.0.4-cp35-cp35m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 65ea236b6341811131355df8526e3559b49a68f2fa86bc06f6a30a0a178e2990
MD5 7f37b13c2fb7cbd6f3e8252bb84dfd37
BLAKE2b-256 e2c1cbc4b19b5c2bdc1d372bf70c24137b675c4f37b465675f50d52a5c5bdc4d

See more details on using hashes here.

File details

Details for the file connected_components_3d-1.0.4-cp34-cp34m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for connected_components_3d-1.0.4-cp34-cp34m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 af916875ce3994617805c12bc88312adb5ecfa6e4de1bc91c3679eb94ad6a133
MD5 fd1954d7b77e3bdb30019d5c4c406171
BLAKE2b-256 d5c697ad011cc7c9f6720361555d69bacddc25f1fe81b40d6759aaa3e3ceaecc

See more details on using hashes here.

File details

Details for the file connected_components_3d-1.0.4-cp27-cp27m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for connected_components_3d-1.0.4-cp27-cp27m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 9edf68efcfa3d8cb25c8a84fdbfc5e336822798488dccb5415155c146c93e343
MD5 bf0fdd468240ffaaa10f2a4db0e76fd2
BLAKE2b-256 6bafb84bb308720c6b36d6c9b8db423f09ef541ecd4eda1ef8803e4ecef0cc14

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page