Skip to main content

Implementation of the k-modes and k-prototypes clustering algorithms

Project description

Version License Test Status Test Coverage Code Health

kmodes

Description

Python implementations of the k-modes and k-prototypes clustering algorithms. Relies on numpy for a lot of the heavy lifting.

k-modes is used for clustering categorical variables. It defines clusters based on the number of matching categories between data points. (This is in contrast to the more well-known k-means algorithm, which clusters numerical data based on Euclidean distance.) The k-prototypes algorithm combines k-modes and k-means and is able to cluster mixed numerical / categorical data.

Implemented are:

The code is modeled after the clustering algorithms in scikit-learn and has the same familiar interface.

Simple usage examples of both k-modes (‘soybean.py’) and k-prototypes (‘stocks.py’) are included in the examples directory.

I would love to have more people play around with this and give me feedback on my implementation. If you come across any issues in running or installing kmodes, please submit a bug report.

Enjoy!

Installation

kmodes can be installed using pip:

pip install kmodes

Alternatively, you can build the latest development version from source:

git clone https://github.com/nicodv/kmodes.git
cd kmodes
python setup.py install

Usage

import numpy as np
from kmodes import kmodes

# random categorical data
data = np.random.choice(20, (100, 10))

km = kmodes.KModes(n_clusters=4, init='Huang', n_init=5, verbose=1)
clusters = km.fit_predict(data)

References

[HUANG97] (1,2)

Huang, Z.: Clustering large data sets with mixed numeric and categorical values, Proceedings of the First Pacific Asia Knowledge Discovery and Data Mining Conference, Singapore, pp. 21-34, 1997.

[HUANG98]

Huang, Z.: Extensions to the k-modes algorithm for clustering large data sets with categorical values, Data Mining and Knowledge Discovery 2(3), pp. 283-304, 1998.

[CAO09]

Cao, F., Liang, J, Bai, L.: A new initialization method for categorical data clustering, Expert Systems with Applications 36(7), pp. 10223-10228., 2009.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kmodes-0.4.tar.gz (9.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kmodes-0.4-py2.py3-none-any.whl (13.8 kB view details)

Uploaded Python 2Python 3

File details

Details for the file kmodes-0.4.tar.gz.

File metadata

  • Download URL: kmodes-0.4.tar.gz
  • Upload date:
  • Size: 9.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for kmodes-0.4.tar.gz
Algorithm Hash digest
SHA256 0602dcc585c8f650cc732dba644f1e8334fe1107f75edcde9282e8cda860500e
MD5 b373c0670cc093aadfba68067b3e18c9
BLAKE2b-256 65bc9beac91d3d997bc85f238e977ddb8c50b3dfaca129e5c689de057c451e4f

See more details on using hashes here.

File details

Details for the file kmodes-0.4-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for kmodes-0.4-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 6bd6b20a33fd7f4d320235e55b2bac72947c4a76921f8642a1f0fc6422379bb4
MD5 4dd7330df5ae1c42c706fef8da0dfbda
BLAKE2b-256 d74f3e6b4a538c16f607c582017a8efa8c39d9d44d599b2a0b97090be91e3a62

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page