Skip to main content

K-means clustering with weights

Project description

ek

K-means clustering with weights

To install: pip install ek

Overview

The ek package provides an implementation of the K-means clustering algorithm that incorporates sample weights. This is particularly useful in scenarios where certain data points are of more significance than others and should have a greater influence on the formation of clusters.

Main Features

  • Weighted K-means Clustering: Allows clustering with weighted data points, which can be crucial for datasets where some instances are more important than others.
  • Compatibility with Scikit-learn: The implementation is designed to be compatible with Scikit-learn's clustering framework, making it easy to integrate with existing codebases that use Scikit-learn for machine learning tasks.
  • Support for Sparse Data: Efficiently handles sparse matrices, which is beneficial for high-dimensional data.
  • Custom Initialization Methods: Supports various methods for initializing cluster centers, including a weighted version of the k-means++ initialization.

Installation

To install the package, use the following pip command:

pip install ek

Usage

Basic Example

Here is a simple example of how to use the ek package to perform weighted K-means clustering:

import numpy as np
from ek import KMeansWeighted

# Sample data
X = np.array([[1, 2], [1, 4], [1, 0],
              [10, 2], [10, 4], [10, 0]])

# Weights for each data point
weights = np.array([1, 2, 1, 1, 1, 2])

# Number of clusters
n_clusters = 2

# Create a KMeansWeighted instance
kmeans = KMeansWeighted(n_clusters=n_clusters)

# Fit the model
kmeans.fit(X, weights)

# Get cluster labels
labels = kmeans.labels_

# Print the labels
print(labels)

Advanced Usage

For more advanced usage, you can specify additional parameters such as init for the initialization method, max_iter for the maximum number of iterations, and tol for the convergence tolerance.

kmeans = KMeansWeighted(n_clusters=3, init='random', max_iter=100, tol=1e-4)
kmeans.fit(X, weights)

Documentation

Classes and Functions

KMeansWeighted

A class for K-means clustering with weights.

  • Parameters:

    • n_clusters: Number of clusters.
    • init: Method for initialization ('k-means++_with_weights', 'random' or an ndarray).
    • max_iter: Maximum number of iterations.
    • tol: Tolerance for convergence.
    • precompute_distances: Whether to precompute distances ('auto', True, False).
    • verbose: Verbosity mode.
    • random_state: Seed or numpy.RandomState instance.
    • copy_x: If True, input data is copied.
    • n_jobs: Number of parallel jobs to run.
  • Methods:

    • fit(X, weights): Compute K-means clustering.
    • fit_predict(X, weights): Compute clustering and predict cluster indices.
    • fit_transform(X, weights): Compute clustering and transform X to cluster-distance space.
    • transform(X): Transform X to cluster-distance space.
    • predict(X): Predict the closest cluster each sample in X belongs to.
    • score(X): Opposite of the value of X on the K-means objective.

This package is designed to be easy to use while providing the flexibility needed for more complex clustering tasks. The implementation is optimized for performance and can handle large datasets efficiently.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ek-0.0.6.tar.gz (16.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ek-0.0.6-py3-none-any.whl (15.5 kB view details)

Uploaded Python 3

File details

Details for the file ek-0.0.6.tar.gz.

File metadata

  • Download URL: ek-0.0.6.tar.gz
  • Upload date:
  • Size: 16.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.13

File hashes

Hashes for ek-0.0.6.tar.gz
Algorithm Hash digest
SHA256 cf6cecdd542300e8490ea05a43e25c35d3a7fcc522e6eec8e30c723560657760
MD5 5b0cc45a6ca7609be96f5ba2bd6ab9be
BLAKE2b-256 f9a028aeda5870b6aac312cd710e976ab2044e8887f972aa213f37b51bf0a6ec

See more details on using hashes here.

File details

Details for the file ek-0.0.6-py3-none-any.whl.

File metadata

  • Download URL: ek-0.0.6-py3-none-any.whl
  • Upload date:
  • Size: 15.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.13

File hashes

Hashes for ek-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 67ea55912e920e4c8405468c9c63b818fb61f33c1d4ce817dfdf367fc2bc416a
MD5 ace9653f90c2884339bf6df65fdba625
BLAKE2b-256 7565a4867cd376772628fa25e63a7639cab5d05cb50127230c4dc13a3b6be0fc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page