Skip to main content

Quantifier of universal similarity amongst arbitrary data streams without a priori knowledge, features, or training.

Project description

Time Smash

Time series clustering and classification suite using notions of Universal similarity among data streams, especially without a priori knowledge about the "correct" features to use for time series data.

  • Featurization algorithms: SymbolicDerivative, InferredHMMLikelihood, Csmash
  • Distance measure: LikelihoodDistance

Example publications

For questions or suggestions contact:research@paraknowledge.ai

Usage examples

SymbolicDerivative

from timesmash import SymbolicDerivative
from sklearn.ensemble import RandomForestClassifier

train = [[1, 0, 1, 0, 1, 0], [1, 1, 0, 1, 1, 0]]
train_label = [[0], [1]]
test = [[0, 1, 1, 0, 1, 1]]
train_features, test_features = SymbolicDerivative().fit_transform(
    train=train, test=test, label=train_label
)
clf = RandomForestClassifier().fit(train_features, train_label)
label = clf.predict(test_features)
print("Predicted label: ", label)

LikelihoodDistance

from timesmash import LikelihoodDistance
from sklearn.cluster import KMeans
train = [[1, 0, 1.1, 0, 11.2, 0], [1, 1, 0, 1, 1, 0], [0, 0.9, 0, 1, 0, 1], [0, 1, 1, 0, 1, 1]]
dist_calc = LikelihoodDistance().fit(train)
dist = dist_calc.produce()
from sklearn.cluster import KMeans
clusters = KMeans(n_clusters = 2).fit(dist).labels_
print("Clusters: " clusters)

InferredHMMLikelihood

from timesmash import InferredHMMLikelihood
from sklearn.ensemble import RandomForestClassifier

train = [[1, 0, 1, 0, 1, 0], [1, 1, 0, 1, 1, 0]]
train_label = [[0], [1]]
test = [[0, 1, 1, 0, 1, 1]]
train_features, test_features = InferredHMMLikelihood().fit_transform(
    train=train, test=test, label=train_label
)
clf = RandomForestClassifier().fit(train_features, train_label)
label = clf.predict(test_features)
print("Predicted label: ", label)

ClusteredHMMClassifier:

from timesmash import Quantizer, InferredHMMLikelihood, LikelihoodDistance
from sklearn.cluster import KMeans
from sklearn.ensemble import RandomForestClassifier
import pandas as pd

train = pd.DataFrame(
    [[1, 0, 1, 0, 1, 0], [1, 1, 0, 1, 1, 0], [1, 0, 1, 0, 1, 0], [1, 1, 0, 1, 1, 0]]
)
train_label = pd.DataFrame([[0], [1], [0], [1]])
test = pd.DataFrame([[0, 1, 1, 0, 1, 1]])

qtz = Quantizer().fit(train, label=train_label)
new_labels = train_label.copy()
for label, dataframe in train_label.groupby(train_label.columns[0]):
    dist = LikelihoodDistance(quantizer=qtz).fit(train.loc[dataframe.index]).produce()
    sub_labels = KMeans(n_clusters=2, random_state=0).fit(dist).labels_
    new_label_names = [str(label) + "_" + str(i) for i in sub_labels]
    new_labels.loc[dataframe.index, train_label.columns[0]] = new_label_names

featurizer = InferredHMMLikelihood(quantizer=qtz, epsilon=0.01)
train_features, test_features = featurizer.fit_transform(
    train=train, test=test, label=new_labels
)

clf = RandomForestClassifier().fit(train_features, train_label)
print("Predicted label: ", clf.predict(test_features))

XHMMFeatures for anomaly detection:

import pandas as pd
from timesmash import XHMMFeatures
from sklearn.neighbors import LocalOutlierFactor

channel1_train = pd.DataFrame([[0,1,0,1,0,1,0,1,0,1],[1,0,1,0,1,0,1,0,1,0]], index=['person_1', 'person_2'])
channel2_train = pd.DataFrame([[0,1,0,1,0,1,0,1,0,1],[1,0,1,0,1,0,1,0,1,0]], index=['person_1', 'person_2'])
labels = pd.DataFrame([1,1], index=['person_1', 'person_2'])
    
alg = XHMMFeatures(n_quantizations=1)
features_train = alg.fit_transform([channel1_train,channel2_train], labels)
    
clf = LocalOutlierFactor(novelty=True)  
clf.fit(features_train)
        
channel1_test = pd.DataFrame([[0,1,0,1,0,1,0,1,0,1],[1,0,1,0,1,0,1,0,1]], index=['person_test_1', 'person_test_2'])
channel2_test= pd.DataFrame([[0,1,0,1,0,1,0,1,0,1],[0,1,0,1,0,1,0,1,0]], index=['person_test_1', 'person_test_2'])

features_test = alg.transform([channel1_test,channel2_test])
print(clf.predict(features_test))

XHMMFeatures for classification:

import pandas as pd
from timesmash import XHMMFeatures
from sklearn.ensemble import RandomForestClassifier

d1_train = pd.DataFrame([[0,1,0,1,0,1,0,1,0,1],[1,0,1,0,1,0,1,0,1,0]], index=['person_1', 'person_2'])
d2_train = pd.DataFrame([[1,0,1,0,1,0,1,0,1,0],[1,0,1,0,1,0,1,0,1,0]], index=['person_1', 'person_2'])
labels = pd.DataFrame([0,1], index=['person_1', 'person_2'])
    
alg = XHMMFeatures(n_quantizations=1)
features_train = alg.fit_transform([d1_train,d2_train], labels)
    
clf = RandomForestClassifier()  
clf.fit(features_train, labels)
        
d1_test = pd.DataFrame([[1,0,1,0,1,0,1,0,1]], index=['person_test'])
d2_test= pd.DataFrame([[0,1,0,1,0,1,0,1,0]], index=['person_test'])

features_test = alg.transform([d1_test,d2_test])
    
print(clf.predict(features_test))

XHMMClustering for multichannel clustering:

import pandas as pd
from timesmash import XHMMClustering

channel1 = pd.DataFrame(
    [
        [0, 1, 0, 1, 0, 1, 0, 1, 0, 1],
        [1, 0, 1, 0, 1, 0, 1, 0, 1, 0],
        [1, 0, 1, 0, 1, 0, 1, 0, 1, 0],
        [0, 1, 0, 1, 0, 1, 0, 1, 0, 1],
    ],
    index=["person_1", "person_2", "person_3", "person_4"],
)
channel2 = pd.DataFrame(
    [
        [1, 0, 1, 0, 1, 0, 1, 0, 1, 0],
        [1, 0, 1, 0, 1, 0, 1, 0, 1, 0],
        [0, 1, 0, 1, 0, 1, 0, 1, 0, 1],
        [0, 1, 0, 1, 0, 1, 0, 1, 0, 1],
    ],
    index=["person_1", "person_2", "person_3", "person_4"],
)
alg = XHMMClustering(n_quantizations=1).fit(
    [channel1, channel2]
)
clusters = alg.labels_
print(clusters)

Binder

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

timesmash-0.2.26.tar.gz (32.9 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

timesmash-0.2.26-py3-none-any.whl (33.1 MB view details)

Uploaded Python 3

File details

Details for the file timesmash-0.2.26.tar.gz.

File metadata

  • Download URL: timesmash-0.2.26.tar.gz
  • Upload date:
  • Size: 32.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.6.3 pkginfo/1.7.1 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.64.1 CPython/3.10.8

File hashes

Hashes for timesmash-0.2.26.tar.gz
Algorithm Hash digest
SHA256 1c0ee0038552589ec7ab6c6dcf7d1e403451ac7cb0b60cca4f229d785028c9ff
MD5 9a6c3d0159ffac88ee6e128cc526578d
BLAKE2b-256 16ea6975cb44e75c397b867eee78591732bff6911c6cb77d957d6849df7a4516

See more details on using hashes here.

File details

Details for the file timesmash-0.2.26-py3-none-any.whl.

File metadata

  • Download URL: timesmash-0.2.26-py3-none-any.whl
  • Upload date:
  • Size: 33.1 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.6.3 pkginfo/1.7.1 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.64.1 CPython/3.10.8

File hashes

Hashes for timesmash-0.2.26-py3-none-any.whl
Algorithm Hash digest
SHA256 8db671cf02517f32b92f240798d072154e6b9d578a0bf10f5d98e551cb093346
MD5 699fde4abeca7dc697490e489e082a3c
BLAKE2b-256 72f372e33344cebba8df5570e543aae800cc0f9038f8c6383e791e5f697c5377

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page