Quantifier of universal similarity amongst arbitrary data streams without a priori knowledge, features, or training.
Project description
Time Smash
Time series clustering and classification suite using notions of Universal similarity among data streams, especially without a priori knowledge about the "correct" features to use for time series data.
- Featurization algorithms: SymbolicDerivative, InferredHMMLikelihood, Csmash
- Distance measure: LikelihoodDistance
Example publications
-
Huang, Yi, Victor Rotaru, and Ishanu Chattopadhyay. "Sequence likelihood divergence for fast time series comparison." Knowledge and Information Systems 65, no. 7 (2023): 3079-3098. https://link.springer.com/article/10.1007/s10115-023-01855-0
-
Chattopadhyay, Ishanu, and Hod Lipson. "Data smashing: uncovering lurking order in data." Journal of The Royal Society Interface 11, no. 101 (2014): 20140826. https://royalsocietypublishing.org/doi/10.1098/rsif.2014.0826
-
Timesmash: Process-Aware Fast Time Series Clustering and Classification https://easychair.org/publications/preprint/qpVv
For questions or suggestions contact:research@paraknowledge.ai
Usage examples
SymbolicDerivative
from timesmash import SymbolicDerivative
from sklearn.ensemble import RandomForestClassifier
train = [[1, 0, 1, 0, 1, 0], [1, 1, 0, 1, 1, 0]]
train_label = [[0], [1]]
test = [[0, 1, 1, 0, 1, 1]]
train_features, test_features = SymbolicDerivative().fit_transform(
train=train, test=test, label=train_label
)
clf = RandomForestClassifier().fit(train_features, train_label)
label = clf.predict(test_features)
print("Predicted label: ", label)
LikelihoodDistance
from timesmash import LikelihoodDistance
from sklearn.cluster import KMeans
train = [[1, 0, 1.1, 0, 11.2, 0], [1, 1, 0, 1, 1, 0], [0, 0.9, 0, 1, 0, 1], [0, 1, 1, 0, 1, 1]]
dist_calc = LikelihoodDistance().fit(train)
dist = dist_calc.produce()
from sklearn.cluster import KMeans
clusters = KMeans(n_clusters = 2).fit(dist).labels_
print("Clusters: " clusters)
InferredHMMLikelihood
from timesmash import InferredHMMLikelihood
from sklearn.ensemble import RandomForestClassifier
train = [[1, 0, 1, 0, 1, 0], [1, 1, 0, 1, 1, 0]]
train_label = [[0], [1]]
test = [[0, 1, 1, 0, 1, 1]]
train_features, test_features = InferredHMMLikelihood().fit_transform(
train=train, test=test, label=train_label
)
clf = RandomForestClassifier().fit(train_features, train_label)
label = clf.predict(test_features)
print("Predicted label: ", label)
ClusteredHMMClassifier:
from timesmash import Quantizer, InferredHMMLikelihood, LikelihoodDistance
from sklearn.cluster import KMeans
from sklearn.ensemble import RandomForestClassifier
import pandas as pd
train = pd.DataFrame(
[[1, 0, 1, 0, 1, 0], [1, 1, 0, 1, 1, 0], [1, 0, 1, 0, 1, 0], [1, 1, 0, 1, 1, 0]]
)
train_label = pd.DataFrame([[0], [1], [0], [1]])
test = pd.DataFrame([[0, 1, 1, 0, 1, 1]])
qtz = Quantizer().fit(train, label=train_label)
new_labels = train_label.copy()
for label, dataframe in train_label.groupby(train_label.columns[0]):
dist = LikelihoodDistance(quantizer=qtz).fit(train.loc[dataframe.index]).produce()
sub_labels = KMeans(n_clusters=2, random_state=0).fit(dist).labels_
new_label_names = [str(label) + "_" + str(i) for i in sub_labels]
new_labels.loc[dataframe.index, train_label.columns[0]] = new_label_names
featurizer = InferredHMMLikelihood(quantizer=qtz, epsilon=0.01)
train_features, test_features = featurizer.fit_transform(
train=train, test=test, label=new_labels
)
clf = RandomForestClassifier().fit(train_features, train_label)
print("Predicted label: ", clf.predict(test_features))
XHMMFeatures for anomaly detection:
import pandas as pd
from timesmash import XHMMFeatures
from sklearn.neighbors import LocalOutlierFactor
channel1_train = pd.DataFrame([[0,1,0,1,0,1,0,1,0,1],[1,0,1,0,1,0,1,0,1,0]], index=['person_1', 'person_2'])
channel2_train = pd.DataFrame([[0,1,0,1,0,1,0,1,0,1],[1,0,1,0,1,0,1,0,1,0]], index=['person_1', 'person_2'])
labels = pd.DataFrame([1,1], index=['person_1', 'person_2'])
alg = XHMMFeatures(n_quantizations=1)
features_train = alg.fit_transform([channel1_train,channel2_train], labels)
clf = LocalOutlierFactor(novelty=True)
clf.fit(features_train)
channel1_test = pd.DataFrame([[0,1,0,1,0,1,0,1,0,1],[1,0,1,0,1,0,1,0,1]], index=['person_test_1', 'person_test_2'])
channel2_test= pd.DataFrame([[0,1,0,1,0,1,0,1,0,1],[0,1,0,1,0,1,0,1,0]], index=['person_test_1', 'person_test_2'])
features_test = alg.transform([channel1_test,channel2_test])
print(clf.predict(features_test))
XHMMFeatures for classification:
import pandas as pd
from timesmash import XHMMFeatures
from sklearn.ensemble import RandomForestClassifier
d1_train = pd.DataFrame([[0,1,0,1,0,1,0,1,0,1],[1,0,1,0,1,0,1,0,1,0]], index=['person_1', 'person_2'])
d2_train = pd.DataFrame([[1,0,1,0,1,0,1,0,1,0],[1,0,1,0,1,0,1,0,1,0]], index=['person_1', 'person_2'])
labels = pd.DataFrame([0,1], index=['person_1', 'person_2'])
alg = XHMMFeatures(n_quantizations=1)
features_train = alg.fit_transform([d1_train,d2_train], labels)
clf = RandomForestClassifier()
clf.fit(features_train, labels)
d1_test = pd.DataFrame([[1,0,1,0,1,0,1,0,1]], index=['person_test'])
d2_test= pd.DataFrame([[0,1,0,1,0,1,0,1,0]], index=['person_test'])
features_test = alg.transform([d1_test,d2_test])
print(clf.predict(features_test))
XHMMClustering for multichannel clustering:
import pandas as pd
from timesmash import XHMMClustering
channel1 = pd.DataFrame(
[
[0, 1, 0, 1, 0, 1, 0, 1, 0, 1],
[1, 0, 1, 0, 1, 0, 1, 0, 1, 0],
[1, 0, 1, 0, 1, 0, 1, 0, 1, 0],
[0, 1, 0, 1, 0, 1, 0, 1, 0, 1],
],
index=["person_1", "person_2", "person_3", "person_4"],
)
channel2 = pd.DataFrame(
[
[1, 0, 1, 0, 1, 0, 1, 0, 1, 0],
[1, 0, 1, 0, 1, 0, 1, 0, 1, 0],
[0, 1, 0, 1, 0, 1, 0, 1, 0, 1],
[0, 1, 0, 1, 0, 1, 0, 1, 0, 1],
],
index=["person_1", "person_2", "person_3", "person_4"],
)
alg = XHMMClustering(n_quantizations=1).fit(
[channel1, channel2]
)
clusters = alg.labels_
print(clusters)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file timesmash-0.2.26.tar.gz.
File metadata
- Download URL: timesmash-0.2.26.tar.gz
- Upload date:
- Size: 32.9 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.6.0 importlib_metadata/4.6.3 pkginfo/1.7.1 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.64.1 CPython/3.10.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1c0ee0038552589ec7ab6c6dcf7d1e403451ac7cb0b60cca4f229d785028c9ff
|
|
| MD5 |
9a6c3d0159ffac88ee6e128cc526578d
|
|
| BLAKE2b-256 |
16ea6975cb44e75c397b867eee78591732bff6911c6cb77d957d6849df7a4516
|
File details
Details for the file timesmash-0.2.26-py3-none-any.whl.
File metadata
- Download URL: timesmash-0.2.26-py3-none-any.whl
- Upload date:
- Size: 33.1 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.6.0 importlib_metadata/4.6.3 pkginfo/1.7.1 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.64.1 CPython/3.10.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8db671cf02517f32b92f240798d072154e6b9d578a0bf10f5d98e551cb093346
|
|
| MD5 |
699fde4abeca7dc697490e489e082a3c
|
|
| BLAKE2b-256 |
72f372e33344cebba8df5570e543aae800cc0f9038f8c6383e791e5f697c5377
|