Model-parallel bayesian filtering with Apache Spark.
Project description
Artan
Model-parallel bayesian filtering with Apache Spark.
Overview
This library provides supports for running various bayesian filters in parallel with Apache Spark. Uses arbitrary stateful transformation capabilities of Spark DataFrames to define model-parallel bayesian filters. Therefore, it is suitable for latent state estimation of many similar small scale systems rather than a big single system.
Both structured streaming & batch processing modes are supported. Implemented filters extend SparkML Transformers, so you can transform a DataFrame of measurements to a DataFrame of estimated states with Kalman filters (extended, unscented, etc,..) and various other filters as a part of your SparkML Pipeline.
Artan requires Scala 2.11, Spark 2.4+ and Python 3,6+
Download
This project has been published to the Maven Central Repository. When submitting jobs on your cluster, you can use
spark-submit
with --packages
parameter to download all required dependencies including python packages.
spark-submit --packages='com.github.ozancicek:artan_2.11:0.2.0'
For SBT:
libraryDependencies += "com.github.ozancicek" %% "artan" % "0.2.0"
For python:
pip install artan
Note that pip will only install the python dependencies. To submit pyspark jobs, --packages='com.github.ozancicek:artan_2.11:0.2.0'
argument should be specified in order to download necessary jars.
Docs and Examples
Visit docs and examples for all sample scripts.
Streaming examples
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.