Use MCMC to do benchmark analysis

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

benchmcmc — Benchmark analysis with MCMC

Install

Installing is as always

pip install benchmcmc

It depends on pymc3 and matplotlib, so these are also installed.

Quickstart

Getting started real quick:

benchmcmc --generate 69 11 10 131 10 9 --beta > bench.txt
benchmcmc bench.txt

Introduction

This package lets you take a series of benchmark data analyses whether there at one point was a change in performance.

Suppose that you run your benchmark tests on every commit that you have (e.g. looping over git-rev-list), and you see that your performance data is (e.g. in requests per second or in seconds, or other measures) as follows:

13.64
12.82
11.69
15.12
12.30
18.46
13.51
14.33
13.84
12.77
... (180 rows omitted)
10.93
11.02
12.45
11.78
12.12
13.51
10.66
10.18
10.81
12.19

In this data, it seems to be centered around 13+ε in the beginning, and it ends being centered around 12+ε, or visualized:

scatterplot of performance over time

There seems to be a slight drop in values before the 100th point, but it's not easy to determine exactly where the change occurred.

Suppose that you wonder whether or not the performance at the start and at the end are likely to be from two different distributions, and if so, where the switchpoint was.

traceplot

Analysis

Running benchmcmc on the data gives the above plot which shows that it is likely that the performance went from ~13.5 to ~12.25 at or around the 69th or 75th datapoints.

This helps you pin down when a performance change might have occurred.

Generating synthetic data

You can run benchmcmc --generate for generating synthetic benchmark data.

$ benchmcmc --generate 100 15 3 100 14 3 [--beta] > benchmarkfile.txt

This generates 200 samples, 100 from N(mu=15, sigma=3) followed by 100 from N(mu=14, sigma=3).

If you use --beta, you get a bit more realistic performance with a lower bound of mu, especially for lower values of mu.

Running a script on a history

Suppose that you want to run python script.py on a script that is in your Git tree.

LOGFILE=/tmp/timescript
echo "" > $LOGFILE
for commit in $(git rev-list master)
do
    git checkout $commit
    printf "%s," "`(git rev-parse --short HEAD)`" >> $LOGFILE
    /usr/bin/time -a -o $LOGFILE --format=%e python script.py
done
tac $LOGFILE

When run in a repository, it will output time data in the format

commit,time

Here is an example of the output:

484fde8,0.04
58a1cdb,0.04
d26b797,0.04
81f4b9a,0.04
3ae1e11,0.04
7689ca2,0.04
8c76b29,0.04
43db50c,0.04
b34b146,0.04
4c56a54,0.04
9c08050,0.07
b22278d,0.07
7a9c111,0.07
065b6a5,0.07
6cc7cdd,0.07
ec7f042,0.07
b3ba887,0.08
a32ce81,0.07
9136914,0.07
b456714,0.07
504cf73,0.07
8002774,0.07
e1f5f9f,0.09

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.0.7

Dec 7, 2020

0.0.6

Oct 27, 2020

0.0.5

Oct 27, 2020

This version

0.0.4

Oct 26, 2020

0.0.3

Oct 25, 2020

0.0.2

Oct 25, 2020

0.0.1

Oct 24, 2020

0.0.0

Oct 24, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

benchmcmc-0.0.4.tar.gz (5.1 kB view hashes)

Uploaded Oct 26, 2020 Source

Hashes for benchmcmc-0.0.4.tar.gz

Hashes for benchmcmc-0.0.4.tar.gz
Algorithm	Hash digest
SHA256	`822f14056a575e087b6572efa92b759f0a5673f5348ddaba2393fac12894ae56`
MD5	`3e338e00ed5c2d782bac6ab45b391830`
BLAKE2b-256	`7aaf158d9cdb16868c3e9fedbdbfa3e4dc00aa297e6153dab80351a485e7473b`