Skip to main content

A coverage-guided fuzzer for Python and Python extensions.

Project description

Atheris: A Coverage-Guided, Native Python Fuzzer

Atheris is a coverage-guided Python fuzzing engine. It supports fuzzing of Python code, but also native extensions written for CPython. Atheris is based off of libFuzzer. When fuzzing native code, Atheris can be used in combination with Address Sanitizer or Undefined Behavior Sanitizer to catch extra bugs.

Installation Instructions

Atheris supports Linux (32- and 64-bit) and Mac OS X, Python versions 3.6-3.10.

You can install prebuilt versions of Atheris with pip:

pip3 install atheris

These wheels come with a built-in libFuzzer, which is fine for fuzzing Python code. If you plan to fuzz native extensions, you may need to build from source to ensure the libFuzzer version in Atheris matches your Clang version.

Building from Source

Atheris relies on libFuzzer, which is distributed with Clang. If you have a sufficiently new version of clang on your path, installation from source is as simple as:

# Build latest release from source
pip3 install --no-binary atheris atheris
# Build development code from source
git clone https://github.com/google/atheris.git
cd atheris
pip3 install .

If you don't have clang installed or it's too old, you'll need to download and build the latest version of LLVM. Follow the instructions in Installing Against New LLVM below.

Mac

Apple Clang doesn't come with libFuzzer, so you'll need to install a new version of LLVM from head. Follow the instructions in Installing Against New LLVM below.

Installing Against New LLVM

# Building LLVM
git clone https://github.com/llvm/llvm-project.git
cd llvm-project
mkdir build
cd build
cmake -DLLVM_ENABLE_PROJECTS='clang;compiler-rt' -G "Unix Makefiles" ../llvm
make -j 10  # This step is very slow

# Installing Atheris
CLANG_BIN="$(pwd)/bin/clang" pip3 install <whatever>

Using Atheris

Example

import atheris

with atheris.instrument_imports():
  import some_library
  import sys

def TestOneInput(data):
  some_library.parse(data)

atheris.Setup(sys.argv, TestOneInput)
atheris.Fuzz()

When fuzzing Python, Atheris will report a failure if the Python code under test throws an uncaught exception.

Python coverage

Atheris collects Python coverage information by instrumenting bytecode. There are 3 options for adding this instrumentation to the bytecode:

  • You can instrument the libraries you import:

    with atheris.instrument_imports():
      import foo
      from bar import baz
    

    This will cause instrumentation to be added to foo and bar, as well as any libraries they import.

  • Or, you can instrument individual functions:

    @atheris.instrument_func
    def my_function(foo, bar):
      print("instrumented")
    
  • Or finally, you can instrument everything:

    atheris.instrument_all()
    

    Put this right before atheris.Setup(). This will find every Python function currently loaded in the interpreter, and instrument it. This might take a while.

Atheris can additionally instrument regular expression checks, e.g. re.search. To enable this feature, you will need to add: atheris.enabled_hooks.add("RegEx") To your script before your code calls re.compile. Internally this will import the re module and instrument the necessary functions. This is currently an experimental feature.

Why am I getting "No interesting inputs were found"?

You might see this error:

ERROR: no interesting inputs were found. Is the code instrumented for coverage? Exiting.

You'll get this error if the first 2 calls to TestOneInput didn't produce any coverage events. Even if you have instrumented some Python code, this can happen if the instrumentation isn't reached in those first 2 calls. (For example, because you have a nontrivial TestOneInput). You can resolve this by adding an atheris.instrument_func decorator to TestOneInput, using atheris.instrument_all(), or moving your TestOneInput function into an instrumented module.

Visualizing Python code coverage

Examining which lines are executed is helpful for understanding the effectiveness of your fuzzer. Atheris is compatible with coverage.py: you can run your fuzzer using the coverage.py module as you would for any other Python program. Here's an example:

python3 -m coverage run your_fuzzer.py -atheris_runs=10000  # Times to run
python3 -m coverage html
(cd htmlcov && python3 -m http.server 8000)

Coverage reports are only generated when your fuzzer exits gracefully. This happens if:

  • you specify -atheris_runs=<number>, and that many runs have elapsed.
  • your fuzzer exits by Python exception.
  • your fuzzer exits by sys.exit().

No coverage report will be generated if your fuzzer exits due to a crash in native code, or due to libFuzzer's -runs flag (use -atheris_runs). If your fuzzer exits via other methods, such as SIGINT (Ctrl+C), Atheris will attempt to generate a report but may be unable to (depending on your code). For consistent reports, we recommend always using -atheris_runs=<number>.

If you'd like to examine coverage when running with your corpus, you can do that with the following command:

python3 -m coverage run your_fuzzer.py corpus_dir/* -atheris_runs=$(( 1 + $(ls corpus_dir | wc -l) ))

This will cause Atheris to run on each file in <corpus-dir>, then exit. Note: atheris use empty data set as the first input even if there is no empty file in <corpus_dir>. Importantly, if you leave off the -atheris_runs=$(ls corpus_dir | wc -l), no coverage report will be generated.

Using coverage.py will significantly slow down your fuzzer, so only use it for visualizing coverage; don't use it all the time.

Fuzzing Native Extensions

In order for fuzzing native extensions to be effective, your native extensions must be instrumented. See Native Extension Fuzzing for instructions.

Structure-aware Fuzzing

Atheris is based on a coverage-guided mutation-based fuzzer (LibFuzzer). This has the advantage of not requiring any grammar definition for generating inputs, making its setup easier. The disadvantage is that it will be harder for the fuzzer to generate inputs for code that parses complex data types. Often the inputs will be rejected early, resulting in low coverage.

Atheris supports custom mutators (as offered by LibFuzzer) to produce grammar-aware inputs.

Example (Atheris-equivalent of the example in the LibFuzzer docs):

@atheris.instrument_func
def TestOneInput(data):
  try:
    decompressed = zlib.decompress(data)
  except zlib.error:
    return

  if len(decompressed) < 2:
    return

  try:
    if decompressed.decode() == 'FU':
      raise RuntimeError('Boom')
  except UnicodeDecodeError:
    pass

To reach the RuntimeError crash, the fuzzer needs to be able to produce inputs that are valid compressed data and satisfy the checks after decompression. It is very unlikely that Atheris will be able to produce such inputs: mutations on the input data will most probably result in invalid data that will fail at decompression-time.

To overcome this issue, you can define a custom mutator function (equivalent to LLVMFuzzerCustomMutator). This example produces valid compressed data. To enable Atheris to make use of it, pass the custom mutator function to the invocation of atheris.Setup.

def CustomMutator(data, max_size, seed):
  try:
    decompressed = zlib.decompress(data)
  except zlib.error:
    decompressed = b'Hi'
  else:
    decompressed = atheris.Mutate(decompressed, len(decompressed))
  return zlib.compress(decompressed)

atheris.Setup(sys.argv, TestOneInput, custom_mutator=CustomMutator)
atheris.Fuzz()

As seen in the example, the custom mutator may request Atheris to mutate data using atheris.Mutate() (this is equivalent to LLVMFuzzerMutate).

You can experiment with custom_mutator_example.py and see that without the mutator Atheris would not be able to find the crash, while with the mutator this is achieved in a matter of seconds.

$ python3 example_fuzzers/custom_mutator_example.py --no_mutator
[...]
#2      INITED cov: 2 ft: 2 corp: 1/1b exec/s: 0 rss: 37Mb
#524288 pulse  cov: 2 ft: 2 corp: 1/1b lim: 4096 exec/s: 262144 rss: 37Mb
#1048576        pulse  cov: 2 ft: 2 corp: 1/1b lim: 4096 exec/s: 349525 rss: 37Mb
#2097152        pulse  cov: 2 ft: 2 corp: 1/1b lim: 4096 exec/s: 299593 rss: 37Mb
#4194304        pulse  cov: 2 ft: 2 corp: 1/1b lim: 4096 exec/s: 279620 rss: 37Mb
[...]

$ python3 example_fuzzers/custom_mutator_example.py
[...]
INFO: found LLVMFuzzerCustomMutator (0x7f9c989fb0d0). Disabling -len_control by default.
[...]
#2      INITED cov: 2 ft: 2 corp: 1/1b exec/s: 0 rss: 37Mb
#3      NEW    cov: 4 ft: 4 corp: 2/11b lim: 4096 exec/s: 0 rss: 37Mb L: 10/10 MS: 1 Custom-
#12     NEW    cov: 5 ft: 5 corp: 3/21b lim: 4096 exec/s: 0 rss: 37Mb L: 10/10 MS: 7 Custom-CrossOver-Custom-CrossOver-Custom-ChangeBit-Custom-
 === Uncaught Python exception: ===
RuntimeError: Boom
Traceback (most recent call last):
  File "example_fuzzers/custom_mutator_example.py", line 62, in TestOneInput
    raise RuntimeError('Boom')
[...]

Custom crossover functions (equivalent to LLVMFuzzerCustomCrossOver) are also supported. You can pass the custom crossover function to the invocation of atheris.Setup. See its usage in custom_crossover_fuzz_test.py.

Structure-aware Fuzzing with Protocol Buffers

libprotobuf-mutator has bindings to use it together with Atheris to perform structure-aware fuzzing using protocol buffers.

See the documentation for atheris_libprotobuf_mutator.

Integration with OSS-Fuzz

Atheris is fully supported by OSS-Fuzz, Google's continuous fuzzing service for open source projects. For integrating with OSS-Fuzz, please see https://google.github.io/oss-fuzz/getting-started/new-project-guide/python-lang.

API

The atheris module provides three key functions: instrument_imports(), Setup() and Fuzz().

In your source file, import all libraries you wish to fuzz inside a with atheris.instrument_imports():-block, like this:

# library_a will not get instrumented
import library_a

with atheris.instrument_imports():
    # library_b will get instrumented
    import library_b

Generally, it's best to import atheris first and then import all other libraries inside of a with atheris.instrument_imports() block.

Next, define a fuzzer entry point function and pass it to atheris.Setup() along with the fuzzer's arguments (typically sys.argv). Finally, call atheris.Fuzz() to start fuzzing. You must call atheris.Setup() before atheris.Fuzz().

instrument_imports(include=[], exclude=[])

  • include: A list of fully-qualified module names that shall be instrumented.
  • exclude: A list of fully-qualified module names that shall NOT be instrumented.

This should be used together with a with-statement. All modules imported in said statement will be instrumented. However, because Python imports all modules only once, this cannot be used to instrument any previously imported module, including modules required by Atheris. To add coverage to those modules, use instrument_all() instead.

A full list of unsupported modules can be retrieved as follows:

import sys
import atheris
print(sys.modules.keys())

instrument_func(func)

  • func: The function to instrument.

This will instrument the specified Python function and then return func. This is typically used as a decorator, but can be used to instrument individual functions too. Note that the func is instrumented in-place, so this will affect all call points of the function.

This cannot be called on a bound method - call it on the unbound version.

instrument_all()

This will scan over all objects in the interpreter and call instrument_func on every Python function. This works even on core Python interpreter functions, something which instrument_imports cannot do.

This function is experimental.

Setup(args, test_one_input, internal_libfuzzer=None)

  • args: A list of strings: the process arguments to pass to the fuzzer, typically sys.argv. This argument list may be modified in-place, to remove arguments consumed by the fuzzer. See the LibFuzzer docs for a list of such options.
  • test_one_input: your fuzzer's entry point. Must take a single bytes argument. This will be repeatedly invoked with a single bytes container.
  • internal_libfuzzer: Indicates whether libfuzzer will be provided by atheris or by an external library (see native_extension_fuzzing.md). If unspecified, Atheris will determine this automatically. If fuzzing pure Python, leave this as True.

Fuzz()

This starts the fuzzer. You must have called Setup() before calling this function. This function does not return.

In many cases Setup() and Fuzz() could be combined into a single function, but they are separated because you may want the fuzzer to consume the command-line arguments it handles before passing any remaining arguments to another setup function.

FuzzedDataProvider

Often, a bytes object is not convenient input to your code being fuzzed. Similar to libFuzzer, we provide a FuzzedDataProvider to translate these bytes into other input forms.

You can construct the FuzzedDataProvider with:

fdp = atheris.FuzzedDataProvider(input_bytes)

The FuzzedDataProvider then supports the following functions:

def ConsumeBytes(count: int)

Consume count bytes.

def ConsumeUnicode(count: int)

Consume unicode characters. Might contain surrogate pair characters, which according to the specification are invalid in this situation. However, many core software tools (e.g. Windows file paths) support them, so other software often needs to too.

def ConsumeUnicodeNoSurrogates(count: int)

Consume unicode characters, but never generate surrogate pair characters.

def ConsumeString(count: int)

Alias for ConsumeBytes in Python 2, or ConsumeUnicode in Python 3.

def ConsumeInt(int: bytes)

Consume a signed integer of the specified size (when written in two's complement notation).

def ConsumeUInt(int: bytes)

Consume an unsigned integer of the specified size.

def ConsumeIntInRange(min: int, max: int)

Consume an integer in the range [min, max].

def ConsumeIntList(count: int, bytes: int)

Consume a list of count integers of size bytes.

def ConsumeIntListInRange(count: int, min: int, max: int)

Consume a list of count integers in the range [min, max].

def ConsumeFloat()

Consume an arbitrary floating-point value. Might produce weird values like NaN and Inf.

def ConsumeRegularFloat()

Consume an arbitrary numeric floating-point value; never produces a special type like NaN or Inf.

def ConsumeProbability()

Consume a floating-point value in the range [0, 1].

def ConsumeFloatInRange(min: float, max: float)

Consume a floating-point value in the range [min, max].

def ConsumeFloatList(count: int)

Consume a list of count arbitrary floating-point values. Might produce weird values like NaN and Inf.

def ConsumeRegularFloatList(count: int)

Consume a list of count arbitrary numeric floating-point values; never produces special types like NaN or Inf.

def ConsumeProbabilityList(count: int)

Consume a list of count floats in the range [0, 1].

def ConsumeFloatListInRange(count: int, min: float, max: float)

Consume a list of count floats in the range [min, max]

def PickValueInList(l: list)

Given a list, pick a random value

def ConsumeBool()

Consume either True or False.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

atheris-2.1.1.tar.gz (278.0 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

atheris-2.1.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (29.9 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

atheris-2.1.1-cp310-cp310-macosx_10_9_universal2.whl (7.7 MB view details)

Uploaded CPython 3.10macOS 10.9+ universal2 (ARM64, x86-64)

atheris-2.1.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (29.9 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

atheris-2.1.1-cp39-cp39-macosx_12_0_x86_64.whl (7.2 MB view details)

Uploaded CPython 3.9macOS 12.0+ x86-64

atheris-2.1.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (29.9 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

atheris-2.1.1-cp38-cp38-macosx_12_0_x86_64.whl (7.2 MB view details)

Uploaded CPython 3.8macOS 12.0+ x86-64

atheris-2.1.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (30.2 MB view details)

Uploaded CPython 3.7mmanylinux: glibc 2.17+ x86-64

atheris-2.1.1-cp37-cp37m-macosx_12_0_x86_64.whl (7.2 MB view details)

Uploaded CPython 3.7mmacOS 12.0+ x86-64

atheris-2.1.1-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (30.2 MB view details)

Uploaded CPython 3.6mmanylinux: glibc 2.17+ x86-64

File details

Details for the file atheris-2.1.1.tar.gz.

File metadata

  • Download URL: atheris-2.1.1.tar.gz
  • Upload date:
  • Size: 278.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.7

File hashes

Hashes for atheris-2.1.1.tar.gz
Algorithm Hash digest
SHA256 c9050aff68e32f3843e0125c651c39dc2cdfbc8adfd1528036c1857ae23a955d
MD5 16a67729fbde59672497331dd2480815
BLAKE2b-256 6ba122f68df030fc8053dd2990afd881dacdc68547a572cd2e0e47e68ee46585

See more details on using hashes here.

File details

Details for the file atheris-2.1.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for atheris-2.1.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 97d5111b9de4d581b8a51cac3ebcacd81b2d2a69661f99b9316194ea6a01f6c7
MD5 c6930b9b1318daabaed820576c5089b6
BLAKE2b-256 11f6ec2b9c72e2ffbcecc5b2b27e54ecc13783ed87e5cb2435b7b6c797001547

See more details on using hashes here.

File details

Details for the file atheris-2.1.1-cp310-cp310-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for atheris-2.1.1-cp310-cp310-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 131cce292147ae30ec5c6388fc7c6d432f66efe652f8ea1bb94ccffa80551e11
MD5 06de6583d0da48d4b0203bbff4150ee3
BLAKE2b-256 f79e2c3f9253cfe977cf3da7085becdc543e61994b141388c3dd26e9bb554dbd

See more details on using hashes here.

File details

Details for the file atheris-2.1.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for atheris-2.1.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 27936e8602075faecf9b63a650c512461ece612b2f06b4882bf7e114618da6bd
MD5 62cb19554b628bdf1f6ae1d9e96d0e48
BLAKE2b-256 d93ca81ac889fe564f889685fef93eea61e22c1fd21f3c50858631862f7bd92b

See more details on using hashes here.

File details

Details for the file atheris-2.1.1-cp39-cp39-macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for atheris-2.1.1-cp39-cp39-macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 1bcd00732b0ddae3beec963b24a82b88126095bfcc40fd4045509dc1b272351d
MD5 4dd68fabac38de796f7bae95f53d4f04
BLAKE2b-256 d73acf2c40ad9a9d1926987cae9aa93b03d01b31cd7c1fe80561848d0dd09a5f

See more details on using hashes here.

File details

Details for the file atheris-2.1.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for atheris-2.1.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 687a74bf22467240882187f7ff1ae75fd4f5967f6564a8d969fc5c5aa7f4ec88
MD5 4af84b34c1d81b2f99210c63e6812a07
BLAKE2b-256 0ebeb5b9710d68820b55f65c93396a6bf4bef9194fa772215da63aed960a2057

See more details on using hashes here.

File details

Details for the file atheris-2.1.1-cp38-cp38-macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for atheris-2.1.1-cp38-cp38-macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 2f68a5dafb12b15de1da1bf72fef18a996b7889dfa8a7b08dac4051ffd2970f1
MD5 4b3223207b046939a7c434a47c2063ab
BLAKE2b-256 3fc43d8002af0c6a07fea89eb1c8ecc712767264990bcb0a921cd7dcebafe8e4

See more details on using hashes here.

File details

Details for the file atheris-2.1.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for atheris-2.1.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 fcb7f7e73aabaa2c7c01b204acf5f518f18c23eaa36e8c0d3ac325c1271290d6
MD5 e5a6318708caf80a918eaefacffd512e
BLAKE2b-256 7c7da808a201941055a3353a1d2f9a3661f553e125797ea31182b6f6daf4c35b

See more details on using hashes here.

File details

Details for the file atheris-2.1.1-cp37-cp37m-macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for atheris-2.1.1-cp37-cp37m-macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 a70c3473d9e1381c19c29c005e91806578634816a49c94a14f2ea3c2adb64d3a
MD5 aa69315980118f7a78c5a8160da99915
BLAKE2b-256 c9a9d0e6a5291b22eef6b8ce8d3eb21abb354a6525d002f83b36c70889659a6a

See more details on using hashes here.

File details

Details for the file atheris-2.1.1-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for atheris-2.1.1-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 eb8a9abf29acce06a2ac881ef10604bc2b363b549f1f9526f408376361c6b20e
MD5 42849bf27a810c600ed4b674f5c915fe
BLAKE2b-256 5f2d3c0e4dc670fa5d5f6a32f70426966edaaaa004444a65cbf7857bf2a1d571

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page