Skip to main content

An abstraction layer for distributed computation

Project description

Fugue

GitHub release PyPI pyversions PyPI license PyPI version Coverage Status Doc

Join Fugue-Project on Slack

Fugue is a pure abstraction layer that adapts to different computing frameworks such as Spark and Dask. It is to unify the core concepts of distributed computing and to help you decouple your logic from specific computing frameworks.

Installation

pip install fugue

Fugue has these extras:

For example a common use case is:

pip install fugue[sql,spark]

Docs and Tutorials

To read the complete static docs, click here

The best way to start is to go through the tutorials. We have the tutorials in an interactive notebook environent.

Run the tutorial using binder:

Binder

But it runs slow on binder, the machine on binder isn't powerful enough for a distributed framework such as Spark. Parallel executions can become sequential, so some of the performance comparison examples will not give you the correct numbers.

Run the tutorial using docker

Alternatively, you should get decent performance if running its docker image on your own machine:

docker run -p 8888:8888 fugueproject/tutorials:latest

Contributing Code

There are three steps to setting-up a development environment

  1. Create a virtual environment with your choice of environment manager
  2. Install the requirements
  3. Install the git hook scripts

Creating an environment

Below are examples for how to create and activate an environment in virtualenv and conda.

Using virtualenv

python3 -m venv venv
. venv/bin/activate

Using conda

conda create --name fugue-dev
conda activate fugue-dev

Installing requirements

The Fugue repo has a Makefile that can be used to install the requirements. It supports installation in both pip and conda. Instructions to install make for Windows users can be found later.

Pip install requirements

make setupinpip

Conda install requirements

make setupinconda

Manually install requirements

For Windows users who don't have the make command, you can use your package manager of choice. For pip:

pip3 install -r requirements.txt

For Anaconda users, first install pip in the newly created environment. If pip install is used without installing pip, conda will use the system-wide pip

conda install pip
pip install -r requirements.txt

Notes for Windows Users

For Windows users, you will need to download Microsoft C++ Build Tools found here

make is a GNU command that does not come with Windows. An installer can be downloaded here After installing, add the bin to your PATH environment variable.

Installing git hook scripts

Fugue has pre-commit hooks to check if code is appropriate to be committed. The previous make command installs this. If you installed the requirements manually, install the git hook scripts with:

pre-commit install

Update History

0.4.3

  • Unified checkpoints and persist
  • Drop columns and na implementations in both programming and sql interfaces
  • Presort takes array as input
  • Fixed jinja template rendering issue
  • Fixed path format detection bug

0.4.2

  • Require pandas 1.0 because of parquet schema
  • Improved Fugue SQL extension parsing logic
  • Doc for contributors to setup their environment

0.4.1

  • Added set operations to programming interface: union, subtract, intersect
  • Added distinct to programming interface
  • Ensured partitioning follows SQL convention: groups with null keys are NOT removed
  • Switched join, union, subtract, intersect, distinct to QPD implementations, so they follow SQL convention
  • Set operations in Fugue SQL can directly operate on Fugue statemens (e.g. TRANSFORM USING t1 UNION TRANSFORM USING t2)
  • Fixed bugs
  • Added onboarding document for contributors

<=0.4.0

  • Main features of Fugue core and Fugue SQL
  • Support backends: Pandas, Spark and Dask

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fugue-0.4.3.tar.gz (244.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fugue-0.4.3-py3-none-any.whl (300.9 kB view details)

Uploaded Python 3

File details

Details for the file fugue-0.4.3.tar.gz.

File metadata

  • Download URL: fugue-0.4.3.tar.gz
  • Upload date:
  • Size: 244.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.7.9

File hashes

Hashes for fugue-0.4.3.tar.gz
Algorithm Hash digest
SHA256 79ca32f9b3c5693c5125e5c9f190e47b7f8ffff8d5ff9085fb53916a182ba00c
MD5 b3bbe93d2f074703ff4746a9cbcc18bf
BLAKE2b-256 991dac82dc6c3c950812a8b00f9d67aeefb172b1adb01d7760f2951466d1ad75

See more details on using hashes here.

File details

Details for the file fugue-0.4.3-py3-none-any.whl.

File metadata

  • Download URL: fugue-0.4.3-py3-none-any.whl
  • Upload date:
  • Size: 300.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.7.9

File hashes

Hashes for fugue-0.4.3-py3-none-any.whl
Algorithm Hash digest
SHA256 38e4435f776afa28d141ddaaf4f995f57ba1a8536d43199e839f57091c809ffc
MD5 971371333ff8d88e9be9873fb69095e6
BLAKE2b-256 97efa376b8994dbcc63c53fd9ca90f523de9c76b1a71c46754aa1ea7a1ed1b94

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page