Skip to main content

Python library for GraphBLAS: high-performance sparse linear algebra for scalable graph analytics

Project description

Python-graphblas

conda-forge pypi License Tests Docs Coverage Binder Discord

Python library for GraphBLAS: high-performance sparse linear algebra for scalable graph analytics.

Install

Install the latest version of Python-graphblas via conda:

$ conda install -c conda-forge python-graphblas

or pip:

$ pip install python-graphblas

This will also install the SuiteSparse:GraphBLAS compiled C library.

Description

Currently works with SuiteSparse:GraphBLAS, but the goal is to make it work with all implementations of the GraphBLAS spec.

The approach taken with this library is to follow the C-API specification as closely as possible while making improvements allowed with the Python syntax. Because the spec always passes in the output object to be written to, we follow the same, which is very different from the way Python normally operates. In fact, many who are familiar with other Python data libraries (numpy, pandas, etc) will find it strange to not create new objects for every call.

At the highest level, the goal is to separate output, mask, and accumulator on the left side of the assignment operator = and put the computation on the right side. Unfortunately, that approach doesn't always work very well with how Python handles assignment, so instead we (ab)use the left-shift << notation to give the same flavor of assignment. This opens up all kinds of nice possibilities.

This is an example of how the mapping works:

// C call
GrB_Matrix_mxm(M, mask, GrB_PLUS_INT64, GrB_MIN_PLUS_INT64, A, B, NULL)
# Python call
M(mask.V, accum=binary.plus) << A.mxm(B, semiring.min_plus)

The expression on the right A.mxm(B) creates a delayed object which does no computation. Once it is used in the << expression with M, the whole thing is translated into the equivalent GraphBLAS call.

Delayed objects also have a .new() method which can be used to force computation and return a new object. This is convenient and often appropriate, but will create many unnecessary objects if used in a loop. It also loses the ability to perform accumulation with existing results. For best performance, following the standard GraphBLAS approach of (1) creating the object outside the loop and (2) using the object repeatedly within each loop is a much better approach, even if it doesn't feel very Pythonic.

Descriptor flags are set on the appropriate elements to keep logic close to what it affects. Here is the same call with descriptor bits set. ttcsr indicates transpose the first and second matrices, complement the structure of the mask, and do a replacement on the output.

// C call
GrB_Matrix_mxm(M, mask, GrB_PLUS_INT64, GrB_MIN_PLUS_INT64, A, B, desc.ttcsr)
# Python call
M(~mask.S, accum=binary.plus, replace=True) << A.T.mxm(B.T, semiring.min_plus)

The objects receiving the flag operations (A.T, ~mask, etc) are also delayed objects. They hold on to the state but do no computation, allowing the correct descriptor bits to be set in a single GraphBLAS call.

If no mask or accumulator is used, the call looks like this:

M << A.mxm(B, semiring.min_plus)

The use of << to indicate updating is actually just syntactic sugar for a real .update() method. The above expression could be written as:

M.update(A.mxm(B, semiring.min_plus))

Operations

M(mask, accum) << A.mxm(B, semiring)        # mxm
w(mask, accum) << A.mxv(v, semiring)        # mxv
w(mask, accum) << v.vxm(B, semiring)        # vxm
M(mask, accum) << A.ewise_add(B, binaryop)  # eWiseAdd
M(mask, accum) << A.ewise_mult(B, binaryop) # eWiseMult
M(mask, accum) << A.kronecker(B, binaryop)  # kronecker
M(mask, accum) << A.T                       # transpose

Extract

M(mask, accum) << A[rows, cols]             # rows and cols are a list or a slice
w(mask, accum) << A[rows, col_index]        # extract column
w(mask, accum) << A[row_index, cols]        # extract row
s = A[row_index, col_index].value           # extract single element

Assign

M(mask, accum)[rows, cols] << A             # rows and cols are a list or a slice
M(mask, accum)[rows, col_index] << v        # assign column
M(mask, accum)[row_index, cols] << v        # assign row
M(mask, accum)[rows, cols] << s             # assign scalar to many elements
M[row_index, col_index] << s                # assign scalar to single element
                                            # (mask and accum not allowed)
del M[row_index, col_index]                 # remove single element

Apply

M(mask, accum) << A.apply(unaryop)
M(mask, accum) << A.apply(binaryop, left=s)   # bind-first
M(mask, accum) << A.apply(binaryop, right=s)  # bind-second

Reduce

v(mask, accum) << A.reduce_rowwise(op)      # reduce row-wise
v(mask, accum) << A.reduce_columnwise(op)   # reduce column-wise
s(accum) << A.reduce_scalar(op)
s(accum) << v.reduce(op)

Creating new Vectors / Matrices

A = Matrix.new(dtype, num_rows, num_cols)   # new_type
B = A.dup()                                 # dup
A = Matrix.from_coo([row_indices], [col_indices], [values])  # build

New from delayed

Delayed objects can be used to create a new object using .new() method

C = A.mxm(B, semiring).new()

Properties

size = v.size                               # size
nrows = M.nrows                             # nrows
ncols = M.ncols                             # ncols
nvals = M.nvals                             # nvals
rindices, cindices, vals = M.to_coo()       # extractTuples

Initialization

There is a mechanism to initialize graphblas with a context prior to use. This allows for setting the backend to use as well as the blocking/non-blocking mode. If the context is not initialized, a default initialization will be performed automatically.

import graphblas as gb
# Context initialization must happen before any other imports
gb.init('suitesparse', blocking=True)

# Now we can import other items from graphblas
from graphblas import binary, semiring
from graphblas import Matrix, Vector, Scalar

Performant User Defined Functions

Python-graphblas requires numba which enables compiling user-defined Python functions to native C for use in GraphBLAS.

Example customized UnaryOp:

from graphblas import unary

def force_odd_func(x):
    if x % 2 == 0:
        return x + 1
    return x

unary.register_new('force_odd', force_odd_func)

v = Vector.from_coo([0, 1, 3], [1, 2, 3])
w = v.apply(unary.force_odd).new()
w  # indexes=[0, 1, 3], values=[1, 3, 3]

Similar methods exist for BinaryOp, Monoid, and Semiring.

Import/Export connectors to the Python ecosystem

graphblas.io contains functions for converting to and from:

import graphblas as gb

# numpy arrays
# 1-D array becomes Vector, 2-D array becomes Matrix
A = gb.io.from_numpy(m)
m = gb.io.to_numpy(A)

# scipy.sparse matrices
A = gb.io.from_scipy_sparse(m)
m = gb.io.to_scipy_sparse(m, format='csr')

# networkx graphs
A = gb.io.from_networkx(g)
g = gb.io.to_networkx(A)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

python-graphblas-2022.11.0.tar.gz (265.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

python_graphblas-2022.11.0-py3-none-any.whl (273.8 kB view details)

Uploaded Python 3

File details

Details for the file python-graphblas-2022.11.0.tar.gz.

File metadata

  • Download URL: python-graphblas-2022.11.0.tar.gz
  • Upload date:
  • Size: 265.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.15

File hashes

Hashes for python-graphblas-2022.11.0.tar.gz
Algorithm Hash digest
SHA256 090d6d4ed7b037e254092fa82c784f1a33fa14ed11e50d644201bb6f3d4c2e59
MD5 1e62b290358661363aa103e1ba2d3c0a
BLAKE2b-256 ff1f2ba12cc9d5caa81bca6f202f4adb43df03695d62ca0b556cb11c99e515f1

See more details on using hashes here.

File details

Details for the file python_graphblas-2022.11.0-py3-none-any.whl.

File metadata

File hashes

Hashes for python_graphblas-2022.11.0-py3-none-any.whl
Algorithm Hash digest
SHA256 672b513e82f7474d0c9cb0ce1996f8c96eaee8bf692d4bd544a10a437a2e290d
MD5 71b16ec7086bebfb6589eec2095db10e
BLAKE2b-256 0fb2a9fd8d2c986bd0cdaa82af39fc57cb954d736f644e832f5920d140d4488b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page