Skip to main content

VDJ assignment and antibody sequence annotation. Scalable from a single sequence to billions of sequences.

Project description

Build Status Documentation Status

abstar

VDJ assignment and antibody sequence annotation. Scalable from a single sequence to billions of sequences.

install

pip install abstar

use

To run abstar on a single FASTA or FASTQ file:
abstar -i <input-file> -o <output-directory> -t <temp-directory>

To iteratively run abstar on all files in an input directory:
abstar -i <input-directory> -o <output-directory> -t <temp-directory>

To run abstar using the included test data as input:
abstar -o <output-directory> -t <temp-directory> --use-test-data

When using the abstar test data, note that although the test data file contains 1,000 sequences, one of the test sequences is not a valid antibody recombination. Only 999 sequences should be processed successfully.

When using BaseSpace as the input data source, you can optionally provide all of the required directories:
abstar -i <input-directory> -o <output-directory> -t <temp-directory> -b

Or you can simply provide a single project directory, and all required directories will be created in the project directory:
abstar -p <project_directory> -b

additional options

-l LOG_LOCATION, --log LOG_LOCATION Change the log directory location. Default is the parent directory of <output_directory>.

-m, --merge Input directory should contain paired FASTQ (or gzipped FASTQ) files. Paired files will be merged with PANDAseq prior to processing with abstar. Note that when using the BaseSpace option (-b, --basespace), this option is implied.

-b, --basespace Download a sequencing run from BaseSpace, which is Illumina's cloud storage environment. Since Illumina sequencers produce paired-end reads, --merge is implied.

-u N, --uaid N Sequences contain a unique antibody ID (UAID, or molecular barcode) of length N. The uaid will be parsed from the beginning of each input sequence and added to the JSON output. Negative values result in the UAID being parsed from the end of the sequence.

-s SPECIES, --species SPECIES Select the species from which the input sequences are derived. Supported options are 'human', 'mouse', and 'macaque'. Default is 'human'.

-c, --cluster Runs abstar in distributed mode on a Celery cluster.

-h, --help Prints detailed information about all runtime options.

-D --debug Much more verbose logging.

api

Most core abstar functions are available through a public API, making it easier to run abstar as a component of integrated analysis pipelines. See the abstar documentation for more detail about the API.

helper scripts

A few helper scripts are included with abstar:
batch_mongoimport automates the import of multiple JSON output files into a MongoDB database.
build_abstar_germline_db creates abstar germline databases from IMGT-gapped FASTA files of V, D and J gene segments.
make_basespace_credfile makes a credentials file for BaseSpace, which is required if downloading sequences from BaseSpace with abstar. Developer credentials are required, and the process for obtaining them is explained here

testing

To run the test suite, clone or download the repository and run pytest ./ from the top-level directory.

requirements

Python 3.8+
abutils
biopython
celery
nwalign3
pymongo
pytest
scikit-bio

All of the above dependencies can be installed with pip, and will be installed automatically when installing abstar with pip.
If you're new to Python, a great way to get started is to install the Anaconda Python distribution, which includes pip as well as a ton of useful scientific Python packages.

sequence merging requires PANDAseq
batch_mongoimport requires MongoDB
BaseSpace downloading requires the BaseSpace Python SDK

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

abstar-0.6.1.tar.gz (38.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

abstar-0.6.1-py3-none-any.whl (39.0 MB view details)

Uploaded Python 3

File details

Details for the file abstar-0.6.1.tar.gz.

File metadata

  • Download URL: abstar-0.6.1.tar.gz
  • Upload date:
  • Size: 38.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.3

File hashes

Hashes for abstar-0.6.1.tar.gz
Algorithm Hash digest
SHA256 1809894a0d45b7e481af0487d622c525c59240ec21d831cdbaefe444e7090b9a
MD5 192776470b478b37097ed7fc0639c5fb
BLAKE2b-256 ccf79513bb778b9745b6ae0c07434e87a3bcd3a6380dc5fedd7d96d83b05dcb5

See more details on using hashes here.

File details

Details for the file abstar-0.6.1-py3-none-any.whl.

File metadata

  • Download URL: abstar-0.6.1-py3-none-any.whl
  • Upload date:
  • Size: 39.0 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.3

File hashes

Hashes for abstar-0.6.1-py3-none-any.whl
Algorithm Hash digest
SHA256 8a9003b0ad171a4f43de7d5a3a0aab09826f1957a5cd05ff42211c8813ba8d1a
MD5 613619005edc1bf041448871e9c4172e
BLAKE2b-256 8cb9cf11852a0a9bb340a2a66179956b20489f3d6480ccc8b6f36d2bdc5d40ec

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page