Skip to main content

Gene cluster prediction with Conditional random fields.

Project description

Hi, I'm GECCO!

🦎 ️Overview

GECCO (Gene Cluster prediction with Conditional Random Fields) is a fast and scalable method for identifying putative novel Biosynthetic Gene Clusters (BGCs) in genomic and metagenomic data using Conditional Random Fields (CRFs).

GitLabCI License Coverage Docs Source Mirror Changelog Issues Preprint PyPI Bioconda Versions Wheel

🔧 Installing GECCO

GECCO is implemented in Python, and supports all versions from Python 3.6. It requires additional libraries that can be installed directly from PyPI, the Python Package Index.

Use pip to install GECCO on your machine:

$ pip install gecco-tool

If you'd rather use Conda, a package is available in the bioconda channel. You can install with:

$ conda install -c bioconda gecco

This will install GECCO, its dependencies, and the data needed to run predictions. This requires around 100MB of data to be downloaded, so it could take some time depending on your Internet connection. Once done, you will have a gecco command available in your $PATH.

Note that GECCO uses HMMER3, which can only run on PowerPC and recent x86-64 machines running a POSIX operating system. Therefore, Linux and OSX are supported platforms, but GECCO will not be able to run on Windows.

🧬 Running GECCO

Once gecco is installed, you can run it from the terminal by giving it a FASTA or GenBank file with the genomic sequence you want to analyze, as well as an output directory:

$ gecco run --genome some_genome.fna -o some_output_dir

Additional parameters of interest are:

  • --jobs, which controls the number of threads that will be spawned by GECCO whenever a step can be parallelized. The default, 0, will autodetect the number of CPUs on the machine using multiprocessing.cpu_count.
  • --cds, controlling the minimum number of consecutive genes a BGC region must have to be detected by GECCO (default is 3).
  • --threshold, controlling the minimum probability for a gene to be considered part of a BGC region. Using a lower number will increase the number (and possibly length) of predictions, but reduce accuracy.

🔖 Reference

GECCO can be cited using the following preprint:

Accurate de novo identification of biosynthetic gene clusters with GECCO. Laura M Carroll, Martin Larralde, Jonas Simon Fleck, Ruby Ponnudurai, Alessio Milanese, Elisa Cappio Barazzone, Georg Zeller. bioRxiv 2021.05.03.442509; doi:10.1101/2021.05.03.442509

💭 Feedback

⚠️ Issue Tracker

Found a bug ? Have an enhancement request ? Head over to the GitHub issue tracker if you need to report or ask something. If you are filing in on a bug, please include as much information as you can about the issue, and try to recreate the same bug in a simple, easily reproducible situation.

🏗️ Contributing

Contributions are more than welcome! See CONTRIBUTING.md for more details.

⚖️ License

This software is provided under the GNU General Public License v3.0 or later. GECCO is developped by the Zeller Team at the European Molecular Biology Laboratory in Heidelberg.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gecco-tool-0.8.2.tar.gz (847.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gecco_tool-0.8.2-py2.py3-none-any.whl (40.5 MB view details)

Uploaded Python 2Python 3

File details

Details for the file gecco-tool-0.8.2.tar.gz.

File metadata

  • Download URL: gecco-tool-0.8.2.tar.gz
  • Upload date:
  • Size: 847.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.2 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.0 CPython/3.9.6

File hashes

Hashes for gecco-tool-0.8.2.tar.gz
Algorithm Hash digest
SHA256 2089696c3dcdc368708159b49d73ed26189cefbf823cc87d4932386a3c2421b6
MD5 526b654944aa06d3ff9eafed612b3030
BLAKE2b-256 b088f5e1ebe136e7e6de6d46cf6d3f76f52d2ae5a3060e96bd3ba74f0584d4e3

See more details on using hashes here.

File details

Details for the file gecco_tool-0.8.2-py2.py3-none-any.whl.

File metadata

  • Download URL: gecco_tool-0.8.2-py2.py3-none-any.whl
  • Upload date:
  • Size: 40.5 MB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.2 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.0 CPython/3.9.6

File hashes

Hashes for gecco_tool-0.8.2-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 9e3276cadebdfd95958d8285eb4590a6a8cff2d70c37e4ca04c322347848caed
MD5 bc59b12e912dc674db633afb84c0321b
BLAKE2b-256 5bc09ddb54650763fbe6515ea56595428574afd2d847cab292fc79a9bfdcba07

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page