Skip to main content

An Illumina Sample Sheet parsing utility.

Project description

.. raw:: html

<h1 align="center">

sample-sheet

.. raw:: html

</h2>

.. raw:: html

<p align="center">

A Python 3.6 library for handling Illumina sample sheets

.. raw:: html

</p>

.. raw:: html

<p align="center">

Installation · Tutorial · Command Line Utility · Contributing

.. raw:: html

</p>

.. raw:: html

<p align="center">

.. raw:: html

</p>

The intent of this library is to obviate the need to use Illumina’s
proprietary `Experiment
Manager <https://support.illumina.com/sequencing/sequencing_software/experiment_manager.html>`__
and to enable interactive reading, *de novo* creation, and writing of
Sample Sheets for all Illumina platforms. As of ``v0.5.0`` this library
supports the entire Illumina specification for a sample sheet as defined
in `this
manual <https://www.illumina.com/content/dam/illumina-marketing/documents/products/technotes/sequencing-sheet-format-specifications-technical-note-970-2017-004.pdf>`__.

.. raw:: html

<h3 align="center">

Installation

.. raw:: html

</h3>

::

❯ pip install sample_sheet

.. raw:: html

<h3 align="center">

Tutorial

.. raw:: html

</h3>

To demonstrate the features of this library we a test file available in
this repostiory at the relative location:
```sample-sheet/tests/resources/paired-end-single-index.csv`` <tests/resources/paired-end-single-index.csv>`__.

.. code:: python

from sample_sheet import SampleSheet

host = 'https://raw.githubusercontent.com/'
url = host + 'clintval/sample-sheet/master/tests/resources/paired-end-single-index.csv'

sample_sheet = SampleSheet(url)

The metadata of the sample sheet can be accessed with the ``Header``,
``Reads`` and, ``Settings`` attributes:

.. code:: python

>>> sample_sheet.Header.Assay
'SureSelectXT'

>>> sample_sheet.Reads
[151, 151]

>>> sample_sheet.is_paired_end
True

>>> sample_sheet.Settings.BarcodeMismatches
'2'

The samples can be accessed directly or *via* iteration:

.. code:: python

>>> sample_sheet.samples
[Sample({"Sample_ID": "1823A", "Sample_Name": "1823A-tissue", "index": "GAATCTGA"}),
Sample({"Sample_ID": "1823B", "Sample_Name": "1823B-tissue", "index": "AGCAGGAA"}),
Sample({"Sample_ID": "1824A", "Sample_Name": "1824A-tissue", "index": "GAGCTGAA"}),
Sample({"Sample_ID": "1825A", "Sample_Name": "1825A-tissue", "index": "AAACATCG"}),
Sample({"Sample_ID": "1826A", "Sample_Name": "1826A-tissue", "index": "GAGTTAGC"}),
Sample({"Sample_ID": "1826B", "Sample_Name": "1823A-tissue", "index": "CGAACTTA"}),
Sample({"Sample_ID": "1829A", "Sample_Name": "1823B-tissue", "index": "GATAGACA"})]

>>> for sample in sample_sheet:
>>> print(sample)
>>> break
"1823A"

If a column labeled ``Read_Structure`` is provided *per* sample, then
additional functionality is enabled.

.. code:: python

>>> first_sample, *_ = sample_sheet.samples
>>> first_sample.Read_Structure
ReadStructure(structure="151T8B151T")

>>> first_sample.Read_Structure.total_cycles
310

>>> first_sample.Read_Structure.tokens
['151T', '8B', '151T']

Sample Sheet Creation
^^^^^^^^^^^^^^^^^^^^^

Sample sheets can be created *de novo* and written to a file-like
object. The following snippet shows how to add attributes to mandatory
sections, add optional user-defined sections, and add samples before
writing the file to a file-like object.

.. code:: python

import sys

sample_sheet = SampleSheet()

# Fill out the [Header] section of the sample sheet.
sample_sheet.Header.IEM4FileVersion = 4

# If you want to use a key with whitespace it in you must use the `add_attr`
# method and specify and alternate name.
sample_sheet.Header.add_attr(attr='Investigator_Name', value='jdoe', name='Investigator Name')

# An optional [Manifests] section can be added.
sample_sheet.add_section('Manifests')

# Fill out the [Settings] section of the sample sheet.
sample_sheet.Settings.CreateFastqForIndexReads = 1
sample_sheet.Settings.BarcodeMismatches = 2

# Create a paired-end flowcell with 151 template bases.
sample_sheet.Reads = [151, 151]

# Create your first single-indexed sample with both a name and ID.
sample = Sample(dict(Sample_ID='1823A', Sample_Name='1823A-tissue', index='ACGT'))

sample_sheet.add_sample(sample)

sample_sheet.write(sys.stdout)

.. code:: python

"""
[Header],,
IEM4FileVersion,4,
Investigator Name,jdoe,
,,
[Reads],,
151,,
151,,
,,
[Manifests],,
,,
[Settings],,
CreateFastqForIndexReads,1,
BarcodeMismatches,2,
,,
[Data],,
Sample_ID,Sample_Name,index
1823A,1823A-tissue,ACGT
"""

IPython Integration
^^^^^^^^^^^^^^^^^^^

A quick summary of the samples can be displayed in Markdown ASCII or
HTML rendered Markdown if run in an IPython environment:

.. code:: python

>>> sample_sheet.experimental_design
"""
| Sample_ID | Sample_Name | Library_ID | Description |
|:------------|:--------------|:-------------|:-----------------|
| 1823A | 1823A-tissue | 2017-01-20 | 0.5x treatment |
| 1823B | 1823B-tissue | 2017-01-20 | 0.5x treatment |
| 1824A | 1824A-tissue | 2017-01-20 | 1.0x treatment |
| 1825A | 1825A-tissue | 2017-01-20 | 10.0x treatment |
| 1826A | 1826A-tissue | 2017-01-20 | 100.0x treatment |
| 1826B | 1823A-tissue | 2017-01-17 | 0.5x treatment |
| 1829A | 1823B-tissue | 2017-01-17 | 0.5x treatment |
"""

.. raw:: html

<h3 align="center">

Command Line Utility

.. raw:: html

</h3>

Prints a tabular summary of the sample sheet.

.. code:: bash

❯ sample-sheet summary paired-end-single-index.csv
┌Header─────────────┬─────────────────────────────────┐
│ IEM1FileVersion │ 4 │
│ Investigator_Name │ jdoe │
│ Experiment_Name │ exp001 │
│ Date │ 11/16/2017 │
│ Workflow │ SureSelectXT │
│ Application │ NextSeq FASTQ Only │
│ Assay │ SureSelectXT │
│ Description │ A description of this flow cell │
│ Chemistry │ Default │
└───────────────────┴─────────────────────────────────┘
┌Settings──────────────────┬──────────┐
│ CreateFastqForIndexReads │ 1 │
│ BarcodeMismatches │ 2 │
│ Reads │ 151, 151 │
└──────────────────────────┴──────────┘
┌Identifiers┬──────────────┬────────────┬──────────┬────────┐
│ Sample_ID │ Sample_Name │ Library_ID │ index │ index2 │
├───────────┼──────────────┼────────────┼──────────┼────────┤
│ 1823A │ 1823A-tissue │ 2017-01-20 │ GAATCTGA │ │
│ 1823B │ 1823B-tissue │ 2017-01-20 │ AGCAGGAA │ │
│ 1824A │ 1824A-tissue │ 2017-01-20 │ GAGCTGAA │ │
│ 1825A │ 1825A-tissue │ 2017-01-20 │ AAACATCG │ │
│ 1826A │ 1826A-tissue │ 2017-01-20 │ GAGTTAGC │ │
│ 1826B │ 1823A-tissue │ 2017-01-17 │ CGAACTTA │ │
│ 1829A │ 1823B-tissue │ 2017-01-17 │ GATAGACA │ │
└───────────┴──────────────┴────────────┴──────────┴────────┘
┌Descriptions──────────────────┐
│ Sample_ID │ Description │
├───────────┼──────────────────┤
│ 1823A │ 0.5x treatment │
│ 1823B │ 0.5x treatment │
│ 1824A │ 1.0x treatment │
│ 1825A │ 10.0x treatment │
│ 1826A │ 100.0x treatment │
│ 1826B │ 0.5x treatment │
│ 1829A │ 0.5x treatment │
└───────────┴──────────────────┘

.. raw:: html

<h3 align="center">

Contributing

.. raw:: html

</h3>

Pull requests, feature requests, and issues welcome!

To make a development install:

.. code:: bash

❯ git clone git@github.com:clintval/sample-sheet.git
❯ pip install -e 'sample-sheet[fancytest]'

To run the tests:

::

Name Stmts Miss Cover
---------------------------------------------------
sample_sheet/__init__.py 1 0 100%
sample_sheet/_sample_sheet.py 334 0 100%
---------------------------------------------------
TOTAL 335 0 100%

OK! 65 tests, 0 failures, 0 errors in 0.1s

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sample_sheet-0.5.0.tar.gz (14.4 kB view details)

Uploaded Source

File details

Details for the file sample_sheet-0.5.0.tar.gz.

File metadata

  • Download URL: sample_sheet-0.5.0.tar.gz
  • Upload date:
  • Size: 14.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for sample_sheet-0.5.0.tar.gz
Algorithm Hash digest
SHA256 c195595a71914ae5c646720fca11962adccdef3cef681893e79b8b0860da6dbf
MD5 ec23c69ee06848cbad92606c12a62c9a
BLAKE2b-256 fb0dd4ac74639fec263f3fd5df28a53b9a46538c6fb60ea39d821c39278f05fe

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page