Skip to content

🚀 Quickstart

This is a workflow (written in Nextflow) to analyse mitochondrial sequencing experiments. Is is heavily inspired by the gatk-workflows/gatk4-mitochondria-pipeline and by the epi2me-labs/wf-template.

  1. Install Nextflow in your system (see official documentation)
  2. Download the GRCh38 release of the Human Genome and its index files. You need to save all these files in the same directory.

    Links from Google Cloud Life Sciences
  3. Put your FASTQ files (or symbolic links ln -s <original> <link>) in one directory. Use a name pattern in a way that you can identify your samples from the prefix.

    Example
    $ ls test_data/*.fq.gz
    test_data/SAMPLE-A_R1.fq.gz  test_data/SAMPLE-C_R1.fq.gz
    test_data/SAMPLE-A_R2.fq.gz  test_data/SAMPLE-C_R2.fq.gz
    test_data/SAMPLE-B_R1.fq.gz  test_data/SAMPLE-D_R1.fq.gz
    test_data/SAMPLE-B_R2.fq.gz  test_data/SAMPLE-D_R2.fq.gz
    
  4. Run the Human Mitochondrial Workflow

    Example
    ./nextflow run lmtani/wf-human-mito \
        --fastq "test_data/*_R{1,2}.fq.gz" \
        --reference /path/to/Homo_sapiens_assembly38.fasta \
        --outdir name-your-output-directory
    

In the end, you will have variants (VCF) and alignment (BAM) for each sample and a CSV file with information about each sample (coverage, haplogroup, etc). For more details about the meaning of each column, please see the 📦 Outputs section

Example
$ ls -1 name-your-output-directory/
alignments/            # Alignments (BAM)
all_samples.csv        # Information about each sample
execution/             # Pipeline execution details
variants/              # Variants (VCF)
workspace/             # Outputs of each process in the pipeline