🚀 Quickstart¶
This is a workflow (written in Nextflow) to analyse mitochondrial sequencing experiments. Is is heavily inspired by the gatk-workflows/gatk4-mitochondria-pipeline and by the epi2me-labs/wf-template.
- Install Nextflow in your system (see official documentation)
-
Download the GRCh38 release of the Human Genome and its index files. You need to save all these files in the same directory.
Links from Google Cloud Life Sciences
- Homo_sapiens_assembly38.fasta
- Homo_sapiens_assembly38.dict
- Homo_sapiens_assembly38.fasta.fai
- Homo_sapiens_assembly38.fasta.64.alt
- Homo_sapiens_assembly38.fasta.64.amb
- Homo_sapiens_assembly38.fasta.64.ann
- Homo_sapiens_assembly38.fasta.64.bwt
- Homo_sapiens_assembly38.fasta.64.pac
- Homo_sapiens_assembly38.fasta.64.sa
-
Put your FASTQ files (or symbolic links
ln -s <original> <link>
) in one directory. Use a name pattern in a way that you can identify your samples from the prefix.Example
$ ls test_data/*.fq.gz test_data/SAMPLE-A_R1.fq.gz test_data/SAMPLE-C_R1.fq.gz test_data/SAMPLE-A_R2.fq.gz test_data/SAMPLE-C_R2.fq.gz test_data/SAMPLE-B_R1.fq.gz test_data/SAMPLE-D_R1.fq.gz test_data/SAMPLE-B_R2.fq.gz test_data/SAMPLE-D_R2.fq.gz
-
Run the Human Mitochondrial Workflow
Example
./nextflow run lmtani/wf-human-mito \ --fastq "test_data/*_R{1,2}.fq.gz" \ --reference /path/to/Homo_sapiens_assembly38.fasta \ --outdir name-your-output-directory
In the end, you will have variants (VCF) and alignment (BAM) for each sample and a CSV file with information about each sample (coverage, haplogroup, etc). For more details about the meaning of each column, please see the 📦 Outputs section
Example
$ ls -1 name-your-output-directory/
alignments/ # Alignments (BAM)
all_samples.csv # Information about each sample
execution/ # Pipeline execution details
variants/ # Variants (VCF)
workspace/ # Outputs of each process in the pipeline