mafft_and_snp - WDL Atlas

WORKFLOW mafft_and_snp

File Path	`pipes/WDL/workflows/mafft_and_snp.wdl`
WDL Version	1.0
Type	workflow

Imports

Namespace	Path
`nextstrain`	`../tasks/tasks_nextstrain.wdl`
`utils`	`../tasks/tasks_utils.wdl`

Workflow: mafft_and_snp

Align assemblies with mafft and find SNPs with snp-sites.

Author: Broad Viral Genomics

viral-ngs@broadinstitute.org

Inputs

Name	Type	Description	Default
`assembly_fastas`	`Array[File]`	Set of assembled genomes to align and build trees. These must represent a single chromosome/segment of a genome only. Fastas may be one-sequence-per-individual or a concatenated multi-fasta (unaligned) or a mixture of the two. They may be compressed (gz, bz2, zst, lz4), uncompressed, or a mixture.	-
`ref_fasta`	`File`	A reference assembly (not included in assembly_fastas) to align assembly_fastas against. Typically from NCBI RefSeq or similar. Uncompressed.	-
`min_unambig_genome`	`Int`	Minimum number of called bases in genome to pass prefilter.	-
`exclude_sites`	`File?`	-	-
`vcf_reference`	`File?`	-	-
`tree_builder_args`	`String?`	-	-
21 optional inputs with default values
`run_iqtree`	`Boolean`	-	false
`cpus`	`Int`	-	4
`docker`	`String`	-	"quay.io/broadinstitute/viral-core:2.5.1"
`disk_size`	`Int`	-	750
`remove_reference`	`Boolean`	-	false
`keep_length`	`Boolean`	-	true
`large`	`Boolean`	-	false
`memsavetree`	`Boolean`	-	false
`docker`	`String`	-	"quay.io/broadinstitute/viral-phylo:2.5.1.0"
`mem_size`	`Int`	-	500
`cpus`	`Int`	-	64
`disk_size`	`Int`	-	750
`allow_wildcard_bases`	`Boolean`	-	true
`docker`	`String`	-	"quay.io/biocontainers/snp-sites:2.5.1--hed695b0_0"
`disk_size`	`Int`	-	750
`method`	`String`	-	"iqtree"
`substitution_model`	`String`	-	"GTR"
`cpus`	`Int`	-	64
`machine_mem_gb`	`Int`	-	32
`docker`	`String`	-	"docker.io/nextstrain/base:build-20240318T173028Z"
`disk_size`	`Int`	-	1250

Outputs

Name	Type	Expression
`combined_assemblies`	`File`	`filter_sequences_by_length.filtered_fasta`
`multiple_alignment`	`File`	`mafft.aligned_sequences`
`unmasked_snps`	`File`	`snp_sites.snps_vcf`
`ml_tree`	`File?`	`draft_augur_tree.aligned_tree`

Calls

This workflow calls the following tasks or subworkflows:

CALL TASKS `zcat` ↗

Input Mappings (2)

Input	Value
`infiles`	`assembly_fastas`
`output_name`	`"all_samples_combined_assembly.fasta.gz"`

CALL TASKS `filter_sequences_by_length` ↗

Input Mappings (2)

Input	Value
`sequences_fasta`	`zcat.combined`
`min_non_N`	`min_unambig_genome`

CALL TASKS `mafft` ↗ → mafft_one_chr

Input Mappings (3)

Input	Value
`sequences`	`filter_sequences_by_length.filtered_fasta`
`ref_fasta`	`ref_fasta`
`basename`	`"all_samples_aligned.fasta"`

CALL TASKS `snp_sites` ↗

Input Mappings (1)

Input	Value
`msa_fasta`	`mafft.aligned_sequences`

CALL TASKS `draft_augur_tree` ↗

Input Mappings (1)

Input	Value
`msa_or_vcf`	`mafft.aligned_sequences`

Images

Container images used by tasks in this workflow:

🐳 viral-core

quay.io/broadinstitute/viral-core:2.5.1

Used by 2 tasks:

zcat
filter_sequences_by_length

🐳 Parameterized Image

⚙️ Parameterized

Configured via input:
docker

Used by 1 task:

mafft

🐳 Parameterized Image

⚙️ Parameterized

Configured via input:
docker

Used by 1 task:

snp_sites

🐳 Parameterized Image

⚙️ Parameterized

Configured via input:
docker

Used by 1 task:

draft_augur_tree

← Back to Index

flowchart TD
    Start([mafft_and_snp])
    N1["zcat"]
    N2["filter_sequences_by_length"]
    N3["mafft
mafft_one_chr"]
    N4["snp_sites"]
    subgraph C1 ["↔️ if run_iqtree"]
        direction TB
        N5["draft_augur_tree"]
    end
    N1 --> N2
    N2 --> N3
    N3 --> N4
    N3 --> N5
    Start --> N1
    N5 --> End([End])
    N4 --> End([End])
    classDef taskNode fill:#a371f7,stroke:#8b5cf6,stroke-width:2px,color:#fff
    classDef workflowNode fill:#58a6ff,stroke:#1f6feb,stroke-width:2px,color:#fff

version 1.0

import "../tasks/tasks_nextstrain.wdl" as nextstrain
import "../tasks/tasks_utils.wdl" as utils

workflow mafft_and_snp {
    meta {
        description: "Align assemblies with mafft and find SNPs with snp-sites."
        author: "Broad Viral Genomics"
        email:  "viral-ngs@broadinstitute.org"
    }

    input {
        Array[File]     assembly_fastas
        File            ref_fasta
        Int             min_unambig_genome
        Boolean         run_iqtree=false
    }

    parameter_meta {
        assembly_fastas: {
          description: "Set of assembled genomes to align and build trees. These must represent a single chromosome/segment of a genome only. Fastas may be one-sequence-per-individual or a concatenated multi-fasta (unaligned) or a mixture of the two. They may be compressed (gz, bz2, zst, lz4), uncompressed, or a mixture.",
          patterns: ["*.fasta", "*.fa", "*.fasta.gz", "*.fasta.zst"]
        }
        ref_fasta: {
          description: "A reference assembly (not included in assembly_fastas) to align assembly_fastas against. Typically from NCBI RefSeq or similar. Uncompressed.",
          patterns: ["*.fasta", "*.fa"]
        }
        min_unambig_genome: {
          description: "Minimum number of called bases in genome to pass prefilter."
        }
    }

    call utils.zcat {
        input:
            infiles     = assembly_fastas,
            output_name = "all_samples_combined_assembly.fasta.gz"
    }
    call utils.filter_sequences_by_length {
        input:
            sequences_fasta = zcat.combined,
            min_non_N       = min_unambig_genome
    }
    call nextstrain.mafft_one_chr as mafft {
        input:
            sequences = filter_sequences_by_length.filtered_fasta,
            ref_fasta = ref_fasta,
            basename  = "all_samples_aligned.fasta"
    }
    call nextstrain.snp_sites {
        input:
            msa_fasta = mafft.aligned_sequences
    }
    if(run_iqtree) {
        call nextstrain.draft_augur_tree {
            input:
                msa_or_vcf = mafft.aligned_sequences
        }
    }

    output {
        File  combined_assemblies = filter_sequences_by_length.filtered_fasta
        File  multiple_alignment  = mafft.aligned_sequences
        File  unmasked_snps       = snp_sites.snps_vcf
        File? ml_tree             = draft_augur_tree.aligned_tree
    }
}

version 1.0 import "../tasks/tasks_nextstrain.wdl" as nextstrain import "../tasks/tasks_utils.wdl" as utils workflow mafft_and_snp { meta { description: "Align assemblies with mafft and find SNPs with snp-sites." author: "Broad Viral Genomics" email: "viral-ngs@broadinstitute.org" } input { Array[File] assembly_fastas File ref_fasta Int min_unambig_genome Boolean run_iqtree=false } parameter_meta { assembly_fastas: { description: "Set of assembled genomes to align and build trees. These must represent a single chromosome/segment of a genome only. Fastas may be one-sequence-per-individual or a concatenated multi-fasta (unaligned) or a mixture of the two. They may be compressed (gz, bz2, zst, lz4), uncompressed, or a mixture.", patterns: ["*.fasta", "*.fa", "*.fasta.gz", "*.fasta.zst"] } ref_fasta: { description: "A reference assembly (not included in assembly_fastas) to align assembly_fastas against. Typically from NCBI RefSeq or similar. Uncompressed.", patterns: ["*.fasta", "*.fa"] } min_unambig_genome: { description: "Minimum number of called bases in genome to pass prefilter." } } call utils.zcat { input: infiles = assembly_fastas, output_name = "all_samples_combined_assembly.fasta.gz" } call utils.filter_sequences_by_length { input: sequences_fasta = zcat.combined, min_non_N = min_unambig_genome } call nextstrain.mafft_one_chr as mafft { input: sequences = filter_sequences_by_length.filtered_fasta, ref_fasta = ref_fasta, basename = "all_samples_aligned.fasta" } call nextstrain.snp_sites { input: msa_fasta = mafft.aligned_sequences } if(run_iqtree) { call nextstrain.draft_augur_tree { input: msa_or_vcf = mafft.aligned_sequences } } output { File combined_assemblies = filter_sequences_by_length.filtered_fasta File multiple_alignment = mafft.aligned_sequences File unmasked_snps = snp_sites.snps_vcf File? ml_tree = draft_augur_tree.aligned_tree } }

WORKFLOW mafft_and_snp

Imports

Workflow: mafft_and_snp

Inputs

Outputs

Calls

CALL TASKS zcat ↗

CALL TASKS filter_sequences_by_length ↗

CALL TASKS mafft ↗ → mafft_one_chr

CALL TASKS snp_sites ↗

CALL TASKS draft_augur_tree ↗

Images

mafft_and_snp - Workflow Graph

mafft_and_snp - WDL Source Code

CALL TASKS `zcat` ↗

CALL TASKS `filter_sequences_by_length` ↗

CALL TASKS `mafft` ↗ → mafft_one_chr

CALL TASKS `snp_sites` ↗

CALL TASKS `draft_augur_tree` ↗