mafft_and_snp
pipes/WDL/workflows/mafft_and_snp.wdl

WORKFLOW mafft_and_snp

File Path pipes/WDL/workflows/mafft_and_snp.wdl
WDL Version 1.0
Type workflow

Imports

Namespace Path
nextstrain ../tasks/tasks_nextstrain.wdl
utils ../tasks/tasks_utils.wdl

Workflow: mafft_and_snp

Align assemblies with mafft and find SNPs with snp-sites.

Author: Broad Viral Genomics
viral-ngs@broadinstitute.org

Inputs

Name Type Description Default
assembly_fastas Array[File] Set of assembled genomes to align and build trees. These must represent a single chromosome/segment of a genome only. Fastas may be one-sequence-per-individual or a concatenated multi-fasta (unaligned) or a mixture of the two. They may be compressed (gz, bz2, zst, lz4), uncompressed, or a mixture. -
ref_fasta File A reference assembly (not included in assembly_fastas) to align assembly_fastas against. Typically from NCBI RefSeq or similar. Uncompressed. -
min_unambig_genome Int Minimum number of called bases in genome to pass prefilter. -
exclude_sites File? - -
vcf_reference File? - -
tree_builder_args String? - -
21 optional inputs with default values

Outputs

Name Type Expression
combined_assemblies File filter_sequences_by_length.filtered_fasta
multiple_alignment File mafft.aligned_sequences
unmasked_snps File snp_sites.snps_vcf
ml_tree File? draft_augur_tree.aligned_tree

Calls

This workflow calls the following tasks or subworkflows:

CALL TASKS zcat

Input Mappings (2)
Input Value
infiles assembly_fastas
output_name "all_samples_combined_assembly.fasta.gz"

CALL TASKS filter_sequences_by_length

Input Mappings (2)
Input Value
sequences_fasta zcat.combined
min_non_N min_unambig_genome

CALL TASKS mafft → mafft_one_chr

Input Mappings (3)
Input Value
sequences filter_sequences_by_length.filtered_fasta
ref_fasta ref_fasta
basename "all_samples_aligned.fasta"

CALL TASKS snp_sites

Input Mappings (1)
Input Value
msa_fasta mafft.aligned_sequences

CALL TASKS draft_augur_tree

Input Mappings (1)
Input Value
msa_or_vcf mafft.aligned_sequences

Images

Container images used by tasks in this workflow:

🐳 viral-core

quay.io/broadinstitute/viral-core:2.5.1

Used by 2 tasks:
  • zcat
  • filter_sequences_by_length
🐳 Parameterized Image
⚙️ Parameterized

Configured via input:
docker

Used by 1 task:
  • mafft
🐳 Parameterized Image
⚙️ Parameterized

Configured via input:
docker

Used by 1 task:
  • snp_sites
🐳 Parameterized Image
⚙️ Parameterized

Configured via input:
docker

Used by 1 task:
  • draft_augur_tree
← Back to Index

mafft_and_snp - Workflow Graph

🖱️ Scroll to zoom • Drag to pan • Double-click to reset • ESC to close

mafft_and_snp - WDL Source Code

version 1.0

import "../tasks/tasks_nextstrain.wdl" as nextstrain
import "../tasks/tasks_utils.wdl" as utils

workflow mafft_and_snp {
    meta {
        description: "Align assemblies with mafft and find SNPs with snp-sites."
        author: "Broad Viral Genomics"
        email:  "viral-ngs@broadinstitute.org"
    }

    input {
        Array[File]     assembly_fastas
        File            ref_fasta
        Int             min_unambig_genome
        Boolean         run_iqtree=false
    }

    parameter_meta {
        assembly_fastas: {
          description: "Set of assembled genomes to align and build trees. These must represent a single chromosome/segment of a genome only. Fastas may be one-sequence-per-individual or a concatenated multi-fasta (unaligned) or a mixture of the two. They may be compressed (gz, bz2, zst, lz4), uncompressed, or a mixture.",
          patterns: ["*.fasta", "*.fa", "*.fasta.gz", "*.fasta.zst"]
        }
        ref_fasta: {
          description: "A reference assembly (not included in assembly_fastas) to align assembly_fastas against. Typically from NCBI RefSeq or similar. Uncompressed.",
          patterns: ["*.fasta", "*.fa"]
        }
        min_unambig_genome: {
          description: "Minimum number of called bases in genome to pass prefilter."
        }
    }

    call utils.zcat {
        input:
            infiles     = assembly_fastas,
            output_name = "all_samples_combined_assembly.fasta.gz"
    }
    call utils.filter_sequences_by_length {
        input:
            sequences_fasta = zcat.combined,
            min_non_N       = min_unambig_genome
    }
    call nextstrain.mafft_one_chr as mafft {
        input:
            sequences = filter_sequences_by_length.filtered_fasta,
            ref_fasta = ref_fasta,
            basename  = "all_samples_aligned.fasta"
    }
    call nextstrain.snp_sites {
        input:
            msa_fasta = mafft.aligned_sequences
    }
    if(run_iqtree) {
        call nextstrain.draft_augur_tree {
            input:
                msa_or_vcf = mafft.aligned_sequences
        }
    }

    output {
        File  combined_assemblies = filter_sequences_by_length.filtered_fasta
        File  multiple_alignment  = mafft.aligned_sequences
        File  unmasked_snps       = snp_sites.snps_vcf
        File? ml_tree             = draft_augur_tree.aligned_tree
    }
}