augur_from_mltree
pipes/WDL/workflows/augur_from_mltree.wdl

WORKFLOW augur_from_mltree

File Path pipes/WDL/workflows/augur_from_mltree.wdl
WDL Version 1.0
Type workflow

Imports

Namespace Path
nextstrain ../tasks/tasks_nextstrain.wdl

Workflow: augur_from_mltree

Take a premade maximum likelihood tree (Newick format) and run the remainder of the augur pipeline (timetree modificaitons, ancestral inference, etc) and convert to json representation suitable for Nextstrain visualization. See https://nextstrain.org/docs/getting-started/ and https://nextstrain-augur.readthedocs.io/en/stable/

Author: Broad Viral Genomics
viral-ngs@broadinstitute.org

Inputs

Name Type Description Default
raw_tree File Maximum likelihood tree (newick format). -
msa_or_vcf File Multiple sequence alignment (aligned fasta) or variants (vcf format). -
sample_metadata File Metadata in tab-separated text format. See https://nextstrain-augur.readthedocs.io/en/stable/faq/metadata.html for details. -
ref_fasta File A reference assembly (not included in assembly_fastas) to align assembly_fastas against. Typically from NCBI RefSeq or similar. -
genbank_gb File A 'genbank' formatted gene annotation file that is used to calculate coding consequences of observed mutations. Must correspond to the same coordinate space as ref_fasta. Typically downloaded from the same NCBI accession number as ref_fasta. -
auspice_config File A file specifying options to customize the auspice export; see: https://nextstrain.github.io/auspice/customise-client/introduction -
clades_tsv File? A TSV file containing clade mutation positions in four columns: [clade gene site alt]; see: https://nextstrain.org/docs/tutorials/defining-clades -
ancestral_traits_to_infer Array[String]? A list of metadata traits to use for ancestral node inference (see https://nextstrain-augur.readthedocs.io/en/stable/usage/cli/traits.html). Multiple traits may be specified; must correspond exactly to column headers in metadata file. Omitting these values will skip ancestral trait inference, and ancestral nodes will not have estimated values for metadata. -
gen_per_year Int? - -
clock_rate Float? - -
clock_std_dev Float? - -
root String? - -
covariance Boolean? - -
precision Int? - -
branch_length_inference String? - -
coalescent String? - -
vcf_reference File? - -
weights File? - -
sampling_bias_correction Float? - -
vcf_reference File? - -
root_sequence File? - -
output_vcf File? - -
genes File? - -
vcf_reference_output File? - -
vcf_reference File? - -
lat_longs_tsv File? - -
colors_tsv File? - -
geo_resolutions Array[String]? - -
color_by_metadata Array[String]? - -
description_md File? - -
maintainers Array[String]? - -
title String? - -
29 optional inputs with default values

Outputs

Name Type Expression
time_tree File refine_augur_tree.tree_refined
auspice_input_json File export_auspice_json.virus_json

Calls

This workflow calls the following tasks or subworkflows:

CALL TASKS refine_augur_tree

Input Mappings (3)
Input Value
raw_tree raw_tree
msa_or_vcf msa_or_vcf
metadata sample_metadata

CALL TASKS ancestral_traits

Input Mappings (3)
Input Value
tree refine_augur_tree.tree_refined
metadata sample_metadata
columns select_first([ancestral_traits_to_infer, []])

CALL TASKS ancestral_tree

Input Mappings (2)
Input Value
tree refine_augur_tree.tree_refined
msa_or_vcf msa_or_vcf

CALL TASKS translate_augur_tree

Input Mappings (3)
Input Value
tree refine_augur_tree.tree_refined
nt_muts ancestral_tree.nt_muts_json
genbank_gb genbank_gb

CALL TASKS assign_clades_to_nodes

Input Mappings (5)
Input Value
tree_nwk refine_augur_tree.tree_refined
nt_muts_json ancestral_tree.nt_muts_json
aa_muts_json translate_augur_tree.aa_muts_json
ref_fasta ref_fasta
clades_tsv select_first([clades_tsv])

CALL TASKS export_auspice_json

Input Mappings (4)
Input Value
tree refine_augur_tree.tree_refined
sample_metadata sample_metadata
node_data_jsons select_all([refine_augur_tree.branch_lengths, ancestral_traits.node_data_json, ancestral_tree.nt_muts_json, translate_augur_tree.aa_muts_json, assign_clades_to_nodes.node_clade_data_json])
auspice_config auspice_config

Images

Container images used by tasks in this workflow:

🐳 Parameterized Image
⚙️ Parameterized

Configured via input:
docker

Used by 6 tasks:
  • refine_augur_tree
  • ancestral_tree
  • translate_augur_tree
  • export_auspice_json
  • ancestral_traits
  • assign_clades_to_nodes
← Back to Index

augur_from_mltree - Workflow Graph

🖱️ Scroll to zoom • Drag to pan • Double-click to reset • ESC to close

augur_from_mltree - WDL Source Code

version 1.0

import "../tasks/tasks_nextstrain.wdl" as nextstrain

workflow augur_from_mltree {
    meta {
        description: "Take a premade maximum likelihood tree (Newick format) and run the remainder of the augur pipeline (timetree modificaitons, ancestral inference, etc) and convert to json representation suitable for Nextstrain visualization. See https://nextstrain.org/docs/getting-started/ and https://nextstrain-augur.readthedocs.io/en/stable/"
        author: "Broad Viral Genomics"
        email:  "viral-ngs@broadinstitute.org"
    }

    input {
        File           raw_tree
        File           msa_or_vcf
        File           sample_metadata
        File           ref_fasta
        File           genbank_gb
        File           auspice_config
        File?          clades_tsv
        Array[String]? ancestral_traits_to_infer
    }

    parameter_meta {
        raw_tree: {
          description: "Maximum likelihood tree (newick format).",
          patterns: ["*.nwk", "*.newick"]
        }
        msa_or_vcf: {
          description: "Multiple sequence alignment (aligned fasta) or variants (vcf format).",
          patterns: ["*.fasta", "*.fa", "*.vcf", "*.vcf.gz"]
        }
        sample_metadata: {
          description: "Metadata in tab-separated text format. See https://nextstrain-augur.readthedocs.io/en/stable/faq/metadata.html for details.",
          patterns: ["*.txt", "*.tsv"]
        }
        ref_fasta: {
          description: "A reference assembly (not included in assembly_fastas) to align assembly_fastas against. Typically from NCBI RefSeq or similar.",
          patterns: ["*.fasta", "*.fa"]
        }
        genbank_gb: {
          description: "A 'genbank' formatted gene annotation file that is used to calculate coding consequences of observed mutations. Must correspond to the same coordinate space as ref_fasta. Typically downloaded from the same NCBI accession number as ref_fasta.",
          patterns: ["*.gb", "*.gbf"]
        }
        ancestral_traits_to_infer: {
          description: "A list of metadata traits to use for ancestral node inference (see https://nextstrain-augur.readthedocs.io/en/stable/usage/cli/traits.html). Multiple traits may be specified; must correspond exactly to column headers in metadata file. Omitting these values will skip ancestral trait inference, and ancestral nodes will not have estimated values for metadata."
        }
        auspice_config: {
          description: "A file specifying options to customize the auspice export; see: https://nextstrain.github.io/auspice/customise-client/introduction",
          patterns: ["*.json", "*.txt"]
        }
        clades_tsv: {
          description: "A TSV file containing clade mutation positions in four columns: [clade  gene    site    alt]; see: https://nextstrain.org/docs/tutorials/defining-clades",
          patterns: ["*.tsv", "*.txt"]
        }
    }

    call nextstrain.refine_augur_tree {
        input:
            raw_tree   = raw_tree,
            msa_or_vcf = msa_or_vcf,
            metadata   = sample_metadata
    }
    if(defined(ancestral_traits_to_infer) && length(select_first([ancestral_traits_to_infer,[]]))>0) {
        call nextstrain.ancestral_traits {
            input:
                tree     = refine_augur_tree.tree_refined,
                metadata = sample_metadata,
                columns  = select_first([ancestral_traits_to_infer,[]])
        }
    }
    call nextstrain.ancestral_tree {
        input:
            tree       = refine_augur_tree.tree_refined,
            msa_or_vcf = msa_or_vcf
    }
    call nextstrain.translate_augur_tree {
        input:
            tree       = refine_augur_tree.tree_refined,
            nt_muts    = ancestral_tree.nt_muts_json,
            genbank_gb = genbank_gb
    }
    if(defined(clades_tsv)) {
        call nextstrain.assign_clades_to_nodes {
            input:
                tree_nwk     = refine_augur_tree.tree_refined,
                nt_muts_json = ancestral_tree.nt_muts_json,
                aa_muts_json = translate_augur_tree.aa_muts_json,
                ref_fasta    = ref_fasta,
                clades_tsv   = select_first([clades_tsv])
        }
    }
    call nextstrain.export_auspice_json {
        input:
            tree            = refine_augur_tree.tree_refined,
            sample_metadata = sample_metadata,
            node_data_jsons = select_all([
                                refine_augur_tree.branch_lengths,
                                ancestral_traits.node_data_json,
                                ancestral_tree.nt_muts_json,
                                translate_augur_tree.aa_muts_json,
                                assign_clades_to_nodes.node_clade_data_json]),
            auspice_config  = auspice_config
    }

    output {
        File time_tree          = refine_augur_tree.tree_refined
        File auspice_input_json = export_auspice_json.virus_json
    }
}