augur_from_mltree - WDL Atlas

WORKFLOW augur_from_mltree

File Path	`pipes/WDL/workflows/augur_from_mltree.wdl`
WDL Version	1.0
Type	workflow

Imports

Namespace	Path
`nextstrain`	`../tasks/tasks_nextstrain.wdl`

Workflow: augur_from_mltree

Take a premade maximum likelihood tree (Newick format) and run the remainder of the augur pipeline (timetree modificaitons, ancestral inference, etc) and convert to json representation suitable for Nextstrain visualization. See https://nextstrain.org/docs/getting-started/ and https://nextstrain-augur.readthedocs.io/en/stable/

Author: Broad Viral Genomics

viral-ngs@broadinstitute.org

Inputs

Name	Type	Description	Default
`raw_tree`	`File`	Maximum likelihood tree (newick format).	-
`msa_or_vcf`	`File`	Multiple sequence alignment (aligned fasta) or variants (vcf format).	-
`sample_metadata`	`File`	Metadata in tab-separated text format. See https://nextstrain-augur.readthedocs.io/en/stable/faq/metadata.html for details.	-
`ref_fasta`	`File`	A reference assembly (not included in assembly_fastas) to align assembly_fastas against. Typically from NCBI RefSeq or similar.	-
`genbank_gb`	`File`	A 'genbank' formatted gene annotation file that is used to calculate coding consequences of observed mutations. Must correspond to the same coordinate space as ref_fasta. Typically downloaded from the same NCBI accession number as ref_fasta.	-
`auspice_config`	`File`	A file specifying options to customize the auspice export; see: https://nextstrain.github.io/auspice/customise-client/introduction	-
`clades_tsv`	`File?`	A TSV file containing clade mutation positions in four columns: [clade gene site alt]; see: https://nextstrain.org/docs/tutorials/defining-clades	-
`ancestral_traits_to_infer`	`Array[String]?`	A list of metadata traits to use for ancestral node inference (see https://nextstrain-augur.readthedocs.io/en/stable/usage/cli/traits.html). Multiple traits may be specified; must correspond exactly to column headers in metadata file. Omitting these values will skip ancestral trait inference, and ancestral nodes will not have estimated values for metadata.	-
`gen_per_year`	`Int?`	-	-
`clock_rate`	`Float?`	-	-
`clock_std_dev`	`Float?`	-	-
`root`	`String?`	-	-
`covariance`	`Boolean?`	-	-
`precision`	`Int?`	-	-
`branch_length_inference`	`String?`	-	-
`coalescent`	`String?`	-	-
`vcf_reference`	`File?`	-	-
`weights`	`File?`	-	-
`sampling_bias_correction`	`Float?`	-	-
`vcf_reference`	`File?`	-	-
`root_sequence`	`File?`	-	-
`output_vcf`	`File?`	-	-
`genes`	`File?`	-	-
`vcf_reference_output`	`File?`	-	-
`vcf_reference`	`File?`	-	-
`lat_longs_tsv`	`File?`	-	-
`colors_tsv`	`File?`	-	-
`geo_resolutions`	`Array[String]?`	-	-
`color_by_metadata`	`Array[String]?`	-	-
`description_md`	`File?`	-	-
`maintainers`	`Array[String]?`	-	-
`title`	`String?`	-	-
29 optional inputs with default values
`generate_timetree`	`Boolean`	-	true
`keep_root`	`Boolean`	-	true
`keep_polytomies`	`Boolean`	-	false
`date_confidence`	`Boolean`	-	true
`date_inference`	`String?`	-	"marginal"
`clock_filter_iqd`	`Int?`	-	4
`divergence_units`	`String?`	-	"mutations"
`docker`	`String`	-	"docker.io/nextstrain/base:build-20240318T173028Z"
`disk_size`	`Int`	-	750
`machine_mem_gb`	`Int`	-	75
`confidence`	`Boolean`	-	true
`machine_mem_gb`	`Int`	-	32
`docker`	`String`	-	"docker.io/nextstrain/base:build-20240318T173028Z"
`disk_size`	`Int`	-	750
`inference`	`String`	-	"joint"
`keep_ambiguous`	`Boolean`	-	false
`infer_ambiguous`	`Boolean`	-	false
`keep_overhangs`	`Boolean`	-	false
`docker`	`String`	-	"docker.io/nextstrain/base:build-20240318T173028Z"
`disk_size`	`Int`	-	300
`docker`	`String`	-	"docker.io/nextstrain/base:build-20240318T173028Z"
`disk_size`	`Int`	-	300
`docker`	`String`	-	"docker.io/nextstrain/base:build-20240318T173028Z"
`disk_size`	`Int`	-	300
`include_root_sequence`	`Boolean`	-	true
`out_basename`	`String`	-	basename(basename(tree,".nwk"),"_timetree")
`machine_mem_gb`	`Int`	-	64
`docker`	`String`	-	"docker.io/nextstrain/base:build-20240318T173028Z"
`disk_size`	`Int`	-	300

Outputs

Name	Type	Expression
`time_tree`	`File`	`refine_augur_tree.tree_refined`
`auspice_input_json`	`File`	`export_auspice_json.virus_json`

Calls

This workflow calls the following tasks or subworkflows:

CALL TASKS `refine_augur_tree` ↗

Input Mappings (3)

Input	Value
`raw_tree`	`raw_tree`
`msa_or_vcf`	`msa_or_vcf`
`metadata`	`sample_metadata`

CALL TASKS `ancestral_traits` ↗

Input Mappings (3)

Input	Value
`tree`	`refine_augur_tree.tree_refined`
`metadata`	`sample_metadata`
`columns`	`select_first([ancestral_traits_to_infer, []])`

CALL TASKS `ancestral_tree` ↗

Input Mappings (2)

Input	Value
`tree`	`refine_augur_tree.tree_refined`
`msa_or_vcf`	`msa_or_vcf`

CALL TASKS `translate_augur_tree` ↗

Input Mappings (3)

Input	Value
`tree`	`refine_augur_tree.tree_refined`
`nt_muts`	`ancestral_tree.nt_muts_json`
`genbank_gb`	`genbank_gb`

CALL TASKS `assign_clades_to_nodes` ↗

Input Mappings (5)

Input	Value
`tree_nwk`	`refine_augur_tree.tree_refined`
`nt_muts_json`	`ancestral_tree.nt_muts_json`
`aa_muts_json`	`translate_augur_tree.aa_muts_json`
`ref_fasta`	`ref_fasta`
`clades_tsv`	`select_first([clades_tsv])`

CALL TASKS `export_auspice_json` ↗

Input Mappings (4)

Input	Value
`tree`	`refine_augur_tree.tree_refined`
`sample_metadata`	`sample_metadata`
`node_data_jsons`	`select_all([refine_augur_tree.branch_lengths, ancestral_traits.node_data_json, ancestral_tree.nt_muts_json, translate_augur_tree.aa_muts_json, assign_clades_to_nodes.node_clade_data_json])`
`auspice_config`	`auspice_config`

Images

Container images used by tasks in this workflow:

🐳 Parameterized Image

⚙️ Parameterized

Configured via input:
docker

Used by 6 tasks:

refine_augur_tree
ancestral_tree
translate_augur_tree
export_auspice_json
ancestral_traits
assign_clades_to_nodes

← Back to Index

flowchart TD
    Start([augur_from_mltree])
    N1["refine_augur_tree"]
    subgraph C1 ["↔️ if defined(ancestral_traits_to_infer) && length(select_first([ancestral_traits_to_infer, []])) > 0"]
        direction TB
        N2["ancestral_traits"]
    end
    N3["ancestral_tree"]
    N4["translate_augur_tree"]
    subgraph C2 ["↔️ if defined(clades_tsv)"]
        direction TB
        N5["assign_clades_to_nodes"]
    end
    N6["export_auspice_json"]
    N1 --> N2
    N1 --> N3
    N3 --> N4
    N1 --> N4
    N3 --> N5
    N1 --> N5
    N4 --> N5
    N5 --> N6
    N1 --> N6
    N4 --> N6
    N3 --> N6
    N2 --> N6
    Start --> N1
    N6 --> End([End])
    classDef taskNode fill:#a371f7,stroke:#8b5cf6,stroke-width:2px,color:#fff
    classDef workflowNode fill:#58a6ff,stroke:#1f6feb,stroke-width:2px,color:#fff

version 1.0

import "../tasks/tasks_nextstrain.wdl" as nextstrain

workflow augur_from_mltree {
    meta {
        description: "Take a premade maximum likelihood tree (Newick format) and run the remainder of the augur pipeline (timetree modificaitons, ancestral inference, etc) and convert to json representation suitable for Nextstrain visualization. See https://nextstrain.org/docs/getting-started/ and https://nextstrain-augur.readthedocs.io/en/stable/"
        author: "Broad Viral Genomics"
        email:  "viral-ngs@broadinstitute.org"
    }

    input {
        File           raw_tree
        File           msa_or_vcf
        File           sample_metadata
        File           ref_fasta
        File           genbank_gb
        File           auspice_config
        File?          clades_tsv
        Array[String]? ancestral_traits_to_infer
    }

    parameter_meta {
        raw_tree: {
          description: "Maximum likelihood tree (newick format).",
          patterns: ["*.nwk", "*.newick"]
        }
        msa_or_vcf: {
          description: "Multiple sequence alignment (aligned fasta) or variants (vcf format).",
          patterns: ["*.fasta", "*.fa", "*.vcf", "*.vcf.gz"]
        }
        sample_metadata: {
          description: "Metadata in tab-separated text format. See https://nextstrain-augur.readthedocs.io/en/stable/faq/metadata.html for details.",
          patterns: ["*.txt", "*.tsv"]
        }
        ref_fasta: {
          description: "A reference assembly (not included in assembly_fastas) to align assembly_fastas against. Typically from NCBI RefSeq or similar.",
          patterns: ["*.fasta", "*.fa"]
        }
        genbank_gb: {
          description: "A 'genbank' formatted gene annotation file that is used to calculate coding consequences of observed mutations. Must correspond to the same coordinate space as ref_fasta. Typically downloaded from the same NCBI accession number as ref_fasta.",
          patterns: ["*.gb", "*.gbf"]
        }
        ancestral_traits_to_infer: {
          description: "A list of metadata traits to use for ancestral node inference (see https://nextstrain-augur.readthedocs.io/en/stable/usage/cli/traits.html). Multiple traits may be specified; must correspond exactly to column headers in metadata file. Omitting these values will skip ancestral trait inference, and ancestral nodes will not have estimated values for metadata."
        }
        auspice_config: {
          description: "A file specifying options to customize the auspice export; see: https://nextstrain.github.io/auspice/customise-client/introduction",
          patterns: ["*.json", "*.txt"]
        }
        clades_tsv: {
          description: "A TSV file containing clade mutation positions in four columns: [clade  gene    site    alt]; see: https://nextstrain.org/docs/tutorials/defining-clades",
          patterns: ["*.tsv", "*.txt"]
        }
    }

    call nextstrain.refine_augur_tree {
        input:
            raw_tree   = raw_tree,
            msa_or_vcf = msa_or_vcf,
            metadata   = sample_metadata
    }
    if(defined(ancestral_traits_to_infer) && length(select_first([ancestral_traits_to_infer,[]]))>0) {
        call nextstrain.ancestral_traits {
            input:
                tree     = refine_augur_tree.tree_refined,
                metadata = sample_metadata,
                columns  = select_first([ancestral_traits_to_infer,[]])
        }
    }
    call nextstrain.ancestral_tree {
        input:
            tree       = refine_augur_tree.tree_refined,
            msa_or_vcf = msa_or_vcf
    }
    call nextstrain.translate_augur_tree {
        input:
            tree       = refine_augur_tree.tree_refined,
            nt_muts    = ancestral_tree.nt_muts_json,
            genbank_gb = genbank_gb
    }
    if(defined(clades_tsv)) {
        call nextstrain.assign_clades_to_nodes {
            input:
                tree_nwk     = refine_augur_tree.tree_refined,
                nt_muts_json = ancestral_tree.nt_muts_json,
                aa_muts_json = translate_augur_tree.aa_muts_json,
                ref_fasta    = ref_fasta,
                clades_tsv   = select_first([clades_tsv])
        }
    }
    call nextstrain.export_auspice_json {
        input:
            tree            = refine_augur_tree.tree_refined,
            sample_metadata = sample_metadata,
            node_data_jsons = select_all([
                                refine_augur_tree.branch_lengths,
                                ancestral_traits.node_data_json,
                                ancestral_tree.nt_muts_json,
                                translate_augur_tree.aa_muts_json,
                                assign_clades_to_nodes.node_clade_data_json]),
            auspice_config  = auspice_config
    }

    output {
        File time_tree          = refine_augur_tree.tree_refined
        File auspice_input_json = export_auspice_json.virus_json
    }
}

version 1.0 import "../tasks/tasks_nextstrain.wdl" as nextstrain workflow augur_from_mltree { meta { description: "Take a premade maximum likelihood tree (Newick format) and run the remainder of the augur pipeline (timetree modificaitons, ancestral inference, etc) and convert to json representation suitable for Nextstrain visualization. See https://nextstrain.org/docs/getting-started/ and https://nextstrain-augur.readthedocs.io/en/stable/" author: "Broad Viral Genomics" email: "viral-ngs@broadinstitute.org" } input { File raw_tree File msa_or_vcf File sample_metadata File ref_fasta File genbank_gb File auspice_config File? clades_tsv Array[String]? ancestral_traits_to_infer } parameter_meta { raw_tree: { description: "Maximum likelihood tree (newick format).", patterns: ["*.nwk", "*.newick"] } msa_or_vcf: { description: "Multiple sequence alignment (aligned fasta) or variants (vcf format).", patterns: ["*.fasta", "*.fa", "*.vcf", "*.vcf.gz"] } sample_metadata: { description: "Metadata in tab-separated text format. See https://nextstrain-augur.readthedocs.io/en/stable/faq/metadata.html for details.", patterns: ["*.txt", "*.tsv"] } ref_fasta: { description: "A reference assembly (not included in assembly_fastas) to align assembly_fastas against. Typically from NCBI RefSeq or similar.", patterns: ["*.fasta", "*.fa"] } genbank_gb: { description: "A 'genbank' formatted gene annotation file that is used to calculate coding consequences of observed mutations. Must correspond to the same coordinate space as ref_fasta. Typically downloaded from the same NCBI accession number as ref_fasta.", patterns: ["*.gb", "*.gbf"] } ancestral_traits_to_infer: { description: "A list of metadata traits to use for ancestral node inference (see https://nextstrain-augur.readthedocs.io/en/stable/usage/cli/traits.html). Multiple traits may be specified; must correspond exactly to column headers in metadata file. Omitting these values will skip ancestral trait inference, and ancestral nodes will not have estimated values for metadata." } auspice_config: { description: "A file specifying options to customize the auspice export; see: https://nextstrain.github.io/auspice/customise-client/introduction", patterns: ["*.json", "*.txt"] } clades_tsv: { description: "A TSV file containing clade mutation positions in four columns: [clade gene site alt]; see: https://nextstrain.org/docs/tutorials/defining-clades", patterns: ["*.tsv", "*.txt"] } } call nextstrain.refine_augur_tree { input: raw_tree = raw_tree, msa_or_vcf = msa_or_vcf, metadata = sample_metadata } if(defined(ancestral_traits_to_infer) && length(select_first([ancestral_traits_to_infer,[]]))>0) { call nextstrain.ancestral_traits { input: tree = refine_augur_tree.tree_refined, metadata = sample_metadata, columns = select_first([ancestral_traits_to_infer,[]]) } } call nextstrain.ancestral_tree { input: tree = refine_augur_tree.tree_refined, msa_or_vcf = msa_or_vcf } call nextstrain.translate_augur_tree { input: tree = refine_augur_tree.tree_refined, nt_muts = ancestral_tree.nt_muts_json, genbank_gb = genbank_gb } if(defined(clades_tsv)) { call nextstrain.assign_clades_to_nodes { input: tree_nwk = refine_augur_tree.tree_refined, nt_muts_json = ancestral_tree.nt_muts_json, aa_muts_json = translate_augur_tree.aa_muts_json, ref_fasta = ref_fasta, clades_tsv = select_first([clades_tsv]) } } call nextstrain.export_auspice_json { input: tree = refine_augur_tree.tree_refined, sample_metadata = sample_metadata, node_data_jsons = select_all([ refine_augur_tree.branch_lengths, ancestral_traits.node_data_json, ancestral_tree.nt_muts_json, translate_augur_tree.aa_muts_json, assign_clades_to_nodes.node_clade_data_json]), auspice_config = auspice_config } output { File time_tree = refine_augur_tree.tree_refined File auspice_input_json = export_auspice_json.virus_json } }

WORKFLOW augur_from_mltree

Imports

Workflow: augur_from_mltree

Inputs

Outputs

Calls

CALL TASKS refine_augur_tree ↗

CALL TASKS ancestral_traits ↗

CALL TASKS ancestral_tree ↗

CALL TASKS translate_augur_tree ↗

CALL TASKS assign_clades_to_nodes ↗

CALL TASKS export_auspice_json ↗

Images

augur_from_mltree - Workflow Graph

augur_from_mltree - WDL Source Code

CALL TASKS `refine_augur_tree` ↗

CALL TASKS `ancestral_traits` ↗

CALL TASKS `ancestral_tree` ↗

CALL TASKS `translate_augur_tree` ↗

CALL TASKS `assign_clades_to_nodes` ↗

CALL TASKS `export_auspice_json` ↗