augur_from_msa_with_subsampler
pipes/WDL/workflows/augur_from_msa_with_subsampler.wdl

WORKFLOW augur_from_msa_with_subsampler

File Path pipes/WDL/workflows/augur_from_msa_with_subsampler.wdl
WDL Version 1.0
Type workflow

Imports

Namespace Path
interhost ../tasks/tasks_interhost.wdl
nextstrain ../tasks/tasks_nextstrain.wdl
reports ../tasks/tasks_reports.wdl
utils ../tasks/tasks_utils.wdl

Workflow: augur_from_msa_with_subsampler

Build trees, and convert to json representation suitable for Nextstrain visualization. See https://nextstrain.org/docs/getting-started/ and https://nextstrain-augur.readthedocs.io/en/stable/

Author: Broad Viral Genomics
viral-ngs@broadinstitute.org

Inputs

Name Type Description Default
aligned_msa_fasta File Multiple sequence alignment (aligned fasta). -
sample_metadata Array[File]+ Metadata in tab-separated text format. See https://nextstrain-augur.readthedocs.io/en/stable/faq/metadata.html for details. At least one tab file must be provided--if multiple are provided, they will be joined via a full left outer join using the 'strain' column as the join ID. -
ref_fasta File? A reference assembly (not included in assembly_fastas) to align assembly_fastas against. Typically from NCBI RefSeq or similar. -
genbank_gb File? A 'genbank' formatted gene annotation file that is used to calculate coding consequences of observed mutations. Must correspond to the same coordinate space as ref_fasta. Typically downloaded from the same NCBI accession number as ref_fasta. -
auspice_config File A file specifying options to customize the auspice export; see: https://nextstrain.github.io/auspice/customise-client/introduction -
clades_tsv File? A TSV file containing clade mutation positions in four columns: [clade gene site alt]; see: https://nextstrain.org/docs/tutorials/defining-clades -
ancestral_traits_to_infer Array[String]? A list of metadata traits to use for ancestral node inference (see https://nextstrain-augur.readthedocs.io/en/stable/usage/cli/traits.html). Multiple traits may be specified; must correspond exactly to column headers in metadata file. Omitting these values will skip ancestral trait inference, and ancestral nodes will not have estimated values for metadata. -
mask_bed File? Optional list of sites to mask when building trees. -
case_data File - -
id_column String - -
geo_column String - -
keep_file File? - -
remove_file File? - -
filter_file File? - -
seed_num Int? - -
start_date String? - -
end_date String? - -
exclude_sites File? - -
vcf_reference File? - -
tree_builder_args String? - -
gen_per_year Int? - -
clock_rate Float? - -
clock_std_dev Float? - -
root String? - -
covariance Boolean? - -
precision Int? - -
branch_length_inference String? - -
coalescent String? - -
vcf_reference File? - -
weights File? - -
sampling_bias_correction Float? - -
min_date Float? - -
max_date Float? - -
pivot_interval Int? - -
pivot_interval_units String? - -
narrow_bandwidth Float? - -
wide_bandwidth Float? - -
proportion_wide Float? - -
minimal_frequency Float? - -
stiffness Float? - -
inertia Float? - -
vcf_reference File? - -
root_sequence File? - -
output_vcf File? - -
genes File? - -
vcf_reference_output File? - -
vcf_reference File? - -
lat_longs_tsv File? - -
colors_tsv File? - -
geo_resolutions Array[String]? - -
color_by_metadata Array[String]? - -
description_md File? - -
maintainers Array[String]? - -
title String? - -
54 optional inputs with default values

Outputs

Name Type Expression
selected_metadata File subsample_by_cases.selected_metadata
sampling_stats_file File subsample_by_cases.sampling_stats
masked_subsampled_msa File masked_sequences
ml_tree File draft_augur_tree.aligned_tree
time_tree File refine_augur_tree.tree_refined
node_data_jsons Array[File] select_all([refine_augur_tree.branch_lengths, ancestral_traits.node_data_json, ancestral_tree.nt_muts_json, translate_augur_tree.aa_muts_json, assign_clades_to_nodes.node_clade_data_json])
auspice_input_json File export_auspice_json.virus_json
tip_frequencies_json File tip_frequencies.node_data_json
root_sequence_json File export_auspice_json.root_sequence_json

Calls

This workflow calls the following tasks or subworkflows:

CALL TASKS tsv_join

Input Mappings (4)
Input Value
input_tsvs sample_metadata
id_col 'strain'
out_basename "metadata-merged"
out_suffix ".txt.zst"

CALL TASKS subsample_by_cases

Input Mappings (1)
Input Value
metadata select_first(flatten([[tsv_join.out_tsv], sample_metadata]))

CALL TASKS filter_sequences_to_list

Input Mappings (2)
Input Value
sequences aligned_msa_fasta
keep_list [subsample_by_cases.selected_sequences]

CALL TASKS augur_mask_sites

Input Mappings (2)
Input Value
sequences filter_sequences_to_list.filtered_fasta
mask_bed mask_bed

CALL TASKS draft_augur_tree

Input Mappings (1)
Input Value
msa_or_vcf masked_sequences

CALL TASKS refine_augur_tree

Input Mappings (3)
Input Value
raw_tree draft_augur_tree.aligned_tree
msa_or_vcf masked_sequences
metadata subsample_by_cases.selected_metadata

CALL TASKS ancestral_traits

Input Mappings (3)
Input Value
tree refine_augur_tree.tree_refined
metadata subsample_by_cases.selected_metadata
columns select_first([ancestral_traits_to_infer, []])

CALL TASKS tip_frequencies

Input Mappings (2)
Input Value
tree refine_augur_tree.tree_refined
metadata subsample_by_cases.selected_metadata

CALL TASKS ancestral_tree

Input Mappings (2)
Input Value
tree refine_augur_tree.tree_refined
msa_or_vcf masked_sequences

CALL TASKS translate_augur_tree

Input Mappings (3)
Input Value
tree refine_augur_tree.tree_refined
nt_muts ancestral_tree.nt_muts_json
genbank_gb select_first([genbank_gb])

CALL TASKS assign_clades_to_nodes

Input Mappings (5)
Input Value
tree_nwk refine_augur_tree.tree_refined
nt_muts_json ancestral_tree.nt_muts_json
aa_muts_json translate_augur_tree.aa_muts_json
ref_fasta select_first([ref_fasta])
clades_tsv select_first([clades_tsv])

CALL TASKS export_auspice_json

Input Mappings (4)
Input Value
tree refine_augur_tree.tree_refined
sample_metadata subsample_by_cases.selected_metadata
node_data_jsons select_all([refine_augur_tree.branch_lengths, ancestral_traits.node_data_json, ancestral_tree.nt_muts_json, translate_augur_tree.aa_muts_json, assign_clades_to_nodes.node_clade_data_json])
auspice_config auspice_config

Images

Container images used by tasks in this workflow:

🐳 Parameterized Image
⚙️ Parameterized

Configured via input:
docker

Used by 1 task:
  • subsample_by_cases
🐳 Parameterized Image
⚙️ Parameterized

Configured via input:
docker

Used by 2 tasks:
  • filter_sequences_to_list
  • tsv_join
🐳 Parameterized Image
⚙️ Parameterized

Configured via input:
docker

Used by 9 tasks:
  • draft_augur_tree
  • refine_augur_tree
  • tip_frequencies
  • ancestral_tree
  • export_auspice_json
  • augur_mask_sites
  • ancestral_traits
  • translate_augur_tree
  • assign_clades_to_nodes
← Back to Index

augur_from_msa_with_subsampler - Workflow Graph

🖱️ Scroll to zoom • Drag to pan • Double-click to reset • ESC to close

augur_from_msa_with_subsampler - WDL Source Code

version 1.0

import "../tasks/tasks_interhost.wdl" as interhost
import "../tasks/tasks_nextstrain.wdl" as nextstrain
import "../tasks/tasks_reports.wdl" as reports
import "../tasks/tasks_utils.wdl" as utils

workflow augur_from_msa_with_subsampler {
    meta {
        description: "Build trees, and convert to json representation suitable for Nextstrain visualization. See https://nextstrain.org/docs/getting-started/ and https://nextstrain-augur.readthedocs.io/en/stable/"
        author: "Broad Viral Genomics"
        email:  "viral-ngs@broadinstitute.org"
        allowNestedInputs: true
    }

    input {
        File           aligned_msa_fasta
        Array[File]+   sample_metadata
        File?          ref_fasta
        File?          genbank_gb
        File           auspice_config
        File?          clades_tsv
        Array[String]? ancestral_traits_to_infer
        File?          mask_bed
    }

    parameter_meta {
        aligned_msa_fasta: {
          description: "Multiple sequence alignment (aligned fasta).",
          patterns: ["*.fasta", "*.fa", "*.fasta.gz", "*.fa.gz", "*.fasta.zst", "*.fa.zst"]
        }
        sample_metadata: {
          description: "Metadata in tab-separated text format. See https://nextstrain-augur.readthedocs.io/en/stable/faq/metadata.html for details. At least one tab file must be provided--if multiple are provided, they will be joined via a full left outer join using the 'strain' column as the join ID.",
          patterns: ["*.txt", "*.tsv", "*.txt.gz", "*.txt.zst", "*.tsv.gz", "*.tsv.zst"]
        }
        ref_fasta: {
          description: "A reference assembly (not included in assembly_fastas) to align assembly_fastas against. Typically from NCBI RefSeq or similar.",
          patterns: ["*.fasta", "*.fa"]
        }
        genbank_gb: {
          description: "A 'genbank' formatted gene annotation file that is used to calculate coding consequences of observed mutations. Must correspond to the same coordinate space as ref_fasta. Typically downloaded from the same NCBI accession number as ref_fasta.",
          patterns: ["*.gb", "*.gbf"]
        }
        ancestral_traits_to_infer: {
          description: "A list of metadata traits to use for ancestral node inference (see https://nextstrain-augur.readthedocs.io/en/stable/usage/cli/traits.html). Multiple traits may be specified; must correspond exactly to column headers in metadata file. Omitting these values will skip ancestral trait inference, and ancestral nodes will not have estimated values for metadata."
        }
        auspice_config: {
          description: "A file specifying options to customize the auspice export; see: https://nextstrain.github.io/auspice/customise-client/introduction",
          patterns: ["*.json", "*.txt"]
        }
        clades_tsv: {
          description: "A TSV file containing clade mutation positions in four columns: [clade  gene    site    alt]; see: https://nextstrain.org/docs/tutorials/defining-clades",
          patterns: ["*.tsv", "*.txt"]
        }
        mask_bed: {
          description: "Optional list of sites to mask when building trees.",
          patterns: ["*.bed"]
        }
    }

    # merge tsvs if necessary
    if(length(sample_metadata)>1) {
        call utils.tsv_join {
            input:
                input_tsvs   = sample_metadata,
                id_col       = 'strain',
                out_basename = "metadata-merged",
                out_suffix   = ".txt.zst"
        }
    }

    # subsample and filter genomic data based on epi case data
    call interhost.subsample_by_cases {
        input:
            metadata = select_first(flatten([[tsv_join.out_tsv], sample_metadata]))
    }
    call nextstrain.filter_sequences_to_list {
        input:
            sequences = aligned_msa_fasta,
            keep_list = [subsample_by_cases.selected_sequences]
    }

    # standard augur pipeline
    if(defined(mask_bed)) {
        call nextstrain.augur_mask_sites {
            input:
                sequences = filter_sequences_to_list.filtered_fasta,
                mask_bed  = mask_bed
        }
    }
    File masked_sequences = select_first([augur_mask_sites.masked_sequences, filter_sequences_to_list.filtered_fasta])
    call nextstrain.draft_augur_tree {
        input:
            msa_or_vcf = masked_sequences
    }
    call nextstrain.refine_augur_tree {
        input:
            raw_tree   = draft_augur_tree.aligned_tree,
            msa_or_vcf = masked_sequences,
            metadata   = subsample_by_cases.selected_metadata
    }
    if(defined(ancestral_traits_to_infer) && length(select_first([ancestral_traits_to_infer,[]]))>0) {
        call nextstrain.ancestral_traits {
            input:
                tree     = refine_augur_tree.tree_refined,
                metadata = subsample_by_cases.selected_metadata,
                columns  = select_first([ancestral_traits_to_infer,[]])
        }
    }
    call nextstrain.tip_frequencies {
        input:
            tree     = refine_augur_tree.tree_refined,
            metadata = subsample_by_cases.selected_metadata
    }
    call nextstrain.ancestral_tree {
        input:
            tree       = refine_augur_tree.tree_refined,
            msa_or_vcf = masked_sequences
    }
    if(defined(genbank_gb)) {
        call nextstrain.translate_augur_tree {
            input:
                tree       = refine_augur_tree.tree_refined,
                nt_muts    = ancestral_tree.nt_muts_json,
                genbank_gb = select_first([genbank_gb])
        }
    }
    if(defined(clades_tsv) && defined(ref_fasta)) {
        call nextstrain.assign_clades_to_nodes {
            input:
                tree_nwk     = refine_augur_tree.tree_refined,
                nt_muts_json = ancestral_tree.nt_muts_json,
                aa_muts_json = translate_augur_tree.aa_muts_json,
                ref_fasta    = select_first([ref_fasta]),
                clades_tsv   = select_first([clades_tsv])
        }
    }
    call nextstrain.export_auspice_json {
        input:
            tree            = refine_augur_tree.tree_refined,
            sample_metadata = subsample_by_cases.selected_metadata,
            node_data_jsons = select_all([
                                refine_augur_tree.branch_lengths,
                                ancestral_traits.node_data_json,
                                ancestral_tree.nt_muts_json,
                                translate_augur_tree.aa_muts_json,
                                assign_clades_to_nodes.node_clade_data_json]),
            auspice_config  = auspice_config
    }

    output {
        File        selected_metadata     = subsample_by_cases.selected_metadata
        File        sampling_stats_file   = subsample_by_cases.sampling_stats

        File        masked_subsampled_msa = masked_sequences
        
        File        ml_tree               = draft_augur_tree.aligned_tree
        File        time_tree             = refine_augur_tree.tree_refined
        
        Array[File] node_data_jsons       = select_all([
                    refine_augur_tree.branch_lengths,
                    ancestral_traits.node_data_json,
                    ancestral_tree.nt_muts_json,
                    translate_augur_tree.aa_muts_json,
                    assign_clades_to_nodes.node_clade_data_json])

        File        auspice_input_json    = export_auspice_json.virus_json
        File        tip_frequencies_json  = tip_frequencies.node_data_json
        File        root_sequence_json    = export_auspice_json.root_sequence_json
    }
}