augur_from_msa_with_subsampler

WORKFLOW augur_from_msa_with_subsampler

File Path	`pipes/WDL/workflows/augur_from_msa_with_subsampler.wdl`
WDL Version	1.0
Type	workflow

Imports

Namespace	Path
`interhost`	`../tasks/tasks_interhost.wdl`
`nextstrain`	`../tasks/tasks_nextstrain.wdl`
`reports`	`../tasks/tasks_reports.wdl`
`utils`	`../tasks/tasks_utils.wdl`

Workflow: augur_from_msa_with_subsampler

Build trees, and convert to json representation suitable for Nextstrain visualization. See https://nextstrain.org/docs/getting-started/ and https://nextstrain-augur.readthedocs.io/en/stable/

Author: Broad Viral Genomics

viral-ngs@broadinstitute.org

Inputs

Name	Type	Description	Default
`aligned_msa_fasta`	`File`	Multiple sequence alignment (aligned fasta).	-
`sample_metadata`	`Array[File]+`	Metadata in tab-separated text format. See https://nextstrain-augur.readthedocs.io/en/stable/faq/metadata.html for details. At least one tab file must be provided--if multiple are provided, they will be joined via a full left outer join using the 'strain' column as the join ID.	-
`ref_fasta`	`File?`	A reference assembly (not included in assembly_fastas) to align assembly_fastas against. Typically from NCBI RefSeq or similar.	-
`genbank_gb`	`File?`	A 'genbank' formatted gene annotation file that is used to calculate coding consequences of observed mutations. Must correspond to the same coordinate space as ref_fasta. Typically downloaded from the same NCBI accession number as ref_fasta.	-
`auspice_config`	`File`	A file specifying options to customize the auspice export; see: https://nextstrain.github.io/auspice/customise-client/introduction	-
`clades_tsv`	`File?`	A TSV file containing clade mutation positions in four columns: [clade gene site alt]; see: https://nextstrain.org/docs/tutorials/defining-clades	-
`ancestral_traits_to_infer`	`Array[String]?`	A list of metadata traits to use for ancestral node inference (see https://nextstrain-augur.readthedocs.io/en/stable/usage/cli/traits.html). Multiple traits may be specified; must correspond exactly to column headers in metadata file. Omitting these values will skip ancestral trait inference, and ancestral nodes will not have estimated values for metadata.	-
`mask_bed`	`File?`	Optional list of sites to mask when building trees.	-
`case_data`	`File`	-	-
`id_column`	`String`	-	-
`geo_column`	`String`	-	-
`keep_file`	`File?`	-	-
`remove_file`	`File?`	-	-
`filter_file`	`File?`	-	-
`seed_num`	`Int?`	-	-
`start_date`	`String?`	-	-
`end_date`	`String?`	-	-
`exclude_sites`	`File?`	-	-
`vcf_reference`	`File?`	-	-
`tree_builder_args`	`String?`	-	-
`gen_per_year`	`Int?`	-	-
`clock_rate`	`Float?`	-	-
`clock_std_dev`	`Float?`	-	-
`root`	`String?`	-	-
`covariance`	`Boolean?`	-	-
`precision`	`Int?`	-	-
`branch_length_inference`	`String?`	-	-
`coalescent`	`String?`	-	-
`vcf_reference`	`File?`	-	-
`weights`	`File?`	-	-
`sampling_bias_correction`	`Float?`	-	-
`min_date`	`Float?`	-	-
`max_date`	`Float?`	-	-
`pivot_interval`	`Int?`	-	-
`pivot_interval_units`	`String?`	-	-
`narrow_bandwidth`	`Float?`	-	-
`wide_bandwidth`	`Float?`	-	-
`proportion_wide`	`Float?`	-	-
`minimal_frequency`	`Float?`	-	-
`stiffness`	`Float?`	-	-
`inertia`	`Float?`	-	-
`vcf_reference`	`File?`	-	-
`root_sequence`	`File?`	-	-
`output_vcf`	`File?`	-	-
`genes`	`File?`	-	-
`vcf_reference_output`	`File?`	-	-
`vcf_reference`	`File?`	-	-
`lat_longs_tsv`	`File?`	-	-
`colors_tsv`	`File?`	-	-
`geo_resolutions`	`Array[String]?`	-	-
`color_by_metadata`	`Array[String]?`	-	-
`description_md`	`File?`	-	-
`maintainers`	`Array[String]?`	-	-
`title`	`String?`	-	-
54 optional inputs with default values
`prefer_first`	`Boolean`	-	true
`machine_mem_gb`	`Int`	-	7
`date_column`	`String`	-	"date"
`unit`	`String`	-	"week"
`baseline`	`Float`	-	0.0001
`docker`	`String`	-	"quay.io/broadinstitute/subsampler"
`machine_mem_gb`	`Int`	-	30
`out_fname`	`String`	-	sub(sub(basename(sequences,".zst"),".vcf",".filtered.vcf"),".fasta$",".filtered.fasta")
`docker`	`String`	-	"quay.io/broadinstitute/viral-core:2.5.1"
`disk_size`	`Int`	-	750
`docker`	`String`	-	"docker.io/nextstrain/base:build-20240318T173028Z"
`disk_size`	`Int`	-	750
`method`	`String`	-	"iqtree"
`substitution_model`	`String`	-	"GTR"
`cpus`	`Int`	-	64
`machine_mem_gb`	`Int`	-	32
`docker`	`String`	-	"docker.io/nextstrain/base:build-20240318T173028Z"
`disk_size`	`Int`	-	1250
`generate_timetree`	`Boolean`	-	true
`keep_root`	`Boolean`	-	true
`keep_polytomies`	`Boolean`	-	false
`date_confidence`	`Boolean`	-	true
`date_inference`	`String?`	-	"marginal"
`clock_filter_iqd`	`Int?`	-	4
`divergence_units`	`String?`	-	"mutations"
`docker`	`String`	-	"docker.io/nextstrain/base:build-20240318T173028Z"
`disk_size`	`Int`	-	750
`machine_mem_gb`	`Int`	-	75
`confidence`	`Boolean`	-	true
`machine_mem_gb`	`Int`	-	32
`docker`	`String`	-	"docker.io/nextstrain/base:build-20240318T173028Z"
`disk_size`	`Int`	-	750
`method`	`String`	-	"kde"
`censored`	`Boolean`	-	false
`include_internal_nodes`	`Boolean`	-	false
`machine_mem_gb`	`Int`	-	64
`docker`	`String`	-	"docker.io/nextstrain/base:build-20240318T173028Z"
`out_basename`	`String`	-	basename(tree,'.nwk')
`disk_size`	`Int`	-	200
`inference`	`String`	-	"joint"
`keep_ambiguous`	`Boolean`	-	false
`infer_ambiguous`	`Boolean`	-	false
`keep_overhangs`	`Boolean`	-	false
`docker`	`String`	-	"docker.io/nextstrain/base:build-20240318T173028Z"
`disk_size`	`Int`	-	300
`docker`	`String`	-	"docker.io/nextstrain/base:build-20240318T173028Z"
`disk_size`	`Int`	-	300
`docker`	`String`	-	"docker.io/nextstrain/base:build-20240318T173028Z"
`disk_size`	`Int`	-	300
`include_root_sequence`	`Boolean`	-	true
`out_basename`	`String`	-	basename(basename(tree,".nwk"),"_timetree")
`machine_mem_gb`	`Int`	-	64
`docker`	`String`	-	"docker.io/nextstrain/base:build-20240318T173028Z"
`disk_size`	`Int`	-	300

Outputs

Name	Type	Expression
`selected_metadata`	`File`	`subsample_by_cases.selected_metadata`
`sampling_stats_file`	`File`	`subsample_by_cases.sampling_stats`
`masked_subsampled_msa`	`File`	`masked_sequences`
`ml_tree`	`File`	`draft_augur_tree.aligned_tree`
`time_tree`	`File`	`refine_augur_tree.tree_refined`
`node_data_jsons`	`Array[File]`	`select_all([refine_augur_tree.branch_lengths, ancestral_traits.node_data_json, ancestral_tree.nt_muts_json, translate_augur_tree.aa_muts_json, assign_clades_to_nodes.node_clade_data_json])`
`auspice_input_json`	`File`	`export_auspice_json.virus_json`
`tip_frequencies_json`	`File`	`tip_frequencies.node_data_json`
`root_sequence_json`	`File`	`export_auspice_json.root_sequence_json`

Calls

This workflow calls the following tasks or subworkflows:

CALL TASKS `tsv_join` ↗

Input Mappings (4)

Input	Value
`input_tsvs`	`sample_metadata`
`id_col`	`'strain'`
`out_basename`	`"metadata-merged"`
`out_suffix`	`".txt.zst"`

CALL TASKS `subsample_by_cases` ↗

Input Mappings (1)

Input	Value
`metadata`	`select_first(flatten([[tsv_join.out_tsv], sample_metadata]))`

CALL TASKS `filter_sequences_to_list` ↗

Input Mappings (2)

Input	Value
`sequences`	`aligned_msa_fasta`
`keep_list`	`[subsample_by_cases.selected_sequences]`

CALL TASKS `augur_mask_sites` ↗

Input Mappings (2)

Input	Value
`sequences`	`filter_sequences_to_list.filtered_fasta`
`mask_bed`	`mask_bed`

CALL TASKS `draft_augur_tree` ↗

Input Mappings (1)

Input	Value
`msa_or_vcf`	`masked_sequences`

CALL TASKS `refine_augur_tree` ↗

Input Mappings (3)

Input	Value
`raw_tree`	`draft_augur_tree.aligned_tree`
`msa_or_vcf`	`masked_sequences`
`metadata`	`subsample_by_cases.selected_metadata`

CALL TASKS `ancestral_traits` ↗

Input Mappings (3)

Input	Value
`tree`	`refine_augur_tree.tree_refined`
`metadata`	`subsample_by_cases.selected_metadata`
`columns`	`select_first([ancestral_traits_to_infer, []])`

CALL TASKS `tip_frequencies` ↗

Input Mappings (2)

Input	Value
`tree`	`refine_augur_tree.tree_refined`
`metadata`	`subsample_by_cases.selected_metadata`

CALL TASKS `ancestral_tree` ↗

Input Mappings (2)

Input	Value
`tree`	`refine_augur_tree.tree_refined`
`msa_or_vcf`	`masked_sequences`

CALL TASKS `translate_augur_tree` ↗

Input Mappings (3)

Input	Value
`tree`	`refine_augur_tree.tree_refined`
`nt_muts`	`ancestral_tree.nt_muts_json`
`genbank_gb`	`select_first([genbank_gb])`

CALL TASKS `assign_clades_to_nodes` ↗

Input Mappings (5)

Input	Value
`tree_nwk`	`refine_augur_tree.tree_refined`
`nt_muts_json`	`ancestral_tree.nt_muts_json`
`aa_muts_json`	`translate_augur_tree.aa_muts_json`
`ref_fasta`	`select_first([ref_fasta])`
`clades_tsv`	`select_first([clades_tsv])`

CALL TASKS `export_auspice_json` ↗

Input Mappings (4)

Input	Value
`tree`	`refine_augur_tree.tree_refined`
`sample_metadata`	`subsample_by_cases.selected_metadata`
`node_data_jsons`	`select_all([refine_augur_tree.branch_lengths, ancestral_traits.node_data_json, ancestral_tree.nt_muts_json, translate_augur_tree.aa_muts_json, assign_clades_to_nodes.node_clade_data_json])`
`auspice_config`	`auspice_config`

Images

Container images used by tasks in this workflow:

🐳 Parameterized Image

⚙️ Parameterized

Configured via input:
docker

Used by 1 task:

subsample_by_cases

🐳 Parameterized Image

⚙️ Parameterized

Configured via input:
docker

Used by 2 tasks:

filter_sequences_to_list
tsv_join

🐳 Parameterized Image

⚙️ Parameterized

Configured via input:
docker

Used by 9 tasks:

draft_augur_tree
refine_augur_tree
tip_frequencies
ancestral_tree
export_auspice_json
augur_mask_sites
ancestral_traits
translate_augur_tree
assign_clades_to_nodes

← Back to Index

flowchart TD
    Start([augur_from_msa_with_subsampler])
    subgraph C1 ["↔️ if length(sample_metadata) > 1"]
        direction TB
        N1["tsv_join"]
    end
    N2["subsample_by_cases"]
    N3["filter_sequences_to_list"]
    subgraph C2 ["↔️ if defined(mask_bed)"]
        direction TB
        N4["augur_mask_sites"]
    end
    N5["draft_augur_tree"]
    N6["refine_augur_tree"]
    subgraph C3 ["↔️ if defined(ancestral_traits_to_infer) && length(select_first([ancestral_traits_to_infer, []])) > 0"]
        direction TB
        N7["ancestral_traits"]
    end
    N8["tip_frequencies"]
    N9["ancestral_tree"]
    subgraph C4 ["↔️ if defined(genbank_gb)"]
        direction TB
        N10["translate_augur_tree"]
    end
    subgraph C5 ["↔️ if defined(clades_tsv) && defined(ref_fasta)"]
        direction TB
        N11["assign_clades_to_nodes"]
    end
    N12["export_auspice_json"]
    N1 --> N2
    N2 --> N3
    N3 --> N4
    N4 --> N5
    N3 --> N5
    N4 --> N6
    N5 --> N6
    N2 --> N6
    N3 --> N6
    N2 --> N7
    N6 --> N7
    N2 --> N8
    N6 --> N8
    N4 --> N9
    N6 --> N9
    N3 --> N9
    N9 --> N10
    N6 --> N10
    N9 --> N11
    N6 --> N11
    N10 --> N11
    N2 --> N12
    N11 --> N12
    N6 --> N12
    N10 --> N12
    N9 --> N12
    N7 --> N12
    Start --> N1
    N8 --> End([End])
    N12 --> End([End])
    classDef taskNode fill:#a371f7,stroke:#8b5cf6,stroke-width:2px,color:#fff
    classDef workflowNode fill:#58a6ff,stroke:#1f6feb,stroke-width:2px,color:#fff

version 1.0

import "../tasks/tasks_interhost.wdl" as interhost
import "../tasks/tasks_nextstrain.wdl" as nextstrain
import "../tasks/tasks_reports.wdl" as reports
import "../tasks/tasks_utils.wdl" as utils

workflow augur_from_msa_with_subsampler {
    meta {
        description: "Build trees, and convert to json representation suitable for Nextstrain visualization. See https://nextstrain.org/docs/getting-started/ and https://nextstrain-augur.readthedocs.io/en/stable/"
        author: "Broad Viral Genomics"
        email:  "viral-ngs@broadinstitute.org"
        allowNestedInputs: true
    }

    input {
        File           aligned_msa_fasta
        Array[File]+   sample_metadata
        File?          ref_fasta
        File?          genbank_gb
        File           auspice_config
        File?          clades_tsv
        Array[String]? ancestral_traits_to_infer
        File?          mask_bed
    }

    parameter_meta {
        aligned_msa_fasta: {
          description: "Multiple sequence alignment (aligned fasta).",
          patterns: ["*.fasta", "*.fa", "*.fasta.gz", "*.fa.gz", "*.fasta.zst", "*.fa.zst"]
        }
        sample_metadata: {
          description: "Metadata in tab-separated text format. See https://nextstrain-augur.readthedocs.io/en/stable/faq/metadata.html for details. At least one tab file must be provided--if multiple are provided, they will be joined via a full left outer join using the 'strain' column as the join ID.",
          patterns: ["*.txt", "*.tsv", "*.txt.gz", "*.txt.zst", "*.tsv.gz", "*.tsv.zst"]
        }
        ref_fasta: {
          description: "A reference assembly (not included in assembly_fastas) to align assembly_fastas against. Typically from NCBI RefSeq or similar.",
          patterns: ["*.fasta", "*.fa"]
        }
        genbank_gb: {
          description: "A 'genbank' formatted gene annotation file that is used to calculate coding consequences of observed mutations. Must correspond to the same coordinate space as ref_fasta. Typically downloaded from the same NCBI accession number as ref_fasta.",
          patterns: ["*.gb", "*.gbf"]
        }
        ancestral_traits_to_infer: {
          description: "A list of metadata traits to use for ancestral node inference (see https://nextstrain-augur.readthedocs.io/en/stable/usage/cli/traits.html). Multiple traits may be specified; must correspond exactly to column headers in metadata file. Omitting these values will skip ancestral trait inference, and ancestral nodes will not have estimated values for metadata."
        }
        auspice_config: {
          description: "A file specifying options to customize the auspice export; see: https://nextstrain.github.io/auspice/customise-client/introduction",
          patterns: ["*.json", "*.txt"]
        }
        clades_tsv: {
          description: "A TSV file containing clade mutation positions in four columns: [clade  gene    site    alt]; see: https://nextstrain.org/docs/tutorials/defining-clades",
          patterns: ["*.tsv", "*.txt"]
        }
        mask_bed: {
          description: "Optional list of sites to mask when building trees.",
          patterns: ["*.bed"]
        }
    }

    # merge tsvs if necessary
    if(length(sample_metadata)>1) {
        call utils.tsv_join {
            input:
                input_tsvs   = sample_metadata,
                id_col       = 'strain',
                out_basename = "metadata-merged",
                out_suffix   = ".txt.zst"
        }
    }

    # subsample and filter genomic data based on epi case data
    call interhost.subsample_by_cases {
        input:
            metadata = select_first(flatten([[tsv_join.out_tsv], sample_metadata]))
    }
    call nextstrain.filter_sequences_to_list {
        input:
            sequences = aligned_msa_fasta,
            keep_list = [subsample_by_cases.selected_sequences]
    }

    # standard augur pipeline
    if(defined(mask_bed)) {
        call nextstrain.augur_mask_sites {
            input:
                sequences = filter_sequences_to_list.filtered_fasta,
                mask_bed  = mask_bed
        }
    }
    File masked_sequences = select_first([augur_mask_sites.masked_sequences, filter_sequences_to_list.filtered_fasta])
    call nextstrain.draft_augur_tree {
        input:
            msa_or_vcf = masked_sequences
    }
    call nextstrain.refine_augur_tree {
        input:
            raw_tree   = draft_augur_tree.aligned_tree,
            msa_or_vcf = masked_sequences,
            metadata   = subsample_by_cases.selected_metadata
    }
    if(defined(ancestral_traits_to_infer) && length(select_first([ancestral_traits_to_infer,[]]))>0) {
        call nextstrain.ancestral_traits {
            input:
                tree     = refine_augur_tree.tree_refined,
                metadata = subsample_by_cases.selected_metadata,
                columns  = select_first([ancestral_traits_to_infer,[]])
        }
    }
    call nextstrain.tip_frequencies {
        input:
            tree     = refine_augur_tree.tree_refined,
            metadata = subsample_by_cases.selected_metadata
    }
    call nextstrain.ancestral_tree {
        input:
            tree       = refine_augur_tree.tree_refined,
            msa_or_vcf = masked_sequences
    }
    if(defined(genbank_gb)) {
        call nextstrain.translate_augur_tree {
            input:
                tree       = refine_augur_tree.tree_refined,
                nt_muts    = ancestral_tree.nt_muts_json,
                genbank_gb = select_first([genbank_gb])
        }
    }
    if(defined(clades_tsv) && defined(ref_fasta)) {
        call nextstrain.assign_clades_to_nodes {
            input:
                tree_nwk     = refine_augur_tree.tree_refined,
                nt_muts_json = ancestral_tree.nt_muts_json,
                aa_muts_json = translate_augur_tree.aa_muts_json,
                ref_fasta    = select_first([ref_fasta]),
                clades_tsv   = select_first([clades_tsv])
        }
    }
    call nextstrain.export_auspice_json {
        input:
            tree            = refine_augur_tree.tree_refined,
            sample_metadata = subsample_by_cases.selected_metadata,
            node_data_jsons = select_all([
                                refine_augur_tree.branch_lengths,
                                ancestral_traits.node_data_json,
                                ancestral_tree.nt_muts_json,
                                translate_augur_tree.aa_muts_json,
                                assign_clades_to_nodes.node_clade_data_json]),
            auspice_config  = auspice_config
    }

    output {
        File        selected_metadata     = subsample_by_cases.selected_metadata
        File        sampling_stats_file   = subsample_by_cases.sampling_stats

        File        masked_subsampled_msa = masked_sequences
        
        File        ml_tree               = draft_augur_tree.aligned_tree
        File        time_tree             = refine_augur_tree.tree_refined
        
        Array[File] node_data_jsons       = select_all([
                    refine_augur_tree.branch_lengths,
                    ancestral_traits.node_data_json,
                    ancestral_tree.nt_muts_json,
                    translate_augur_tree.aa_muts_json,
                    assign_clades_to_nodes.node_clade_data_json])

        File        auspice_input_json    = export_auspice_json.virus_json
        File        tip_frequencies_json  = tip_frequencies.node_data_json
        File        root_sequence_json    = export_auspice_json.root_sequence_json
    }
}

version 1.0 import "../tasks/tasks_interhost.wdl" as interhost import "../tasks/tasks_nextstrain.wdl" as nextstrain import "../tasks/tasks_reports.wdl" as reports import "../tasks/tasks_utils.wdl" as utils workflow augur_from_msa_with_subsampler { meta { description: "Build trees, and convert to json representation suitable for Nextstrain visualization. See https://nextstrain.org/docs/getting-started/ and https://nextstrain-augur.readthedocs.io/en/stable/" author: "Broad Viral Genomics" email: "viral-ngs@broadinstitute.org" allowNestedInputs: true } input { File aligned_msa_fasta Array[File]+ sample_metadata File? ref_fasta File? genbank_gb File auspice_config File? clades_tsv Array[String]? ancestral_traits_to_infer File? mask_bed } parameter_meta { aligned_msa_fasta: { description: "Multiple sequence alignment (aligned fasta).", patterns: ["*.fasta", "*.fa", "*.fasta.gz", "*.fa.gz", "*.fasta.zst", "*.fa.zst"] } sample_metadata: { description: "Metadata in tab-separated text format. See https://nextstrain-augur.readthedocs.io/en/stable/faq/metadata.html for details. At least one tab file must be provided--if multiple are provided, they will be joined via a full left outer join using the 'strain' column as the join ID.", patterns: ["*.txt", "*.tsv", "*.txt.gz", "*.txt.zst", "*.tsv.gz", "*.tsv.zst"] } ref_fasta: { description: "A reference assembly (not included in assembly_fastas) to align assembly_fastas against. Typically from NCBI RefSeq or similar.", patterns: ["*.fasta", "*.fa"] } genbank_gb: { description: "A 'genbank' formatted gene annotation file that is used to calculate coding consequences of observed mutations. Must correspond to the same coordinate space as ref_fasta. Typically downloaded from the same NCBI accession number as ref_fasta.", patterns: ["*.gb", "*.gbf"] } ancestral_traits_to_infer: { description: "A list of metadata traits to use for ancestral node inference (see https://nextstrain-augur.readthedocs.io/en/stable/usage/cli/traits.html). Multiple traits may be specified; must correspond exactly to column headers in metadata file. Omitting these values will skip ancestral trait inference, and ancestral nodes will not have estimated values for metadata." } auspice_config: { description: "A file specifying options to customize the auspice export; see: https://nextstrain.github.io/auspice/customise-client/introduction", patterns: ["*.json", "*.txt"] } clades_tsv: { description: "A TSV file containing clade mutation positions in four columns: [clade gene site alt]; see: https://nextstrain.org/docs/tutorials/defining-clades", patterns: ["*.tsv", "*.txt"] } mask_bed: { description: "Optional list of sites to mask when building trees.", patterns: ["*.bed"] } } # merge tsvs if necessary if(length(sample_metadata)>1) { call utils.tsv_join { input: input_tsvs = sample_metadata, id_col = 'strain', out_basename = "metadata-merged", out_suffix = ".txt.zst" } } # subsample and filter genomic data based on epi case data call interhost.subsample_by_cases { input: metadata = select_first(flatten([[tsv_join.out_tsv], sample_metadata])) } call nextstrain.filter_sequences_to_list { input: sequences = aligned_msa_fasta, keep_list = [subsample_by_cases.selected_sequences] } # standard augur pipeline if(defined(mask_bed)) { call nextstrain.augur_mask_sites { input: sequences = filter_sequences_to_list.filtered_fasta, mask_bed = mask_bed } } File masked_sequences = select_first([augur_mask_sites.masked_sequences, filter_sequences_to_list.filtered_fasta]) call nextstrain.draft_augur_tree { input: msa_or_vcf = masked_sequences } call nextstrain.refine_augur_tree { input: raw_tree = draft_augur_tree.aligned_tree, msa_or_vcf = masked_sequences, metadata = subsample_by_cases.selected_metadata } if(defined(ancestral_traits_to_infer) && length(select_first([ancestral_traits_to_infer,[]]))>0) { call nextstrain.ancestral_traits { input: tree = refine_augur_tree.tree_refined, metadata = subsample_by_cases.selected_metadata, columns = select_first([ancestral_traits_to_infer,[]]) } } call nextstrain.tip_frequencies { input: tree = refine_augur_tree.tree_refined, metadata = subsample_by_cases.selected_metadata } call nextstrain.ancestral_tree { input: tree = refine_augur_tree.tree_refined, msa_or_vcf = masked_sequences } if(defined(genbank_gb)) { call nextstrain.translate_augur_tree { input: tree = refine_augur_tree.tree_refined, nt_muts = ancestral_tree.nt_muts_json, genbank_gb = select_first([genbank_gb]) } } if(defined(clades_tsv) && defined(ref_fasta)) { call nextstrain.assign_clades_to_nodes { input: tree_nwk = refine_augur_tree.tree_refined, nt_muts_json = ancestral_tree.nt_muts_json, aa_muts_json = translate_augur_tree.aa_muts_json, ref_fasta = select_first([ref_fasta]), clades_tsv = select_first([clades_tsv]) } } call nextstrain.export_auspice_json { input: tree = refine_augur_tree.tree_refined, sample_metadata = subsample_by_cases.selected_metadata, node_data_jsons = select_all([ refine_augur_tree.branch_lengths, ancestral_traits.node_data_json, ancestral_tree.nt_muts_json, translate_augur_tree.aa_muts_json, assign_clades_to_nodes.node_clade_data_json]), auspice_config = auspice_config } output { File selected_metadata = subsample_by_cases.selected_metadata File sampling_stats_file = subsample_by_cases.sampling_stats File masked_subsampled_msa = masked_sequences File ml_tree = draft_augur_tree.aligned_tree File time_tree = refine_augur_tree.tree_refined Array[File] node_data_jsons = select_all([ refine_augur_tree.branch_lengths, ancestral_traits.node_data_json, ancestral_tree.nt_muts_json, translate_augur_tree.aa_muts_json, assign_clades_to_nodes.node_clade_data_json]) File auspice_input_json = export_auspice_json.virus_json File tip_frequencies_json = tip_frequencies.node_data_json File root_sequence_json = export_auspice_json.root_sequence_json } }

WORKFLOW augur_from_msa_with_subsampler

Imports

Workflow: augur_from_msa_with_subsampler

Inputs

Outputs

Calls

CALL TASKS tsv_join ↗

CALL TASKS subsample_by_cases ↗

CALL TASKS filter_sequences_to_list ↗

CALL TASKS augur_mask_sites ↗

CALL TASKS draft_augur_tree ↗

CALL TASKS refine_augur_tree ↗

CALL TASKS ancestral_traits ↗

CALL TASKS tip_frequencies ↗

CALL TASKS ancestral_tree ↗

CALL TASKS translate_augur_tree ↗

CALL TASKS assign_clades_to_nodes ↗

CALL TASKS export_auspice_json ↗

Images