WORKFLOW augur_from_msa_with_subsampler
| File Path |
pipes/WDL/workflows/augur_from_msa_with_subsampler.wdl
|
|---|---|
| WDL Version | 1.0 |
| Type | workflow |
Imports
| Namespace | Path |
|---|---|
interhost
|
../tasks/tasks_interhost.wdl
|
nextstrain
|
../tasks/tasks_nextstrain.wdl
|
reports
|
../tasks/tasks_reports.wdl
|
utils
|
../tasks/tasks_utils.wdl
|
Workflow: augur_from_msa_with_subsampler
Build trees, and convert to json representation suitable for Nextstrain visualization. See https://nextstrain.org/docs/getting-started/ and https://nextstrain-augur.readthedocs.io/en/stable/
Author: Broad Viral Genomics
Inputs
| Name | Type | Description | Default |
|---|---|---|---|
aligned_msa_fasta
|
File
|
Multiple sequence alignment (aligned fasta). | - |
sample_metadata
|
Array[File]+
|
Metadata in tab-separated text format. See https://nextstrain-augur.readthedocs.io/en/stable/faq/metadata.html for details. At least one tab file must be provided--if multiple are provided, they will be joined via a full left outer join using the 'strain' column as the join ID. | - |
ref_fasta
|
File?
|
A reference assembly (not included in assembly_fastas) to align assembly_fastas against. Typically from NCBI RefSeq or similar. | - |
genbank_gb
|
File?
|
A 'genbank' formatted gene annotation file that is used to calculate coding consequences of observed mutations. Must correspond to the same coordinate space as ref_fasta. Typically downloaded from the same NCBI accession number as ref_fasta. | - |
auspice_config
|
File
|
A file specifying options to customize the auspice export; see: https://nextstrain.github.io/auspice/customise-client/introduction | - |
clades_tsv
|
File?
|
A TSV file containing clade mutation positions in four columns: [clade gene site alt]; see: https://nextstrain.org/docs/tutorials/defining-clades | - |
ancestral_traits_to_infer
|
Array[String]?
|
A list of metadata traits to use for ancestral node inference (see https://nextstrain-augur.readthedocs.io/en/stable/usage/cli/traits.html). Multiple traits may be specified; must correspond exactly to column headers in metadata file. Omitting these values will skip ancestral trait inference, and ancestral nodes will not have estimated values for metadata. | - |
mask_bed
|
File?
|
Optional list of sites to mask when building trees. | - |
case_data
|
File
|
- | - |
id_column
|
String
|
- | - |
geo_column
|
String
|
- | - |
keep_file
|
File?
|
- | - |
remove_file
|
File?
|
- | - |
filter_file
|
File?
|
- | - |
seed_num
|
Int?
|
- | - |
start_date
|
String?
|
- | - |
end_date
|
String?
|
- | - |
exclude_sites
|
File?
|
- | - |
vcf_reference
|
File?
|
- | - |
tree_builder_args
|
String?
|
- | - |
gen_per_year
|
Int?
|
- | - |
clock_rate
|
Float?
|
- | - |
clock_std_dev
|
Float?
|
- | - |
root
|
String?
|
- | - |
covariance
|
Boolean?
|
- | - |
precision
|
Int?
|
- | - |
branch_length_inference
|
String?
|
- | - |
coalescent
|
String?
|
- | - |
vcf_reference
|
File?
|
- | - |
weights
|
File?
|
- | - |
sampling_bias_correction
|
Float?
|
- | - |
min_date
|
Float?
|
- | - |
max_date
|
Float?
|
- | - |
pivot_interval
|
Int?
|
- | - |
pivot_interval_units
|
String?
|
- | - |
narrow_bandwidth
|
Float?
|
- | - |
wide_bandwidth
|
Float?
|
- | - |
proportion_wide
|
Float?
|
- | - |
minimal_frequency
|
Float?
|
- | - |
stiffness
|
Float?
|
- | - |
inertia
|
Float?
|
- | - |
vcf_reference
|
File?
|
- | - |
root_sequence
|
File?
|
- | - |
output_vcf
|
File?
|
- | - |
genes
|
File?
|
- | - |
vcf_reference_output
|
File?
|
- | - |
vcf_reference
|
File?
|
- | - |
lat_longs_tsv
|
File?
|
- | - |
colors_tsv
|
File?
|
- | - |
geo_resolutions
|
Array[String]?
|
- | - |
color_by_metadata
|
Array[String]?
|
- | - |
description_md
|
File?
|
- | - |
maintainers
|
Array[String]?
|
- | - |
title
|
String?
|
- | - |
54 optional inputs with default values |
|||
Outputs
| Name | Type | Expression |
|---|---|---|
selected_metadata
|
File
|
subsample_by_cases.selected_metadata
|
sampling_stats_file
|
File
|
subsample_by_cases.sampling_stats
|
masked_subsampled_msa
|
File
|
masked_sequences
|
ml_tree
|
File
|
draft_augur_tree.aligned_tree
|
time_tree
|
File
|
refine_augur_tree.tree_refined
|
node_data_jsons
|
Array[File]
|
select_all([refine_augur_tree.branch_lengths, ancestral_traits.node_data_json, ancestral_tree.nt_muts_json, translate_augur_tree.aa_muts_json, assign_clades_to_nodes.node_clade_data_json])
|
auspice_input_json
|
File
|
export_auspice_json.virus_json
|
tip_frequencies_json
|
File
|
tip_frequencies.node_data_json
|
root_sequence_json
|
File
|
export_auspice_json.root_sequence_json
|
Calls
This workflow calls the following tasks or subworkflows:
CALL
TASKS
tsv_join
Input Mappings (4)
| Input | Value |
|---|---|
input_tsvs
|
sample_metadata
|
id_col
|
'strain'
|
out_basename
|
"metadata-merged"
|
out_suffix
|
".txt.zst"
|
CALL
TASKS
subsample_by_cases
Input Mappings (1)
| Input | Value |
|---|---|
metadata
|
select_first(flatten([[tsv_join.out_tsv], sample_metadata]))
|
CALL
TASKS
filter_sequences_to_list
Input Mappings (2)
| Input | Value |
|---|---|
sequences
|
aligned_msa_fasta
|
keep_list
|
[subsample_by_cases.selected_sequences]
|
CALL
TASKS
augur_mask_sites
Input Mappings (2)
| Input | Value |
|---|---|
sequences
|
filter_sequences_to_list.filtered_fasta
|
mask_bed
|
mask_bed
|
CALL
TASKS
draft_augur_tree
Input Mappings (1)
| Input | Value |
|---|---|
msa_or_vcf
|
masked_sequences
|
CALL
TASKS
refine_augur_tree
Input Mappings (3)
| Input | Value |
|---|---|
raw_tree
|
draft_augur_tree.aligned_tree
|
msa_or_vcf
|
masked_sequences
|
metadata
|
subsample_by_cases.selected_metadata
|
CALL
TASKS
ancestral_traits
Input Mappings (3)
| Input | Value |
|---|---|
tree
|
refine_augur_tree.tree_refined
|
metadata
|
subsample_by_cases.selected_metadata
|
columns
|
select_first([ancestral_traits_to_infer, []])
|
CALL
TASKS
tip_frequencies
Input Mappings (2)
| Input | Value |
|---|---|
tree
|
refine_augur_tree.tree_refined
|
metadata
|
subsample_by_cases.selected_metadata
|
CALL
TASKS
ancestral_tree
Input Mappings (2)
| Input | Value |
|---|---|
tree
|
refine_augur_tree.tree_refined
|
msa_or_vcf
|
masked_sequences
|
CALL
TASKS
translate_augur_tree
Input Mappings (3)
| Input | Value |
|---|---|
tree
|
refine_augur_tree.tree_refined
|
nt_muts
|
ancestral_tree.nt_muts_json
|
genbank_gb
|
select_first([genbank_gb])
|
CALL
TASKS
assign_clades_to_nodes
Input Mappings (5)
| Input | Value |
|---|---|
tree_nwk
|
refine_augur_tree.tree_refined
|
nt_muts_json
|
ancestral_tree.nt_muts_json
|
aa_muts_json
|
translate_augur_tree.aa_muts_json
|
ref_fasta
|
select_first([ref_fasta])
|
clades_tsv
|
select_first([clades_tsv])
|
CALL
TASKS
export_auspice_json
Input Mappings (4)
| Input | Value |
|---|---|
tree
|
refine_augur_tree.tree_refined
|
sample_metadata
|
subsample_by_cases.selected_metadata
|
node_data_jsons
|
select_all([refine_augur_tree.branch_lengths, ancestral_traits.node_data_json, ancestral_tree.nt_muts_json, translate_augur_tree.aa_muts_json, assign_clades_to_nodes.node_clade_data_json])
|
auspice_config
|
auspice_config
|
Images
Container images used by tasks in this workflow:
Parameterized Image
⚙️ Parameterized
Configured via input:
docker
Used by 1 task:
-
subsample_by_cases
Parameterized Image
⚙️ Parameterized
Configured via input:
docker
Used by 2 tasks:
-
filter_sequences_to_list -
tsv_join
Parameterized Image
⚙️ Parameterized
Configured via input:
docker
Used by 9 tasks:
-
draft_augur_tree -
refine_augur_tree -
tip_frequencies -
ancestral_tree -
export_auspice_json -
augur_mask_sites -
ancestral_traits -
translate_augur_tree -
assign_clades_to_nodes