WORKFLOW subsample_by_metadata_with_focal
| File Path |
pipes/WDL/workflows/subsample_by_metadata_with_focal.wdl
|
|---|---|
| WDL Version | 1.0 |
| Type | workflow |
Imports
| Namespace | Path |
|---|---|
nextstrain
|
../tasks/tasks_nextstrain.wdl
|
reports
|
../tasks/tasks_reports.wdl
|
utils
|
../tasks/tasks_utils.wdl
|
Workflow: subsample_by_metadata_with_focal
Filter and subsample a global sequence set with a bias towards a geographic area of interest.
Inputs
| Name | Type | Description | Default |
|---|---|---|---|
sample_metadata_tsvs
|
Array[File]+
|
Tab-separated metadata file that contain binning variables and values. Must contain all samples: output will be filtered to the IDs present in this file. | - |
sequences_fasta
|
File
|
Sequences in fasta format. | - |
priorities
|
File?
|
- | - |
lab_highlight_loc
|
String?
|
- | - |
sequences_per_group
|
Int?
|
- | - |
group_by
|
String?
|
- | - |
include
|
File?
|
- | - |
exclude
|
File?
|
- | - |
min_date
|
Float?
|
- | - |
max_date
|
Float?
|
- | - |
min_length
|
Int?
|
- | - |
priority
|
File?
|
- | - |
subsample_seed
|
Int?
|
- | - |
exclude_where
|
Array[String]?
|
- | - |
include_where
|
Array[String]?
|
- | - |
include
|
File?
|
- | - |
exclude
|
File?
|
- | - |
min_date
|
Float?
|
- | - |
max_date
|
Float?
|
- | - |
min_length
|
Int?
|
- | - |
subsample_seed
|
Int?
|
- | - |
include_where
|
Array[String]?
|
- | - |
include
|
File?
|
- | - |
exclude
|
File?
|
- | - |
min_date
|
Float?
|
- | - |
max_date
|
Float?
|
- | - |
min_length
|
Int?
|
- | - |
subsample_seed
|
Int?
|
- | - |
include_where
|
Array[String]?
|
- | - |
22 optional inputs with default values |
|||
Outputs
| Name | Type | Expression |
|---|---|---|
metadata_merged
|
File
|
derived_cols.derived_metadata
|
keep_list
|
File
|
fasta_to_ids.ids_txt
|
subsampled_sequences
|
File
|
cat_fasta.combined
|
focal_kept
|
Int
|
subsample_focal.sequences_out
|
global_kept
|
Int
|
subsample_global.sequences_out
|
sequences_kept
|
Int
|
subsample_focal.sequences_out + subsample_global.sequences_out
|
Calls
This workflow calls the following tasks or subworkflows:
CALL
TASKS
tsv_join
Input Mappings (3)
| Input | Value |
|---|---|
input_tsvs
|
sample_metadata_tsvs
|
id_col
|
'strain'
|
out_basename
|
"metadata-merged"
|
CALL
TASKS
derived_cols
Input Mappings (1)
| Input | Value |
|---|---|
metadata_tsv
|
select_first(flatten([[tsv_join.out_tsv], sample_metadata_tsvs]))
|
CALL
TASKS
prefilter
→ filter_subsample_sequences
Input Mappings (2)
| Input | Value |
|---|---|
sequences_fasta
|
sequences_fasta
|
sample_metadata_tsv
|
derived_cols.derived_metadata
|
CALL
TASKS
subsample_focal
→ filter_subsample_sequences
Input Mappings (6)
| Input | Value |
|---|---|
sequences_fasta
|
prefilter.filtered_fasta
|
sample_metadata_tsv
|
derived_cols.derived_metadata
|
exclude_where
|
["~{focal_variable}!=~{focal_value}"]
|
sequences_per_group
|
focal_bin_max
|
group_by
|
focal_bin_variable
|
priority
|
priorities
|
CALL
TASKS
subsample_global
→ filter_subsample_sequences
Input Mappings (6)
| Input | Value |
|---|---|
sequences_fasta
|
prefilter.filtered_fasta
|
sample_metadata_tsv
|
derived_cols.derived_metadata
|
exclude_where
|
["~{focal_variable}=~{focal_value}"]
|
sequences_per_group
|
global_bin_max
|
group_by
|
global_bin_variable
|
priority
|
priorities
|
CALL
TASKS
cat_fasta
→ concatenate
Input Mappings (2)
| Input | Value |
|---|---|
infiles
|
[subsample_focal.filtered_fasta, subsample_global.filtered_fasta]
|
output_name
|
"subsampled.fasta"
|
CALL
TASKS
fasta_to_ids
Input Mappings (1)
| Input | Value |
|---|---|
sequences_fasta
|
cat_fasta.combined
|
Images
Container images used by tasks in this workflow:
Parameterized Image
⚙️ Parameterized
Configured via input:
docker
Used by 2 tasks:
-
derived_cols -
tsv_join
Parameterized Image
⚙️ Parameterized
Configured via input:
docker
Used by 3 tasks:
-
prefilter -
subsample_focal -
subsample_global
ubuntu
ubuntu
Used by 2 tasks:
-
cat_fasta -
fasta_to_ids