WORKFLOW
sarscov2_genbank
File Path
pipes/WDL/workflows/sarscov2_genbank.wdl
WDL Version
1.0
Type
workflow
Imports
Namespace
Path
ncbi
../tasks/tasks_ncbi.wdl
reports
../tasks/tasks_reports.wdl
utils
../tasks/tasks_utils.wdl
Workflow: sarscov2_genbank
Prepare SARS-CoV-2 assemblies for Genbank submission. This includes QC checks with NCBI's VADR tool and filters out genomes that do not pass its tests.
Author: Broad Viral Genomics
Name
Type
Description
Default
assemblies_fasta
Array[File]+
Genomes to prepare for Genbank submission. One file per genome: all segments/chromosomes included in one file. All fasta files must contain exactly the same number of sequences as reference_fasta (which must equal the number of files in reference_annot_tbl).
-
author_list
String?
A string containing a space-delimited list with of author surnames separated by first name and (optional) middle initial. Ex. 'Lastname,Firstname, Last-hypenated,First,M., Last,F.'
-
author_sbt_defaults_yaml
File
A YAML file with default values to use for the submitter, submitter affiliation, and author affiliation. Optionally including authors at the start and end of the author_list. Example: gs://pathogen-public-dbs/other-related/default_sbt_values.yaml
-
author_sbt_j2_template
File
A jinja2-format template for the sbt file expected by NCBI. Example: gs://pathogen-public-dbs/other-related/author_template.sbt.j2
-
biosample_attributes
File
A post-submission attributes file from NCBI BioSample, which is available at https://submit.ncbi.nlm.nih.gov/subs/ and clicking on 'Download attributes file with BioSample accessions'.
-
assembly_stats_tsv
File
A four column tab text file with one row per sequence and the following header columns: SeqID, Assembly Method, Coverage, Sequencing Technology
-
fasta_rename_map
File?
-
-
vadr_model_tar
File?
-
-
vadr_model_tar_subdir
String?
-
-
filter_to_accession
String?
-
-
organism_name_override
String?
-
-
sequence_id_override
String?
-
-
isolate_prefix_override
String?
-
-
source_overrides_json
File?
-
-
submission_name
String
-
-
submission_uid
String
-
-
spuid_namespace
String
-
-
account_name
String
-
-
username
String?
-
-
submitting_lab_name
String
-
-
filter_to_accession
String?
-
-
organism_name_override
String?
-
-
sequence_id_override
String?
-
-
isolate_prefix_override
String?
-
-
source_overrides_json
File?
-
-
submission_name
String
-
-
submission_uid
String
-
-
spuid_namespace
String
-
-
account_name
String
-
-
username
String?
-
-
submitting_lab_name
String
-
-
39 optional inputs with default values
min_genome_bases
Int
-
15000
max_vadr_alerts
Int
-
0
taxid
Int
-
2697049
gisaid_prefix
String
-
'hCoV-19/'
out_basename
String
-
basename(genome_fasta,".fasta")
docker
String
-
"quay.io/broadinstitute/viral-core:2.5.1"
docker
String
-
"ubuntu"
out_basename
String
-
basename(genome_fasta,'.fasta')
docker
String
-
"mirror.gcr.io/staphb/vadr:1.6.4"
mem_size
Int
-
16
cpus
Int
-
4
cpus
Int
-
4
cpus
Int
-
4
biosample_col_for_fasta_headers
String
-
"sample_name"
src_to_attr_map
Map[String,String]
-
{}
sanitize_seq_ids
Boolean
-
true
out_basename
String
-
basename(basename(biosample_attributes,".txt"),".tsv")
docker
String
-
"python:slim"
docker
String
-
"quay.io/broadinstitute/viral-core:2.5.1"
out_base
String
-
"authors"
docker
String
-
"quay.io/broadinstitute/py3-bio:0.1.2"
wizard
String
-
"BankIt_SARSCoV2_api"
docker
String
-
"quay.io/broadinstitute/viral-baseimage:0.3.0"
continent
String
-
"North America"
strict
Boolean
-
true
address_map
String
-
'{}'
authors_map
String
-
'{}'
biosample_col_for_fasta_headers
String
-
"sample_name"
src_to_attr_map
Map[String,String]
-
{}
sanitize_seq_ids
Boolean
-
true
out_basename
String
-
basename(basename(biosample_attributes,".txt"),".tsv")
docker
String
-
"python:slim"
docker
String
-
"quay.io/broadinstitute/viral-core:2.5.1"
wizard
String
-
"BankIt_SARSCoV2_api"
docker
String
-
"quay.io/broadinstitute/viral-baseimage:0.3.0"
continent
String
-
"North America"
strict
Boolean
-
true
address_map
String
-
'{}'
authors_map
String
-
'{}'
Outputs
Name
Type
Expression
submission_zip
File
passing_package_genbank.submission_zip
submission_xml
File
passing_package_genbank.submission_xml
submit_ready
File
passing_package_genbank.submit_ready
num_successful
Int
length(select_all(passing_assemblies))
num_weird
Int
length(select_all(weird_assemblies))
num_input
Int
length(assemblies_fasta)
vadr_outputs
Array[File]
vadr.outputs_tgz
gisaid_fasta
File
passing_prefix_gisaid.renamed_fasta
gisaid_meta_csv
File
passing_gisaid_meta.meta_csv
weird_genbank_zip
File
weird_package_genbank.submission_zip
weird_genbank_xml
File
weird_package_genbank.submission_xml
weird_gisaid_fasta
File
weird_prefix_gisaid.renamed_fasta
weird_gisaid_meta_csv
File
weird_gisaid_meta.meta_csv
Calls
This workflow calls the following tasks or subworkflows:
Input Mappings (2)
Input
Value
genome_fasta
assembly
new_name
read_map(select_first([fasta_rename_map]))[fasta_basename]
Input Mappings (1)
Input
Value
fasta
assembly
Input Mappings (4)
Input
Value
genome_fasta
renamed_assembly
vadr_opts
"--glsearch -s -r --nomisc --mkey sarscov2 --lowsim5seq 6 --lowsim3seq 6 --alt_fail lowscore,insertnn,deletinn"
minlen
50
maxlen
30000
Input Mappings (2)
Input
Value
infiles
select_all(passing_assemblies)
output_name
"assemblies-passing.fasta"
Input Mappings (1)
Input
Value
sequences_fasta
passing_fasta.combined
Input Mappings (2)
Input
Value
infiles
select_all(weird_assemblies)
output_name
"assemblies-weird.fasta"
CALL
TASKS
weird_ids
↗
→ fasta_to_ids
Input Mappings (1)
Input
Value
sequences_fasta
weird_fasta.combined
Input Mappings (4)
Input
Value
biosample_attributes
biosample_attributes
num_segments
1
taxid
taxid
filter_to_ids
passing_ids.ids_txt
Input Mappings (2)
Input
Value
assembly_stats_tsv
assembly_stats_tsv
filter_to_ids
passing_ids.ids_txt
Input Mappings (3)
Input
Value
author_list
author_list
defaults_yaml
author_sbt_defaults_yaml
j2_template
author_sbt_j2_template
CALL
TASKS
passing_package_genbank
↗
→ package_special_genbank_ftp_submission
Input Mappings (4)
Input
Value
sequences_fasta
passing_fasta.combined
source_modifier_table
passing_source_modifiers.genbank_source_modifier_table
author_template_sbt
generate_author_sbt.sbt_file
structured_comment_table
passing_structured_cmt.structured_comment_table
Input Mappings (3)
Input
Value
genome_fasta
passing_fasta.combined
prefix
gisaid_prefix
out_basename
"gisaid-passing-sequences"
Input Mappings (4)
Input
Value
source_modifier_table
passing_source_modifiers.genbank_source_modifier_table
structured_comments
passing_structured_cmt.structured_comment_table
fasta_filename
"gisaid-passing-sequences.fasta"
out_name
"gisaid-passing-meta.csv"
Input Mappings (4)
Input
Value
biosample_attributes
biosample_attributes
num_segments
1
taxid
taxid
filter_to_ids
weird_ids.ids_txt
Input Mappings (2)
Input
Value
assembly_stats_tsv
assembly_stats_tsv
filter_to_ids
weird_ids.ids_txt
CALL
TASKS
weird_package_genbank
↗
→ package_special_genbank_ftp_submission
Input Mappings (4)
Input
Value
sequences_fasta
weird_fasta.combined
source_modifier_table
weird_source_modifiers.genbank_source_modifier_table
author_template_sbt
generate_author_sbt.sbt_file
structured_comment_table
weird_structured_cmt.structured_comment_table
Input Mappings (3)
Input
Value
genome_fasta
weird_fasta.combined
prefix
gisaid_prefix
out_basename
"gisaid-weird-sequences"
Input Mappings (4)
Input
Value
source_modifier_table
weird_source_modifiers.genbank_source_modifier_table
structured_comments
weird_structured_cmt.structured_comment_table
fasta_filename
"gisaid-weird-sequences.fasta"
out_name
"gisaid-weird-meta.csv"
Images
Container images used by tasks in this workflow:
ubuntu
Used by 4 tasks:
passing_fasta
passing_ids
weird_fasta
weird_ids
⚙️ Parameterized
Configured via input:
docker
Used by 6 tasks:
passing_source_modifiers
passing_prefix_gisaid
passing_gisaid_meta
weird_source_modifiers
weird_prefix_gisaid
weird_gisaid_meta
⚙️ Parameterized
Configured via input:
docker
Used by 3 tasks:
passing_structured_cmt
weird_structured_cmt
rename_fasta_header
⚙️ Parameterized
Configured via input:
docker
⚙️ Parameterized
Configured via input:
docker
Used by 2 tasks:
passing_package_genbank
weird_package_genbank
⚙️ Parameterized
Configured via input:
docker
Zoom In
Zoom Out
Fit
Reset
🖱️ Scroll to zoom • Drag to pan • Double-click to reset • ESC to close
flowchart TD
Start([sarscov2_genbank])
subgraph S1 ["🔃 scatter assembly in assemblies_fasta"]
direction TB
subgraph C1 ["↔️ if defined(fasta_rename_map)"]
direction TB
N1["rename_fasta_header"]
end
N2["assembly_bases"]
N3["vadr"]
end
N4["passing_fastaconcatenate "]
N5["passing_idsfasta_to_ids "]
N6["weird_fastaconcatenate "]
N7["weird_idsfasta_to_ids "]
N8["passing_source_modifiersbiosample_to_genbank "]
N9["passing_structured_cmtstructured_comments "]
N10["generate_author_sbtgenerate_author_sbt_file "]
N11["passing_package_genbankpackage_special_genbank_ftp_submission "]
N12["passing_prefix_gisaidprefix_fasta_header "]
N13["passing_gisaid_metagisaid_meta_prep "]
N14["weird_source_modifiersbiosample_to_genbank "]
N15["weird_structured_cmtstructured_comments "]
N16["weird_package_genbankpackage_special_genbank_ftp_submission "]
N17["weird_prefix_gisaidprefix_fasta_header "]
N18["weird_gisaid_metagisaid_meta_prep "]
N1 --> N3
N1 --> N4
N4 --> N5
N1 --> N6
N6 --> N7
N5 --> N8
N5 --> N9
N4 --> N11
N10 --> N11
N9 --> N11
N8 --> N11
N4 --> N12
N9 --> N13
N8 --> N13
N7 --> N14
N7 --> N15
N6 --> N16
N15 --> N16
N14 --> N16
N10 --> N16
N6 --> N17
N15 --> N18
N14 --> N18
Start --> N1
Start --> N2
Start --> N10
N18 --> End([End])
N2 --> End([End])
N12 --> End([End])
N13 --> End([End])
N17 --> End([End])
N3 --> End([End])
N11 --> End([End])
N16 --> End([End])
classDef taskNode fill:#a371f7,stroke:#8b5cf6,stroke-width:2px,color:#fff
classDef workflowNode fill:#58a6ff,stroke:#1f6feb,stroke-width:2px,color:#fff
version 1.0
import "../tasks/tasks_ncbi.wdl" as ncbi
import "../tasks/tasks_reports.wdl" as reports
import "../tasks/tasks_utils.wdl" as utils
workflow sarscov2_genbank {
meta {
description: "Prepare SARS-CoV-2 assemblies for Genbank submission. This includes QC checks with NCBI's VADR tool and filters out genomes that do not pass its tests."
author: "Broad Viral Genomics"
email: "viral-ngs@broadinstitute.org"
}
input {
Array[File]+ assemblies_fasta
String? author_list # of the form "Lastname,A.B., Lastname,C.,"; optional alternative to names in author_sbt_defaults_yaml
File author_sbt_defaults_yaml # defaults to fill in for author_sbt file (including both author and non-author fields)
File author_sbt_j2_template
File biosample_attributes
File assembly_stats_tsv
File? fasta_rename_map
Int min_genome_bases = 15000
Int max_vadr_alerts = 0
Int taxid = 2697049
String gisaid_prefix = 'hCoV-19/'
}
parameter_meta {
assemblies_fasta: {
description: "Genomes to prepare for Genbank submission. One file per genome: all segments/chromosomes included in one file. All fasta files must contain exactly the same number of sequences as reference_fasta (which must equal the number of files in reference_annot_tbl).",
patterns: ["*.fasta"]
}
author_list: {
description: "A string containing a space-delimited list with of author surnames separated by first name and (optional) middle initial. Ex. 'Lastname,Firstname, Last-hypenated,First,M., Last,F.'"
}
author_sbt_defaults_yaml: {
description: "A YAML file with default values to use for the submitter, submitter affiliation, and author affiliation. Optionally including authors at the start and end of the author_list. Example: gs://pathogen-public-dbs/other-related/default_sbt_values.yaml",
patterns: ["*.yaml","*.yml"]
}
author_sbt_j2_template: {
description: "A jinja2-format template for the sbt file expected by NCBI. Example: gs://pathogen-public-dbs/other-related/author_template.sbt.j2"
}
biosample_attributes: {
description: "A post-submission attributes file from NCBI BioSample, which is available at https://submit.ncbi.nlm.nih.gov/subs/ and clicking on 'Download attributes file with BioSample accessions'.",
patterns: ["*.txt", "*.tsv"]
}
assembly_stats_tsv: {
description: "A four column tab text file with one row per sequence and the following header columns: SeqID, Assembly Method, Coverage, Sequencing Technology",
patterns: ["*.txt", "*.tsv"]
}
}
scatter(assembly in assemblies_fasta) {
if(defined(fasta_rename_map)) {
String fasta_basename = basename(assembly, ".fasta")
call ncbi.rename_fasta_header {
input:
genome_fasta = assembly,
new_name = read_map(select_first([fasta_rename_map]))[fasta_basename]
}
}
call reports.assembly_bases {
input:
fasta = assembly
}
File renamed_assembly = select_first([rename_fasta_header.renamed_fasta, assembly])
call ncbi.vadr {
input:
genome_fasta = renamed_assembly,
vadr_opts = "--glsearch -s -r --nomisc --mkey sarscov2 --lowsim5seq 6 --lowsim3seq 6 --alt_fail lowscore,insertnn,deletinn",
minlen = 50,
maxlen = 30000
}
if (assembly_bases.assembly_length_unambiguous >= min_genome_bases) {
if (vadr.num_alerts <= max_vadr_alerts) {
File passing_assemblies = renamed_assembly
}
if (vadr.num_alerts > max_vadr_alerts) {
File weird_assemblies = renamed_assembly
}
}
}
# prep the good ones
call utils.concatenate as passing_fasta {
input:
infiles = select_all(passing_assemblies),
output_name = "assemblies-passing.fasta"
}
call utils.fasta_to_ids as passing_ids {
input:
sequences_fasta = passing_fasta.combined
}
# prep the weird ones
call utils.concatenate as weird_fasta {
input:
infiles = select_all(weird_assemblies),
output_name = "assemblies-weird.fasta"
}
call utils.fasta_to_ids as weird_ids {
input:
sequences_fasta = weird_fasta.combined
}
# package genbank
call ncbi.biosample_to_genbank as passing_source_modifiers {
input:
biosample_attributes = biosample_attributes,
num_segments = 1,
taxid = taxid,
filter_to_ids = passing_ids.ids_txt
}
call ncbi.structured_comments as passing_structured_cmt {
input:
assembly_stats_tsv = assembly_stats_tsv,
filter_to_ids = passing_ids.ids_txt
}
call ncbi.generate_author_sbt_file as generate_author_sbt {
input:
author_list = author_list,
defaults_yaml = author_sbt_defaults_yaml,
j2_template = author_sbt_j2_template
}
call ncbi.package_special_genbank_ftp_submission as passing_package_genbank {
input:
sequences_fasta = passing_fasta.combined,
source_modifier_table = passing_source_modifiers.genbank_source_modifier_table,
author_template_sbt = generate_author_sbt.sbt_file,
structured_comment_table = passing_structured_cmt.structured_comment_table
}
# translate to gisaid
call ncbi.prefix_fasta_header as passing_prefix_gisaid {
input:
genome_fasta = passing_fasta.combined,
prefix = gisaid_prefix,
out_basename = "gisaid-passing-sequences"
}
call ncbi.gisaid_meta_prep as passing_gisaid_meta {
input:
source_modifier_table = passing_source_modifiers.genbank_source_modifier_table,
structured_comments = passing_structured_cmt.structured_comment_table,
fasta_filename = "gisaid-passing-sequences.fasta",
out_name = "gisaid-passing-meta.csv"
}
# package genbank
call ncbi.biosample_to_genbank as weird_source_modifiers {
input:
biosample_attributes = biosample_attributes,
num_segments = 1,
taxid = taxid,
filter_to_ids = weird_ids.ids_txt
}
call ncbi.structured_comments as weird_structured_cmt {
input:
assembly_stats_tsv = assembly_stats_tsv,
filter_to_ids = weird_ids.ids_txt
}
call ncbi.package_special_genbank_ftp_submission as weird_package_genbank {
input:
sequences_fasta = weird_fasta.combined,
source_modifier_table = weird_source_modifiers.genbank_source_modifier_table,
author_template_sbt = generate_author_sbt.sbt_file,
structured_comment_table = weird_structured_cmt.structured_comment_table
}
# translate to gisaid
call ncbi.prefix_fasta_header as weird_prefix_gisaid {
input:
genome_fasta = weird_fasta.combined,
prefix = gisaid_prefix,
out_basename = "gisaid-weird-sequences"
}
call ncbi.gisaid_meta_prep as weird_gisaid_meta {
input:
source_modifier_table = weird_source_modifiers.genbank_source_modifier_table,
structured_comments = weird_structured_cmt.structured_comment_table,
fasta_filename = "gisaid-weird-sequences.fasta",
out_name = "gisaid-weird-meta.csv"
}
output {
File submission_zip = passing_package_genbank.submission_zip
File submission_xml = passing_package_genbank.submission_xml
File submit_ready = passing_package_genbank.submit_ready
Int num_successful = length(select_all(passing_assemblies))
Int num_weird = length(select_all(weird_assemblies))
Int num_input = length(assemblies_fasta)
Array[File] vadr_outputs = vadr.outputs_tgz
File gisaid_fasta = passing_prefix_gisaid.renamed_fasta
File gisaid_meta_csv = passing_gisaid_meta.meta_csv
File weird_genbank_zip = weird_package_genbank.submission_zip
File weird_genbank_xml = weird_package_genbank.submission_xml
File weird_gisaid_fasta = weird_prefix_gisaid.renamed_fasta
File weird_gisaid_meta_csv = weird_gisaid_meta.meta_csv
}
}
version 1.0
import "../tasks/tasks_ncbi.wdl" as ncbi
import "../tasks/tasks_reports.wdl" as reports
import "../tasks/tasks_utils.wdl" as utils
workflow sarscov2_genbank {
meta {
description: "Prepare SARS-CoV-2 assemblies for Genbank submission. This includes QC checks with NCBI's VADR tool and filters out genomes that do not pass its tests."
author: "Broad Viral Genomics"
email: "viral-ngs@broadinstitute.org"
}
input {
Array[File]+ assemblies_fasta
String? author_list # of the form "Lastname,A.B., Lastname,C.,"; optional alternative to names in author_sbt_defaults_yaml
File author_sbt_defaults_yaml # defaults to fill in for author_sbt file (including both author and non-author fields)
File author_sbt_j2_template
File biosample_attributes
File assembly_stats_tsv
File? fasta_rename_map
Int min_genome_bases = 15000
Int max_vadr_alerts = 0
Int taxid = 2697049
String gisaid_prefix = 'hCoV-19/'
}
parameter_meta {
assemblies_fasta: {
description: "Genomes to prepare for Genbank submission. One file per genome: all segments/chromosomes included in one file. All fasta files must contain exactly the same number of sequences as reference_fasta (which must equal the number of files in reference_annot_tbl).",
patterns: ["*.fasta"]
}
author_list: {
description: "A string containing a space-delimited list with of author surnames separated by first name and (optional) middle initial. Ex. 'Lastname,Firstname, Last-hypenated,First,M., Last,F.'"
}
author_sbt_defaults_yaml: {
description: "A YAML file with default values to use for the submitter, submitter affiliation, and author affiliation. Optionally including authors at the start and end of the author_list. Example: gs://pathogen-public-dbs/other-related/default_sbt_values.yaml",
patterns: ["*.yaml","*.yml"]
}
author_sbt_j2_template: {
description: "A jinja2-format template for the sbt file expected by NCBI. Example: gs://pathogen-public-dbs/other-related/author_template.sbt.j2"
}
biosample_attributes: {
description: "A post-submission attributes file from NCBI BioSample, which is available at https://submit.ncbi.nlm.nih.gov/subs/ and clicking on 'Download attributes file with BioSample accessions'.",
patterns: ["*.txt", "*.tsv"]
}
assembly_stats_tsv: {
description: "A four column tab text file with one row per sequence and the following header columns: SeqID, Assembly Method, Coverage, Sequencing Technology",
patterns: ["*.txt", "*.tsv"]
}
}
scatter(assembly in assemblies_fasta) {
if(defined(fasta_rename_map)) {
String fasta_basename = basename(assembly, ".fasta")
call ncbi.rename_fasta_header {
input:
genome_fasta = assembly,
new_name = read_map(select_first([fasta_rename_map]))[fasta_basename]
}
}
call reports.assembly_bases {
input:
fasta = assembly
}
File renamed_assembly = select_first([rename_fasta_header.renamed_fasta, assembly])
call ncbi.vadr {
input:
genome_fasta = renamed_assembly,
vadr_opts = "--glsearch -s -r --nomisc --mkey sarscov2 --lowsim5seq 6 --lowsim3seq 6 --alt_fail lowscore,insertnn,deletinn",
minlen = 50,
maxlen = 30000
}
if (assembly_bases.assembly_length_unambiguous >= min_genome_bases) {
if (vadr.num_alerts <= max_vadr_alerts) {
File passing_assemblies = renamed_assembly
}
if (vadr.num_alerts > max_vadr_alerts) {
File weird_assemblies = renamed_assembly
}
}
}
# prep the good ones
call utils.concatenate as passing_fasta {
input:
infiles = select_all(passing_assemblies),
output_name = "assemblies-passing.fasta"
}
call utils.fasta_to_ids as passing_ids {
input:
sequences_fasta = passing_fasta.combined
}
# prep the weird ones
call utils.concatenate as weird_fasta {
input:
infiles = select_all(weird_assemblies),
output_name = "assemblies-weird.fasta"
}
call utils.fasta_to_ids as weird_ids {
input:
sequences_fasta = weird_fasta.combined
}
# package genbank
call ncbi.biosample_to_genbank as passing_source_modifiers {
input:
biosample_attributes = biosample_attributes,
num_segments = 1,
taxid = taxid,
filter_to_ids = passing_ids.ids_txt
}
call ncbi.structured_comments as passing_structured_cmt {
input:
assembly_stats_tsv = assembly_stats_tsv,
filter_to_ids = passing_ids.ids_txt
}
call ncbi.generate_author_sbt_file as generate_author_sbt {
input:
author_list = author_list,
defaults_yaml = author_sbt_defaults_yaml,
j2_template = author_sbt_j2_template
}
call ncbi.package_special_genbank_ftp_submission as passing_package_genbank {
input:
sequences_fasta = passing_fasta.combined,
source_modifier_table = passing_source_modifiers.genbank_source_modifier_table,
author_template_sbt = generate_author_sbt.sbt_file,
structured_comment_table = passing_structured_cmt.structured_comment_table
}
# translate to gisaid
call ncbi.prefix_fasta_header as passing_prefix_gisaid {
input:
genome_fasta = passing_fasta.combined,
prefix = gisaid_prefix,
out_basename = "gisaid-passing-sequences"
}
call ncbi.gisaid_meta_prep as passing_gisaid_meta {
input:
source_modifier_table = passing_source_modifiers.genbank_source_modifier_table,
structured_comments = passing_structured_cmt.structured_comment_table,
fasta_filename = "gisaid-passing-sequences.fasta",
out_name = "gisaid-passing-meta.csv"
}
# package genbank
call ncbi.biosample_to_genbank as weird_source_modifiers {
input:
biosample_attributes = biosample_attributes,
num_segments = 1,
taxid = taxid,
filter_to_ids = weird_ids.ids_txt
}
call ncbi.structured_comments as weird_structured_cmt {
input:
assembly_stats_tsv = assembly_stats_tsv,
filter_to_ids = weird_ids.ids_txt
}
call ncbi.package_special_genbank_ftp_submission as weird_package_genbank {
input:
sequences_fasta = weird_fasta.combined,
source_modifier_table = weird_source_modifiers.genbank_source_modifier_table,
author_template_sbt = generate_author_sbt.sbt_file,
structured_comment_table = weird_structured_cmt.structured_comment_table
}
# translate to gisaid
call ncbi.prefix_fasta_header as weird_prefix_gisaid {
input:
genome_fasta = weird_fasta.combined,
prefix = gisaid_prefix,
out_basename = "gisaid-weird-sequences"
}
call ncbi.gisaid_meta_prep as weird_gisaid_meta {
input:
source_modifier_table = weird_source_modifiers.genbank_source_modifier_table,
structured_comments = weird_structured_cmt.structured_comment_table,
fasta_filename = "gisaid-weird-sequences.fasta",
out_name = "gisaid-weird-meta.csv"
}
output {
File submission_zip = passing_package_genbank.submission_zip
File submission_xml = passing_package_genbank.submission_xml
File submit_ready = passing_package_genbank.submit_ready
Int num_successful = length(select_all(passing_assemblies))
Int num_weird = length(select_all(weird_assemblies))
Int num_input = length(assemblies_fasta)
Array[File] vadr_outputs = vadr.outputs_tgz
File gisaid_fasta = passing_prefix_gisaid.renamed_fasta
File gisaid_meta_csv = passing_gisaid_meta.meta_csv
File weird_genbank_zip = weird_package_genbank.submission_zip
File weird_genbank_xml = weird_package_genbank.submission_xml
File weird_gisaid_fasta = weird_prefix_gisaid.renamed_fasta
File weird_gisaid_meta_csv = weird_gisaid_meta.meta_csv
}
}