WORKFLOW
classify_single
File Path
pipes/WDL/workflows/classify_single.wdl
WDL Version
1.0
Type
workflow
Imports
Namespace
Path
metagenomics
../tasks/tasks_metagenomics.wdl
read_utils
../tasks/tasks_read_utils.wdl
assembly
../tasks/tasks_assembly.wdl
ncbi
../tasks/tasks_ncbi.wdl
reports
../tasks/tasks_reports.wdl
utils
../tasks/tasks_utils.wdl
Workflow: classify_single
Runs raw reads through taxonomic classification (Kraken2), human read depletion (based on Kraken2), de novo assembly (SPAdes), and FASTQC/multiQC of reads.
Author: Broad Viral Genomics
Name
Type
Description
Default
reads_bams
Array[File]+
Reads to classify. May be unmapped or mapped or both, paired-end or single-end. Multiple input files will be merged first.
-
ncbi_taxdump_tgz
File
An NCBI taxdump.tar.gz file that contains, at the minimum, a nodes.dmp and names.dmp file.
-
spikein_db
File?
ERCC spike-in sequences
-
trim_clip_db
File
Adapter sequences to remove via trimmomatic prior to SPAdes assembly
-
kraken2_db_tgz
File
Pre-built Kraken database tarball containing three files: hash.k2d, opts.k2d, and taxo.k2d.
-
krona_taxonomy_db_kraken2_tgz
File
Krona taxonomy database containing a single file: taxonomy.tab, or possibly just a compressed taxonomy.tab
-
taxid_to_ref_accessions_tsv
File?
-
-
sample_name
String?
-
-
reheader_table
File?
-
-
machine_mem_gb
Int?
-
-
min_base_qual
Int?
-
-
taxonomic_ids
Array[Int]?
-
-
minimum_hit_groups
Int?
-
-
taxonomic_ids
Array[Int]?
-
-
minimum_hit_groups
Int?
-
-
spades_min_contig_len
Int?
-
-
spades_options
String?
-
-
machine_mem_gb
Int?
-
-
emailAddress
String
-
-
skani_m
Int?
-
-
skani_s
Int?
-
-
skani_c
Int?
-
-
skani_n
Int?
-
-
38 optional inputs with default values
taxa_to_dehost
Array[String]
-
["Vertebrata"]
taxa_to_avoid_assembly
Array[String]
-
["Vertebrata", "other sequences", "Bacteria"]
out_basename
String
-
basename(in_bams[0],".bam")
run_fastqc
Boolean
-
false
docker
String
-
"quay.io/broadinstitute/viral-core:2.5.1"
disk_size
Int
-
750
machine_mem_gb
Int
-
4
topNHits
Int
-
3
filter_bam_to_proper_primary_mapped_reads
Boolean
-
true
do_not_require_proper_mapped_pairs_when_filtering
Boolean
-
false
keep_singletons_when_filtering
Boolean
-
false
keep_duplicates_when_filtering
Boolean
-
false
docker
String
-
"quay.io/broadinstitute/viral-core:2.5.1"
confidence_threshold
Float?
-
0.05
machine_mem_gb
Int
-
90
docker
String
-
"quay.io/broadinstitute/viral-classify:2.5.1.0"
withoutChildren
Boolean
-
false
run_fastqc
Boolean
-
false
machine_mem_gb
Int
-
26 + 10 * ceil(size(classified_reads_txt_gz,"GB"))
docker
String
-
"quay.io/broadinstitute/viral-classify:2.5.1.0"
withoutChildren
Boolean
-
false
run_fastqc
Boolean
-
false
machine_mem_gb
Int
-
26 + 10 * ceil(size(classified_reads_txt_gz,"GB"))
docker
String
-
"quay.io/broadinstitute/viral-classify:2.5.1.0"
method
String
-
"mvicuna"
machine_mem_gb
Int
-
7
docker
String
-
"quay.io/broadinstitute/viral-core:2.5.1"
spades_n_reads
Int
-
10000000
sample_name
String
-
basename(basename(reads_unmapped_bam,".bam"),".taxfilt")
docker
String
-
"quay.io/broadinstitute/viral-assemble:2.5.1.0"
focal_taxon
String
-
"Viruses"
docker
String
-
"quay.io/broadinstitute/viral-classify:2.5.1.0"
docker
String
-
"quay.io/broadinstitute/viral-phylo:2.5.1.0"
docker
String
-
"quay.io/broadinstitute/viral-assemble:2.5.1.0"
machine_mem_gb
Int
-
4
cpu
Int
-
2
disk_size
Int
-
100
set_default_keys
Array[String]
-
[]
Outputs
Name
Type
Expression
cleaned_reads_unaligned_bam
File
deplete.bam_filtered_to_taxa
deduplicated_reads_unaligned
File
rmdup_ubam.dedup_bam
contigs_fasta
File
spades.contigs_fasta
read_counts_raw
Int
deplete.classified_taxonomic_filter_read_count_pre
read_counts_depleted
Int
deplete.classified_taxonomic_filter_read_count_post
read_counts_dedup
Int
rmdup_ubam.dedup_read_count_post
read_counts_prespades_subsample
Int
spades.subsample_read_count
kraken2_summary_report
File
kraken2.kraken2_summary_report
kraken2_reads_report
File
kraken2.kraken2_reads_report
kraken2_krona_plot
File
kraken2.krona_report_html
kraken2_top_taxa_report
File
report_primary_kraken_taxa.ranked_focal_report
kraken2_focal_taxon_name
String
report_primary_kraken_taxa.focal_tax_name
kraken2_focal_total_reads
Int
report_primary_kraken_taxa.total_focal_reads
kraken2_top_taxon_id
String
report_primary_kraken_taxa.tax_id
kraken2_top_taxon_name
String
report_primary_kraken_taxa.tax_name
kraken2_top_taxon_rank
String
report_primary_kraken_taxa.tax_rank
kraken2_top_taxon_num_reads
Int
report_primary_kraken_taxa.num_reads
kraken2_top_taxon_pct_of_focal
Float
report_primary_kraken_taxa.percent_of_focal
skani_num_hits
Int
length(select_first([select_references.top_matches_per_cluster_basenames, []]))
skani_contigs_to_refs_dist_tsv
File?
select_references.skani_dist_full_tsv
skani_hits_taxids
Array[Int]?
skani_hit_taxid
skani_hits_taxnames
Array[String]?
skani_hit_taxname
raw_fastqc
File?
merge_raw_reads.fastqc
cleaned_fastqc
File?
deplete.fastqc_html_report
spikein_report
File?
spikein.report
spikein_tophit
String?
spikein.top_hit_id
spikein_pct_of_total_reads
String?
spikein.pct_total_reads_mapped
spikein_pct_lesser_hits
String?
spikein.pct_lesser_hits_of_mapped
kraken2_viral_classify_version
String
kraken2.viralngs_version
deplete_viral_classify_version
String
deplete.viralngs_version
spades_viral_assemble_version
String
spades.viralngs_version
Calls
This workflow calls the following tasks or subworkflows:
CALL
TASKS
merge_raw_reads
↗
→ merge_and_reheader_bams
Input Mappings (1)
Input
Value
in_bams
reads_bams
CALL
TASKS
spikein
↗
→ align_and_count
Input Mappings (2)
Input
Value
reads_bam
reads_bam
ref_db
select_first([spikein_db])
Input Mappings (3)
Input
Value
reads_bam
reads_bam
kraken2_db_tgz
kraken2_db_tgz
krona_taxonomy_db_tgz
krona_taxonomy_db_kraken2_tgz
CALL
TASKS
deplete
↗
→ filter_bam_to_taxa
Input Mappings (6)
Input
Value
classified_bam
reads_bam
classified_reads_txt_gz
kraken2.kraken2_reads_report
ncbi_taxonomy_db_tgz
ncbi_taxdump_tgz
exclude_taxa
true
taxonomic_names
taxa_to_dehost
out_filename_suffix
"hs_depleted"
Input Mappings (6)
Input
Value
classified_bam
reads_bam
classified_reads_txt_gz
kraken2.kraken2_reads_report
ncbi_taxonomy_db_tgz
ncbi_taxdump_tgz
exclude_taxa
true
taxonomic_names
taxa_to_avoid_assembly
out_filename_suffix
"acellular"
Input Mappings (1)
Input
Value
reads_unmapped_bam
filter_acellular.bam_filtered_to_taxa
CALL
TASKS
spades
↗
→ assemble
Input Mappings (3)
Input
Value
reads_unmapped_bam
rmdup_ubam.dedup_bam
trim_clip_db
trim_clip_db
always_succeed
true
Input Mappings (1)
Input
Value
kraken_summary_report
kraken2.kraken2_summary_report
Input Mappings (2)
Input
Value
joined_string
taxon[3]
delimiter
":"
Input Mappings (2)
Input
Value
accessions
string_split.tokens
combined_out_prefix
sub(taxon[3],":","-")
Input Mappings (2)
Input
Value
reference_genomes_fastas
download_annotations.combined_fasta
contigs_fasta
spades.contigs_fasta
CALL
TASKS
tax_lookup
↗
→ fetch_row_from_tsv
Input Mappings (4)
Input
Value
tsv
select_first([taxid_to_ref_accessions_tsv])
idx_col
"accessions"
idx_val
sub(top_match,"-",":")
add_header
["taxid", "isolate_prefix", "taxname", "accessions"]
Images
Container images used by tasks in this workflow:
⚙️ Parameterized
Configured via input:
docker
Used by 2 tasks:
merge_raw_reads
rmdup_ubam
⚙️ Parameterized
Configured via input:
docker
Used by 4 tasks:
kraken2
deplete
filter_acellular
report_primary_kraken_taxa
⚙️ Parameterized
Configured via input:
docker
⚙️ Parameterized
Configured via input:
docker
Zoom In
Zoom Out
Fit
Reset
🖱️ Scroll to zoom • Drag to pan • Double-click to reset • ESC to close
flowchart TD
Start([classify_single])
N1["merge_raw_readsmerge_and_reheader_bams "]
subgraph C1 ["↔️ if defined(spikein_db)"]
direction TB
N2["spikeinalign_and_count "]
end
N3["kraken2"]
N4["depletefilter_bam_to_taxa "]
N5["filter_acellularfilter_bam_to_taxa "]
N6["rmdup_ubam"]
N7["spadesassemble "]
N8["report_primary_kraken_taxa"]
subgraph C2 ["↔️ if defined(taxid_to_ref_accessions_tsv)"]
direction TB
subgraph S1 ["🔃 scatter taxon in read_tsv(select_first([taxid_to_ref_accessions_tsv]))"]
direction TB
N9["string_split"]
N10["download_annotations"]
end
N11["select_references"]
subgraph S2 ["🔃 scatter top_match in select_references.top_matches_per_cluster_basenames"]
direction TB
N12["tax_lookupfetch_row_from_tsv "]
end
end
N1 --> N2
N1 --> N3
N1 --> N4
N3 --> N4
N1 --> N5
N3 --> N5
N5 --> N6
N6 --> N7
N3 --> N8
N9 --> N10
N7 --> N11
N10 --> N11
N11 --> N12
Start --> N1
Start --> N9
N8 --> End([End])
N2 --> End([End])
N4 --> End([End])
N12 --> End([End])
classDef taskNode fill:#a371f7,stroke:#8b5cf6,stroke-width:2px,color:#fff
classDef workflowNode fill:#58a6ff,stroke:#1f6feb,stroke-width:2px,color:#fff
version 1.0
import "../tasks/tasks_metagenomics.wdl" as metagenomics
import "../tasks/tasks_read_utils.wdl" as read_utils
import "../tasks/tasks_assembly.wdl" as assembly
import "../tasks/tasks_ncbi.wdl" as ncbi
import "../tasks/tasks_reports.wdl" as reports
import "../tasks/tasks_utils.wdl" as utils
workflow classify_single {
meta {
description: "Runs raw reads through taxonomic classification (Kraken2), human read depletion (based on Kraken2), de novo assembly (SPAdes), and FASTQC/multiQC of reads."
author: "Broad Viral Genomics"
email: "viral-ngs@broadinstitute.org"
allowNestedInputs: true
}
input {
Array[File]+ reads_bams
File ncbi_taxdump_tgz
File? spikein_db
File trim_clip_db
File kraken2_db_tgz
File krona_taxonomy_db_kraken2_tgz
Array[String] taxa_to_dehost = ["Vertebrata"]
Array[String] taxa_to_avoid_assembly = ["Vertebrata", "other sequences", "Bacteria"]
File? taxid_to_ref_accessions_tsv
}
parameter_meta {
reads_bams: {
description: "Reads to classify. May be unmapped or mapped or both, paired-end or single-end. Multiple input files will be merged first.",
patterns: ["*.bam"]
}
spikein_db: {
description: "ERCC spike-in sequences",
patterns: ["*.fasta", "*.fasta.gz", "*.fasta.zst"]
}
trim_clip_db: {
description: "Adapter sequences to remove via trimmomatic prior to SPAdes assembly",
patterns: ["*.fasta", "*.fasta.gz", "*.fasta.zst"]
}
kraken2_db_tgz: {
description: "Pre-built Kraken database tarball containing three files: hash.k2d, opts.k2d, and taxo.k2d.",
patterns: ["*.tar.gz", "*.tar.lz4", "*.tar.bz2", "*.tar.zst"]
}
krona_taxonomy_db_kraken2_tgz: {
description: "Krona taxonomy database containing a single file: taxonomy.tab, or possibly just a compressed taxonomy.tab",
patterns: ["*.tab.zst", "*.tab.gz", "*.tab", "*.tar.gz", "*.tar.lz4", "*.tar.bz2", "*.tar.zst"]
}
ncbi_taxdump_tgz: {
description: "An NCBI taxdump.tar.gz file that contains, at the minimum, a nodes.dmp and names.dmp file.",
patterns: ["*.tar.gz", "*.tar.lz4", "*.tar.bz2", "*.tar.zst"]
}
deduplicated_reads_unaligned: {
description: "Deduplication on unaligned reads in BAM format using mvicuna or cdhit.",
category: "other"
}
kraken2_krona_plot: {
description:"Visualize the results of the Kraken2 analysis with Krona, which disaplys taxonmic hierarchiral data in multi-layerd pie.",
category:"other"
}
kraken2_summary_report: {
description: "Kraken report output file.",
category: "other"
}
}
call read_utils.merge_and_reheader_bams as merge_raw_reads {
input:
in_bams = reads_bams
}
File reads_bam = merge_raw_reads.out_bam
if(defined(spikein_db)) {
call reports.align_and_count as spikein {
input:
reads_bam = reads_bam,
ref_db = select_first([spikein_db])
}
}
call metagenomics.kraken2 as kraken2 {
input:
reads_bam = reads_bam,
kraken2_db_tgz = kraken2_db_tgz,
krona_taxonomy_db_tgz = krona_taxonomy_db_kraken2_tgz
}
call metagenomics.filter_bam_to_taxa as deplete {
input:
classified_bam = reads_bam,
classified_reads_txt_gz = kraken2.kraken2_reads_report,
ncbi_taxonomy_db_tgz = ncbi_taxdump_tgz,
exclude_taxa = true,
taxonomic_names = taxa_to_dehost,
out_filename_suffix = "hs_depleted"
}
call metagenomics.filter_bam_to_taxa as filter_acellular {
input:
classified_bam = reads_bam,
classified_reads_txt_gz = kraken2.kraken2_reads_report,
ncbi_taxonomy_db_tgz = ncbi_taxdump_tgz,
exclude_taxa = true,
taxonomic_names = taxa_to_avoid_assembly,
out_filename_suffix = "acellular"
}
call read_utils.rmdup_ubam {
input:
reads_unmapped_bam = filter_acellular.bam_filtered_to_taxa
}
call assembly.assemble as spades {
input:
reads_unmapped_bam = rmdup_ubam.dedup_bam,
trim_clip_db = trim_clip_db,
always_succeed = true
}
call metagenomics.report_primary_kraken_taxa {
input:
kraken_summary_report = kraken2.kraken2_summary_report
}
if(defined(taxid_to_ref_accessions_tsv)) {
# download (multi-segment) genomes for each reference, fasta filename = colon-concatenated accession list
scatter(taxon in read_tsv(select_first([taxid_to_ref_accessions_tsv]))) {
# taxon = [taxid, isolate_prefix, taxname, semicolon_delim_accession_list]
call utils.string_split {
input:
joined_string = taxon[3],
delimiter = ":"
}
call ncbi.download_annotations {
input:
accessions = string_split.tokens,
combined_out_prefix = sub(taxon[3], ":", "-") # singularity does not like colons in filenames
}
}
# subset reference genomes to those with ANI hits to contigs and cluster reference hits by any ANI similarity to each other
call assembly.select_references {
input:
reference_genomes_fastas = download_annotations.combined_fasta,
contigs_fasta = spades.contigs_fasta
}
# get taxid and taxname from taxid_to_ref_accessions_tsv
scatter(top_match in select_references.top_matches_per_cluster_basenames) {
call utils.fetch_row_from_tsv as tax_lookup {
input:
tsv = select_first([taxid_to_ref_accessions_tsv]),
idx_col = "accessions",
idx_val = sub(top_match, "-", ":"),
add_header = ["taxid", "isolate_prefix", "taxname", "accessions"]
}
Int skani_hit_taxid = tax_lookup.map["taxid"]
String skani_hit_taxname = tax_lookup.map["taxname"]
}
}
output {
File cleaned_reads_unaligned_bam = deplete.bam_filtered_to_taxa
File deduplicated_reads_unaligned = rmdup_ubam.dedup_bam
File contigs_fasta = spades.contigs_fasta
Int read_counts_raw = deplete.classified_taxonomic_filter_read_count_pre
Int read_counts_depleted = deplete.classified_taxonomic_filter_read_count_post
Int read_counts_dedup = rmdup_ubam.dedup_read_count_post
Int read_counts_prespades_subsample = spades.subsample_read_count
File kraken2_summary_report = kraken2.kraken2_summary_report
File kraken2_reads_report = kraken2.kraken2_reads_report
File kraken2_krona_plot = kraken2.krona_report_html
File kraken2_top_taxa_report = report_primary_kraken_taxa.ranked_focal_report
String kraken2_focal_taxon_name = report_primary_kraken_taxa.focal_tax_name
Int kraken2_focal_total_reads = report_primary_kraken_taxa.total_focal_reads
String kraken2_top_taxon_id = report_primary_kraken_taxa.tax_id
String kraken2_top_taxon_name = report_primary_kraken_taxa.tax_name
String kraken2_top_taxon_rank = report_primary_kraken_taxa.tax_rank
Int kraken2_top_taxon_num_reads = report_primary_kraken_taxa.num_reads
Float kraken2_top_taxon_pct_of_focal = report_primary_kraken_taxa.percent_of_focal
Int skani_num_hits = length(select_first([select_references.top_matches_per_cluster_basenames, []]))
File? skani_contigs_to_refs_dist_tsv = select_references.skani_dist_full_tsv
Array[Int]? skani_hits_taxids = skani_hit_taxid
Array[String]? skani_hits_taxnames = skani_hit_taxname
File? raw_fastqc = merge_raw_reads.fastqc
File? cleaned_fastqc = deplete.fastqc_html_report
File? spikein_report = spikein.report
String? spikein_tophit = spikein.top_hit_id
String? spikein_pct_of_total_reads = spikein.pct_total_reads_mapped
String? spikein_pct_lesser_hits = spikein.pct_lesser_hits_of_mapped
String kraken2_viral_classify_version = kraken2.viralngs_version
String deplete_viral_classify_version = deplete.viralngs_version
String spades_viral_assemble_version = spades.viralngs_version
}
}
version 1.0
import "../tasks/tasks_metagenomics.wdl" as metagenomics
import "../tasks/tasks_read_utils.wdl" as read_utils
import "../tasks/tasks_assembly.wdl" as assembly
import "../tasks/tasks_ncbi.wdl" as ncbi
import "../tasks/tasks_reports.wdl" as reports
import "../tasks/tasks_utils.wdl" as utils
workflow classify_single {
meta {
description: "Runs raw reads through taxonomic classification (Kraken2), human read depletion (based on Kraken2), de novo assembly (SPAdes), and FASTQC/multiQC of reads."
author: "Broad Viral Genomics"
email: "viral-ngs@broadinstitute.org"
allowNestedInputs: true
}
input {
Array[File]+ reads_bams
File ncbi_taxdump_tgz
File? spikein_db
File trim_clip_db
File kraken2_db_tgz
File krona_taxonomy_db_kraken2_tgz
Array[String] taxa_to_dehost = ["Vertebrata"]
Array[String] taxa_to_avoid_assembly = ["Vertebrata", "other sequences", "Bacteria"]
File? taxid_to_ref_accessions_tsv
}
parameter_meta {
reads_bams: {
description: "Reads to classify. May be unmapped or mapped or both, paired-end or single-end. Multiple input files will be merged first.",
patterns: ["*.bam"]
}
spikein_db: {
description: "ERCC spike-in sequences",
patterns: ["*.fasta", "*.fasta.gz", "*.fasta.zst"]
}
trim_clip_db: {
description: "Adapter sequences to remove via trimmomatic prior to SPAdes assembly",
patterns: ["*.fasta", "*.fasta.gz", "*.fasta.zst"]
}
kraken2_db_tgz: {
description: "Pre-built Kraken database tarball containing three files: hash.k2d, opts.k2d, and taxo.k2d.",
patterns: ["*.tar.gz", "*.tar.lz4", "*.tar.bz2", "*.tar.zst"]
}
krona_taxonomy_db_kraken2_tgz: {
description: "Krona taxonomy database containing a single file: taxonomy.tab, or possibly just a compressed taxonomy.tab",
patterns: ["*.tab.zst", "*.tab.gz", "*.tab", "*.tar.gz", "*.tar.lz4", "*.tar.bz2", "*.tar.zst"]
}
ncbi_taxdump_tgz: {
description: "An NCBI taxdump.tar.gz file that contains, at the minimum, a nodes.dmp and names.dmp file.",
patterns: ["*.tar.gz", "*.tar.lz4", "*.tar.bz2", "*.tar.zst"]
}
deduplicated_reads_unaligned: {
description: "Deduplication on unaligned reads in BAM format using mvicuna or cdhit.",
category: "other"
}
kraken2_krona_plot: {
description:"Visualize the results of the Kraken2 analysis with Krona, which disaplys taxonmic hierarchiral data in multi-layerd pie.",
category:"other"
}
kraken2_summary_report: {
description: "Kraken report output file.",
category: "other"
}
}
call read_utils.merge_and_reheader_bams as merge_raw_reads {
input:
in_bams = reads_bams
}
File reads_bam = merge_raw_reads.out_bam
if(defined(spikein_db)) {
call reports.align_and_count as spikein {
input:
reads_bam = reads_bam,
ref_db = select_first([spikein_db])
}
}
call metagenomics.kraken2 as kraken2 {
input:
reads_bam = reads_bam,
kraken2_db_tgz = kraken2_db_tgz,
krona_taxonomy_db_tgz = krona_taxonomy_db_kraken2_tgz
}
call metagenomics.filter_bam_to_taxa as deplete {
input:
classified_bam = reads_bam,
classified_reads_txt_gz = kraken2.kraken2_reads_report,
ncbi_taxonomy_db_tgz = ncbi_taxdump_tgz,
exclude_taxa = true,
taxonomic_names = taxa_to_dehost,
out_filename_suffix = "hs_depleted"
}
call metagenomics.filter_bam_to_taxa as filter_acellular {
input:
classified_bam = reads_bam,
classified_reads_txt_gz = kraken2.kraken2_reads_report,
ncbi_taxonomy_db_tgz = ncbi_taxdump_tgz,
exclude_taxa = true,
taxonomic_names = taxa_to_avoid_assembly,
out_filename_suffix = "acellular"
}
call read_utils.rmdup_ubam {
input:
reads_unmapped_bam = filter_acellular.bam_filtered_to_taxa
}
call assembly.assemble as spades {
input:
reads_unmapped_bam = rmdup_ubam.dedup_bam,
trim_clip_db = trim_clip_db,
always_succeed = true
}
call metagenomics.report_primary_kraken_taxa {
input:
kraken_summary_report = kraken2.kraken2_summary_report
}
if(defined(taxid_to_ref_accessions_tsv)) {
# download (multi-segment) genomes for each reference, fasta filename = colon-concatenated accession list
scatter(taxon in read_tsv(select_first([taxid_to_ref_accessions_tsv]))) {
# taxon = [taxid, isolate_prefix, taxname, semicolon_delim_accession_list]
call utils.string_split {
input:
joined_string = taxon[3],
delimiter = ":"
}
call ncbi.download_annotations {
input:
accessions = string_split.tokens,
combined_out_prefix = sub(taxon[3], ":", "-") # singularity does not like colons in filenames
}
}
# subset reference genomes to those with ANI hits to contigs and cluster reference hits by any ANI similarity to each other
call assembly.select_references {
input:
reference_genomes_fastas = download_annotations.combined_fasta,
contigs_fasta = spades.contigs_fasta
}
# get taxid and taxname from taxid_to_ref_accessions_tsv
scatter(top_match in select_references.top_matches_per_cluster_basenames) {
call utils.fetch_row_from_tsv as tax_lookup {
input:
tsv = select_first([taxid_to_ref_accessions_tsv]),
idx_col = "accessions",
idx_val = sub(top_match, "-", ":"),
add_header = ["taxid", "isolate_prefix", "taxname", "accessions"]
}
Int skani_hit_taxid = tax_lookup.map["taxid"]
String skani_hit_taxname = tax_lookup.map["taxname"]
}
}
output {
File cleaned_reads_unaligned_bam = deplete.bam_filtered_to_taxa
File deduplicated_reads_unaligned = rmdup_ubam.dedup_bam
File contigs_fasta = spades.contigs_fasta
Int read_counts_raw = deplete.classified_taxonomic_filter_read_count_pre
Int read_counts_depleted = deplete.classified_taxonomic_filter_read_count_post
Int read_counts_dedup = rmdup_ubam.dedup_read_count_post
Int read_counts_prespades_subsample = spades.subsample_read_count
File kraken2_summary_report = kraken2.kraken2_summary_report
File kraken2_reads_report = kraken2.kraken2_reads_report
File kraken2_krona_plot = kraken2.krona_report_html
File kraken2_top_taxa_report = report_primary_kraken_taxa.ranked_focal_report
String kraken2_focal_taxon_name = report_primary_kraken_taxa.focal_tax_name
Int kraken2_focal_total_reads = report_primary_kraken_taxa.total_focal_reads
String kraken2_top_taxon_id = report_primary_kraken_taxa.tax_id
String kraken2_top_taxon_name = report_primary_kraken_taxa.tax_name
String kraken2_top_taxon_rank = report_primary_kraken_taxa.tax_rank
Int kraken2_top_taxon_num_reads = report_primary_kraken_taxa.num_reads
Float kraken2_top_taxon_pct_of_focal = report_primary_kraken_taxa.percent_of_focal
Int skani_num_hits = length(select_first([select_references.top_matches_per_cluster_basenames, []]))
File? skani_contigs_to_refs_dist_tsv = select_references.skani_dist_full_tsv
Array[Int]? skani_hits_taxids = skani_hit_taxid
Array[String]? skani_hits_taxnames = skani_hit_taxname
File? raw_fastqc = merge_raw_reads.fastqc
File? cleaned_fastqc = deplete.fastqc_html_report
File? spikein_report = spikein.report
String? spikein_tophit = spikein.top_hit_id
String? spikein_pct_of_total_reads = spikein.pct_total_reads_mapped
String? spikein_pct_lesser_hits = spikein.pct_lesser_hits_of_mapped
String kraken2_viral_classify_version = kraken2.viralngs_version
String deplete_viral_classify_version = deplete.viralngs_version
String spades_viral_assemble_version = spades.viralngs_version
}
}