WORKFLOW
assemble_refbased
File Path
pipes/WDL/workflows/assemble_refbased.wdl
WDL Version
1.0
Type
workflow
Imports
Namespace
Path
assembly
../tasks/tasks_assembly.wdl
intrahost
../tasks/tasks_intrahost.wdl
reports
../tasks/tasks_reports.wdl
read_utils
../tasks/tasks_read_utils.wdl
Workflow: assemble_refbased
Reference-based microbial consensus calling. Aligns NGS reads to a singular reference genome, calls a new consensus sequence, and emits: new assembly, reads aligned to provided reference, reads aligned to new assembly, various figures of merit, plots, and QC metrics. The user may provide unaligned reads spread across multiple input files and this workflow will parallelize alignment per input file before merging results prior to consensus calling.
Author: Broad Viral Genomics
This workflow is called as a subworkflow by 7 other workflows:
Name
Type
Description
Default
reads_unmapped_bams
Array[File]+
Unaligned reads in BAM format
-
reference_fasta
File
Reference genome to align reads to.
-
sample_original_name
String?
-
-
novocraft_license
File?
-
-
trim_coords_bed
File?
optional primers to trim in reference coordinate space (0-based BED format)
-
machine_mem_gb
Int?
-
-
min_keep_length
Int?
-
-
sliding_window
Int?
-
-
primer_offset
Int?
-
-
machine_mem_gb
Int?
-
-
reheader_table
File?
-
-
amplicon_set
String?
-
-
max_coverage_depth
Int?
-
-
base_q_threshold
Int?
-
-
mapping_q_threshold
Int?
-
-
read_length_threshold
Int?
-
-
plotXLimits
String?
-
-
plotYLimits
String?
-
-
machine_mem_gb
Int?
-
-
reheader_table
File?
-
-
max_coverage_depth
Int?
-
-
base_q_threshold
Int?
-
-
mapping_q_threshold
Int?
-
-
read_length_threshold
Int?
-
-
plotXLimits
String?
-
-
plotYLimits
String?
-
-
51 optional inputs with default values
sample_name
String
Base name of output files. The 'SM' field in BAM read group headers are also rewritten to this value. Avoid spaces and other filename-unfriendly characters.
basename(reads_unmapped_bams[0],'.bam')
aligner
String
Read aligner software to use. Options: novoalign, bwa, minimap2. Minimap2 can automatically handle Illumina, PacBio, or Oxford Nanopore reads as long as the 'PL' field in the BAM read group header is set properly (novoalign and bwa are Illumina-only).
"minimap2"
min_coverage
Int
Minimum read coverage required to call a position unambiguous.
3
major_cutoff
Float
If the major allele is present at a frequency higher than this cutoff, we will call an unambiguous base at that position. If it is equal to or below this cutoff, we will call an ambiguous base representing all possible alleles at that position.
0.75
skip_mark_dupes
Boolean
skip Picard MarkDuplicates step after alignment. This is recommended to be set to true for PCR amplicon based data. (Default: false)
false
align_to_ref_options
Map[String,String]
-
{"novoalign": "-r Random -l 40 -g 40 -x 20 -t 501 -k", "bwa": "-k 12 -B 1", "minimap2": ""}
align_to_self_options
Map[String,String]
-
{"novoalign": "-r Random -l 40 -g 40 -x 20 -t 100", "bwa": "", "minimap2": ""}
docker
String
-
"quay.io/broadinstitute/viral-core:2.5.1"
sample_name
String
Base name of output files. The 'SM' field in BAM read group headers are also rewritten to this value. Avoid spaces and other filename-unfriendly characters.
basename(basename(basename(reads_unmapped_bam,".bam"),".taxfilt"),".clean")
min_quality
Int?
-
1
docker
String
-
"andersenlabapps/ivar:1.3.1"
bam_basename
String
-
basename(aligned_bam,".bam")
disk_size
Int
-
375
run_fastqc
Boolean
-
false
docker
String
-
"quay.io/broadinstitute/viral-core:2.5.1"
disk_size
Int
-
750
machine_mem_gb
Int
-
4
out_basename
String
-
basename(aligned_bam,'.bam')
docker
String
-
"quay.io/broadinstitute/viral-phylo:2.5.1.0"
max_amp_len
Int
-
5000
max_amplicons
Int
-
500
machine_mem_gb
Int
-
32
docker
String
-
"quay.io/broadinstitute/viral-core:2.5.1"
docker
String
-
"quay.io/broadinstitute/viral-core:2.5.1"
skip_mark_dupes
Boolean
skip Picard MarkDuplicates step after alignment. This is recommended to be set to true for PCR amplicon based data. (Default: false)
false
plot_only_non_duplicates
Boolean
-
false
bin_large_plots
Boolean
-
false
binning_summary_statistic
String?
-
"max"
plot_width_pixels
Int?
-
1100
plot_height_pixels
Int?
-
850
plot_pixels_per_inch
Int?
-
100
docker
String
-
"quay.io/broadinstitute/viral-core:2.5.1"
mark_duplicates
Boolean
-
false
machine_mem_gb
Int
-
15
docker
String
-
"quay.io/broadinstitute/viral-assemble:2.5.1.0"
docker
String
-
"quay.io/broadinstitute/viral-core:2.5.1"
sample_name
String
Base name of output files. The 'SM' field in BAM read group headers are also rewritten to this value. Avoid spaces and other filename-unfriendly characters.
basename(basename(basename(reads_unmapped_bam,".bam"),".taxfilt"),".clean")
run_fastqc
Boolean
-
false
docker
String
-
"quay.io/broadinstitute/viral-core:2.5.1"
disk_size
Int
-
750
machine_mem_gb
Int
-
4
out_basename
String
-
basename(aligned_bam,'.bam')
docker
String
-
"quay.io/broadinstitute/viral-phylo:2.5.1.0"
skip_mark_dupes
Boolean
skip Picard MarkDuplicates step after alignment. This is recommended to be set to true for PCR amplicon based data. (Default: false)
false
plot_only_non_duplicates
Boolean
-
false
bin_large_plots
Boolean
-
false
binning_summary_statistic
String?
-
"max"
plot_width_pixels
Int?
-
1100
plot_height_pixels
Int?
-
850
plot_pixels_per_inch
Int?
-
100
docker
String
-
"quay.io/broadinstitute/viral-core:2.5.1"
Outputs
Name
Type
Expression
assembly_fasta
File
call_consensus.refined_assembly_fasta
align_to_ref_variants_vcf_gz
File
call_consensus.sites_vcf_gz
assembly_length
Int
call_consensus.assembly_length
assembly_length_unambiguous
Int
call_consensus.assembly_length_unambiguous
reference_genome_length
Int
align_to_ref.reference_length[0]
assembly_mean_coverage
Float
plot_ref_coverage.mean_coverage
dist_to_ref_snps
Int
call_consensus.dist_to_ref_snps
dist_to_ref_indels
Int
call_consensus.dist_to_ref_indels
primer_trimmed_read_count
Array[Int]
ivar_trim.primer_trimmed_read_count
primer_trimmed_read_percent
Array[Float]
ivar_trim.primer_trimmed_read_percent
ivar_trim_stats
Array[Map[String,String]]
ivar_stats
ivar_trim_stats_tsv
Array[Array[String]]
ivar_stats_row
replicate_concordant_sites
Int
run_discordance.concordant_sites
replicate_discordant_snps
Int
run_discordance.discordant_snps
replicate_discordant_indels
Int
run_discordance.discordant_indels
num_read_groups
Int
run_discordance.num_read_groups
num_libraries
Int
run_discordance.num_libraries
replicate_discordant_vcf
File
run_discordance.discordant_sites_vcf
align_to_ref_per_input_aligned_flagstat
Array[File]
align_to_ref.aligned_bam_flagstat
align_to_ref_per_input_reads_provided
Array[Int]
align_to_ref.reads_provided
align_to_ref_per_input_reads_aligned
Array[Int]
align_to_ref.reads_aligned
align_to_ref_merged_aligned_trimmed_only_bam
File
aligned_trimmed_bam
align_to_ref_fastqc
File
select_first([merge_align_to_ref.fastqc, align_to_ref.aligned_only_reads_fastqc[0]])
align_to_ref_merged_coverage_plot
File
plot_ref_coverage.coverage_plot
align_to_ref_merged_coverage_tsv
File
plot_ref_coverage.coverage_tsv
align_to_ref_merged_reads_aligned
Int
plot_ref_coverage.reads_aligned
align_to_ref_merged_read_pairs_aligned
Int
plot_ref_coverage.read_pairs_aligned
align_to_ref_merged_bases_aligned
Int
floor(plot_ref_coverage.bases_aligned)
align_to_ref_isnvs_vcf
File
isnvs_ref.report_vcf
picard_metrics_wgs
File
alignment_metrics.wgs_metrics
picard_metrics_alignment
File
alignment_metrics.alignment_metrics
picard_metrics_insert_size
File
alignment_metrics.insert_size_metrics
samtools_ampliconstats
File
alignment_metrics.amplicon_stats
samtools_ampliconstats_parsed
File
alignment_metrics.amplicon_stats_parsed
align_to_self_merged_aligned_and_unaligned_bam
Array[File]
align_to_self.aligned_bam
align_to_self_merged_aligned_only_bam
File
aligned_self_bam
align_to_self_merged_coverage_plot
File
plot_self_coverage.coverage_plot
align_to_self_merged_coverage_tsv
File
plot_self_coverage.coverage_tsv
align_to_self_merged_reads_aligned
Int
plot_self_coverage.reads_aligned
align_to_self_merged_read_pairs_aligned
Int
plot_self_coverage.read_pairs_aligned
align_to_self_merged_bases_aligned
Int
floor(plot_self_coverage.bases_aligned)
align_to_self_merged_mean_coverage
Float
plot_self_coverage.mean_coverage
align_to_self_isnvs_vcf
File
isnvs_self.report_vcf
assembly_method
String
"viral-ngs/assemble_refbased"
align_to_ref_viral_core_version
String
align_to_ref.viralngs_version[0]
ivar_version
String
ivar_trim.ivar_version[0]
viral_assemble_version
String
call_consensus.viralngs_version
Calls
This workflow calls the following tasks or subworkflows:
Input Mappings (6)
Input
Value
reference_fasta
reference_fasta
reads_unmapped_bam
reads_unmapped_bam
novocraft_license
novocraft_license
skip_mark_dupes
skip_mark_dupes
aligner
aligner
aligner_options
align_to_ref_options[aligner]
Input Mappings (2)
Input
Value
aligned_bam
align_to_ref.aligned_only_reads_bam
trim_coords_bed
trim_coords_bed
Input Mappings (3)
Input
Value
in_bams
ivar_trim.aligned_trimmed_bam
sample_name
sample_name
out_basename
"~{sample_name}.align_to_ref.trimmed"
Input Mappings (2)
Input
Value
reference_fasta
reference_fasta
aligned_bam
aligned_trimmed_bam
Input Mappings (4)
Input
Value
aligned_bam
aligned_trimmed_bam
ref_fasta
reference_fasta
primers_bed
trim_coords_bed
min_coverage
min_coverage
Input Mappings (4)
Input
Value
reads_aligned_bam
aligned_trimmed_bam
reference_fasta
reference_fasta
min_coverage
min_coverage + 1
out_basename
sample_name
Input Mappings (2)
Input
Value
aligned_reads_bam
aligned_trimmed_bam
sample_name
sample_name
CALL
TASKS
call_consensus
↗
→ refine_assembly_with_aligned_reads
Input Mappings (6)
Input
Value
reference_fasta
reference_fasta
reads_aligned_bam
aligned_trimmed_bam
min_coverage
min_coverage
major_cutoff
major_cutoff
out_basename
sample_name
sample_name
select_first([sample_original_name, sample_name])
Input Mappings (6)
Input
Value
reference_fasta
call_consensus.refined_assembly_fasta
reads_unmapped_bam
reads_unmapped_bam
novocraft_license
novocraft_license
skip_mark_dupes
skip_mark_dupes
aligner
aligner
aligner_options
align_to_self_options[aligner]
Input Mappings (3)
Input
Value
in_bams
align_to_self.aligned_only_reads_bam
sample_name
sample_name
out_basename
"~{sample_name}.merge_align_to_self"
Input Mappings (2)
Input
Value
reference_fasta
call_consensus.refined_assembly_fasta
aligned_bam
aligned_self_bam
Input Mappings (2)
Input
Value
aligned_reads_bam
aligned_self_bam
sample_name
sample_name
Images
Container images used by tasks in this workflow:
⚙️ Parameterized
Configured via input:
docker
⚙️ Parameterized
Configured via input:
docker
Used by 6 tasks:
alignment_metrics
run_discordance
align_to_ref
merge_align_to_ref
align_to_self
merge_align_to_self
~{docker}
Used by 2 tasks:
plot_ref_coverage
plot_self_coverage
⚙️ Parameterized
Configured via input:
docker
⚙️ Parameterized
Configured via input:
docker
Zoom In
Zoom Out
Fit
Reset
🖱️ Scroll to zoom • Drag to pan • Double-click to reset • ESC to close
flowchart TD
Start([assemble_refbased])
subgraph S1 ["🔃 scatter reads_unmapped_bam in reads_unmapped_bams"]
direction TB
N1["align_to_refalign_reads "]
N2["ivar_trim"]
end
subgraph C1 ["↔️ if length(reads_unmapped_bams) > 1"]
direction TB
N3["merge_align_to_refmerge_and_reheader_bams "]
end
N4["isnvs_reflofreq "]
N5["alignment_metrics"]
N6["run_discordance"]
N7["plot_ref_coverageplot_coverage "]
N8["call_consensusrefine_assembly_with_aligned_reads "]
subgraph S2 ["🔃 scatter reads_unmapped_bam in reads_unmapped_bams"]
direction TB
N9["align_to_selfalign_reads "]
end
subgraph C2 ["↔️ if length(reads_unmapped_bams) > 1"]
direction TB
N10["merge_align_to_selfmerge_and_reheader_bams "]
end
N11["isnvs_selflofreq "]
N12["plot_self_coverageplot_coverage "]
N1 --> N2
N2 --> N3
N3 --> N4
N2 --> N4
N3 --> N5
N2 --> N5
N3 --> N6
N2 --> N6
N3 --> N7
N2 --> N7
N3 --> N8
N2 --> N8
N8 --> N9
N9 --> N10
N9 --> N11
N8 --> N11
N10 --> N11
N9 --> N12
N10 --> N12
Start --> N1
N7 --> End([End])
N6 --> End([End])
N4 --> End([End])
N12 --> End([End])
N11 --> End([End])
N5 --> End([End])
classDef taskNode fill:#a371f7,stroke:#8b5cf6,stroke-width:2px,color:#fff
classDef workflowNode fill:#58a6ff,stroke:#1f6feb,stroke-width:2px,color:#fff
version 1.0
import "../tasks/tasks_assembly.wdl" as assembly
import "../tasks/tasks_intrahost.wdl" as intrahost
import "../tasks/tasks_reports.wdl" as reports
import "../tasks/tasks_read_utils.wdl" as read_utils
workflow assemble_refbased {
meta {
description: "Reference-based microbial consensus calling. Aligns NGS reads to a singular reference genome, calls a new consensus sequence, and emits: new assembly, reads aligned to provided reference, reads aligned to new assembly, various figures of merit, plots, and QC metrics. The user may provide unaligned reads spread across multiple input files and this workflow will parallelize alignment per input file before merging results prior to consensus calling."
author: "Broad Viral Genomics"
email: "viral-ngs@broadinstitute.org"
allowNestedInputs: true
# https://wdl-aid.readthedocs.io/en/latest/usage.html -- can exclude certain inputs from documentation; attempt to do this on with docker nominated variables
}
parameter_meta {
sample_name: {
description: "Base name of output files. The 'SM' field in BAM read group headers are also rewritten to this value. Avoid spaces and other filename-unfriendly characters.",
category: "common"
}
reads_unmapped_bams: {
description: "Unaligned reads in BAM format",
patterns: ["*.bam"]
}
reference_fasta: {
description: "Reference genome to align reads to.",
patterns: ["*.fasta"]
}
aligner: {
description: "Read aligner software to use. Options: novoalign, bwa, minimap2. Minimap2 can automatically handle Illumina, PacBio, or Oxford Nanopore reads as long as the 'PL' field in the BAM read group header is set properly (novoalign and bwa are Illumina-only)."
}
skip_mark_dupes: {
description: "skip Picard MarkDuplicates step after alignment. This is recommended to be set to true for PCR amplicon based data. (Default: false)"
}
trim_coords_bed: {
description: "optional primers to trim in reference coordinate space (0-based BED format)",
patterns: ["*.bed"],
category: "common"
}
min_coverage: {
description: "Minimum read coverage required to call a position unambiguous."
}
major_cutoff: {
description: "If the major allele is present at a frequency higher than this cutoff, we will call an unambiguous base at that position. If it is equal to or below this cutoff, we will call an ambiguous base representing all possible alleles at that position."
}
assembly_fasta: {
description: "The new assembly / consensus sequence for this sample"
}
align_to_ref_variants_vcf_gz: {
description: "All variants in the input reads against the original reference genome. This VCF file is used to create the assembly_fasta"
}
assembly_length: {
description: "The length of the sequence described in assembly_fasta, inclusive of any uncovered regions denoted by Ns"
}
assembly_length_unambiguous: {
description: "The number of called consensus bases in assembly_fasta (excludes regions of the genome that lack read coverage)"
}
}
input {
Array[File]+ reads_unmapped_bams
File reference_fasta
String sample_name = basename(reads_unmapped_bams[0], '.bam')
String? sample_original_name
String aligner="minimap2"
File? novocraft_license
Int min_coverage=3
Float major_cutoff=0.75
Boolean skip_mark_dupes=false
File? trim_coords_bed
Map[String,String] align_to_ref_options = {
"novoalign": "-r Random -l 40 -g 40 -x 20 -t 501 -k",
"bwa": "-k 12 -B 1",
"minimap2": ""
}
Map[String,String] align_to_self_options = {
"novoalign": "-r Random -l 40 -g 40 -x 20 -t 100",
"bwa": "",
"minimap2": ""
}
}
scatter(reads_unmapped_bam in reads_unmapped_bams) {
call assembly.align_reads as align_to_ref {
input:
reference_fasta = reference_fasta,
reads_unmapped_bam = reads_unmapped_bam,
novocraft_license = novocraft_license,
skip_mark_dupes = skip_mark_dupes,
aligner = aligner,
aligner_options = align_to_ref_options[aligner]
}
call assembly.ivar_trim {
input:
aligned_bam = align_to_ref.aligned_only_reads_bam,
trim_coords_bed = trim_coords_bed
}
Map[String,String] ivar_stats = {
'file': basename(reads_unmapped_bam, '.bam'),
'trim_percent': ivar_trim.primer_trimmed_read_percent,
'trim_count': ivar_trim.primer_trimmed_read_count
}
Array[String] ivar_stats_row = [ivar_stats['file'], ivar_stats['trim_percent'], ivar_stats['trim_count']]
}
if(length(reads_unmapped_bams)>1) {
call read_utils.merge_and_reheader_bams as merge_align_to_ref {
input:
in_bams = ivar_trim.aligned_trimmed_bam,
sample_name = sample_name,
out_basename = "~{sample_name}.align_to_ref.trimmed"
}
}
File aligned_trimmed_bam = select_first([merge_align_to_ref.out_bam, ivar_trim.aligned_trimmed_bam[0]])
call intrahost.lofreq as isnvs_ref {
input:
reference_fasta = reference_fasta,
aligned_bam = aligned_trimmed_bam
}
call reports.alignment_metrics {
input:
aligned_bam = aligned_trimmed_bam,
ref_fasta = reference_fasta,
primers_bed = trim_coords_bed,
min_coverage = min_coverage
}
call assembly.run_discordance {
input:
reads_aligned_bam = aligned_trimmed_bam,
reference_fasta = reference_fasta,
min_coverage = min_coverage+1,
out_basename = sample_name
}
call reports.plot_coverage as plot_ref_coverage {
input:
aligned_reads_bam = aligned_trimmed_bam,
sample_name = sample_name
}
call assembly.refine_assembly_with_aligned_reads as call_consensus {
input:
reference_fasta = reference_fasta,
reads_aligned_bam = aligned_trimmed_bam,
min_coverage = min_coverage,
major_cutoff = major_cutoff,
out_basename = sample_name,
sample_name = select_first([sample_original_name, sample_name])
}
scatter(reads_unmapped_bam in reads_unmapped_bams) {
call assembly.align_reads as align_to_self {
input:
reference_fasta = call_consensus.refined_assembly_fasta,
reads_unmapped_bam = reads_unmapped_bam,
novocraft_license = novocraft_license,
skip_mark_dupes = skip_mark_dupes,
aligner = aligner,
aligner_options = align_to_self_options[aligner]
}
}
if(length(reads_unmapped_bams)>1) {
call read_utils.merge_and_reheader_bams as merge_align_to_self {
input:
in_bams = align_to_self.aligned_only_reads_bam,
sample_name = sample_name,
out_basename = "~{sample_name}.merge_align_to_self"
}
}
File aligned_self_bam = select_first([merge_align_to_self.out_bam, align_to_self.aligned_only_reads_bam[0]])
call intrahost.lofreq as isnvs_self {
input:
reference_fasta = call_consensus.refined_assembly_fasta,
aligned_bam = aligned_self_bam
}
call reports.plot_coverage as plot_self_coverage {
input:
aligned_reads_bam = aligned_self_bam,
sample_name = sample_name
}
output {
File assembly_fasta = call_consensus.refined_assembly_fasta
File align_to_ref_variants_vcf_gz = call_consensus.sites_vcf_gz
Int assembly_length = call_consensus.assembly_length
Int assembly_length_unambiguous = call_consensus.assembly_length_unambiguous
Int reference_genome_length = align_to_ref.reference_length[0]
Float assembly_mean_coverage = plot_ref_coverage.mean_coverage
Int dist_to_ref_snps = call_consensus.dist_to_ref_snps
Int dist_to_ref_indels = call_consensus.dist_to_ref_indels
Array[Int] primer_trimmed_read_count = ivar_trim.primer_trimmed_read_count
Array[Float] primer_trimmed_read_percent = ivar_trim.primer_trimmed_read_percent
Array[Map[String,String]] ivar_trim_stats = ivar_stats
Array[Array[String]] ivar_trim_stats_tsv = ivar_stats_row
Int replicate_concordant_sites = run_discordance.concordant_sites
Int replicate_discordant_snps = run_discordance.discordant_snps
Int replicate_discordant_indels = run_discordance.discordant_indels
Int num_read_groups = run_discordance.num_read_groups
Int num_libraries = run_discordance.num_libraries
File replicate_discordant_vcf = run_discordance.discordant_sites_vcf
Array[File] align_to_ref_per_input_aligned_flagstat = align_to_ref.aligned_bam_flagstat
Array[Int] align_to_ref_per_input_reads_provided = align_to_ref.reads_provided
Array[Int] align_to_ref_per_input_reads_aligned = align_to_ref.reads_aligned
File align_to_ref_merged_aligned_trimmed_only_bam = aligned_trimmed_bam
File align_to_ref_fastqc = select_first([merge_align_to_ref.fastqc, align_to_ref.aligned_only_reads_fastqc[0]])
File align_to_ref_merged_coverage_plot = plot_ref_coverage.coverage_plot
File align_to_ref_merged_coverage_tsv = plot_ref_coverage.coverage_tsv
Int align_to_ref_merged_reads_aligned = plot_ref_coverage.reads_aligned
Int align_to_ref_merged_read_pairs_aligned = plot_ref_coverage.read_pairs_aligned
Int align_to_ref_merged_bases_aligned = floor(plot_ref_coverage.bases_aligned)
File align_to_ref_isnvs_vcf = isnvs_ref.report_vcf
File picard_metrics_wgs = alignment_metrics.wgs_metrics
File picard_metrics_alignment = alignment_metrics.alignment_metrics
File picard_metrics_insert_size = alignment_metrics.insert_size_metrics
File samtools_ampliconstats = alignment_metrics.amplicon_stats
File samtools_ampliconstats_parsed = alignment_metrics.amplicon_stats_parsed
Array[File] align_to_self_merged_aligned_and_unaligned_bam = align_to_self.aligned_bam
File align_to_self_merged_aligned_only_bam = aligned_self_bam
File align_to_self_merged_coverage_plot = plot_self_coverage.coverage_plot
File align_to_self_merged_coverage_tsv = plot_self_coverage.coverage_tsv
Int align_to_self_merged_reads_aligned = plot_self_coverage.reads_aligned
Int align_to_self_merged_read_pairs_aligned = plot_self_coverage.read_pairs_aligned
Int align_to_self_merged_bases_aligned = floor(plot_self_coverage.bases_aligned)
Float align_to_self_merged_mean_coverage = plot_self_coverage.mean_coverage
File align_to_self_isnvs_vcf = isnvs_self.report_vcf
String assembly_method = "viral-ngs/assemble_refbased"
String align_to_ref_viral_core_version = align_to_ref.viralngs_version[0]
String ivar_version = ivar_trim.ivar_version[0]
String viral_assemble_version = call_consensus.viralngs_version
}
}
version 1.0
import "../tasks/tasks_assembly.wdl" as assembly
import "../tasks/tasks_intrahost.wdl" as intrahost
import "../tasks/tasks_reports.wdl" as reports
import "../tasks/tasks_read_utils.wdl" as read_utils
workflow assemble_refbased {
meta {
description: "Reference-based microbial consensus calling. Aligns NGS reads to a singular reference genome, calls a new consensus sequence, and emits: new assembly, reads aligned to provided reference, reads aligned to new assembly, various figures of merit, plots, and QC metrics. The user may provide unaligned reads spread across multiple input files and this workflow will parallelize alignment per input file before merging results prior to consensus calling."
author: "Broad Viral Genomics"
email: "viral-ngs@broadinstitute.org"
allowNestedInputs: true
# https://wdl-aid.readthedocs.io/en/latest/usage.html -- can exclude certain inputs from documentation; attempt to do this on with docker nominated variables
}
parameter_meta {
sample_name: {
description: "Base name of output files. The 'SM' field in BAM read group headers are also rewritten to this value. Avoid spaces and other filename-unfriendly characters.",
category: "common"
}
reads_unmapped_bams: {
description: "Unaligned reads in BAM format",
patterns: ["*.bam"]
}
reference_fasta: {
description: "Reference genome to align reads to.",
patterns: ["*.fasta"]
}
aligner: {
description: "Read aligner software to use. Options: novoalign, bwa, minimap2. Minimap2 can automatically handle Illumina, PacBio, or Oxford Nanopore reads as long as the 'PL' field in the BAM read group header is set properly (novoalign and bwa are Illumina-only)."
}
skip_mark_dupes: {
description: "skip Picard MarkDuplicates step after alignment. This is recommended to be set to true for PCR amplicon based data. (Default: false)"
}
trim_coords_bed: {
description: "optional primers to trim in reference coordinate space (0-based BED format)",
patterns: ["*.bed"],
category: "common"
}
min_coverage: {
description: "Minimum read coverage required to call a position unambiguous."
}
major_cutoff: {
description: "If the major allele is present at a frequency higher than this cutoff, we will call an unambiguous base at that position. If it is equal to or below this cutoff, we will call an ambiguous base representing all possible alleles at that position."
}
assembly_fasta: {
description: "The new assembly / consensus sequence for this sample"
}
align_to_ref_variants_vcf_gz: {
description: "All variants in the input reads against the original reference genome. This VCF file is used to create the assembly_fasta"
}
assembly_length: {
description: "The length of the sequence described in assembly_fasta, inclusive of any uncovered regions denoted by Ns"
}
assembly_length_unambiguous: {
description: "The number of called consensus bases in assembly_fasta (excludes regions of the genome that lack read coverage)"
}
}
input {
Array[File]+ reads_unmapped_bams
File reference_fasta
String sample_name = basename(reads_unmapped_bams[0], '.bam')
String? sample_original_name
String aligner="minimap2"
File? novocraft_license
Int min_coverage=3
Float major_cutoff=0.75
Boolean skip_mark_dupes=false
File? trim_coords_bed
Map[String,String] align_to_ref_options = {
"novoalign": "-r Random -l 40 -g 40 -x 20 -t 501 -k",
"bwa": "-k 12 -B 1",
"minimap2": ""
}
Map[String,String] align_to_self_options = {
"novoalign": "-r Random -l 40 -g 40 -x 20 -t 100",
"bwa": "",
"minimap2": ""
}
}
scatter(reads_unmapped_bam in reads_unmapped_bams) {
call assembly.align_reads as align_to_ref {
input:
reference_fasta = reference_fasta,
reads_unmapped_bam = reads_unmapped_bam,
novocraft_license = novocraft_license,
skip_mark_dupes = skip_mark_dupes,
aligner = aligner,
aligner_options = align_to_ref_options[aligner]
}
call assembly.ivar_trim {
input:
aligned_bam = align_to_ref.aligned_only_reads_bam,
trim_coords_bed = trim_coords_bed
}
Map[String,String] ivar_stats = {
'file': basename(reads_unmapped_bam, '.bam'),
'trim_percent': ivar_trim.primer_trimmed_read_percent,
'trim_count': ivar_trim.primer_trimmed_read_count
}
Array[String] ivar_stats_row = [ivar_stats['file'], ivar_stats['trim_percent'], ivar_stats['trim_count']]
}
if(length(reads_unmapped_bams)>1) {
call read_utils.merge_and_reheader_bams as merge_align_to_ref {
input:
in_bams = ivar_trim.aligned_trimmed_bam,
sample_name = sample_name,
out_basename = "~{sample_name}.align_to_ref.trimmed"
}
}
File aligned_trimmed_bam = select_first([merge_align_to_ref.out_bam, ivar_trim.aligned_trimmed_bam[0]])
call intrahost.lofreq as isnvs_ref {
input:
reference_fasta = reference_fasta,
aligned_bam = aligned_trimmed_bam
}
call reports.alignment_metrics {
input:
aligned_bam = aligned_trimmed_bam,
ref_fasta = reference_fasta,
primers_bed = trim_coords_bed,
min_coverage = min_coverage
}
call assembly.run_discordance {
input:
reads_aligned_bam = aligned_trimmed_bam,
reference_fasta = reference_fasta,
min_coverage = min_coverage+1,
out_basename = sample_name
}
call reports.plot_coverage as plot_ref_coverage {
input:
aligned_reads_bam = aligned_trimmed_bam,
sample_name = sample_name
}
call assembly.refine_assembly_with_aligned_reads as call_consensus {
input:
reference_fasta = reference_fasta,
reads_aligned_bam = aligned_trimmed_bam,
min_coverage = min_coverage,
major_cutoff = major_cutoff,
out_basename = sample_name,
sample_name = select_first([sample_original_name, sample_name])
}
scatter(reads_unmapped_bam in reads_unmapped_bams) {
call assembly.align_reads as align_to_self {
input:
reference_fasta = call_consensus.refined_assembly_fasta,
reads_unmapped_bam = reads_unmapped_bam,
novocraft_license = novocraft_license,
skip_mark_dupes = skip_mark_dupes,
aligner = aligner,
aligner_options = align_to_self_options[aligner]
}
}
if(length(reads_unmapped_bams)>1) {
call read_utils.merge_and_reheader_bams as merge_align_to_self {
input:
in_bams = align_to_self.aligned_only_reads_bam,
sample_name = sample_name,
out_basename = "~{sample_name}.merge_align_to_self"
}
}
File aligned_self_bam = select_first([merge_align_to_self.out_bam, align_to_self.aligned_only_reads_bam[0]])
call intrahost.lofreq as isnvs_self {
input:
reference_fasta = call_consensus.refined_assembly_fasta,
aligned_bam = aligned_self_bam
}
call reports.plot_coverage as plot_self_coverage {
input:
aligned_reads_bam = aligned_self_bam,
sample_name = sample_name
}
output {
File assembly_fasta = call_consensus.refined_assembly_fasta
File align_to_ref_variants_vcf_gz = call_consensus.sites_vcf_gz
Int assembly_length = call_consensus.assembly_length
Int assembly_length_unambiguous = call_consensus.assembly_length_unambiguous
Int reference_genome_length = align_to_ref.reference_length[0]
Float assembly_mean_coverage = plot_ref_coverage.mean_coverage
Int dist_to_ref_snps = call_consensus.dist_to_ref_snps
Int dist_to_ref_indels = call_consensus.dist_to_ref_indels
Array[Int] primer_trimmed_read_count = ivar_trim.primer_trimmed_read_count
Array[Float] primer_trimmed_read_percent = ivar_trim.primer_trimmed_read_percent
Array[Map[String,String]] ivar_trim_stats = ivar_stats
Array[Array[String]] ivar_trim_stats_tsv = ivar_stats_row
Int replicate_concordant_sites = run_discordance.concordant_sites
Int replicate_discordant_snps = run_discordance.discordant_snps
Int replicate_discordant_indels = run_discordance.discordant_indels
Int num_read_groups = run_discordance.num_read_groups
Int num_libraries = run_discordance.num_libraries
File replicate_discordant_vcf = run_discordance.discordant_sites_vcf
Array[File] align_to_ref_per_input_aligned_flagstat = align_to_ref.aligned_bam_flagstat
Array[Int] align_to_ref_per_input_reads_provided = align_to_ref.reads_provided
Array[Int] align_to_ref_per_input_reads_aligned = align_to_ref.reads_aligned
File align_to_ref_merged_aligned_trimmed_only_bam = aligned_trimmed_bam
File align_to_ref_fastqc = select_first([merge_align_to_ref.fastqc, align_to_ref.aligned_only_reads_fastqc[0]])
File align_to_ref_merged_coverage_plot = plot_ref_coverage.coverage_plot
File align_to_ref_merged_coverage_tsv = plot_ref_coverage.coverage_tsv
Int align_to_ref_merged_reads_aligned = plot_ref_coverage.reads_aligned
Int align_to_ref_merged_read_pairs_aligned = plot_ref_coverage.read_pairs_aligned
Int align_to_ref_merged_bases_aligned = floor(plot_ref_coverage.bases_aligned)
File align_to_ref_isnvs_vcf = isnvs_ref.report_vcf
File picard_metrics_wgs = alignment_metrics.wgs_metrics
File picard_metrics_alignment = alignment_metrics.alignment_metrics
File picard_metrics_insert_size = alignment_metrics.insert_size_metrics
File samtools_ampliconstats = alignment_metrics.amplicon_stats
File samtools_ampliconstats_parsed = alignment_metrics.amplicon_stats_parsed
Array[File] align_to_self_merged_aligned_and_unaligned_bam = align_to_self.aligned_bam
File align_to_self_merged_aligned_only_bam = aligned_self_bam
File align_to_self_merged_coverage_plot = plot_self_coverage.coverage_plot
File align_to_self_merged_coverage_tsv = plot_self_coverage.coverage_tsv
Int align_to_self_merged_reads_aligned = plot_self_coverage.reads_aligned
Int align_to_self_merged_read_pairs_aligned = plot_self_coverage.read_pairs_aligned
Int align_to_self_merged_bases_aligned = floor(plot_self_coverage.bases_aligned)
Float align_to_self_merged_mean_coverage = plot_self_coverage.mean_coverage
File align_to_self_isnvs_vcf = isnvs_self.report_vcf
String assembly_method = "viral-ngs/assemble_refbased"
String align_to_ref_viral_core_version = align_to_ref.viralngs_version[0]
String ivar_version = ivar_trim.ivar_version[0]
String viral_assemble_version = call_consensus.viralngs_version
}
}