classify_multi
pipes/WDL/workflows/classify_multi.wdl

WORKFLOW classify_multi

File Path pipes/WDL/workflows/classify_multi.wdl
WDL Version 1.0
Type workflow

Imports

Namespace Path
metagenomics ../tasks/tasks_metagenomics.wdl
read_utils ../tasks/tasks_read_utils.wdl
taxon_filter ../tasks/tasks_taxon_filter.wdl
assembly ../tasks/tasks_assembly.wdl
reports ../tasks/tasks_reports.wdl

Workflow: classify_multi

Runs raw reads through taxonomic classification (Kraken2), human read depletion (based on Kraken2), de novo assembly (SPAdes), and FASTQC/multiQC of reads.

Author: Broad Viral Genomics
viral-ngs@broadinstitute.org

Inputs

Name Type Description Default
reads_bams Array[File]+ Reads to classify. May be unmapped or mapped or both, paired-end or single-end. -
ncbi_taxdump_tgz File An NCBI taxdump.tar.gz file that contains, at the minimum, a nodes.dmp and names.dmp file. -
spikein_db File ERCC spike-in sequences -
trim_clip_db File Adapter sequences to remove via trimmomatic prior to SPAdes assembly -
kraken2_db_tgz File Pre-built Kraken database tarball containing three files: hash.k2d, opts.k2d, and taxo.k2d. -
krona_taxonomy_db_kraken2_tgz File Krona taxonomy database containing a single file: taxonomy.tab, or possibly just a compressed taxonomy.tab -
machine_mem_gb Int? - -
min_base_qual Int? - -
taxonomic_ids Array[Int]? - -
minimum_hit_groups Int? - -
taxonomic_ids Array[Int]? - -
minimum_hit_groups Int? - -
spades_min_contig_len Int? - -
spades_options String? - -
machine_mem_gb Int? - -
title String? - -
comment String? - -
template String? - -
tag String? - -
ignore_analysis_files String? - -
ignore_sample_names String? - -
sample_names File? - -
exclude_modules Array[String]? - -
module_to_use Array[String]? - -
output_data_format String? - -
config File? - -
config_yaml String? - -
title String? - -
comment String? - -
template String? - -
tag String? - -
ignore_analysis_files String? - -
ignore_sample_names String? - -
sample_names File? - -
exclude_modules Array[String]? - -
module_to_use Array[String]? - -
output_data_format String? - -
config File? - -
config_yaml String? - -
title String? - -
comment String? - -
template String? - -
tag String? - -
ignore_analysis_files String? - -
ignore_sample_names String? - -
sample_names File? - -
exclude_modules Array[String]? - -
module_to_use Array[String]? - -
output_data_format String? - -
config File? - -
config_yaml String? - -
query_column Int? - -
taxid_column Int? - -
score_column Int? - -
magnitude_column Int? - -
72 optional inputs with default values

Outputs

Name Type Expression
cleaned_reads_unaligned_bams Array[File] deplete.bam_filtered_to_taxa
deduplicated_reads_unaligned Array[File] rmdup_ubam.dedup_bam
contigs_fastas Array[File] spades.contigs_fasta
read_counts_raw Array[Int] deplete.classified_taxonomic_filter_read_count_pre
read_counts_depleted Array[Int] deplete.classified_taxonomic_filter_read_count_post
read_counts_dedup Array[Int] rmdup_ubam.dedup_read_count_post
read_counts_prespades_subsample Array[Int] spades.subsample_read_count
multiqc_report_raw File multiqc_raw.multiqc_report
multiqc_report_cleaned File multiqc_cleaned.multiqc_report
multiqc_report_dedup File multiqc_dedup.multiqc_report
spikein_counts File spike_summary.count_summary
kraken2_merged_krona File krona_merge_kraken2.krona_report_html
kraken2_summary File metag_summary_report.krakenuniq_aggregate_taxlevel_summary
kraken2_summary_reports Array[File] kraken2.kraken2_summary_report
kraken2_krona_by_sample Array[File] kraken2.krona_report_html
kraken2_viral_classify_version String kraken2.viralngs_version[0]
deplete_viral_classify_version String deplete.viralngs_version[0]
spades_viral_assemble_version String spades.viralngs_version[0]

Calls

This workflow calls the following tasks or subworkflows:

CALL TASKS fastqc_raw → fastqc

Input Mappings (1)
Input Value
reads_bam raw_reads

CALL TASKS spikein → align_and_count

Input Mappings (2)
Input Value
reads_bam raw_reads
ref_db spikein_db

CALL TASKS kraken2

Input Mappings (3)
Input Value
reads_bam raw_reads
kraken2_db_tgz kraken2_db_tgz
krona_taxonomy_db_tgz krona_taxonomy_db_kraken2_tgz

CALL TASKS deplete → filter_bam_to_taxa

Input Mappings (6)
Input Value
classified_bam raw_reads
classified_reads_txt_gz kraken2.kraken2_reads_report
ncbi_taxonomy_db_tgz ncbi_taxdump_tgz
exclude_taxa true
taxonomic_names ["Vertebrata"]
out_filename_suffix "hs_depleted"

CALL TASKS fastqc_cleaned → fastqc

Input Mappings (1)
Input Value
reads_bam deplete.bam_filtered_to_taxa

CALL TASKS filter_acellular → filter_bam_to_taxa

Input Mappings (6)
Input Value
classified_bam raw_reads
classified_reads_txt_gz kraken2.kraken2_reads_report
ncbi_taxonomy_db_tgz ncbi_taxdump_tgz
exclude_taxa true
taxonomic_names ["Vertebrata", "other sequences", "Bacteria"]
out_filename_suffix "acellular"

CALL TASKS rmdup_ubam

Input Mappings (1)
Input Value
reads_unmapped_bam clean_reads

CALL TASKS spades → assemble

Input Mappings (3)
Input Value
reads_unmapped_bam rmdup_ubam.dedup_bam
trim_clip_db trim_clip_db
always_succeed true

CALL TASKS multiqc_raw → MultiQC

Input Mappings (2)
Input Value
input_files fastqc_raw.fastqc_zip
file_name "multiqc-raw.html"

CALL TASKS multiqc_cleaned → MultiQC

Input Mappings (2)
Input Value
input_files fastqc_cleaned.fastqc_zip
file_name "multiqc-cleaned.html"

CALL TASKS multiqc_dedup → MultiQC

Input Mappings (2)
Input Value
input_files rmdup_ubam.dedup_fastqc_zip
file_name "multiqc-dedup.html"

CALL TASKS spike_summary → align_and_count_summary

Input Mappings (1)
Input Value
counts_txt spikein.report

CALL TASKS metag_summary_report → aggregate_metagenomics_reports

Input Mappings (1)
Input Value
kraken_summary_reports kraken2.kraken2_summary_report

CALL TASKS krona_merge_kraken2 → krona

Input Mappings (4)
Input Value
reports_txt_gz kraken2.kraken2_summary_report
krona_taxonomy_db_tgz krona_taxonomy_db_kraken2_tgz
input_type "kraken2"
out_basename "merged-kraken2.krona"

Images

Container images used by tasks in this workflow:

🐳 ~{docker}

~{docker}

Used by 8 tasks:
  • multiqc_raw
  • multiqc_cleaned
  • multiqc_dedup
  • spike_summary
  • metag_summary_report
  • fastqc_raw
  • spikein
  • fastqc_cleaned
🐳 Parameterized Image
⚙️ Parameterized

Configured via input:
docker

Used by 4 tasks:
  • krona_merge_kraken2
  • kraken2
  • deplete
  • filter_acellular
🐳 Parameterized Image
⚙️ Parameterized

Configured via input:
docker

Used by 1 task:
  • rmdup_ubam
🐳 Parameterized Image
⚙️ Parameterized

Configured via input:
docker

Used by 1 task:
  • spades
← Back to Index

classify_multi - Workflow Graph

🖱️ Scroll to zoom • Drag to pan • Double-click to reset • ESC to close

classify_multi - WDL Source Code

version 1.0

import "../tasks/tasks_metagenomics.wdl" as metagenomics
import "../tasks/tasks_read_utils.wdl" as read_utils
import "../tasks/tasks_taxon_filter.wdl" as taxon_filter
import "../tasks/tasks_assembly.wdl" as assembly
import "../tasks/tasks_reports.wdl" as reports

workflow classify_multi {
    meta {
         description: "Runs raw reads through taxonomic classification (Kraken2), human read depletion (based on Kraken2), de novo assembly (SPAdes), and FASTQC/multiQC of reads."
         author: "Broad Viral Genomics"
         email:  "viral-ngs@broadinstitute.org"
    }

    input {
        Array[File]+ reads_bams

        File ncbi_taxdump_tgz

        File spikein_db
        File trim_clip_db

        File kraken2_db_tgz
        File krona_taxonomy_db_kraken2_tgz
    }

    parameter_meta {
        reads_bams: {
          description: "Reads to classify. May be unmapped or mapped or both, paired-end or single-end.",
          patterns: ["*.bam"]
        }
        spikein_db: {
          description: "ERCC spike-in sequences",
          patterns: ["*.fasta", "*.fasta.gz", "*.fasta.zst"]
        }
        trim_clip_db: {
          description: "Adapter sequences to remove via trimmomatic prior to SPAdes assembly",
          patterns: ["*.fasta", "*.fasta.gz", "*.fasta.zst"]
        }
        kraken2_db_tgz: {
          description: "Pre-built Kraken database tarball containing three files: hash.k2d, opts.k2d, and taxo.k2d.",
          patterns: ["*.tar.gz", "*.tar.lz4", "*.tar.bz2", "*.tar.zst"]
        }
        krona_taxonomy_db_kraken2_tgz: {
          description: "Krona taxonomy database containing a single file: taxonomy.tab, or possibly just a compressed taxonomy.tab",
          patterns: ["*.tab.zst", "*.tab.gz", "*.tab", "*.tar.gz", "*.tar.lz4", "*.tar.bz2", "*.tar.zst"]
        }
        ncbi_taxdump_tgz: {
          description: "An NCBI taxdump.tar.gz file that contains, at the minimum, a nodes.dmp and names.dmp file.",
          patterns: ["*.tar.gz", "*.tar.lz4", "*.tar.bz2", "*.tar.zst"]
        }
    }

    scatter(raw_reads in reads_bams) {
        call reports.fastqc as fastqc_raw {
            input: reads_bam = raw_reads
        }
        call reports.align_and_count as spikein {
            input:
                reads_bam = raw_reads,
                ref_db = spikein_db
        }
    }

    scatter(raw_reads in reads_bams) {
        # separate scatter blocks speeds up the gathers in DNAnexus and provides independent failure blocks
        call metagenomics.kraken2 as kraken2 {
            input:
                reads_bam             = raw_reads,
                kraken2_db_tgz        = kraken2_db_tgz,
                krona_taxonomy_db_tgz = krona_taxonomy_db_kraken2_tgz
        }
        call metagenomics.filter_bam_to_taxa as deplete {
            input:
                classified_bam          = raw_reads,
                classified_reads_txt_gz = kraken2.kraken2_reads_report,
                ncbi_taxonomy_db_tgz    = ncbi_taxdump_tgz,
                exclude_taxa            = true,
                taxonomic_names         = ["Vertebrata"],
                out_filename_suffix     = "hs_depleted"
        }
        call reports.fastqc as fastqc_cleaned {
            input: reads_bam = deplete.bam_filtered_to_taxa
        }
        call metagenomics.filter_bam_to_taxa as filter_acellular {
            input:
                classified_bam          = raw_reads,
                classified_reads_txt_gz = kraken2.kraken2_reads_report,
                ncbi_taxonomy_db_tgz    = ncbi_taxdump_tgz,
                exclude_taxa            = true,
                taxonomic_names         = ["Vertebrata", "other sequences", "Bacteria"],
                out_filename_suffix     = "acellular"
        }
    }

    scatter(clean_reads in filter_acellular.bam_filtered_to_taxa) {
        call read_utils.rmdup_ubam {
           input:
                reads_unmapped_bam = clean_reads
        }
        call assembly.assemble as spades {
            input:
                reads_unmapped_bam = rmdup_ubam.dedup_bam,
                trim_clip_db       = trim_clip_db,
                always_succeed     = true
        }
    }

    call reports.MultiQC as multiqc_raw {
        input:
            input_files = fastqc_raw.fastqc_zip,
            file_name   = "multiqc-raw.html"
    }

    call reports.MultiQC as multiqc_cleaned {
        input:
            input_files = fastqc_cleaned.fastqc_zip,
            file_name   = "multiqc-cleaned.html"
    }

    call reports.MultiQC as multiqc_dedup {
        input:
            input_files = rmdup_ubam.dedup_fastqc_zip,
            file_name   = "multiqc-dedup.html"
    }

    call reports.align_and_count_summary as spike_summary {
        input:
            counts_txt = spikein.report
    }

    call reports.aggregate_metagenomics_reports as metag_summary_report {
        input:
            kraken_summary_reports = kraken2.kraken2_summary_report
    }

    call metagenomics.krona as krona_merge_kraken2 {
        input:
            reports_txt_gz        = kraken2.kraken2_summary_report,
            krona_taxonomy_db_tgz = krona_taxonomy_db_kraken2_tgz,
            input_type            = "kraken2",
            out_basename          = "merged-kraken2.krona"
    }

    output {
        Array[File] cleaned_reads_unaligned_bams    = deplete.bam_filtered_to_taxa
        Array[File] deduplicated_reads_unaligned    = rmdup_ubam.dedup_bam
        Array[File] contigs_fastas                  = spades.contigs_fasta
        
        Array[Int]  read_counts_raw                 = deplete.classified_taxonomic_filter_read_count_pre
        Array[Int]  read_counts_depleted            = deplete.classified_taxonomic_filter_read_count_post
        Array[Int]  read_counts_dedup               = rmdup_ubam.dedup_read_count_post
        Array[Int]  read_counts_prespades_subsample = spades.subsample_read_count
        
        File        multiqc_report_raw              = multiqc_raw.multiqc_report
        File        multiqc_report_cleaned          = multiqc_cleaned.multiqc_report
        File        multiqc_report_dedup            = multiqc_dedup.multiqc_report
        File        spikein_counts                  = spike_summary.count_summary
        File        kraken2_merged_krona            = krona_merge_kraken2.krona_report_html
        File        kraken2_summary                 = metag_summary_report.krakenuniq_aggregate_taxlevel_summary
        
        Array[File] kraken2_summary_reports         = kraken2.kraken2_summary_report
        Array[File] kraken2_krona_by_sample         = kraken2.krona_report_html
        
        String      kraken2_viral_classify_version  = kraken2.viralngs_version[0]
        String      deplete_viral_classify_version  = deplete.viralngs_version[0]
        String      spades_viral_assemble_version   = spades.viralngs_version[0]
    }
}