WDL Atlas - Index

Overview

105

Internal WDL Files

Workflows

174

Tasks

🔍 Search

🔍

Workflows

Main workflow files that orchestrate tasks and subworkflows.

WORKFLOW align_and_count_multiple_report - pipes/WDL/workflows/align_and_count_multiple_report.wdl
Count the number of times reads map to provided reference sequences. Useful for counting spike-ins, etc.
WORKFLOW align_and_count_report - pipes/WDL/workflows/align_and_count.wdl
Align reads to reference with minimap2 and count the number of hits. Results are returned in the format of 'samtools idxstats'.
WORKFLOW align_and_plot - pipes/WDL/workflows/align_and_plot.wdl
Align reads to reference and produce coverage plots and statistics.
WORKFLOW amplicon16S_analysis - pipes/WDL/workflows/classify_qiime2_multi.wdl
Running 16S amplicon (from BAM format) sequencing analysis with qiime.
WORKFLOW assemble_denovo - pipes/WDL/workflows/assemble_denovo.wdl
Assisted de novo viral genome assembly from raw reads.
WORKFLOW assemble_denovo_metagenomic - pipes/WDL/workflows/assemble_denovo_metagenomic.wdl
Performs viral de novo assembly on metagenomic reads against a large range of possible reference genomes. Runs raw reads through taxonomic classification (Kraken2), human read depletion (based on Kraken2), de novo assembly (SPAdes), and FASTQC/multiQC of reads. Scaffold de novo contigs against a set of possible references and subsequently polish with reads. This workflow accepts a very large set of input reference genomes. It will subset the reference genomes to those with ANI hits to the provided contigs/MAGs and cluster the reference hits by any ANI similarity to each other. It will choose the top reference from each cluster and produce one assembly for each cluster. This is intended to allow for the presence of multiple diverse viral taxa (coinfections) while forcing a choice of the best assembly from groups of related reference genomes.
WORKFLOW assemble_refbased - pipes/WDL/workflows/assemble_refbased.wdl (used by 7 workflows)
Reference-based microbial consensus calling. Aligns NGS reads to a singular reference genome, calls a new consensus sequence, and emits: new assembly, reads aligned to provided reference, reads aligned to new assembly, various figures of merit, plots, and QC metrics. The user may provide unaligned reads spread across multiple input files and this workflow will parallelize alignment per input file before merging results prior to consensus calling.
WORKFLOW augur_export_only - pipes/WDL/workflows/augur_export_only.wdl
Convert a newick formatted phylogenetic tree with other config settings and node values into a json suitable for auspice visualization. See https://nextstrain-augur.readthedocs.io/en/stable/usage/cli/export.html
WORKFLOW augur_from_assemblies - pipes/WDL/workflows/augur_from_assemblies.wdl
Align assemblies, build trees, and convert to json representation suitable for Nextstrain visualization. See https://nextstrain.org/docs/getting-started/ and https://nextstrain-augur.readthedocs.io/en/stable/
WORKFLOW augur_from_beast_mcc - pipes/WDL/workflows/augur_from_beast_mcc.wdl
Visualize BEAST output with Nextstrain. This workflow converts a BEAST MCC tree (.tree file) into an Auspice v2 json file. See https://nextstrain-augur.readthedocs.io/en/stable/faq/import-beast.html for details.
WORKFLOW augur_from_mltree - pipes/WDL/workflows/augur_from_mltree.wdl
Take a premade maximum likelihood tree (Newick format) and run the remainder of the augur pipeline (timetree modificaitons, ancestral inference, etc) and convert to json representation suitable for Nextstrain visualization. See https://nextstrain.org/docs/getting-started/ and https://nextstrain-augur.readthedocs.io/en/stable/
WORKFLOW augur_from_msa - pipes/WDL/workflows/augur_from_msa.wdl
Build trees, and convert to json representation suitable for Nextstrain visualization. See https://nextstrain.org/docs/getting-started/ and https://nextstrain-augur.readthedocs.io/en/stable/
WORKFLOW augur_from_msa_with_subsampler - pipes/WDL/workflows/augur_from_msa_with_subsampler.wdl
Build trees, and convert to json representation suitable for Nextstrain visualization. See https://nextstrain.org/docs/getting-started/ and https://nextstrain-augur.readthedocs.io/en/stable/
WORKFLOW bams_multiqc - pipes/WDL/workflows/bams_multiqc.wdl
Run FastQC on a set of BAM files, and then MultiQC to summarize all outputs.
WORKFLOW beast_gpu - pipes/WDL/workflows/beast_gpu.wdl
Runs BEAST (v1) on a GPU instance. Use with care--this can be expensive if run incorrectly.
WORKFLOW calc_bam_read_depths - pipes/WDL/workflows/calc_bam_read_depths.wdl
Generates read depth tables.
WORKFLOW chunk_megablast - pipes/WDL/workflows/megablast_chunk.wdl
Chunk megablast function
WORKFLOW classify_kaiju - pipes/WDL/workflows/classify_kaiju.wdl
Taxonomic classification of reads with kaiju.
WORKFLOW classify_kraken2 - pipes/WDL/workflows/classify_kraken2.wdl
Taxonomic classification of sequences via kraken2 (or kraken2x, depending on the database provided).
WORKFLOW classify_krakenuniq - pipes/WDL/workflows/classify_krakenuniq.wdl
Taxonomic classification of reads using krakenuniq v1.
WORKFLOW classify_multi - pipes/WDL/workflows/classify_multi.wdl
Runs raw reads through taxonomic classification (Kraken2), human read depletion (based on Kraken2), de novo assembly (SPAdes), and FASTQC/multiQC of reads.
WORKFLOW classify_single - pipes/WDL/workflows/classify_single.wdl
Runs raw reads through taxonomic classification (Kraken2), human read depletion (based on Kraken2), de novo assembly (SPAdes), and FASTQC/multiQC of reads.
WORKFLOW coverage_table - pipes/WDL/workflows/coverage_table.wdl
WORKFLOW demux_deplete - pipes/WDL/workflows/demux_deplete.wdl (used by 1 workflow)
Picard-based demultiplexing and basecalling from a tarball of a raw BCL directory, followed by QC metrics, depletion, and SRA submission prep.
WORKFLOW demux_metadata_only - pipes/WDL/workflows/demux_metadata_only.wdl
Picard-based demultiplexing and basecalling from a tarball of a raw BCL directory, followed by QC metrics, depletion, and SRA submission prep.
WORKFLOW demux_only - pipes/WDL/workflows/demux_only.wdl
Picard-based demultiplexing and basecalling from a tarball of a raw BCL directory.
WORKFLOW demux_plus - pipes/WDL/workflows/demux_plus.wdl
Picard-based demultiplexing and basecalling from a tarball of a raw BCL directory, followed by basic metagenomics and QC metrics. Intended for automatic triggering post upload on DNAnexus.
WORKFLOW deplete_only - pipes/WDL/workflows/deplete_only.wdl
Taxonomic depletion of reads matching unwanted taxa (such as human).
WORKFLOW detect_cross_contamination - pipes/WDL/workflows/detect_cross_contamination.wdl
Detect cross-contamination between samples using consensus-level and sub-consensus variation.
WORKFLOW detect_cross_contamination_precalled_vcfs - pipes/WDL/workflows/detect_cross_contamination_precalled_vcfs.wdl
Detect cross-contamination between samples using consensus-level and sub-consensus variation, from consensus genomes and pre-called LoFreq vcf files.
WORKFLOW diff_genome_sets - pipes/WDL/workflows/diff_genome_sets.wdl
WORKFLOW downsample - pipes/WDL/workflows/downsample.wdl
Random subsampling of reads.
WORKFLOW dump_gcloud_env_info - pipes/WDL/workflows/dump_gcloud_env_info.wdl
Write system and gcloud environment info to output files.
WORKFLOW fastq_to_ubam - pipes/WDL/workflows/fastq_to_ubam.wdl
Convert reads from fastq format (single or paired) to unaligned BAM format.
WORKFLOW fetch_annotations - pipes/WDL/workflows/fetch_annotations.wdl
WORKFLOW fetch_sra_to_bam - pipes/WDL/workflows/fetch_sra_to_bam.wdl
Retrieve reads from the NCBI Short Read Archive in unaligned BAM format with relevant metadata encoded.
WORKFLOW filter_classified_bam_to_taxa - pipes/WDL/workflows/filter_classified_bam_to_taxa.wdl
Taxonomic filtration of reads utilizing output from a classifier such as kraken1/2/uniq. Can filter out or filter to a specified taxonomic grouping.
WORKFLOW filter_sequences - pipes/WDL/workflows/filter_sequences.wdl
Filter and subsample a sequence set. See https://nextstrain-augur.readthedocs.io/en/stable/usage/cli/filter.html
WORKFLOW genbank_gather - pipes/WDL/workflows/genbank_gather.wdl
Consolidate all genbank submission files for individual genomes into bulk submission packages grouped by submission pathway.
WORKFLOW genbank_single - pipes/WDL/workflows/genbank_single.wdl
Prepare assemblies for Genbank submission. This includes annotation by simple coordinate transfer from Genbank annotations and a multiple alignment. See https://viral-pipelines.readthedocs.io/en/latest/ncbi_submission.html for details.
WORKFLOW isnvs_lofreq - pipes/WDL/workflows/isnvs_lofreq.wdl
variant calls by LoFreq against reference_fasta
WORKFLOW isnvs_merge_to_vcf - pipes/WDL/workflows/isnvs_merge_to_vcf.wdl
WORKFLOW isnvs_one_sample - pipes/WDL/workflows/isnvs_one_sample.wdl
Intrahost variant calling with V-Phaser2. Requires an assembled genome and a BAM of aligned reads against that same genome.
WORKFLOW kraken2_build - pipes/WDL/workflows/kraken2_build.wdl
Build a Kraken2 (or 2X) database.
WORKFLOW mafft - pipes/WDL/workflows/mafft.wdl
MAFFT multiple-alignment for a set of possibly multi-segment genomes.
WORKFLOW mafft_and_snp - pipes/WDL/workflows/mafft_and_snp.wdl
Align assemblies with mafft and find SNPs with snp-sites.
WORKFLOW mafft_and_snp_annotated - pipes/WDL/workflows/mafft_and_snp_annotated.wdl
Align assemblies with mafft and find SNPs with snp-sites.
WORKFLOW mafft_and_trim - pipes/WDL/workflows/mafft_and_trim.wdl
MAFFT based multiple alignment followed by trimal-based edge trimming.
WORKFLOW megablast - pipes/WDL/workflows/blastoff.wdl
WORKFLOW merge_bams - pipes/WDL/workflows/merge_bams.wdl
Merge, reheader, or merge-and-reheader BAM files.
WORKFLOW merge_metagenomics - pipes/WDL/workflows/merge_metagenomics.wdl
Combine metagenomic reports from single samples into an aggregate report.
WORKFLOW merge_tar_chunks - pipes/WDL/workflows/merge_tar_chunks.wdl
Combine multiple tar files (possibly compressed by gzip, bz2, lz4, zstd, etc) into a single tar file. Originally meant for combining streaming upload chunks from a sequencing run.
WORKFLOW merge_vcfs - pipes/WDL/workflows/merge_vcfs.wdl
Merge VCFs from multiple samples using GATK3.
WORKFLOW merge_vcfs_and_annotate - pipes/WDL/workflows/merge_vcfs_and_annotate.wdl
Merge VCFs emitted by GATK UnifiedGenotyper and annotate with snpEff.
WORKFLOW metagenomic_denovo - pipes/WDL/workflows/metagenomic_denovo.wdl
Assisted de novo viral genome assembly (SPAdes, scaffolding, and polishing) from metagenomic raw reads. Runs raw reads through taxonomic classification (Kraken2), human read depletion (based on Kraken2 and optionally using BWA, BLASTN, and/or BMTAGGER databases), and FASTQC/multiQC of reads.
WORKFLOW multiqc_only - pipes/WDL/workflows/multiqc_only.wdl
Combine multiple FastQC reports into a single MultiQC summary.
WORKFLOW nextclade_single - pipes/WDL/workflows/nextclade_single.wdl
Run Nextclade on a single genome
WORKFLOW populate_library_and_sample_tables_from_flowcell - pipes/WDL/workflows/populate_library_and_sample_tables_from_flowcell.wdl
Terra only: Populate per-library-lane and per-sample tables from existing demultiplexed flowcell output
WORKFLOW qiime_import_bam - pipes/WDL/workflows/bam_to_qiime.wdl
Importing BAM files into QIIME
WORKFLOW reconstruct_from_alignments - pipes/WDL/workflows/reconstruct_from_alignments.wdl
Infer disease transmission events from sequence (consensus + intrahost variation) data using the reconstructR tool
WORKFLOW sarscov2_batch_relineage - pipes/WDL/workflows/sarscov2_batch_relineage.wdl (used by 1 workflow)
Re-call Nextclade and Pangolin lineages on a flowcell's worth of SARS-CoV-2 genomes
WORKFLOW sarscov2_biosample_load - pipes/WDL/workflows/sarscov2_biosample_load.wdl (used by 1 workflow)
Load Broad CRSP metadata and register samples with NCBI BioSample. Return attributes table, id map, etc.
WORKFLOW sarscov2_data_release - pipes/WDL/workflows/sarscov2_data_release.wdl
Submit data bundles to databases and repositories
WORKFLOW sarscov2_genbank - pipes/WDL/workflows/sarscov2_genbank.wdl
Prepare SARS-CoV-2 assemblies for Genbank submission. This includes QC checks with NCBI's VADR tool and filters out genomes that do not pass its tests.
WORKFLOW sarscov2_gisaid_ingest - pipes/WDL/workflows/sarscov2_gisaid_ingest.wdl
Sanitize data downloaded from GISAID for use in Nextstrain/augur. See: https://nextstrain.github.io/ncov/data-prep#curate-data-from-the-full-gisaid-database
WORKFLOW sarscov2_illumina_full - pipes/WDL/workflows/sarscov2_illumina_full.wdl
Full SARS-CoV-2 analysis workflow starting from raw Illumina flowcell (tar.gz) and metadata and performing assembly, spike-in analysis, qc, lineage assignment, and packaging for data release.
WORKFLOW sarscov2_lineages - pipes/WDL/workflows/sarscov2_lineages.wdl (used by 1 workflow)
Call Nextclade and Pangolin lineages on a single SARS-CoV-2 genome
WORKFLOW sarscov2_nextclade_multi - pipes/WDL/workflows/sarscov2_nextclade_multi.wdl
Create Nextclade visualizations on many SARS-CoV-2 genomes
WORKFLOW sarscov2_nextstrain - pipes/WDL/workflows/sarscov2_nextstrain.wdl
Align assemblies, build trees, and convert to json representation suitable for Nextstrain visualization. See https://nextstrain.org/docs/getting-started/ and https://nextstrain-augur.readthedocs.io/en/stable/
WORKFLOW sarscov2_nextstrain_aligned_input - pipes/WDL/workflows/sarscov2_nextstrain_aligned_input.wdl
Take aligned assemblies, build trees, and convert to json representation suitable for Nextstrain visualization. See https://nextstrain.org/docs/getting-started/ and https://nextstrain-augur.readthedocs.io/en/stable/
WORKFLOW sarscov2_sequencing_reports - pipes/WDL/workflows/sarscov2_sequencing_reports.wdl
Produce per-state and per-collaborator weekly reports of SARS-CoV-2 surveillance data.
WORKFLOW sarscov2_sra_to_genbank - pipes/WDL/workflows/sarscov2_sra_to_genbank.wdl
Full SARS-CoV-2 analysis workflow starting from SRA data and metadata and performing assembly, spike-in analysis, qc, lineage assignment, and packaging assemblies for data release.
WORKFLOW scaffold_and_refine - pipes/WDL/workflows/scaffold_and_refine.wdl
Scaffold de novo contigs against a set of possible references and subsequently polish with reads.
WORKFLOW scaffold_and_refine_multitaxa - pipes/WDL/workflows/scaffold_and_refine_multitaxa.wdl
Scaffold de novo contigs against a set of possible references and subsequently polish with reads. This workflow accepts a very large set of input reference genomes. It will subset the reference genomes to those with ANI hits to the provided contigs/MAGs and cluster the reference hits by any ANI similarity to each other. It will choose the top reference from each cluster and produce one assembly for each cluster. This is intended to allow for the presence of multiple diverse viral taxa (coinfections) while forcing a choice of the best assembly from groups of related reference genomes.
WORKFLOW simulate_illumina_reads - pipes/WDL/workflows/simulate_illumina_reads.wdl
Generate synthetic Illumina read sets for testing using wgsim. Takes a space-separated string of colon-separated pairs where each pair consists of a GenBank accession and a coverage value (e.g., 'KJ660346.2:12.5x NC_004296.1:0.9X'), downloads the sequences, and simulates Illumina reads.
WORKFLOW submit_biosample - pipes/WDL/workflows/submit_biosample.wdl
Register samples with NCBI BioSample. Return attributes table.
WORKFLOW submit_genbank - pipes/WDL/workflows/submit_genbank.wdl
Submit FTP-eligible genomes to NCBI Genbank (currently only flu A/B/C and SARS-CoV-2)
WORKFLOW submit_sra - pipes/WDL/workflows/submit_sra.wdl
Submit reads to SRA
WORKFLOW subsample_by_metadata - pipes/WDL/workflows/subsample_by_metadata.wdl
Filter and subsample a sequence set. See https://nextstrain-augur.readthedocs.io/en/stable/usage/cli/filter.html
WORKFLOW subsample_by_metadata_with_focal - pipes/WDL/workflows/subsample_by_metadata_with_focal.wdl
Filter and subsample a global sequence set with a bias towards a geographic area of interest.
WORKFLOW subsampler_only - pipes/WDL/workflows/subsample_by_casecounts.wdl
WORKFLOW terra_table_to_tsv - pipes/WDL/workflows/terra_table_to_tsv.wdl
Download data table in Terra workspace to tsv file.
WORKFLOW terra_tsv_to_table - pipes/WDL/workflows/terra_tsv_to_table.wdl
Upload tsv files to Terra data table: insert-or-update on existing rows/columns
WORKFLOW trimal - pipes/WDL/workflows/trimal.wdl
Trim a multiple sequence alignment with Trimal.
WORKFLOW unpack_archive_to_bucket - pipes/WDL/workflows/unpack_archive_to_bucket.wdl
Unpack archive(s) to a target location within a Google Storage bucket
WORKFLOW update_data_tables - pipes/WDL/workflows/terra_update_assemblies.wdl
Create data tables in Terra workspace from provided tsv load file.

Mixed Files (Workflow + Tasks)

Files containing both workflow definition and task implementations.

MIXED CreateEntericsQCViz - pipes/WDL/workflows/create_enterics_qc_viz.wdl (1 task)
MIXED CreateEntericsQCVizGeneral - pipes/WDL/workflows/create_enterics_qc_viz_general.wdl (1 task)
MIXED genbank_dump - pipes/WDL/workflows/sarscov2_genbank_ingest.wdl (1 task)

Task Libraries

Files containing reusable tasks.

TASKS tasks_16S_amplicon - pipes/WDL/tasks/tasks_16S_amplicon.wdl (6 tasks)
Parsing demultiplexed fastq BAM files into qiime readable files.
TASKS tasks_assembly - pipes/WDL/tasks/tasks_assembly.wdl (10 tasks)
TASKS tasks_demux - pipes/WDL/tasks/tasks_demux.wdl (6 tasks)
TASKS tasks_interhost - pipes/WDL/tasks/tasks_interhost.wdl (9 tasks)
Run subsampler to get downsampled dataset and metadata proportional to epidemiological case counts.
TASKS tasks_intrahost - pipes/WDL/tasks/tasks_intrahost.wdl (5 tasks)
TASKS tasks_megablast - pipes/WDL/tasks/tasks_megablast.wdl (4 tasks)
Trim reads via trimmomatic, remove duplicate reads, and subsample to a desired read count (default of 100,000), bam in, bam out.
TASKS tasks_metagenomics - pipes/WDL/tasks/tasks_metagenomics.wdl (11 tasks)
Runs Krakenuniq classification
TASKS tasks_ncbi - pipes/WDL/tasks/tasks_ncbi.wdl (21 tasks)
TASKS tasks_ncbi_tools - pipes/WDL/tasks/tasks_ncbi_tools.wdl (10 tasks)
This searches NCBI SRA for accessions using the Entrez interface, collects associated metadata, and returns read sets as unaligned BAM files with metadata loaded in. Useful metadata from BioSample is also output from this task directly. This has been tested with both SRA and ENA accessions. This queries the NCBI production database, and as such, the output of this task is non-deterministic given the same input.
TASKS tasks_nextstrain - pipes/WDL/tasks/tasks_nextstrain.wdl (26 tasks)
TASKS tasks_read_utils - pipes/WDL/tasks/tasks_read_utils.wdl (9 tasks)
TASKS tasks_reports - pipes/WDL/tasks/tasks_reports.wdl (11 tasks)
Produce various standard metrics and coverage plots via Picard and Samtools for aligned BAM files.
TASKS tasks_sarscov2 - pipes/WDL/tasks/tasks_sarscov2.wdl (6 tasks)
Pangolin classification of one SARS-CoV-2 sample.
TASKS tasks_taxon_filter - pipes/WDL/tasks/tasks_taxon_filter.wdl (4 tasks)
Runs a full human read depletion pipeline and removes PCR duplicates. Input database files (bmtaggerDbs, blastDbs, bwaDbs) may be any combination of: .fasta, .fasta.gz, or tarred up indexed fastas (using the software's indexing method) as .tar.gz, .tar.bz2, .tar.lz4, or .tar.zst.
TASKS tasks_terra - pipes/WDL/tasks/tasks_terra.wdl (6 tasks)
gcloud storage cp without additional authentication only works on Terra
TASKS tasks_utils - pipes/WDL/tasks/tasks_utils.wdl (27 tasks)
This is nothing more than unix cat.