Overview
105
Internal WDL Files
89
Workflows
174
Tasks
🔍 Search
Workflows
Main workflow files that orchestrate tasks and subworkflows.
-
WORKFLOW
align_and_count_multiple_report
-
pipes/WDL/workflows/align_and_count_multiple_report.wdl
Count the number of times reads map to provided reference sequences. Useful for counting spike-ins, etc. -
WORKFLOW
align_and_count_report
-
pipes/WDL/workflows/align_and_count.wdl
Align reads to reference with minimap2 and count the number of hits. Results are returned in the format of 'samtools idxstats'. -
WORKFLOW
align_and_plot
-
pipes/WDL/workflows/align_and_plot.wdl
Align reads to reference and produce coverage plots and statistics. -
WORKFLOW
amplicon16S_analysis
-
pipes/WDL/workflows/classify_qiime2_multi.wdl
Running 16S amplicon (from BAM format) sequencing analysis with qiime. -
WORKFLOW
assemble_denovo
-
pipes/WDL/workflows/assemble_denovo.wdl
Assisted de novo viral genome assembly from raw reads. -
WORKFLOW
assemble_denovo_metagenomic
-
pipes/WDL/workflows/assemble_denovo_metagenomic.wdl
Performs viral de novo assembly on metagenomic reads against a large range of possible reference genomes. Runs raw reads through taxonomic classification (Kraken2), human read depletion (based on Kraken2), de novo assembly (SPAdes), and FASTQC/multiQC of reads. Scaffold de novo contigs against a set of possible references and subsequently polish with reads. This workflow accepts a very large set of input reference genomes. It will subset the reference genomes to those with ANI hits to the provided contigs/MAGs and cluster the reference hits by any ANI similarity to each other. It will choose the top reference from each cluster and produce one assembly for each cluster. This is intended to allow for the presence of multiple diverse viral taxa (coinfections) while forcing a choice of the best assembly from groups of related reference genomes. -
WORKFLOW
assemble_refbased
-
pipes/WDL/workflows/assemble_refbased.wdl(used by 7 workflows)
Reference-based microbial consensus calling. Aligns NGS reads to a singular reference genome, calls a new consensus sequence, and emits: new assembly, reads aligned to provided reference, reads aligned to new assembly, various figures of merit, plots, and QC metrics. The user may provide unaligned reads spread across multiple input files and this workflow will parallelize alignment per input file before merging results prior to consensus calling. -
WORKFLOW
augur_export_only
-
pipes/WDL/workflows/augur_export_only.wdl
Convert a newick formatted phylogenetic tree with other config settings and node values into a json suitable for auspice visualization. See https://nextstrain-augur.readthedocs.io/en/stable/usage/cli/export.html -
WORKFLOW
augur_from_assemblies
-
pipes/WDL/workflows/augur_from_assemblies.wdl
Align assemblies, build trees, and convert to json representation suitable for Nextstrain visualization. See https://nextstrain.org/docs/getting-started/ and https://nextstrain-augur.readthedocs.io/en/stable/ -
WORKFLOW
augur_from_beast_mcc
-
pipes/WDL/workflows/augur_from_beast_mcc.wdl
Visualize BEAST output with Nextstrain. This workflow converts a BEAST MCC tree (.tree file) into an Auspice v2 json file. See https://nextstrain-augur.readthedocs.io/en/stable/faq/import-beast.html for details. -
WORKFLOW
augur_from_mltree
-
pipes/WDL/workflows/augur_from_mltree.wdl
Take a premade maximum likelihood tree (Newick format) and run the remainder of the augur pipeline (timetree modificaitons, ancestral inference, etc) and convert to json representation suitable for Nextstrain visualization. See https://nextstrain.org/docs/getting-started/ and https://nextstrain-augur.readthedocs.io/en/stable/ -
WORKFLOW
augur_from_msa
-
pipes/WDL/workflows/augur_from_msa.wdl
Build trees, and convert to json representation suitable for Nextstrain visualization. See https://nextstrain.org/docs/getting-started/ and https://nextstrain-augur.readthedocs.io/en/stable/ -
WORKFLOW
augur_from_msa_with_subsampler
-
pipes/WDL/workflows/augur_from_msa_with_subsampler.wdl
Build trees, and convert to json representation suitable for Nextstrain visualization. See https://nextstrain.org/docs/getting-started/ and https://nextstrain-augur.readthedocs.io/en/stable/ -
WORKFLOW
bams_multiqc
-
pipes/WDL/workflows/bams_multiqc.wdl
Run FastQC on a set of BAM files, and then MultiQC to summarize all outputs. -
WORKFLOW
beast_gpu
-
pipes/WDL/workflows/beast_gpu.wdl
Runs BEAST (v1) on a GPU instance. Use with care--this can be expensive if run incorrectly. -
WORKFLOW
calc_bam_read_depths
-
pipes/WDL/workflows/calc_bam_read_depths.wdl
Generates read depth tables. -
WORKFLOW
chunk_megablast
-
pipes/WDL/workflows/megablast_chunk.wdl
Chunk megablast function -
WORKFLOW
classify_kaiju
-
pipes/WDL/workflows/classify_kaiju.wdl
Taxonomic classification of reads with kaiju. -
WORKFLOW
classify_kraken2
-
pipes/WDL/workflows/classify_kraken2.wdl
Taxonomic classification of sequences via kraken2 (or kraken2x, depending on the database provided). -
WORKFLOW
classify_krakenuniq
-
pipes/WDL/workflows/classify_krakenuniq.wdl
Taxonomic classification of reads using krakenuniq v1. -
WORKFLOW
classify_multi
-
pipes/WDL/workflows/classify_multi.wdl
Runs raw reads through taxonomic classification (Kraken2), human read depletion (based on Kraken2), de novo assembly (SPAdes), and FASTQC/multiQC of reads. -
WORKFLOW
classify_single
-
pipes/WDL/workflows/classify_single.wdl
Runs raw reads through taxonomic classification (Kraken2), human read depletion (based on Kraken2), de novo assembly (SPAdes), and FASTQC/multiQC of reads. -
WORKFLOW
coverage_table
-
pipes/WDL/workflows/coverage_table.wdl -
WORKFLOW
demux_deplete
-
pipes/WDL/workflows/demux_deplete.wdl(used by 1 workflow)
Picard-based demultiplexing and basecalling from a tarball of a raw BCL directory, followed by QC metrics, depletion, and SRA submission prep. -
WORKFLOW
demux_metadata_only
-
pipes/WDL/workflows/demux_metadata_only.wdl
Picard-based demultiplexing and basecalling from a tarball of a raw BCL directory, followed by QC metrics, depletion, and SRA submission prep. -
WORKFLOW
demux_only
-
pipes/WDL/workflows/demux_only.wdl
Picard-based demultiplexing and basecalling from a tarball of a raw BCL directory. -
WORKFLOW
demux_plus
-
pipes/WDL/workflows/demux_plus.wdl
Picard-based demultiplexing and basecalling from a tarball of a raw BCL directory, followed by basic metagenomics and QC metrics. Intended for automatic triggering post upload on DNAnexus. -
WORKFLOW
deplete_only
-
pipes/WDL/workflows/deplete_only.wdl
Taxonomic depletion of reads matching unwanted taxa (such as human). -
WORKFLOW
detect_cross_contamination
-
pipes/WDL/workflows/detect_cross_contamination.wdl
Detect cross-contamination between samples using consensus-level and sub-consensus variation. -
WORKFLOW
detect_cross_contamination_precalled_vcfs
-
pipes/WDL/workflows/detect_cross_contamination_precalled_vcfs.wdl
Detect cross-contamination between samples using consensus-level and sub-consensus variation, from consensus genomes and pre-called LoFreq vcf files. -
WORKFLOW
diff_genome_sets
-
pipes/WDL/workflows/diff_genome_sets.wdl -
WORKFLOW
downsample
-
pipes/WDL/workflows/downsample.wdl
Random subsampling of reads. -
WORKFLOW
dump_gcloud_env_info
-
pipes/WDL/workflows/dump_gcloud_env_info.wdl
Write system and gcloud environment info to output files. -
WORKFLOW
fastq_to_ubam
-
pipes/WDL/workflows/fastq_to_ubam.wdl
Convert reads from fastq format (single or paired) to unaligned BAM format. -
WORKFLOW
fetch_annotations
-
pipes/WDL/workflows/fetch_annotations.wdl -
WORKFLOW
fetch_sra_to_bam
-
pipes/WDL/workflows/fetch_sra_to_bam.wdl
Retrieve reads from the NCBI Short Read Archive in unaligned BAM format with relevant metadata encoded. -
WORKFLOW
filter_classified_bam_to_taxa
-
pipes/WDL/workflows/filter_classified_bam_to_taxa.wdl
Taxonomic filtration of reads utilizing output from a classifier such as kraken1/2/uniq. Can filter out or filter to a specified taxonomic grouping. -
WORKFLOW
filter_sequences
-
pipes/WDL/workflows/filter_sequences.wdl
Filter and subsample a sequence set. See https://nextstrain-augur.readthedocs.io/en/stable/usage/cli/filter.html -
WORKFLOW
genbank_gather
-
pipes/WDL/workflows/genbank_gather.wdl
Consolidate all genbank submission files for individual genomes into bulk submission packages grouped by submission pathway. -
WORKFLOW
genbank_single
-
pipes/WDL/workflows/genbank_single.wdl
Prepare assemblies for Genbank submission. This includes annotation by simple coordinate transfer from Genbank annotations and a multiple alignment. See https://viral-pipelines.readthedocs.io/en/latest/ncbi_submission.html for details. -
WORKFLOW
isnvs_lofreq
-
pipes/WDL/workflows/isnvs_lofreq.wdl
variant calls by LoFreq against reference_fasta -
WORKFLOW
isnvs_merge_to_vcf
-
pipes/WDL/workflows/isnvs_merge_to_vcf.wdl -
WORKFLOW
isnvs_one_sample
-
pipes/WDL/workflows/isnvs_one_sample.wdl
Intrahost variant calling with V-Phaser2. Requires an assembled genome and a BAM of aligned reads against that same genome. -
WORKFLOW
kraken2_build
-
pipes/WDL/workflows/kraken2_build.wdl
Build a Kraken2 (or 2X) database. -
WORKFLOW
mafft
-
pipes/WDL/workflows/mafft.wdl
MAFFT multiple-alignment for a set of possibly multi-segment genomes. -
WORKFLOW
mafft_and_snp
-
pipes/WDL/workflows/mafft_and_snp.wdl
Align assemblies with mafft and find SNPs with snp-sites. -
WORKFLOW
mafft_and_snp_annotated
-
pipes/WDL/workflows/mafft_and_snp_annotated.wdl
Align assemblies with mafft and find SNPs with snp-sites. -
WORKFLOW
mafft_and_trim
-
pipes/WDL/workflows/mafft_and_trim.wdl
MAFFT based multiple alignment followed by trimal-based edge trimming. -
WORKFLOW
megablast
-
pipes/WDL/workflows/blastoff.wdl -
WORKFLOW
merge_bams
-
pipes/WDL/workflows/merge_bams.wdl
Merge, reheader, or merge-and-reheader BAM files. -
WORKFLOW
merge_metagenomics
-
pipes/WDL/workflows/merge_metagenomics.wdl
Combine metagenomic reports from single samples into an aggregate report. -
WORKFLOW
merge_tar_chunks
-
pipes/WDL/workflows/merge_tar_chunks.wdl
Combine multiple tar files (possibly compressed by gzip, bz2, lz4, zstd, etc) into a single tar file. Originally meant for combining streaming upload chunks from a sequencing run. -
WORKFLOW
merge_vcfs
-
pipes/WDL/workflows/merge_vcfs.wdl
Merge VCFs from multiple samples using GATK3. -
WORKFLOW
merge_vcfs_and_annotate
-
pipes/WDL/workflows/merge_vcfs_and_annotate.wdl
Merge VCFs emitted by GATK UnifiedGenotyper and annotate with snpEff. -
WORKFLOW
metagenomic_denovo
-
pipes/WDL/workflows/metagenomic_denovo.wdl
Assisted de novo viral genome assembly (SPAdes, scaffolding, and polishing) from metagenomic raw reads. Runs raw reads through taxonomic classification (Kraken2), human read depletion (based on Kraken2 and optionally using BWA, BLASTN, and/or BMTAGGER databases), and FASTQC/multiQC of reads. -
WORKFLOW
multiqc_only
-
pipes/WDL/workflows/multiqc_only.wdl
Combine multiple FastQC reports into a single MultiQC summary. -
WORKFLOW
nextclade_single
-
pipes/WDL/workflows/nextclade_single.wdl
Run Nextclade on a single genome -
WORKFLOW
populate_library_and_sample_tables_from_flowcell
-
pipes/WDL/workflows/populate_library_and_sample_tables_from_flowcell.wdl
Terra only: Populate per-library-lane and per-sample tables from existing demultiplexed flowcell output -
WORKFLOW
qiime_import_bam
-
pipes/WDL/workflows/bam_to_qiime.wdl
Importing BAM files into QIIME -
WORKFLOW
reconstruct_from_alignments
-
pipes/WDL/workflows/reconstruct_from_alignments.wdl
Infer disease transmission events from sequence (consensus + intrahost variation) data using the reconstructR tool -
WORKFLOW
sarscov2_batch_relineage
-
pipes/WDL/workflows/sarscov2_batch_relineage.wdl(used by 1 workflow)
Re-call Nextclade and Pangolin lineages on a flowcell's worth of SARS-CoV-2 genomes -
WORKFLOW
sarscov2_biosample_load
-
pipes/WDL/workflows/sarscov2_biosample_load.wdl(used by 1 workflow)
Load Broad CRSP metadata and register samples with NCBI BioSample. Return attributes table, id map, etc. -
WORKFLOW
sarscov2_data_release
-
pipes/WDL/workflows/sarscov2_data_release.wdl
Submit data bundles to databases and repositories -
WORKFLOW
sarscov2_genbank
-
pipes/WDL/workflows/sarscov2_genbank.wdl
Prepare SARS-CoV-2 assemblies for Genbank submission. This includes QC checks with NCBI's VADR tool and filters out genomes that do not pass its tests. -
WORKFLOW
sarscov2_gisaid_ingest
-
pipes/WDL/workflows/sarscov2_gisaid_ingest.wdl
Sanitize data downloaded from GISAID for use in Nextstrain/augur. See: https://nextstrain.github.io/ncov/data-prep#curate-data-from-the-full-gisaid-database -
WORKFLOW
sarscov2_illumina_full
-
pipes/WDL/workflows/sarscov2_illumina_full.wdl
Full SARS-CoV-2 analysis workflow starting from raw Illumina flowcell (tar.gz) and metadata and performing assembly, spike-in analysis, qc, lineage assignment, and packaging for data release. -
WORKFLOW
sarscov2_lineages
-
pipes/WDL/workflows/sarscov2_lineages.wdl(used by 1 workflow)
Call Nextclade and Pangolin lineages on a single SARS-CoV-2 genome -
WORKFLOW
sarscov2_nextclade_multi
-
pipes/WDL/workflows/sarscov2_nextclade_multi.wdl
Create Nextclade visualizations on many SARS-CoV-2 genomes -
WORKFLOW
sarscov2_nextstrain
-
pipes/WDL/workflows/sarscov2_nextstrain.wdl
Align assemblies, build trees, and convert to json representation suitable for Nextstrain visualization. See https://nextstrain.org/docs/getting-started/ and https://nextstrain-augur.readthedocs.io/en/stable/ -
WORKFLOW
sarscov2_nextstrain_aligned_input
-
pipes/WDL/workflows/sarscov2_nextstrain_aligned_input.wdl
Take aligned assemblies, build trees, and convert to json representation suitable for Nextstrain visualization. See https://nextstrain.org/docs/getting-started/ and https://nextstrain-augur.readthedocs.io/en/stable/ -
WORKFLOW
sarscov2_sequencing_reports
-
pipes/WDL/workflows/sarscov2_sequencing_reports.wdl
Produce per-state and per-collaborator weekly reports of SARS-CoV-2 surveillance data. -
WORKFLOW
sarscov2_sra_to_genbank
-
pipes/WDL/workflows/sarscov2_sra_to_genbank.wdl
Full SARS-CoV-2 analysis workflow starting from SRA data and metadata and performing assembly, spike-in analysis, qc, lineage assignment, and packaging assemblies for data release. -
WORKFLOW
scaffold_and_refine
-
pipes/WDL/workflows/scaffold_and_refine.wdl
Scaffold de novo contigs against a set of possible references and subsequently polish with reads. -
WORKFLOW
scaffold_and_refine_multitaxa
-
pipes/WDL/workflows/scaffold_and_refine_multitaxa.wdl
Scaffold de novo contigs against a set of possible references and subsequently polish with reads. This workflow accepts a very large set of input reference genomes. It will subset the reference genomes to those with ANI hits to the provided contigs/MAGs and cluster the reference hits by any ANI similarity to each other. It will choose the top reference from each cluster and produce one assembly for each cluster. This is intended to allow for the presence of multiple diverse viral taxa (coinfections) while forcing a choice of the best assembly from groups of related reference genomes. -
WORKFLOW
simulate_illumina_reads
-
pipes/WDL/workflows/simulate_illumina_reads.wdl
Generate synthetic Illumina read sets for testing using wgsim. Takes a space-separated string of colon-separated pairs where each pair consists of a GenBank accession and a coverage value (e.g., 'KJ660346.2:12.5x NC_004296.1:0.9X'), downloads the sequences, and simulates Illumina reads. -
WORKFLOW
submit_biosample
-
pipes/WDL/workflows/submit_biosample.wdl
Register samples with NCBI BioSample. Return attributes table. -
WORKFLOW
submit_genbank
-
pipes/WDL/workflows/submit_genbank.wdl
Submit FTP-eligible genomes to NCBI Genbank (currently only flu A/B/C and SARS-CoV-2) -
WORKFLOW
submit_sra
-
pipes/WDL/workflows/submit_sra.wdl
Submit reads to SRA -
WORKFLOW
subsample_by_metadata
-
pipes/WDL/workflows/subsample_by_metadata.wdl
Filter and subsample a sequence set. See https://nextstrain-augur.readthedocs.io/en/stable/usage/cli/filter.html -
WORKFLOW
subsample_by_metadata_with_focal
-
pipes/WDL/workflows/subsample_by_metadata_with_focal.wdl
Filter and subsample a global sequence set with a bias towards a geographic area of interest. -
WORKFLOW
subsampler_only
-
pipes/WDL/workflows/subsample_by_casecounts.wdl -
WORKFLOW
terra_table_to_tsv
-
pipes/WDL/workflows/terra_table_to_tsv.wdl
Download data table in Terra workspace to tsv file. -
WORKFLOW
terra_tsv_to_table
-
pipes/WDL/workflows/terra_tsv_to_table.wdl
Upload tsv files to Terra data table: insert-or-update on existing rows/columns -
WORKFLOW
trimal
-
pipes/WDL/workflows/trimal.wdl
Trim a multiple sequence alignment with Trimal. -
WORKFLOW
unpack_archive_to_bucket
-
pipes/WDL/workflows/unpack_archive_to_bucket.wdl
Unpack archive(s) to a target location within a Google Storage bucket -
WORKFLOW
update_data_tables
-
pipes/WDL/workflows/terra_update_assemblies.wdl
Create data tables in Terra workspace from provided tsv load file.
Mixed Files (Workflow + Tasks)
Files containing both workflow definition and task implementations.
-
MIXED
CreateEntericsQCViz
-
pipes/WDL/workflows/create_enterics_qc_viz.wdl(1 task) -
MIXED
CreateEntericsQCVizGeneral
-
pipes/WDL/workflows/create_enterics_qc_viz_general.wdl(1 task) -
MIXED
genbank_dump
-
pipes/WDL/workflows/sarscov2_genbank_ingest.wdl(1 task)
Task Libraries
Files containing reusable tasks.
-
TASKS
tasks_16S_amplicon
-
pipes/WDL/tasks/tasks_16S_amplicon.wdl(6 tasks)
Parsing demultiplexed fastq BAM files into qiime readable files. -
TASKS
tasks_assembly
-
pipes/WDL/tasks/tasks_assembly.wdl(10 tasks) -
TASKS
tasks_demux
-
pipes/WDL/tasks/tasks_demux.wdl(6 tasks) -
TASKS
tasks_interhost
-
pipes/WDL/tasks/tasks_interhost.wdl(9 tasks)
Run subsampler to get downsampled dataset and metadata proportional to epidemiological case counts. -
TASKS
tasks_intrahost
-
pipes/WDL/tasks/tasks_intrahost.wdl(5 tasks) -
TASKS
tasks_megablast
-
pipes/WDL/tasks/tasks_megablast.wdl(4 tasks)
Trim reads via trimmomatic, remove duplicate reads, and subsample to a desired read count (default of 100,000), bam in, bam out. -
TASKS
tasks_metagenomics
-
pipes/WDL/tasks/tasks_metagenomics.wdl(11 tasks)
Runs Krakenuniq classification -
TASKS
tasks_ncbi
-
pipes/WDL/tasks/tasks_ncbi.wdl(21 tasks) -
TASKS
tasks_ncbi_tools
-
pipes/WDL/tasks/tasks_ncbi_tools.wdl(10 tasks)
This searches NCBI SRA for accessions using the Entrez interface, collects associated metadata, and returns read sets as unaligned BAM files with metadata loaded in. Useful metadata from BioSample is also output from this task directly. This has been tested with both SRA and ENA accessions. This queries the NCBI production database, and as such, the output of this task is non-deterministic given the same input. -
TASKS
tasks_nextstrain
-
pipes/WDL/tasks/tasks_nextstrain.wdl(26 tasks) -
TASKS
tasks_read_utils
-
pipes/WDL/tasks/tasks_read_utils.wdl(9 tasks) -
TASKS
tasks_reports
-
pipes/WDL/tasks/tasks_reports.wdl(11 tasks)
Produce various standard metrics and coverage plots via Picard and Samtools for aligned BAM files. -
TASKS
tasks_sarscov2
-
pipes/WDL/tasks/tasks_sarscov2.wdl(6 tasks)
Pangolin classification of one SARS-CoV-2 sample. -
TASKS
tasks_taxon_filter
-
pipes/WDL/tasks/tasks_taxon_filter.wdl(4 tasks)
Runs a full human read depletion pipeline and removes PCR duplicates. Input database files (bmtaggerDbs, blastDbs, bwaDbs) may be any combination of: .fasta, .fasta.gz, or tarred up indexed fastas (using the software's indexing method) as .tar.gz, .tar.bz2, .tar.lz4, or .tar.zst. -
TASKS
tasks_terra
-
pipes/WDL/tasks/tasks_terra.wdl(6 tasks)
gcloud storage cp without additional authentication only works on Terra -
TASKS
tasks_utils
-
pipes/WDL/tasks/tasks_utils.wdl(27 tasks)
This is nothing more than unix cat.