Overview

105
Internal WDL Files
89
Workflows
174
Tasks

🔍 Search

🔍

Workflows

Main workflow files that orchestrate tasks and subworkflows.

  • WORKFLOW align_and_count_multiple_report - pipes/WDL/workflows/align_and_count_multiple_report.wdl
    Count the number of times reads map to provided reference sequences. Useful for counting spike-ins, etc.
  • WORKFLOW align_and_count_report - pipes/WDL/workflows/align_and_count.wdl
    Align reads to reference with minimap2 and count the number of hits. Results are returned in the format of 'samtools idxstats'.
  • WORKFLOW align_and_plot - pipes/WDL/workflows/align_and_plot.wdl
    Align reads to reference and produce coverage plots and statistics.
  • WORKFLOW amplicon16S_analysis - pipes/WDL/workflows/classify_qiime2_multi.wdl
    Running 16S amplicon (from BAM format) sequencing analysis with qiime.
  • WORKFLOW assemble_denovo - pipes/WDL/workflows/assemble_denovo.wdl
    Assisted de novo viral genome assembly from raw reads.
  • WORKFLOW assemble_denovo_metagenomic - pipes/WDL/workflows/assemble_denovo_metagenomic.wdl
    Performs viral de novo assembly on metagenomic reads against a large range of possible reference genomes. Runs raw reads through taxonomic classification (Kraken2), human read depletion (based on Kraken2), de novo assembly (SPAdes), and FASTQC/multiQC of reads. Scaffold de novo contigs against a set of possible references and subsequently polish with reads. This workflow accepts a very large set of input reference genomes. It will subset the reference genomes to those with ANI hits to the provided contigs/MAGs and cluster the reference hits by any ANI similarity to each other. It will choose the top reference from each cluster and produce one assembly for each cluster. This is intended to allow for the presence of multiple diverse viral taxa (coinfections) while forcing a choice of the best assembly from groups of related reference genomes.
  • WORKFLOW assemble_refbased - pipes/WDL/workflows/assemble_refbased.wdl (used by 7 workflows)
    Reference-based microbial consensus calling. Aligns NGS reads to a singular reference genome, calls a new consensus sequence, and emits: new assembly, reads aligned to provided reference, reads aligned to new assembly, various figures of merit, plots, and QC metrics. The user may provide unaligned reads spread across multiple input files and this workflow will parallelize alignment per input file before merging results prior to consensus calling.
  • WORKFLOW augur_export_only - pipes/WDL/workflows/augur_export_only.wdl
    Convert a newick formatted phylogenetic tree with other config settings and node values into a json suitable for auspice visualization. See https://nextstrain-augur.readthedocs.io/en/stable/usage/cli/export.html
  • WORKFLOW augur_from_assemblies - pipes/WDL/workflows/augur_from_assemblies.wdl
    Align assemblies, build trees, and convert to json representation suitable for Nextstrain visualization. See https://nextstrain.org/docs/getting-started/ and https://nextstrain-augur.readthedocs.io/en/stable/
  • WORKFLOW augur_from_beast_mcc - pipes/WDL/workflows/augur_from_beast_mcc.wdl
    Visualize BEAST output with Nextstrain. This workflow converts a BEAST MCC tree (.tree file) into an Auspice v2 json file. See https://nextstrain-augur.readthedocs.io/en/stable/faq/import-beast.html for details.
  • WORKFLOW augur_from_mltree - pipes/WDL/workflows/augur_from_mltree.wdl
    Take a premade maximum likelihood tree (Newick format) and run the remainder of the augur pipeline (timetree modificaitons, ancestral inference, etc) and convert to json representation suitable for Nextstrain visualization. See https://nextstrain.org/docs/getting-started/ and https://nextstrain-augur.readthedocs.io/en/stable/
  • WORKFLOW augur_from_msa - pipes/WDL/workflows/augur_from_msa.wdl
    Build trees, and convert to json representation suitable for Nextstrain visualization. See https://nextstrain.org/docs/getting-started/ and https://nextstrain-augur.readthedocs.io/en/stable/
  • WORKFLOW augur_from_msa_with_subsampler - pipes/WDL/workflows/augur_from_msa_with_subsampler.wdl
    Build trees, and convert to json representation suitable for Nextstrain visualization. See https://nextstrain.org/docs/getting-started/ and https://nextstrain-augur.readthedocs.io/en/stable/
  • WORKFLOW bams_multiqc - pipes/WDL/workflows/bams_multiqc.wdl
    Run FastQC on a set of BAM files, and then MultiQC to summarize all outputs.
  • WORKFLOW beast_gpu - pipes/WDL/workflows/beast_gpu.wdl
    Runs BEAST (v1) on a GPU instance. Use with care--this can be expensive if run incorrectly.
  • WORKFLOW calc_bam_read_depths - pipes/WDL/workflows/calc_bam_read_depths.wdl
    Generates read depth tables.
  • WORKFLOW chunk_megablast - pipes/WDL/workflows/megablast_chunk.wdl
    Chunk megablast function
  • WORKFLOW classify_kaiju - pipes/WDL/workflows/classify_kaiju.wdl
    Taxonomic classification of reads with kaiju.
  • WORKFLOW classify_kraken2 - pipes/WDL/workflows/classify_kraken2.wdl
    Taxonomic classification of sequences via kraken2 (or kraken2x, depending on the database provided).
  • WORKFLOW classify_krakenuniq - pipes/WDL/workflows/classify_krakenuniq.wdl
    Taxonomic classification of reads using krakenuniq v1.
  • WORKFLOW classify_multi - pipes/WDL/workflows/classify_multi.wdl
    Runs raw reads through taxonomic classification (Kraken2), human read depletion (based on Kraken2), de novo assembly (SPAdes), and FASTQC/multiQC of reads.
  • WORKFLOW classify_single - pipes/WDL/workflows/classify_single.wdl
    Runs raw reads through taxonomic classification (Kraken2), human read depletion (based on Kraken2), de novo assembly (SPAdes), and FASTQC/multiQC of reads.
  • WORKFLOW coverage_table - pipes/WDL/workflows/coverage_table.wdl
  • WORKFLOW demux_deplete - pipes/WDL/workflows/demux_deplete.wdl (used by 1 workflow)
    Picard-based demultiplexing and basecalling from a tarball of a raw BCL directory, followed by QC metrics, depletion, and SRA submission prep.
  • WORKFLOW demux_metadata_only - pipes/WDL/workflows/demux_metadata_only.wdl
    Picard-based demultiplexing and basecalling from a tarball of a raw BCL directory, followed by QC metrics, depletion, and SRA submission prep.
  • WORKFLOW demux_only - pipes/WDL/workflows/demux_only.wdl
    Picard-based demultiplexing and basecalling from a tarball of a raw BCL directory.
  • WORKFLOW demux_plus - pipes/WDL/workflows/demux_plus.wdl
    Picard-based demultiplexing and basecalling from a tarball of a raw BCL directory, followed by basic metagenomics and QC metrics. Intended for automatic triggering post upload on DNAnexus.
  • WORKFLOW deplete_only - pipes/WDL/workflows/deplete_only.wdl
    Taxonomic depletion of reads matching unwanted taxa (such as human).
  • WORKFLOW detect_cross_contamination - pipes/WDL/workflows/detect_cross_contamination.wdl
    Detect cross-contamination between samples using consensus-level and sub-consensus variation.
  • WORKFLOW detect_cross_contamination_precalled_vcfs - pipes/WDL/workflows/detect_cross_contamination_precalled_vcfs.wdl
    Detect cross-contamination between samples using consensus-level and sub-consensus variation, from consensus genomes and pre-called LoFreq vcf files.
  • WORKFLOW diff_genome_sets - pipes/WDL/workflows/diff_genome_sets.wdl
  • WORKFLOW downsample - pipes/WDL/workflows/downsample.wdl
    Random subsampling of reads.
  • WORKFLOW dump_gcloud_env_info - pipes/WDL/workflows/dump_gcloud_env_info.wdl
    Write system and gcloud environment info to output files.
  • WORKFLOW fastq_to_ubam - pipes/WDL/workflows/fastq_to_ubam.wdl
    Convert reads from fastq format (single or paired) to unaligned BAM format.
  • WORKFLOW fetch_annotations - pipes/WDL/workflows/fetch_annotations.wdl
  • WORKFLOW fetch_sra_to_bam - pipes/WDL/workflows/fetch_sra_to_bam.wdl
    Retrieve reads from the NCBI Short Read Archive in unaligned BAM format with relevant metadata encoded.
  • WORKFLOW filter_classified_bam_to_taxa - pipes/WDL/workflows/filter_classified_bam_to_taxa.wdl
    Taxonomic filtration of reads utilizing output from a classifier such as kraken1/2/uniq. Can filter out or filter to a specified taxonomic grouping.
  • WORKFLOW filter_sequences - pipes/WDL/workflows/filter_sequences.wdl
    Filter and subsample a sequence set. See https://nextstrain-augur.readthedocs.io/en/stable/usage/cli/filter.html
  • WORKFLOW genbank_gather - pipes/WDL/workflows/genbank_gather.wdl
    Consolidate all genbank submission files for individual genomes into bulk submission packages grouped by submission pathway.
  • WORKFLOW genbank_single - pipes/WDL/workflows/genbank_single.wdl
    Prepare assemblies for Genbank submission. This includes annotation by simple coordinate transfer from Genbank annotations and a multiple alignment. See https://viral-pipelines.readthedocs.io/en/latest/ncbi_submission.html for details.
  • WORKFLOW isnvs_lofreq - pipes/WDL/workflows/isnvs_lofreq.wdl
    variant calls by LoFreq against reference_fasta
  • WORKFLOW isnvs_merge_to_vcf - pipes/WDL/workflows/isnvs_merge_to_vcf.wdl
  • WORKFLOW isnvs_one_sample - pipes/WDL/workflows/isnvs_one_sample.wdl
    Intrahost variant calling with V-Phaser2. Requires an assembled genome and a BAM of aligned reads against that same genome.
  • WORKFLOW kraken2_build - pipes/WDL/workflows/kraken2_build.wdl
    Build a Kraken2 (or 2X) database.
  • WORKFLOW mafft - pipes/WDL/workflows/mafft.wdl
    MAFFT multiple-alignment for a set of possibly multi-segment genomes.
  • WORKFLOW mafft_and_snp - pipes/WDL/workflows/mafft_and_snp.wdl
    Align assemblies with mafft and find SNPs with snp-sites.
  • WORKFLOW mafft_and_snp_annotated - pipes/WDL/workflows/mafft_and_snp_annotated.wdl
    Align assemblies with mafft and find SNPs with snp-sites.
  • WORKFLOW mafft_and_trim - pipes/WDL/workflows/mafft_and_trim.wdl
    MAFFT based multiple alignment followed by trimal-based edge trimming.
  • WORKFLOW megablast - pipes/WDL/workflows/blastoff.wdl
  • WORKFLOW merge_bams - pipes/WDL/workflows/merge_bams.wdl
    Merge, reheader, or merge-and-reheader BAM files.
  • WORKFLOW merge_metagenomics - pipes/WDL/workflows/merge_metagenomics.wdl
    Combine metagenomic reports from single samples into an aggregate report.
  • WORKFLOW merge_tar_chunks - pipes/WDL/workflows/merge_tar_chunks.wdl
    Combine multiple tar files (possibly compressed by gzip, bz2, lz4, zstd, etc) into a single tar file. Originally meant for combining streaming upload chunks from a sequencing run.
  • WORKFLOW merge_vcfs - pipes/WDL/workflows/merge_vcfs.wdl
    Merge VCFs from multiple samples using GATK3.
  • WORKFLOW merge_vcfs_and_annotate - pipes/WDL/workflows/merge_vcfs_and_annotate.wdl
    Merge VCFs emitted by GATK UnifiedGenotyper and annotate with snpEff.
  • WORKFLOW metagenomic_denovo - pipes/WDL/workflows/metagenomic_denovo.wdl
    Assisted de novo viral genome assembly (SPAdes, scaffolding, and polishing) from metagenomic raw reads. Runs raw reads through taxonomic classification (Kraken2), human read depletion (based on Kraken2 and optionally using BWA, BLASTN, and/or BMTAGGER databases), and FASTQC/multiQC of reads.
  • WORKFLOW multiqc_only - pipes/WDL/workflows/multiqc_only.wdl
    Combine multiple FastQC reports into a single MultiQC summary.
  • WORKFLOW nextclade_single - pipes/WDL/workflows/nextclade_single.wdl
    Run Nextclade on a single genome
  • WORKFLOW populate_library_and_sample_tables_from_flowcell - pipes/WDL/workflows/populate_library_and_sample_tables_from_flowcell.wdl
    Terra only: Populate per-library-lane and per-sample tables from existing demultiplexed flowcell output
  • WORKFLOW qiime_import_bam - pipes/WDL/workflows/bam_to_qiime.wdl
    Importing BAM files into QIIME
  • WORKFLOW reconstruct_from_alignments - pipes/WDL/workflows/reconstruct_from_alignments.wdl
    Infer disease transmission events from sequence (consensus + intrahost variation) data using the reconstructR tool
  • WORKFLOW sarscov2_batch_relineage - pipes/WDL/workflows/sarscov2_batch_relineage.wdl (used by 1 workflow)
    Re-call Nextclade and Pangolin lineages on a flowcell's worth of SARS-CoV-2 genomes
  • WORKFLOW sarscov2_biosample_load - pipes/WDL/workflows/sarscov2_biosample_load.wdl (used by 1 workflow)
    Load Broad CRSP metadata and register samples with NCBI BioSample. Return attributes table, id map, etc.
  • WORKFLOW sarscov2_data_release - pipes/WDL/workflows/sarscov2_data_release.wdl
    Submit data bundles to databases and repositories
  • WORKFLOW sarscov2_genbank - pipes/WDL/workflows/sarscov2_genbank.wdl
    Prepare SARS-CoV-2 assemblies for Genbank submission. This includes QC checks with NCBI's VADR tool and filters out genomes that do not pass its tests.
  • WORKFLOW sarscov2_gisaid_ingest - pipes/WDL/workflows/sarscov2_gisaid_ingest.wdl
    Sanitize data downloaded from GISAID for use in Nextstrain/augur. See: https://nextstrain.github.io/ncov/data-prep#curate-data-from-the-full-gisaid-database
  • WORKFLOW sarscov2_illumina_full - pipes/WDL/workflows/sarscov2_illumina_full.wdl
    Full SARS-CoV-2 analysis workflow starting from raw Illumina flowcell (tar.gz) and metadata and performing assembly, spike-in analysis, qc, lineage assignment, and packaging for data release.
  • WORKFLOW sarscov2_lineages - pipes/WDL/workflows/sarscov2_lineages.wdl (used by 1 workflow)
    Call Nextclade and Pangolin lineages on a single SARS-CoV-2 genome
  • WORKFLOW sarscov2_nextclade_multi - pipes/WDL/workflows/sarscov2_nextclade_multi.wdl
    Create Nextclade visualizations on many SARS-CoV-2 genomes
  • WORKFLOW sarscov2_nextstrain - pipes/WDL/workflows/sarscov2_nextstrain.wdl
    Align assemblies, build trees, and convert to json representation suitable for Nextstrain visualization. See https://nextstrain.org/docs/getting-started/ and https://nextstrain-augur.readthedocs.io/en/stable/
  • WORKFLOW sarscov2_nextstrain_aligned_input - pipes/WDL/workflows/sarscov2_nextstrain_aligned_input.wdl
    Take aligned assemblies, build trees, and convert to json representation suitable for Nextstrain visualization. See https://nextstrain.org/docs/getting-started/ and https://nextstrain-augur.readthedocs.io/en/stable/
  • WORKFLOW sarscov2_sequencing_reports - pipes/WDL/workflows/sarscov2_sequencing_reports.wdl
    Produce per-state and per-collaborator weekly reports of SARS-CoV-2 surveillance data.
  • WORKFLOW sarscov2_sra_to_genbank - pipes/WDL/workflows/sarscov2_sra_to_genbank.wdl
    Full SARS-CoV-2 analysis workflow starting from SRA data and metadata and performing assembly, spike-in analysis, qc, lineage assignment, and packaging assemblies for data release.
  • WORKFLOW scaffold_and_refine - pipes/WDL/workflows/scaffold_and_refine.wdl
    Scaffold de novo contigs against a set of possible references and subsequently polish with reads.
  • WORKFLOW scaffold_and_refine_multitaxa - pipes/WDL/workflows/scaffold_and_refine_multitaxa.wdl
    Scaffold de novo contigs against a set of possible references and subsequently polish with reads. This workflow accepts a very large set of input reference genomes. It will subset the reference genomes to those with ANI hits to the provided contigs/MAGs and cluster the reference hits by any ANI similarity to each other. It will choose the top reference from each cluster and produce one assembly for each cluster. This is intended to allow for the presence of multiple diverse viral taxa (coinfections) while forcing a choice of the best assembly from groups of related reference genomes.
  • WORKFLOW simulate_illumina_reads - pipes/WDL/workflows/simulate_illumina_reads.wdl
    Generate synthetic Illumina read sets for testing using wgsim. Takes a space-separated string of colon-separated pairs where each pair consists of a GenBank accession and a coverage value (e.g., 'KJ660346.2:12.5x NC_004296.1:0.9X'), downloads the sequences, and simulates Illumina reads.
  • WORKFLOW submit_biosample - pipes/WDL/workflows/submit_biosample.wdl
    Register samples with NCBI BioSample. Return attributes table.
  • WORKFLOW submit_genbank - pipes/WDL/workflows/submit_genbank.wdl
    Submit FTP-eligible genomes to NCBI Genbank (currently only flu A/B/C and SARS-CoV-2)
  • WORKFLOW submit_sra - pipes/WDL/workflows/submit_sra.wdl
    Submit reads to SRA
  • WORKFLOW subsample_by_metadata - pipes/WDL/workflows/subsample_by_metadata.wdl
    Filter and subsample a sequence set. See https://nextstrain-augur.readthedocs.io/en/stable/usage/cli/filter.html
  • WORKFLOW subsample_by_metadata_with_focal - pipes/WDL/workflows/subsample_by_metadata_with_focal.wdl
    Filter and subsample a global sequence set with a bias towards a geographic area of interest.
  • WORKFLOW subsampler_only - pipes/WDL/workflows/subsample_by_casecounts.wdl
  • WORKFLOW terra_table_to_tsv - pipes/WDL/workflows/terra_table_to_tsv.wdl
    Download data table in Terra workspace to tsv file.
  • WORKFLOW terra_tsv_to_table - pipes/WDL/workflows/terra_tsv_to_table.wdl
    Upload tsv files to Terra data table: insert-or-update on existing rows/columns
  • WORKFLOW trimal - pipes/WDL/workflows/trimal.wdl
    Trim a multiple sequence alignment with Trimal.
  • WORKFLOW unpack_archive_to_bucket - pipes/WDL/workflows/unpack_archive_to_bucket.wdl
    Unpack archive(s) to a target location within a Google Storage bucket
  • WORKFLOW update_data_tables - pipes/WDL/workflows/terra_update_assemblies.wdl
    Create data tables in Terra workspace from provided tsv load file.

Mixed Files (Workflow + Tasks)

Files containing both workflow definition and task implementations.

Task Libraries

Files containing reusable tasks.

  • TASKS tasks_16S_amplicon - pipes/WDL/tasks/tasks_16S_amplicon.wdl (6 tasks)
    Parsing demultiplexed fastq BAM files into qiime readable files.
  • TASKS tasks_assembly - pipes/WDL/tasks/tasks_assembly.wdl (10 tasks)
  • TASKS tasks_demux - pipes/WDL/tasks/tasks_demux.wdl (6 tasks)
  • TASKS tasks_interhost - pipes/WDL/tasks/tasks_interhost.wdl (9 tasks)
    Run subsampler to get downsampled dataset and metadata proportional to epidemiological case counts.
  • TASKS tasks_intrahost - pipes/WDL/tasks/tasks_intrahost.wdl (5 tasks)
  • TASKS tasks_megablast - pipes/WDL/tasks/tasks_megablast.wdl (4 tasks)
    Trim reads via trimmomatic, remove duplicate reads, and subsample to a desired read count (default of 100,000), bam in, bam out.
  • TASKS tasks_metagenomics - pipes/WDL/tasks/tasks_metagenomics.wdl (11 tasks)
    Runs Krakenuniq classification
  • TASKS tasks_ncbi - pipes/WDL/tasks/tasks_ncbi.wdl (21 tasks)
  • TASKS tasks_ncbi_tools - pipes/WDL/tasks/tasks_ncbi_tools.wdl (10 tasks)
    This searches NCBI SRA for accessions using the Entrez interface, collects associated metadata, and returns read sets as unaligned BAM files with metadata loaded in. Useful metadata from BioSample is also output from this task directly. This has been tested with both SRA and ENA accessions. This queries the NCBI production database, and as such, the output of this task is non-deterministic given the same input.
  • TASKS tasks_nextstrain - pipes/WDL/tasks/tasks_nextstrain.wdl (26 tasks)
  • TASKS tasks_read_utils - pipes/WDL/tasks/tasks_read_utils.wdl (9 tasks)
  • TASKS tasks_reports - pipes/WDL/tasks/tasks_reports.wdl (11 tasks)
    Produce various standard metrics and coverage plots via Picard and Samtools for aligned BAM files.
  • TASKS tasks_sarscov2 - pipes/WDL/tasks/tasks_sarscov2.wdl (6 tasks)
    Pangolin classification of one SARS-CoV-2 sample.
  • TASKS tasks_taxon_filter - pipes/WDL/tasks/tasks_taxon_filter.wdl (4 tasks)
    Runs a full human read depletion pipeline and removes PCR duplicates. Input database files (bmtaggerDbs, blastDbs, bwaDbs) may be any combination of: .fasta, .fasta.gz, or tarred up indexed fastas (using the software's indexing method) as .tar.gz, .tar.bz2, .tar.lz4, or .tar.zst.
  • TASKS tasks_terra - pipes/WDL/tasks/tasks_terra.wdl (6 tasks)
    gcloud storage cp without additional authentication only works on Terra
  • TASKS tasks_utils - pipes/WDL/tasks/tasks_utils.wdl (27 tasks)
    This is nothing more than unix cat.