sarscov2_batch_relineage
pipes/WDL/workflows/sarscov2_batch_relineage.wdl

WORKFLOW sarscov2_batch_relineage

File Path pipes/WDL/workflows/sarscov2_batch_relineage.wdl
WDL Version 1.0
Type workflow

Imports

Namespace Path
nextstrain ../tasks/tasks_nextstrain.wdl
sarscov2 ../tasks/tasks_sarscov2.wdl
utils ../tasks/tasks_utils.wdl

Workflow: sarscov2_batch_relineage

Re-call Nextclade and Pangolin lineages on a flowcell's worth of SARS-CoV-2 genomes

Subworkflow Usage

This workflow is called as a subworkflow by 1 other workflow:

Inputs

Name Type Description Default
flowcell_id String - -
genomes_fasta Array[File] - -
metadata_annotated_tsv File - -
metadata_raw_tsv File - -
root_sequence File? - -
auspice_reference_tree_json File? - -
pathogen_json File? - -
gene_annotations_json File? - -
min_length Int? - -
max_ambig Float? - -
analysis_mode String? - -
timezone String? - -
11 optional inputs with default values

Outputs

Name Type Expression
assembly_stats_relineage_tsv File merge_raw.out_tsv
assembly_stats_final_relineage_tsv File merge_annotated.out_tsv
nextclade_all_json File nextclade_many_samples.nextclade_json
nextclade_all_tsv File nextclade_many_samples.nextclade_tsv
nextclade_auspice_json File nextclade_many_samples.auspice_json
nextalign_msa File nextclade_many_samples.nextalign_msa
pangolin_report File pangolin_many_samples.pango_lineage_report
pangolin_msa File pangolin_many_samples.msa_fasta

Calls

This workflow calls the following tasks or subworkflows:

CALL TASKS concatenate

Input Mappings (3)
Input Value
infiles genomes_fasta
output_name "all-genomes.fasta"
cpus 16

CALL TASKS filter_sequences_by_length

Input Mappings (2)
Input Value
sequences_fasta concatenate.combined
min_non_N min_genome_bases

CALL TASKS fasta_to_ids

Input Mappings (1)
Input Value
sequences_fasta filter_sequences_by_length.filtered_fasta

CALL TASKS nextclade_many_samples

Input Mappings (4)
Input Value
genome_fastas [filter_sequences_by_length.filtered_fasta]
genome_ids_setdefault_blank fasta_to_ids.ids_txt
basename "nextclade-~{flowcell_id}"
dataset_name "sars-cov-2"

CALL TASKS pangolin_many_samples

Input Mappings (2)
Input Value
genome_fastas [filter_sequences_by_length.filtered_fasta]
basename "pangolin-~{flowcell_id}"

CALL TASKS today

No explicit input mappings

CALL TASKS merge_raw → tsv_join

Input Mappings (4)
Input Value
input_tsvs [write_tsv(flatten([[meta_header], metadata])), metadata_raw_tsv]
id_col "sample_sanitized"
out_suffix ".tsv"
out_basename basename(metadata_raw_tsv,'.tsv') + ".relineage_~{today.date}"

CALL TASKS merge_annotated → tsv_join

Input Mappings (4)
Input Value
input_tsvs [write_tsv(flatten([[meta_header], metadata])), metadata_annotated_tsv]
id_col "sample_sanitized"
out_suffix ".tsv"
out_basename basename(metadata_annotated_tsv,'.tsv') + ".relineage_~{today.date}"

Images

Container images used by tasks in this workflow:

🐳 ubuntu

ubuntu

Used by 2 tasks:
  • concatenate
  • fasta_to_ids
🐳 Parameterized Image
⚙️ Parameterized

Configured via input:
docker

Used by 3 tasks:
  • filter_sequences_by_length
  • merge_raw
  • merge_annotated
🐳 Parameterized Image
⚙️ Parameterized

Configured via input:
docker

Used by 1 task:
  • nextclade_many_samples
🐳 Parameterized Image
⚙️ Parameterized

Configured via input:
docker

Used by 1 task:
  • pangolin_many_samples
🐳 viral-baseimage

quay.io/broadinstitute/viral-baseimage:0.3.0

Used by 1 task:
  • today
← Back to Index

sarscov2_batch_relineage - Workflow Graph

🖱️ Scroll to zoom • Drag to pan • Double-click to reset • ESC to close

sarscov2_batch_relineage - WDL Source Code

version 1.0

import "../tasks/tasks_nextstrain.wdl" as nextstrain
import "../tasks/tasks_sarscov2.wdl" as sarscov2
import "../tasks/tasks_utils.wdl" as utils

workflow sarscov2_batch_relineage {
    meta {
        description: "Re-call Nextclade and Pangolin lineages on a flowcell's worth of SARS-CoV-2 genomes"
        allowNestedInputs: true
    }

    input {
        String      flowcell_id
        Array[File] genomes_fasta
        File        metadata_annotated_tsv
        File        metadata_raw_tsv
        Int         min_genome_bases = 24000
    }

    call utils.concatenate {
        input:
            infiles     = genomes_fasta,
            output_name = "all-genomes.fasta",
            cpus        = 16
    }

    call utils.filter_sequences_by_length {
        input:
            sequences_fasta = concatenate.combined,
            min_non_N       = min_genome_bases
    }

    call utils.fasta_to_ids {
        input:
            sequences_fasta = filter_sequences_by_length.filtered_fasta
    }

    call nextstrain.nextclade_many_samples {
        input:
            genome_fastas               = [filter_sequences_by_length.filtered_fasta],
            genome_ids_setdefault_blank = fasta_to_ids.ids_txt,
            basename                    = "nextclade-~{flowcell_id}",
            dataset_name                = "sars-cov-2"
    }

    call sarscov2.pangolin_many_samples {
        input:
            genome_fastas = [filter_sequences_by_length.filtered_fasta],
            basename      = "pangolin-~{flowcell_id}"
    }

    scatter(sample_sanitized in read_lines(fasta_to_ids.ids_txt)) {
        Array[String] metadata = [
            sample_sanitized,
            pangolin_many_samples.pango_lineage[sample_sanitized],
            pangolin_many_samples.scorpio_call[sample_sanitized],
            nextclade_many_samples.nextclade_clade[sample_sanitized],
            nextclade_many_samples.aa_subs_csv[sample_sanitized],
            nextclade_many_samples.aa_dels_csv[sample_sanitized],
            pangolin_many_samples.pangolin_versions,
            nextclade_many_samples.nextclade_version
        ]
    }
    Array[String] meta_header = [
        'sample_sanitized',
        'pango_lineage', 'scorpio_call',
        'nextclade_clade', 'nextclade_aa_subs', 'nextclade_aa_dels',
        'pangolin_version', 'nextclade_version'
    ]

    call utils.today

    call utils.tsv_join as merge_raw {
        input:
            input_tsvs = [write_tsv(flatten([[meta_header], metadata])),
                metadata_raw_tsv],
            id_col = "sample_sanitized",
            out_suffix = ".tsv",
            out_basename = basename(metadata_raw_tsv, '.tsv') + ".relineage_~{today.date}"
    }

    call utils.tsv_join as merge_annotated {
        input:
            input_tsvs = [write_tsv(flatten([[meta_header], metadata])),
                metadata_annotated_tsv],
            id_col = "sample_sanitized",
            out_suffix = ".tsv",
            out_basename = basename(metadata_annotated_tsv, '.tsv') + ".relineage_~{today.date}"
    }

    output {
        File   assembly_stats_relineage_tsv = merge_raw.out_tsv
        File   assembly_stats_final_relineage_tsv = merge_annotated.out_tsv
        File   nextclade_all_json           = nextclade_many_samples.nextclade_json
        File   nextclade_all_tsv            = nextclade_many_samples.nextclade_tsv
        File   nextclade_auspice_json       = nextclade_many_samples.auspice_json
        File   nextalign_msa                = nextclade_many_samples.nextalign_msa
        File   pangolin_report              = pangolin_many_samples.pango_lineage_report
        File   pangolin_msa                 = pangolin_many_samples.msa_fasta
    }
}