sarscov2_gisaid_ingest
pipes/WDL/workflows/sarscov2_gisaid_ingest.wdl

WORKFLOW sarscov2_gisaid_ingest

File Path pipes/WDL/workflows/sarscov2_gisaid_ingest.wdl
WDL Version 1.0
Type workflow

Imports

Namespace Path
nextstrain ../tasks/tasks_nextstrain.wdl
terra ../tasks/tasks_terra.wdl

Workflow: sarscov2_gisaid_ingest

Sanitize data downloaded from GISAID for use in Nextstrain/augur. See: https://nextstrain.github.io/ncov/data-prep#curate-data-from-the-full-gisaid-database

Inputs

Name Type Description Default
sequences_gisaid_fasta File Multiple sequences downloaded from GISAID -
metadata_gisaid_tsv File Tab-separated metadata file for sequences downloaded from GISAID and passed in via sequences_gisaid_fasta. -
gcs_out String? If specified, GCP bucket prefix for storage of the output data. -
prefix_to_strip String? - -
3 optional inputs with default values

Outputs

Name Type Expression
sequences_gisaid_sanitized_fasta File sanitize_gisaid.sequences_gisaid_sanitized_fasta
metadata_gisaid_sanitized_tsv File sanitize_gisaid.metadata_gisaid_sanitized_tsv

Calls

This workflow calls the following tasks or subworkflows:

CALL TASKS sanitize_gisaid → nextstrain_ncov_sanitize_gisaid_data

Input Mappings (2)
Input Value
sequences_gisaid_fasta sequences_gisaid_fasta
metadata_gisaid_tsv metadata_gisaid_tsv

CALL TASKS gcs_dump → gcs_copy

Input Mappings (2)
Input Value
infiles [sanitize_gisaid.metadata_gisaid_sanitized_tsv, sanitize_gisaid.sequences_gisaid_sanitized_fasta]
gcs_uri_prefix "~{gcs_out}/"

Images

Container images used by tasks in this workflow:

🐳 Parameterized Image
⚙️ Parameterized

Configured via input:
docker

Used by 1 task:
  • sanitize_gisaid
🐳 viral-baseimage

quay.io/broadinstitute/viral-baseimage:0.3.0

Used by 1 task:
  • gcs_dump
← Back to Index

sarscov2_gisaid_ingest - Workflow Graph

🖱️ Scroll to zoom • Drag to pan • Double-click to reset • ESC to close

sarscov2_gisaid_ingest - WDL Source Code

version 1.0

import "../tasks/tasks_nextstrain.wdl" as nextstrain
import "../tasks/tasks_terra.wdl" as terra

workflow sarscov2_gisaid_ingest {
    meta {
        description: "Sanitize data downloaded from GISAID for use in Nextstrain/augur. See: https://nextstrain.github.io/ncov/data-prep#curate-data-from-the-full-gisaid-database"
    }
    input {
        File sequences_gisaid_fasta
        File metadata_gisaid_tsv

        String? gcs_out
    }
    parameter_meta {
        sequences_gisaid_fasta: {
            description: "Multiple sequences downloaded from GISAID",
            patterns: ["*.fasta","*.fasta.xy","*.fasta.gz"]
        }
        metadata_gisaid_tsv: {
            description: "Tab-separated metadata file for sequences downloaded from GISAID and passed in via sequences_gisaid_fasta.",
            patterns: ["*.txt", "*.tsv","*.tsv.xy","*.tsv.gz"]
        }
        gcs_out: {
            description: "If specified, GCP bucket prefix for storage of the output data."
        }
    }
    call nextstrain.nextstrain_ncov_sanitize_gisaid_data as sanitize_gisaid {
        input:
            sequences_gisaid_fasta = sequences_gisaid_fasta,
            metadata_gisaid_tsv    = metadata_gisaid_tsv
    }

    if(defined(gcs_out)) {
        call terra.gcs_copy as gcs_dump {
            input:
                infiles        = [sanitize_gisaid.metadata_gisaid_sanitized_tsv, sanitize_gisaid.sequences_gisaid_sanitized_fasta],
                gcs_uri_prefix = "~{gcs_out}/"
        }
    }

    output {
        File sequences_gisaid_sanitized_fasta = sanitize_gisaid.sequences_gisaid_sanitized_fasta
        File metadata_gisaid_sanitized_tsv    = sanitize_gisaid.metadata_gisaid_sanitized_tsv
    }
}