sarscov2_biosample_load
pipes/WDL/workflows/sarscov2_biosample_load.wdl

WORKFLOW sarscov2_biosample_load

File Path pipes/WDL/workflows/sarscov2_biosample_load.wdl
WDL Version 1.0
Type workflow

Imports

Namespace Path
ncbi_tools ../tasks/tasks_ncbi_tools.wdl
sarscov2 ../tasks/tasks_sarscov2.wdl
utils ../tasks/tasks_utils.wdl

Workflow: sarscov2_biosample_load

Load Broad CRSP metadata and register samples with NCBI BioSample. Return attributes table, id map, etc.

Author: Broad Viral Genomics
viral-ngs@broadinstitute.org
Subworkflow Usage

This workflow is called as a subworkflow by 1 other workflow:

Inputs

Name Type Description Default
sample_meta_crsp File? - -
id_salt File - -
biosample_submit_tsv File? - -
bioproject String - -
ftp_config_js File - -
14 optional inputs with default values

Outputs

Name Type Expression
biosample_attributes File tsv_join.out_tsv
id_map_tsv File? crsp_meta_etl.collab_ids_tsv
collab_ids_tsv File? crsp_meta_etl.collab_ids_tsv
collab_ids_addcols Array[String] select_first([crsp_meta_etl.collab_ids_addcols, []])

Calls

This workflow calls the following tasks or subworkflows:

CALL TASKS crsp_meta_etl

Input Mappings (3)
Input Value
sample_meta_crsp select_first([sample_meta_crsp])
salt read_string(id_salt)
bioproject bioproject

CALL TASKS md5sum

Input Mappings (1)
Input Value
in_file meta_submit_tsv

CALL TASKS biosample_tsv_filter_preexisting

Input Mappings (2)
Input Value
meta_submit_tsv meta_submit_tsv
out_basename basename(meta_submit_tsv,'.tsv')

CALL TASKS biosample_submit_tsv_ftp_upload

Input Mappings (3)
Input Value
meta_submit_tsv biosample_tsv_filter_preexisting.meta_unsubmitted_tsv
config_js ftp_config_js
target_path "/~{prod_test}/biosample/~{basename(meta_submit_tsv,'.tsv')}/~{md5sum.md5}"

CALL TASKS tsv_join

Input Mappings (3)
Input Value
input_tsvs select_all([biosample_tsv_filter_preexisting.biosample_attributes_tsv, biosample_submit_tsv_ftp_upload.attributes_tsv, meta_submit_tsv])
id_col "isolate"
out_basename basename(meta_submit_tsv,'.tsv') + "-attributes"

Images

Container images used by tasks in this workflow:

🐳 ubuntu

ubuntu

Used by 1 task:
  • md5sum
🐳 Parameterized Image
⚙️ Parameterized

Configured via input:
docker

Used by 2 tasks:
  • biosample_tsv_filter_preexisting
  • biosample_submit_tsv_ftp_upload
🐳 viral-core

quay.io/broadinstitute/viral-core:2.5.1

Used by 1 task:
  • tsv_join
🐳 Parameterized Image
⚙️ Parameterized

Configured via input:
docker

Used by 1 task:
  • crsp_meta_etl
← Back to Index

sarscov2_biosample_load - Workflow Graph

🖱️ Scroll to zoom • Drag to pan • Double-click to reset • ESC to close

sarscov2_biosample_load - WDL Source Code

version 1.0

import "../tasks/tasks_ncbi_tools.wdl" as ncbi_tools
import "../tasks/tasks_sarscov2.wdl" as sarscov2
import "../tasks/tasks_utils.wdl" as utils


workflow sarscov2_biosample_load {
    meta {
        description: "Load Broad CRSP metadata and register samples with NCBI BioSample. Return attributes table, id map, etc."
        author: "Broad Viral Genomics"
        email:  "viral-ngs@broadinstitute.org"
        allowNestedInputs: true
    }

    input {
        File?  sample_meta_crsp
        File   id_salt
        File?  biosample_submit_tsv
        String bioproject
        File   ftp_config_js
        String prod_test = "Production" # Production or Test
    }

    if(!defined(biosample_submit_tsv)) {
        call sarscov2.crsp_meta_etl {
            input:
                sample_meta_crsp = select_first([sample_meta_crsp]),
                salt = read_string(id_salt),
                bioproject = bioproject
        }
    }

    File meta_submit_tsv = select_first([biosample_submit_tsv, crsp_meta_etl.biosample_submit_tsv])

    call utils.md5sum {
        input:
            in_file = meta_submit_tsv
    }

    # see if anything already exists in NCBI
    call ncbi_tools.biosample_tsv_filter_preexisting {
        input:
            meta_submit_tsv = meta_submit_tsv,
            out_basename = basename(meta_submit_tsv, '.tsv')
    }

    # register anything that isn't already in NCBI
    if (biosample_tsv_filter_preexisting.num_not_found > 0) {
        call ncbi_tools.biosample_submit_tsv_ftp_upload {
            input:
                meta_submit_tsv = biosample_tsv_filter_preexisting.meta_unsubmitted_tsv,
                config_js = ftp_config_js,
                target_path = "/~{prod_test}/biosample/~{basename(meta_submit_tsv, '.tsv')}/~{md5sum.md5}"
        }
    }

    # merge all results and attributes
    call utils.tsv_join {
        input:
            input_tsvs = select_all([
                biosample_tsv_filter_preexisting.biosample_attributes_tsv,
                biosample_submit_tsv_ftp_upload.attributes_tsv,
                meta_submit_tsv
            ]),
            id_col = "isolate",
            out_basename = basename(meta_submit_tsv, '.tsv') + "-attributes"
    }

    output {
        File           biosample_attributes = tsv_join.out_tsv
        File?          id_map_tsv           = crsp_meta_etl.collab_ids_tsv
        File?          collab_ids_tsv       = crsp_meta_etl.collab_ids_tsv
        Array[String]  collab_ids_addcols   = select_first([crsp_meta_etl.collab_ids_addcols, []])
    }

}