reconstruct_from_alignments
pipes/WDL/workflows/reconstruct_from_alignments.wdl

WORKFLOW reconstruct_from_alignments

File Path pipes/WDL/workflows/reconstruct_from_alignments.wdl
WDL Version 1.0
Type workflow

Imports

Namespace Path
interhost ../tasks/tasks_interhost.wdl
intrahost ../tasks/tasks_intrahost.wdl
nextstrain ../tasks/tasks_nextstrain.wdl
reports ../tasks/tasks_reports.wdl
read_utils ../tasks/tasks_read_utils.wdl
utils ../tasks/tasks_utils.wdl

Workflow: reconstruct_from_alignments

Infer disease transmission events from sequence (consensus + intrahost variation) data using the reconstructR tool

Author: Broad Viral Genomics
viral-ngs@broadinstitute.org

Inputs

Name Type Description Default
aligned_trimmed_bams Array[File]+ - -
assembly_fastas Array[File]+ - -
ref_fasta File - -
max_coverage_depth Int? - -
base_q_threshold Int? - -
mapping_q_threshold Int? - -
read_length_threshold Int? - -
plotXLimits String? - -
plotYLimits String? - -
date_csv File - -
n_iters Int - -
26 optional inputs with default values

Outputs

Name Type Expression
msa_fasta File mafft.aligned_sequences
lofreq_isnvs Array[File] isnvs_ref.report_vcf
depth_csv File merge_coverage_per_position.coverage_multi_sample_per_position_csv
reconstructr_tabulated_tsv_gz File reconstructr.tabulated_tsv_gz
reconstructr_deciphered_tsv_gz File reconstructr.deciphered_tsv_gz

Calls

This workflow calls the following tasks or subworkflows:

CALL TASKS zcat

Input Mappings (2)
Input Value
infiles assembly_fastas
output_name "all_assemblies.fasta.gz"

CALL TASKS mafft → mafft_one_chr

Input Mappings (4)
Input Value
sequences zcat.combined
ref_fasta ref_fasta
remove_reference true
basename "mafft_msa.fasta"

CALL TASKS get_bam_samplename

Input Mappings (1)
Input Value
bam bam

CALL TASKS isnvs_ref → lofreq

Input Mappings (3)
Input Value
reference_fasta ref_fasta
aligned_bam bam
out_basename get_bam_samplename.sample_name

CALL TASKS plot_ref_coverage → plot_coverage

Input Mappings (2)
Input Value
aligned_reads_bam bam
sample_name get_bam_samplename.sample_name

CALL TASKS merge_coverage_per_position

Input Mappings (2)
Input Value
coverage_tsvs plot_ref_coverage.coverage_tsv
ref_fasta ref_fasta

CALL TASKS reconstructr

Input Mappings (4)
Input Value
ref_fasta ref_fasta
lofreq_vcfs isnvs_ref.report_vcf
msa_fasta mafft.aligned_sequences
depth_csv merge_coverage_per_position.coverage_multi_sample_per_position_csv

Images

Container images used by tasks in this workflow:

🐳 viral-core

quay.io/broadinstitute/viral-core:2.5.1

Used by 2 tasks:
  • zcat
  • get_bam_samplename
🐳 Parameterized Image
⚙️ Parameterized

Configured via input:
docker

Used by 2 tasks:
  • mafft
  • isnvs_ref
🐳 ~{docker}

~{docker}

Used by 2 tasks:
  • merge_coverage_per_position
  • plot_ref_coverage
🐳 Parameterized Image
⚙️ Parameterized

Configured via input:
docker

Used by 1 task:
  • reconstructr
← Back to Index

reconstruct_from_alignments - Workflow Graph

🖱️ Scroll to zoom • Drag to pan • Double-click to reset • ESC to close

reconstruct_from_alignments - WDL Source Code

version 1.0

import "../tasks/tasks_interhost.wdl" as interhost
import "../tasks/tasks_intrahost.wdl" as intrahost
import "../tasks/tasks_nextstrain.wdl" as nextstrain
import "../tasks/tasks_reports.wdl" as reports
import "../tasks/tasks_read_utils.wdl" as read_utils
import "../tasks/tasks_utils.wdl" as utils

workflow reconstruct_from_alignments {
    meta {
        description: "Infer disease transmission events from sequence (consensus + intrahost variation) data using the reconstructR tool"
        author: "Broad Viral Genomics"
        email:  "viral-ngs@broadinstitute.org"
        allowNestedInputs: true
    }

    input {
        Array[File]+   aligned_trimmed_bams
        Array[File]+   assembly_fastas
        File           ref_fasta
    }

    # create multiple sequence alignment of fastas
    call utils.zcat {
        input:
            infiles     = assembly_fastas,
            output_name = "all_assemblies.fasta.gz"
    }
    call nextstrain.mafft_one_chr as mafft {
        input:
            sequences = zcat.combined,
            ref_fasta = ref_fasta,
            remove_reference = true,
            basename  = "mafft_msa.fasta"
    }

    # call iSNVs with lofreq and calculate coverage depths
    scatter(bam in aligned_trimmed_bams) {
        call read_utils.get_bam_samplename {
            input:
                bam = bam
        }
        call intrahost.lofreq as isnvs_ref {
            input:
                reference_fasta = ref_fasta,
                aligned_bam     = bam,
                out_basename    = get_bam_samplename.sample_name
        }
        call reports.plot_coverage as plot_ref_coverage {
            input:
                aligned_reads_bam = bam,
                sample_name       = get_bam_samplename.sample_name
        }
    }
    call reports.merge_coverage_per_position {
        input:
            coverage_tsvs = plot_ref_coverage.coverage_tsv,
            ref_fasta = ref_fasta
    }

    # call reconstructR
    call interhost.reconstructr {
        input:
            ref_fasta = ref_fasta,
            lofreq_vcfs = isnvs_ref.report_vcf,
            msa_fasta = mafft.aligned_sequences,
            depth_csv = merge_coverage_per_position.coverage_multi_sample_per_position_csv
    }

    output {
      File         msa_fasta                      = mafft.aligned_sequences
      Array[File]  lofreq_isnvs                   = isnvs_ref.report_vcf
      File         depth_csv                      = merge_coverage_per_position.coverage_multi_sample_per_position_csv
      File         reconstructr_tabulated_tsv_gz  = reconstructr.tabulated_tsv_gz
      File         reconstructr_deciphered_tsv_gz = reconstructr.deciphered_tsv_gz
    }
}