DEEPVARIANT

Genaro Pimienta
Feb 7
4 min read

Updated: Feb 9

In this blogpost I explain the architecture of DeepVariant (Google Brain team).

Published in 2018, DeepVariant is a neural network-based variant caller that discovers genomic variants with 99.9% accuracy.

A universal SNP and small-indel variant caller using deep neural networks —2018

Unlike Bayesian-based variant callers (e.g., HaplotypeCaller), DeepVariant discovers SNPs and INDELs regardless of the sequencing technology used:

Illumina short reads
PacBio long reads
Oxford Nanopore ultralong reads

SNP: single nucleotide polymorphism

INDEL: insertion or deletion

Before DeepVariant became available, the gold-standard variant caller used in actionable genomics was HaplotypeCaller.

Embedded in the Genome Analysis Toolkit (GATK), HaplotypeCaller uses Bayesian statistics to identify genomic variants with a ~99.9% accuracy.

Scaling accurate genetic variant discovery to tens of thousands of samples —2017

But, because the Bayesian model in HaplotypeCaller (and most other variant callers avaialble today) is optimized for Illumina short reads, it underperforms when analyzing long reads generated by PacBio or Oxford Nanopore Technologies (ONT).

DEEPVARIANT

DeepVariant is based on a convolutional neural network (CNN) with an Inception architecture.

Named GoogLeNet (or Inception-v1) when first published in 2014, this CNN architecture won the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) for image classification.

Going deeper with convolutions —2014

Thus far, four versions of Inception have been developed. DeepVariant is based on Inception-v3.

Rethinking the Inception Architecture for Computer Vision —2015

The Inception architecture has four components (Figure 1):

Input Tensor
Stem block
Body block
Head block

Figure 1. Inception CNN architecture. Multiple copies of the Inception module are implemented in this type of CNN. For simplicity, only one Inception module is shown in this cartoon. — ***Figure 1.*** *Inception CNN architecture. Multiple copies of the Inception module are implemented in this type of CNN. For simplicity, only one Inception module is shown in this cartoon.*

In the following sections I describe these four components in the context of DeepVariant.

Reading my previous blogposts WHAT IS AN ARTIFICIAL NEURAL NETWORK? and CONVOLUTIONAL NEURAL NETWORKS will help you understand the text below.

INPUT TENSOR

Embedded in the Tensor image are the height (read mapping coverage per position in the genome) and width (read mapping window along the genome) of the pileup output, which is generated by a read mapping algorithm (Figure 2).

Figure 2. Pileup output. Read mapping algorithms: BWA-MEM for Illumina short reads and Minimap2 for PacBio and Oxford Nanopore long reads. — ***Figure 2.*** *Pileup output. R*ead mapping algorithms: BWA-MEM for Illumina short reads and Minimap2 for PacBio and Oxford Nanopore long reads.

The depth in the Tensor image is composed of six channels, each populated with a statistical feature from the read mapping summary (Figure 3).

The six features are:

Read base (sequencing instrument)
Base call quality (sequencing instrument)
Base mapping quality (read mapping algorithm)
Read alignment strand (read mapping algorithm)
Metadata tag: mapped read supports predicted variant (read mapping algorithm)
Metadata tag: mapped base differs from reference (read mapping algorithm)

Figure 3. 3D Tensor with six feature channels. — ***Figure 3.*** *3D Tensor with six feature channels.*

STEM BLOCK

Equipped with three convolutional layers (dimensions 7x7, 1x1, and 3x3) and two pooling layers (dimensions 3x3), the stem block extracts low-level representational features from the input Tensor (Figure 4).

In the stem block, the 1x1 convoluting layer, also named bottleneck layer, is an adaptation of the Network-in-Network concept published in 2014.

Network In Network —2014

In DeepVariant, the bottleneck layer reduces the dimensions of the input Tensor, while keeping the number of channels intact. This provides computational efficiency, without sacrificing the depth of the features extracted from the input dataset.

Check this playlist: C4W1L01 Computer Vision

BODY BLOCK

The body block is built of multiple Inception modules, which extract high-level features from the incoming layer (last one in the stem block) in a computationally efficient manner (Figure 6).

An Inception module is composed of multiple (1x1 convolving filter) and asymmetrical (1x3 and 3x1 convolving filters) layers (Figure 6). By stacking together multiple Inception modules, the CNN can replace a computationally taxing 7x7 and 5x5 convolving filters, without loosing representational depth.

It is the depth (level of detail) of the representational features extracted by multiple Inception modules stacked together, which makes DeepVariant so accurate at predicting genomic variants.

Figure 6. Inception module. The depth of the CNN in DeepVariant is proportional to the number of Inception modules stacked in the Body block. — ***Figure 6.*** *Inception module. The depth of the CNN in DeepVariant is proportional to the number of Inception modules stacked in the Body block.*

HEAD BLOCK

The head block comprises one or more multilayer perceptron classifiers. The first classifier in the head block ingests a flattened matrix, which contains the features extracted by the Inception modules.

From the representational features extracted by the Inception modules, the multilayer perceptron computes the presence or absence of a genomic variant in the read mapping pileup output.