top of page

RECURRENT NEURAL NETWORKS

Updated: 3 days ago

In my previous blogpost entitled WHAT IS AN ARTIFICIAL NEURAL NETWORK?, I explained that feedforward networks (FFN) are the prototypical deep learning algorithm. I also mentioned that FNNs are used in regression and classification tasks, and that their performance is modulated by activation functions and numeric constants (weights and biases), which are fine-tuned during model training (Figure 1).


ree

Figure 1. Deep neural network. With the exception of the circles labeled softmax in the output layer, the symbols in this figure are the same as in Figure 1. Softmax is an activation function used for probability classification tasks.


In this blogpost I discuss recurrent neural networks (RNN), which are a specialized form of FFN.


RECURRENT NEURAL NETWORKS

RNNs are type of deep learning architecture used to interpret patterns from sequential data, such as words in a sentence or paragraph. RNNs are often used in language translation and word prediction.


When exposed to sequential data, the FNN embedded in the RNN operates as a recurrent unit: it interprets one word at a time in a sentence (or paragraph) (Figure 2).


ree

Figure 2.  RNNs are feedforward networks with a hidden state and a feedback loop. Two functions are used in RNNs. Neuron activation in the feedforward network is transformed with a ReLu function. The hidden state, updateed each term, is transformed with a tanh function. The output is represented with ŷ to distinguish it from the target value (y).


UNROLLED RECURRENT NEURAL NETWORK

Recurrent unit is a conceptual abstraction. To understand its function, RNN are visualized in its unrolled form (another conceptual abstraction).


As it iterates (recurs) across the sequential string of words, the unrolled recurrent unit uses a feedback loop to recycle information from the letters it interpreted in previous steps. The flow of information recycled by the feedback loop is regulated by a Tanh function (Figure 3).


Every time the unrolled recurrent unit advances along the sequential input, the context and meaning of previous words is accumulated in a hidden state. Using this hidden state as a form of memory of previous events, the recurrent unit ensures term dependency among non-contiguous words in a sentence. The flow of information recycled by the feedback loop is regulated by a Tanh function (Figure 3).


ree

Figure 3.  The unrolled RNN.



SHORT-TERM MEMORY

As explained in my previous blogpost WHAT IS AN ARTIFICIAL NEURAL NETWORK?, fine-tuning a neural network, if not done carefully, can result in vanishing or exploding gradients. When this happens, model training is compromised.


RNNs are impacted by vanishing gradients due to their architectures. The performance of an RNN drops proportionally to the length of the input dataset. Vanishing gradients affect the first terms in a sequence the most. As the recurrent unit advances along a string of words, its hidden state looses information about the first terms it interpreted. We call this loss of performance short-term memory.


To overcome short-term memory in RNNs three types of RNN architecture have been put in place: bidirectional, long short-term memory (LSTM), and gated recurrent units (GRU). I explain these three types of RNN below.


BIDIRECTIONAL

Bidirectional RNNs overcome the short-term memory problem by implementing hidden states that flow in opposite directions (Figure 3). When training a bidirectional RNN, gradient loss is minimized by the presence of hidden states that receive information from the two ends of the string of words.

ree

Figure 4. In a bidirectional RNN the hidden state is regulated by term dependencies in both directions.


LONG SHORT-TERM MEMORY (LSTM)

To assure that the unrolling recurrent unit only captures relevant information, LSTM RNNs use three “information gates (Figure 5).

  • Forget gate

  • Input gate

  • Output gate


These gates are regulated by the status of a “memory cell state”, which reflects the information gathered by hidden state, as it moves along the sequential input.


ree

Figure 5. In a LSTM, the unrolling recurrent unit operates as a memory cell, which triggers the three gates, depending on what information is considered relevant.


GATED RECURRENT UNITS (GRU)

GRUs are optimized LSTMs that feature two information gates (reset gate and update gate). The reset gate determines if the information contained in the hidden state from a previous instance is inherited by the next one. The input gate, on the other hand, controls which information from an input instance is ingested by the network (Figure 5).



ree

Figure 5. GRUs are an optimized type of LSTM RRN, with two gates instead of three.



How vanishing gradients happen is a subject that should be discussed in more detail. But doing so would make this blogpost unmanageable.


To learn more about vanishing gradients visit the following links:

BlackBoard AI

DeepBean

Deeplizard


Stay tuned!

GPR


Disclosure: At BioTech Writing and Consulting we believe in the use AI in Data Science, but do not use AI to generate text or images. 



Comments


bottom of page