Introduction to Encoder-Decoder Models — ELI5 Way

Hi All, welcome to my blog “Introduction to Encoder-Decoder Models — ELI5 Way”. My name is Niranjan Kumar and I’m a Senior Consultant Data Science at Allstate India.

In this article, we will discuss the basic concepts of Encoder-Decoder models and it’s applications in some of the tasks like language modeling, image captioning, text entailment, and machine transliteration.

Citation Note: The content and the structure of this article is based on my understanding of the deep learning lectures from One-Fourth Labs — PadhAI.

Before we discuss the concepts of Encoder-Decoder models, we will start by revisiting the task of language modeling.

Language Modeling — Recap

Language Modeling is the task of predicting what word/letter comes next. Unlike the FNN and CNN, in sequence modeling, the current output is dependent on the previous input and the length of the input is not fixed.

Given a ‘t-1’ words, we are interested in predicting the iᵗʰ word based on the previous words or information. Let’s see how we solve the language modeling using Recurrent Neural Networks.

Language Modeling — RNN

Let’s look at the problem of auto-complete in WhatsApp. As soon as you opened the keyboard to type, you noticed the letter as the suggestion for the first character of the message. In this problem, whenever we type a character the network tries to predict the next possible character based on the previously typed character.

The input to the function is denoted in orange color and represented as an xₜ. The weights associated with the input is denoted using a vector and the hidden representation (sof the word is computed as a function of the output of the previous time step and current input along with bias. The output of the hidden represented (sis given by the following equation,

Once we compute the hidden representation of the input, the final output (yₜ) from the network is a softmax function (represented as O) of hidden representation and weights associated with it along with the bias.

Encoder-Decoder Model — Language Modeling

In this section, we will see how we were using the Encoder-Decoder model in the problem of language modeling without even knowing.

In language modeling, we are interested in finding the probability distribution of the iᵗʰ word based on the previous information.

Encoder Model

  • The RNN the output of the first time step is fed as input along with the original input to the next time step.
  • At each time step, the hidden representation (sₜ₋₁of the word is computed as a function of the output of the previous time step and current input along with bias.
  • The final hidden state vector(sₜ) contains all the encoded information from the previous hidden representations and previous inputs.
  • Here, Recurrent Neural Network is acting as an Encoder.

Decoder Model

  • Once we pass the encoded vector to the output layer, which decodes into the probability distribution of the next possible word.
  • The output layer is a softmax function and it takes hidden state representation and weights associated with it along with the bias as the inputs.
  • Since the output layer contains the linear transformation and bias operation, it can be referred to as the simple feed-forward neural network.
  • Feed-Forward Neural Network is acting as a Decoder.

Encoder-Decoder Applications

In this section, we will discuss some applications of Encoder-Decoder Model

Image Captioning

Image captioning is a task of generating caption automatically based on what was shown on the image.

  • In image captioning, we will pass the image through the Convolution Neural Network and extracts the features from our image in the form of a feature representation vector.
  • The feature representation vector after pre-processing is passed through the RNN or LSTM to generate the caption.
  • CNN is used to encode the image
  • RNN is then used to decode a sentence from the embedding

Text Entailment

Text entailment is a task of determining whether a given piece of text T entails another text called the “hypothesis”.

For example,

Input: It is raining outside.

Output: The ground is wet.

In this problem, both the input and output are a sequence of characters. So both the encoder and decoder networks are RNN or LSTM.

Machine Transliteration

Transliteration — “Writing the same word in another language or script”. Translation tells you the meaning of words in another language but transliteration doesn’t tell you the meaning of the words, but it helps you pronounce them.

Input: INDIA

Output: इंडिया


  • Each character of the input is fed into RNN as the input by converting the character into a one-hot vector representation.
  • At the last time step of the encoder, the final hidden representation of all the previous inputs will be passed as the input to the decoder.


  • The decoder model which can be RNN or LSTM network will decode the state representation vector and gives the probability distribution of each character.
  • The softmax function is used to generate the probability distribution vector for each character. Which in turn helps to generate a complete transliterated word.

Where to go from here?

If you want to learn more about Neural Networks using Keras & Tensorflow 2.0 (Python or R). Check out the Artificial Neural Networks by Abhishek and Pukhraj from Starttechacademy. They explain the fundamentals of deep learning in a simplistic manner.

Recommended Reading — The ELI5 Project MachineLearningLong Short Term Memory and Gated Recurrent Unit’s Explained — ELI5 WayIn this post, we will learn the intuition behind the working of LSTM and GRU.towardsdatascience.comRecurrent Neural Networks (RNN) Explained — the ELI5 waySequence Labeling and Sequence Classification using RNNtowardsdatascience.comUnderstanding Convolution Neural Networks — the ELI5 wayLearn about Convolution Operation and CNN’


In this post, we discussed how we are using a basic encoder-decoder model in the task of language modeling by using RNN and FNN. After that, we have discussed the applications of encoder-decoder models in solving some of the complex tasks like machine transliteration, text entailment.

In my next post, we will discuss the Attention Mechanism. So make sure you follow me on Medium to get notified as soon as it drops.

Until then, Peace 🙂


Author Bio

Niranjan Kumar is Senior Consultant Data Science at Allstate India. He is passionate about Deep Learning and Artificial Intelligence. Apart from writing on Medium, he also writes for as a freelance data science writer. Check out his articles here.

You can connect with him on LinkedIn or follow him on Twitter for updates about upcoming articles on deep learning and machine learning.

Disclaimer — There might be some affiliate links in this post to relevant resources. You can purchase the bundle at the lowest price possible. I will receive a small commission if you purchase the course.


  1. Deep Learning — PadhAI
  2. Understanding Translation vs. Transliteration
  3. Deep Learning(CS7015)

Course Categories

Quisque velit nisi, pretium ut lacinia in, elementum id enim. 

Connect with us