OPEN

RNNs can be adapted to different types of problems by rearranging the way the cells are arranged in the graph. We will see some examples of these configurations and how they are used to solve specific problems.

We will also learn about a major limitation of the SimpleRNN cell, and how two variants of the SimpleRNN cell - long short term memory (LSTM) and gated recurrent unit (GRU) - oversome this limitation. Both LSTM and GRU are drop-in replacements for the SimpleRNN cell, so just replacing the RNN cell with one of these variants can often result in a major performance improvement in your network. While LSTM and GRU are not the only variants, it has been shown empirically that they are the best choices for most sequence problems.

Finally, we will also learn about some tips to improve the performance of our RNNs and when and how to apply them.

In this chapter, we will cover the following topics:

SimpleRNN cell
Basic RNN implementation in Keras in generating text
RNN topologies
LSTM, GRU, and other RNN variants

Just like traditional neural networks, training the RNN also involves backpropagation. The difference in this case is that since the parameters are shared by all time steps, the gradient at each output depends not only on the current time step, but also on the previous ones. This process is called backpropagation through time (BPTT)

Keras provides an LSTM layer that we will use here to construct and train a many-to-one RNN. Our network takes in a sentence (a sequence of words) and outputs a sentiment value (positive or negative). Our training set is a dataset of about 7,000 short sentences from UMICH SI650 sentiment classification competition on Kaggle (https://www.kaggle.com/c/si650winter11#description). Each sentence is labeled 1 or 0 for positive or negative sentiment respectivley, which our network will learn to predict.

snippet.python

from keras.layers.core import Activation, Dense, Dropout, SpatialDropout1D
from keras.layers.embeddings import Embedding
from keras.layers.recurrent import LSTM
from keras.models import Sequential
from keras.preprocessing import sequence
from sklearn.model_selection import train_test_split
import collections
import matplotlib.pyplot as plt
import nltk
import numpy as np
import os

Before we start, we want to do a bit of exploratory analysis on the data. Specifically we need to know how manay unique words there are int the corpus and how many words are there in each sentence:

snippet.python

maxlen = 0
word_freqs = collections.Counter()
num_recs = 0
ftrain = open(os.path.join(DATA_DIR, "umich-sentiment-train.txt"), 'rb')
for line in ftrain:
    label, sentence = line.strip().split("t")
    words = nltk.word_tokenize(sentence.decode("ascii", "ignore").lower())
    if len(words) > maxlen:
        maxlen = len(words)
    for word in words:
        word_freqs[word] += 1
    num_recs += 1
ftrain.close()

At a given time step t, the output of the RNN is dependent on the outputs at all previous time steps. However, it is entirely possible that the output is also dependent on the future outputs as well. This is especially true for applications such as NLP, where the attributes of the word or phrase we are trying to predict may be dependent on the context given by the entire enclosing sentence, not just the words that came before it. Bidirectional RNNs also help a network architecture place equal emphasis on the beginning and end of the sequence, and increase the data available for training.

Bidirectional RNNs are two RNNs stacked on top of each other, reading the input in opposite directions.

RNNs can be stateful, which means that they can maintain state across batches during training.

Plugin Backlinks: 아무 것도 없습니다.