I contacted ESIM, an influential model in the field of Inference. At the same time, collect the wool from Colab.

Introduction to the ESIM model

In this paper, Enhanced LSTM for Natural Language Inference is proposed to calculate the similarity of two sentences. The model consists of three parts:

Input Encoding

First input two sentences, premise and hypothesis of the word vectorandAfter a BiLSTM processing, the new word vector representation is obtainedand.

Local Inference

The paper says that the best way to calculate the correlation between two words is to calculate the inner product of the word vectors, i.e. Thus, a matrix can be obtained by calculating the attention between all pairs of words in two sentences


Here’s an interesting idea: if you want to judge how similar two sentences are, you need to see if they can represent each other. That is, word vectors in premise and hypothesis respectivelyandRepresents the word vector of the other party.

The formula in the paper is:



Because the model doesn’t know which pair to go withandIs close or relative, so we do an enumeration to represent all the cases. The similarity matrix that we calculated was used to do the weighting. The weight at each position is the current weight matrix row (for calculationFor computingIs the Softmax value of the column.

In order to enhance the inference information, the intermediate results obtained before are stacked.



Inference Composition

The word vector used by the inference combination is the result of the previous partandBiLSTM is used to obtain the context information of two sets of word vectors.

After all the information is combined, it is sent to the whole connection layer to complete the final mashup.

Import the libraries you need

import os
import time
import logging
import pickle
from tqdm import tqdm_notebook as tqdm

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

import torchtext
from torchtext import data, datasets
from torchtext.vocab import GloVe

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

import nltk
from nltk import word_tokenize
import spacy
from keras_preprocessing.text import Tokenizer

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)
Copy the code
cuda
Copy the code

Mount the Google Drive

from google.colab import drive
drive.mount('/content/drive')
Copy the code
Go to this URL in a browser: https://accounts.google.com/o/oauth2/xxxxxxxx Enter your authorization code: · · · · · · · · · · Mounted at/content/driveCopy the code
! nvidia-smiCopy the code
Fri Aug 9 04:45:35 2019 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 418.67 Driver Version: 410.79 CUDA Version: 10.0 | | -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + | GPU Name Persistence -m | Bus - Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = | | 0 Tesla K80 Off | 00000000:00:04. 0 Off  | 0 | | N/A 60C P0 62W / 149W | 6368MiB / 11441MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | | = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = | +-----------------------------------------------------------------------------+Copy the code

Prepare the data using Torchtext

Torchtext is used in the following ways: github.com/pytorch/exa…

GloVe in Torchtext can be used directly, but since it does not provide the function of reading source files directly like TorchVision, it can only read cache, so it is best:

  1. Download GloVe locally first
  2. Open the terminal in the download directory and use TorchText to generate the cache in the terminal first
  3. Add cache parameters when using the GloVe in the future, so that TorchText reads the massive GloVe from cache instead of downloading it locally

But if the wool is Colab, that is free (~ ~ ▽ ~) ~

Torchtext can also load SNLI datasets directly, but the data set load directory structure is as follows:

  • root
    • Snli_1. 0
      • Snli_1. 0 _train. Jsonl
      • Snli_1. 0 _dev. Jsonl
      • Snli_1. 0 _test. Jsonl
TEXT = data.Field(batch_first=True, lower=True, tokenize="spacy")
LABEL = data.Field(sequential=False)

# Separate training, validation, and test sets
tic = time.time()
train, dev, test = datasets.SNLI.splits(TEXT, LABEL)
print(f"Cost: {(time.time() - tic) / 60:2.f} min")

# Load the GloVe pre-training vector
tic = time.time()
glove_vectors = GloVe(name='6B', dim=100)
print(f"Creat GloVe done. Cost: {(time.time() - tic) / 60:2.f} min")

Create a vocabulary
tic = time.time()
TEXT.build_vocab(train, dev, test, vectors=glove_vectors)
LABEL.build_vocab(train)
print(f"Build vocab done. Cost: {(time.time() - tic) / 60:2.f} min")

print(f"TEXT.vocab.vectors.size(): {TEXT.vocab.vectors.size()}")
num_words = int(TEXT.vocab.vectors.size()[0])

Save dictionaries for participles and word vectors
if os.path.exists("/content/drive/My Drive/Colab Notebooks"):
    glove_stoi_path = "/content/drive/My Drive/Colab Notebooks/vocab_label_stoi.pkl"
else:
    glove_stoi_path = "./vocab_label_stoi.pkl"
pickle.dump([TEXT.vocab.stoi, LABEL.vocab.stoi], open(glove_stoi_path, "wb"))

batch_sz = 128

train_iter, dev_iter, test_iter = data.BucketIterator.splits(
    datasets=(train, dev, test),
    batch_sizes=(batch_sz, batch_sz, batch_sz),
    shuffle=True,
    device=device
)
Copy the code
Cost: 0.00 min Build vocab done. Cost: 0.12 min TEXT.vocab.vectors. Size (): torch.Size([34193, 100])Copy the code

General Parameter Configuration

It’s best to have a global recipe for alchemy, so it’s easy to adjust.

class Config:

    def __init__(self):
        # For data
        self.batch_first = True
        try:
            self.batch_size = batch_sz
        except NameError:
            self.batch_size = 512

        # For Embedding
        self.n_embed = len(TEXT.vocab)
        self.d_embed = TEXT.vocab.vectors.size()[- 1]

        # For Linear
        self.linear_size = self.d_embed

        # For LSTM
        self.hidden_size = 300

        # For output
        self.d_out = len(LABEL.vocab)  # indicates that the output is in several dimensions
        self.dropout = 0.5

        # For training
        self.save_path = r"/content/drive/My Drive/Colab Notebooks" if os.path.exists(
            r"/content/drive/My Drive/Colab Notebooks") else ". /"
        self.snapshot = os.path.join(self.save_path, "ESIM.pt")

        self.device = device
        self.epoch = 64
        self.scheduler_step = 3
        self.lr = 0.0004
        self.early_stop_ratio = 0.985  End the training session early


args = Config()
Copy the code

ESIM model code implementation

Code reference: github.com/pengshuang/…

The use of the nn. BatchNorm1d

The regularization of data can eliminate the problem of different data distribution in different dimensions. The geometric understanding is to regularize an “ellipsoid” in n-dimensional space into a “sphere”, which can simplify the model training difficulty and improve the training speed.

However, if all of the input data is regularized, it will take a lot of time. Batch Normalization is a compromise approach that regularizes only the BATCH_size of the input data. In terms of probability, the distribution of all samples is estimated according to the distribution of batch_size samples.

PyTorch’s nn.BatchNorm1d is a batch regularization of one-dimensional data, so there are two limitations:

  1. Training (i.e. opening upmodel.train()), provide a batch size of at least 2; Test, use of (model.eval()) there is no batch size limitation
  2. The next-to-last dimension by default is “Batch”

The shape of each batch of data obtained in my previous data processing was Batch * seq_len * embed_DIM after word vector mapping, so there were three dimensions. And after torchtext data. BucketIterator. Splits processing, each batch of seq_len is dynamic, and the length of the longest sentence in the current batch is the same). If you input BatchNorm1d directly without processing, you will usually see the following error:

RuntimeError: running_mean should contain xxx elements not yyy

Whether BatchNorm1d layer needs to be added after Embedding

Reference code implementation is very beautiful, you can see the author’s code expertise. However, it seems that the author does not use the pre-processed word vector as the Embedding vector, while I use the pre-trained word vector GloVe and will not train GloVe, so is it necessary to add nn.BatchNorm1d?

Since blindly increasing the number of layers of the network does not have a good effect, the best way is to first check whether the GloVe word vector is “regularized” in each dimension.

glove = TEXT.vocab.vectors

means, stds = glove.mean(dim=0).numpy(), glove.std(dim=0).numpy()
dims = [i for i in range(glove.shape[1])]

plt.scatter(dims, means)
plt.scatter(dims, stds)
plt.legend(["mean"."std"])
plt.xlabel("Dims")
plt.ylabel("Features")
plt.show()

print(f"mean(means)={means.mean():4.f}, std(means)={means.std():4.f}")
print(f"mean(stds)={stds.mean():4.f}, std(stds)={stds.std():4.f}")
Copy the code

Scheme (means) = 0.0032, STD (means) = 0.0809 scheme (STDS) = 0.4361, STD (STDS) = 0.0541Copy the code

It can be seen from the figure that the distribution of each dimension is stable, so nn.BatchNorm1d is not planned to be used after Embedding layer.

The use of the nn. LSTM

nn.LSTM(
   input_size, hidden_size, num_layers, bias=True, batch_first=False, dropout=0, bidirectional=False)
)
Copy the code

The default batch_first parameter of nn.LSTM is False, which is very uncomfortable for me, so I’d better set it to True.

The following is the LSTM input/output format. Inputs may not contain H_0 and c_0, for which LSTM automatically generates h_0 and c_0 in all zeros.

  • Inputs: input, (h_0, c_0)
  • Outputs: output, (h_n, c_n)
  • input: (seq_len, batch, input_size)
  • output: (seq_len, batch, num_directions * hidden_size)
  • h / c: (num_layers * num_directions, batch, hidden_size)
class ESIM(nn.Module):

    def __init__(self, args):
        super(ESIM, self).__init__()
        self.args = args

        self.embedding = nn.Embedding(
            args.n_embed, args.d_embed)  Initialization of parameters can be put later
        # self.bn_embed = nn.BatchNorm1d(args.d_embed)

        self.lstm1 = nn.LSTM(args.d_embed, args.hidden_size,
                             num_layers=1, batch_first=True, bidirectional=True)
        self.lstm2 = nn.LSTM(args.hidden_size * 8, args.hidden_size,
                             num_layers=1, batch_first=True, bidirectional=True)

        self.fc = nn.Sequential(
            nn.BatchNorm1d(args.hidden_size * 8),
            nn.Linear(args.hidden_size * 8, args.linear_size),
            nn.ELU(inplace=True),
            nn.BatchNorm1d(args.linear_size),
            nn.Dropout(args.dropout),
            nn.Linear(args.linear_size, args.linear_size),
            nn.ELU(inplace=True),
            nn.BatchNorm1d(args.linear_size),
            nn.Dropout(args.dropout),
            nn.Linear(args.linear_size, args.d_out),
            nn.Softmax(dim=- 1))def submul(self, x1, x2):
        mul = x1 * x2
        sub = x1 - x2
        return torch.cat([sub, mul], - 1)

    def apply_multiple(self, x):
        # input: batch_size * seq_len * (2 * hidden_size)
        p1 = F.avg_pool1d(x.transpose(1.2), x.size(1)).squeeze(- 1)
        p2 = F.max_pool1d(x.transpose(1.2), x.size(1)).squeeze(- 1)
        # output: batch_size * (4 * hidden_size)
        return torch.cat([p1, p2], 1)

    def soft_attention_align(self, x1, x2, mask1, mask2):
        ''' x1: batch_size * seq_len * dim x2: batch_size * seq_len * dim '''
        # attention: batch_size * seq_len * seq_len
        attention = torch.matmul(x1, x2.transpose(1.2))
        # mask is used to prevent outliers when calculating Softmax
        mask1 = mask1.float().masked_fill_(mask1, float('-inf'))
        mask2 = mask2.float().masked_fill_(mask2, float('-inf'))

        # weight: batch_size * seq_len * seq_len
        weight1 = F.softmax(attention + mask2.unsqueeze(1), dim=- 1)
        x1_align = torch.matmul(weight1, x2)
        weight2 = F.softmax(attention.transpose(
            1.2) + mask1.unsqueeze(1), dim=- 1)
        x2_align = torch.matmul(weight2, x1)

        # x_align: batch_size * seq_len * hidden_size
        return x1_align, x2_align

    def forward(self, sent1, sent2):
        """ sent1: batch * la sent2: batch * lb """
        mask1, mask2 = sent1.eq(0), sent2.eq(0)
        x1, x2 = self.embedding(sent1), self.embedding(sent2)
        # x1, x2 = self.bn_embed(x1), self.bn_embed(x2)

        # batch * [la | lb] * dim
        o1, _ = self.lstm1(x1)
        o2, _ = self.lstm1(x2)

        # Local Inference
        # batch * [la | lb] * hidden_size
        q1_align, q2_align = self.soft_attention_align(o1, o2, mask1, mask2)

        # Inference Composition
        # batch_size * seq_len * (8 * hidden_size)
        q1_combined = torch.cat([o1, q1_align, self.submul(o1, q1_align)], - 1)
        q2_combined = torch.cat([o2, q2_align, self.submul(o2, q2_align)], - 1)

        # batch_size * seq_len * (2 * hidden_size)
        q1_compose, _ = self.lstm2(q1_combined)
        q2_compose, _ = self.lstm2(q2_combined)

        # Aggregate
        q1_rep = self.apply_multiple(q1_compose)
        q2_rep = self.apply_multiple(q2_compose)

        # Classifier
        similarity = self.fc(torch.cat([q1_rep, q2_rep], - 1))
        return similarity


def take_snapshot(model, path):
    """ Save model training results to Drive to prevent loss after Colab reset. ""
    torch.save(model.state_dict(), path)
    print(f"Snapshot has been saved to {path}")


def load_snapshot(model, path):
    model.load_state_dict(torch.load(path))
    print(f"Load snapshot from {path} done.")


model = ESIM(args)
# if os.path.exists(args.snapshot):
# load_snapshot(model, args.snapshot)

The Embedding vector is not trained
model.embedding.weight.data.copy_(TEXT.vocab.vectors)
model.embedding.weight.requires_grad = False

model.to(args.device)
Copy the code
ESIM( (embedding): Embedding(34193, 100) (lstm1): LSTM(100, 300, batch_first=True, bidirectional=True) (lstm2): LSTM(2400, 300, batch_first=True, bidirectional=True) (fc): Sequential( (0): BatchNorm1d(2400, EPS = 1E-05, momentum=0.1, affine=True, track_running_stats=True) (1): Linear(IN_features =2400, out_features=100, bias=True) (2): ELU(alpha=1.0, inplace) (3): BatchNorm1d(100, EPS = 1E-05, Momentum =0.1, AFFine =True, track_RUNNING_STATS =True) (4): Dropout(P =0.5) (5): Linear(IN_features =100, out_features=100, bias=True) (6): ELU(alpha=1.0, inplace) (7): BatchNorm1d(100, EPS = 1E-05, Momentum =0.1, AFFine =True, track_RUNNING_STATS =True) (8): Dropout(P =0.5) (9): Linear(in_features=100, out_features=4, bias=True) (10): Softmax() ) )Copy the code

The stage of training

Here are a few details:

Batch. The shape of the label

batch.labelIs a one-dimensional vector whose shape is batch; whileY_predIs the shape ofIs a two-dimensional vector, using.topk(1).indicesAfter extracting the maximum value, it’s still a two-dimensional vector.

So if you don’t expandbatch.labelDimension, PyTorch will automatically broadcastbatch.labelThe final result is no longer, butThen the final calculation accuracy will be ridiculously high. This is what the following code means:

(Y_pred.topk(1).indices == batch.label.unsqueeze(1))
Copy the code

Tensor and scalar division

In Python3.6, the result of the division symbol/is floating point by default, but PyTorch is not, which is another detail that is easy to overlook.

(Y_pred.topk(1).indices == batch.label.unsqueeze(1))
Copy the code

The code above results in a bool (which is actually a torch. Uint8). The result type after calling.sum() is torch.LongTensor. But integer division in PyTorch does not yield floating point numbers.

# As if the following code would get 0
In [2]: torch.LongTensor([1]) / torch.LongTensor([5])
Out[2]: tensor([0])
Copy the code

Acc accumulates the correct number of samples calculated in each batch. Acc now points to Torch.LongTensor because of automatic type conversion, so you have to use.item() to extract the integer value when calculating accuracy. If you ignore this detail, you end up with an accuracy of 0.

def training(model, data_iter, loss_fn, optimizer):
    """ Training part """
    model.train()
    data_iter.init_epoch()
    acc, cnt, avg_loss = 0.0.0.0

    for batch in data_iter:
        Y_pred = model(batch.premise, batch.hypothesis)
        loss = loss_fn(Y_pred, batch.label)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        avg_loss += loss.item() / len(data_iter)
        # unsqueeze because label is a one-dimensional vector, same thing
        acc += (Y_pred.topk(1).indices == batch.label.unsqueeze(1)).sum()
        cnt += len(batch.premise)

    return avg_loss, (acc.item() / cnt)  # If item is not extracted, accuracy will be 0


def validating(model, data_iter, loss_fn):
    """ Verification part """
    model.eval()
    data_iter.init_epoch()
    acc, cnt, avg_loss = 0.0.0.0

    with torch.set_grad_enabled(False) :for batch in data_iter:
            Y_pred = model(batch.premise, batch.hypothesis)

            avg_loss += loss_fn(Y_pred, batch.label).item() / len(data_iter)
            acc += (Y_pred.topk(1).indices == batch.label.unsqueeze(1)).sum()
            cnt += len(batch.premise)

    return avg_loss, (acc.item() / cnt)


def train(model, train_data, val_data):
    """ "Training process """
    optimizer = optim.Adam(model.parameters(), lr=args.lr)
    loss_fn = nn.CrossEntropyLoss()
    scheduler = optim.lr_scheduler.ReduceLROnPlateau(
        optimizer, mode='min', factor=0.5, patience=args.scheduler_step, verbose=True)

    train_losses, val_losses, train_accs, val_accs = [], [], [], []

    # Before train
    tic = time.time()
    train_loss, train_acc = validating(model, train_data, loss_fn)
    val_loss, val_acc = validating(model, val_data, loss_fn)
    train_losses.append(train_loss)
    val_losses.append(val_loss)
    train_accs.append(train_acc)
    val_accs.append(val_acc)
    min_val_loss = val_loss
    print(f"Epoch: 0/{args.epoch}\t"
          f"Train loss: {train_loss:4.f}\tacc: {train_acc:4.f}\t"
          f"Val loss: {val_loss:4.f}\tacc: {val_acc:4.f}\t"
          f"Cost time: {(time.time()-tic):2.f}s")

    try:
        for epoch in range(args.epoch):
            tic = time.time()
            train_loss, train_acc = training(
                model, train_data, loss_fn, optimizer)
            val_loss, val_acc = validating(model, val_data, loss_fn)
            train_losses.append(train_loss)
            val_losses.append(val_loss)
            train_accs.append(train_acc)
            val_accs.append(val_acc)
            scheduler.step(val_loss)

            print(f"Epoch: {epoch + 1}/{args.epoch}\t"
                  f"Train loss: {train_loss:4.f}\tacc: {train_acc:4.f}\t"
                  f"Val loss: {val_loss:4.f}\tacc: {val_acc:4.f}\t"
                  f"Cost time: {(time.time()-tic):2.f}s")

            if val_loss < min_val_loss:  # Instant save
                min_val_loss = val_loss
                take_snapshot(model, args.snapshot)

            # Early - stop:
            # if len(val_losses) >= 3 and (val_loss - min_val_loss) / min_val_loss > args.early_stop_ratio:
            # print(f"Early stop with best loss: {min_val_loss:.5f}")
            # break
            # args.early_stop_ratio *= args.early_stop_ratio

    except KeyboardInterrupt:
        print("Interrupted by user")

    return train_losses, val_losses, train_accs, val_accs


train_losses, val_losses, train_accs, val_accs = train(
    model, train_iter, dev_iter)
Copy the code
Epoch: 0/64 Train Loss: 1.3871 ACC: 0.3335 Val Loss: 1.3871 ACC: 0.3331 Cost time: 364.32s Epoch: 1/64 Train Loss: Val Loss: 0.9643 ACC: 0.7760 Cost time: Notebooks/ Notebooks: 2/64 Train Loss: 0.9476 ACC: 0.7925 Val Loss: 0.9785 ACC: 0.7605 Cost time: 1003.32s Epoch: 3/64 Train Loss: 0.9305 ACC: 0.8100 Val Loss: Notebooks: 0.9204 ACC: 0.8217 Cost time: 999.49 S Snapshot has been saved to/Content/Drive/ My Drive/ Notebooks: 4/64 Train Loss: 0.9183 ACC: 0.8227 Val Loss: 0.9154 ACC: 0.8260 Cost time: Notebooks/ Notebooks Notebooks: 5/64 Train Loss: 0.9084 ACC: 0.8329 Val Loss: 0.9251 ACC: 0.8156 Cost time: 996.99s.... Epoch: 21/64 Train Loss: 0.8236 ACC: 0.9198 Val Loss: 0.8912 ACC: 0.8514 Cost time: 992.48s Epoch: 22/64 Train Loss: 0.8210 ACC: 0.9224 Val Loss: 0.8913 ACC: 0.8514 Cost time: 996.35s Epoch 22: Reducing Learning Rate of Group 0 to 5.0000E-05. Epoch: 23/64 Train Loss: 0.8195 ACC: 0.9239 Val Loss: 0.8940 ACC: 0.8485 Cost time: 1000.48s Epoch: 24/64 Train Loss: 0.8169 ACC: 0.9266 Val Loss: 0.8937 ACC: 0.8490 Cost time: 1006.78 s Interrupted by the userCopy the code

Draw the loss-accuracy curve

iters = [i + 1 for i in range(len(train_losses))]

# Prevent KeyboardInterrupt from causing unequal length of loss between sets
min_len = min(len(train_losses), len(val_losses))

# Draw a two-ordinate graph
fig, ax1 = plt.subplots()
ax1.plot(iters, train_losses[: min_len], The '-', label='train loss')
ax1.plot(iters, val_losses[: min_len], '-', label='val loss')
ax1.set_xlabel("Epoch")
ax1.set_ylabel("Loss")

# create subaxes
ax2 = ax1.twinx()
ax2.plot(iters, train_accs[: min_len], ':', label='train acc')
ax2.plot(iters, val_accs[: min_len], The '-', label='val acc')
ax2.set_ylabel("Accuracy")

# Add a legend to the two-ordinate graph
handles1, labels1 = ax1.get_legend_handles_labels()
handles2, labels2 = ax2.get_legend_handles_labels()
plt.legend(handles1 + handles2, labels1 + labels2, loc='center right')
plt.show()
Copy the code

To predict

In addition to training results, models need to be practical.

nlp = spacy.load("en")

# reloading the best model parameters before training results
load_snapshot(model, args.snapshot)
Small data or CPU faster
model.to(torch.device("cpu"))

with open(r"/content/drive/My Drive/Colab Notebooks/vocab_label_stoi.pkl"."rb") as f:
    vocab_stoi, label_stoi = pickle.load(f)
Copy the code
Load snapshot from /content/drive/My Drive/Colab Notebooks/ESIM.pt done.
Copy the code
def sentence2tensor(stoi, sent1: str, sent2: str):
    """ Convert two sentences into tensors """
    sent1 = [str(token) for token in nlp(sent1.lower())]
    sent2 = [str(token) for token in nlp(sent2.lower())]

    tokens1, tokens2 = [], []

    for token in sent1:
        tokens1.append(stoi[token])

    for token in sent2:
        tokens2.append(stoi[token])

    delt_len = len(tokens1) - len(tokens2)

    if delt_len > 0:
        tokens2.extend([1] * delt_len)
    else:
        tokens1.extend([1] * (-delt_len))

    tensor1 = torch.LongTensor(tokens1).unsqueeze(0)
    tensor2 = torch.LongTensor(tokens2).unsqueeze(0)

    return tensor1, tensor2


def use(model, premise: str, hypothsis: str):
    """ Testing with a model """
    label_itos = {0: '<unk>'.1: 'entailment'.2: 'contradiction'.3: 'neutral'}

    model.eval()
    with torch.set_grad_enabled(False) : tensor1, tensor2 = sentence2tensor(vocab_stoi, premise, hypothsis) predict = model(tensor1, tensor2) top1 = predict.topk(1).indices.item()

    print(f"The answer is '{label_itos[top1]}'")

    prob = predict.cpu().squeeze().numpy()
    plt.bar(["<unk>"."entailment"."contradiction"."neutral"], prob)
    plt.ylabel("probability")
    plt.show()
Copy the code

After entering two sentences, print out the most likely guesses and use a histogram to show the probabilities of each

# contains
use(model,
    "A statue at a museum that no seems to be looking at."."There is a statue that not many people seem to be interested in.")

# opposite
use(model,
    "A land rover is being driven across a river."."A sedan is stuck in the middle of a river.")

# neutral
use(model,
    "A woman with a green headscarf, blue shirt and a very big grin."."The woman is young.")
Copy the code
The answer is 'entailment'
Copy the code

The answer is 'contradiction'
Copy the code

The answer is 'neutral'
Copy the code