Author: Shifang
Bert’s name is so well-known that it has been widely used both in competitions and in practice. Remember that the framework used by Ten party to run Bert model for the first time was paddlePaddle. At that time, it was still painful to run Bert with my own training set. Not only did I have to read a lot of configuration files and pre-processing codes, but I didn’t even know what happened when I reported an error. Now TF has been updated to 2.4, and the emergence of Tensorflow-Hub has lowered the threshold of using the pre-training model. Next, let’s take a look at how to quickly build the Bert twin Tower recall model in ten minutes.
tensorflow hub
Go to the TensorFlow website, go to the TensorFlow-Hub, and you can see all kinds of pre-trained models. Find a pre-trained model (as shown below) and download it. As described, it is a 12-layer, 768-dimensional, 12-head model. \
Looking down, we see the accompanying preprocessing tool :\
Download it, too, and then we can build Bert towers.
Bert towers
import os
import shutil
import pickle
import tensorflow as tf
import tensorflow_hub as hub
import tensorflow_text as text
from official.nlp import optimization
from tensorflow.keras import *
from tqdm import tqdm
import numpy as np
import pandas as pd
import json
import re
importRandom # here read your own text dataset withopen('. /data/train_data.pickle', 'rb') as f:
train_data =Pickles. Load (f) # generater def train_generator(): np.random. Shuffle (train_data)for i in range(len(train_data)):
yield train_data[i][0], train_data[i][1] # dataset ds_tr= tf.data.Dataset.from_generator(train_generator, output_types=Def get_model(dim_size, model_name): (tf.string, tf.string) Download the preprocessor path preprocessor= hub.load('. /bert_en_uncased_preprocess/3') # left tower text text_source= tf.keras.layers.Input(shape=(), dtype=Tf.string) # The text for the right tower is text_target= tf.keras.layers.Input(shape=(), dtype=tf.string)
tokenize = hub.KerasLayer(preprocessor.tokenize)
tokenized_inputs_source = [tokenize(text_source)]
tokenized_inputs_target = [tokenize(text_target)]
seq_length = 512This specifies the maximum length of your sequence text bert_pack_inputs= hub.KerasLayer(
preprocessor.bert_pack_inputs,
arguments=dict(seq_length=seq_length))
encoder_inputs_source = bert_pack_inputs(tokenized_inputs_source)
encoder_inputs_target =Inputs: bert_pack_inputs(tokenized_inputs_target= hub.KerasLayer(model_name)
bert_encoder_source, bert_encoder_target =Bert_model (encoder_inputs_source), bert_model(encoder_inputs_target) #in-Batch Loss # Can also directly dot bert_encoder_source['pooled_output'], bert_encoder_target['pooled_output'] matrix_logit= tf.linalg.matmul(bert_encoder_source['pooled_output'], bert_encoder_target['pooled_output'], transpose_a=False, transpose_b=True)
matrix_logit = matrix_logit / tf.sqrt(dim_size)
model = models.Model(inputs = [text_source, text_target], outputs = [bert_encoder_source['pooled_output'], bert_encoder_target['pooled_output'], matrix_logit])
return model
bert_double_tower = get_model(128.0, '. /small_bert_bert_en_uncased_L-2_H-128_A-2 _1/3')
bert_double_tower.summary()
Copy the code
We see that the Bert twin towers model has been constructed :\
Then define Loss and you can train! \
optimizer = optimizers.Adam(learning_rate=5e-5)
loss_func_softmax = tf.keras.losses.CategoricalCrossentropy(from_logits=True)
train_loss = metrics.Mean(name='train_loss')
train_acc = metrics.CategoricalAccuracy(name='train_accuracy')
def train_step(model, features):
with tf.GradientTape() as tape:
p_source, p_target, pred = model(features)
label = tf.eye(tf.shape(pred)[0])
loss = loss_func_softmax(label, pred)
gradients = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
train_loss.update_state(loss)
train_acc.update_state(label, pred)
def train_model(model, bz, epochs):
for epoch in tf.range(epochs):
steps = 0
for feature in ds_tr.prefetch(buffer_size = tf.data.experimental.AUTOTUNE).batch(bz):
logs_s = 'At Epoch={},STEP={}'
tf.print(tf.strings.format(logs_s,(epoch, steps)))
train_step(model, feature)
steps += 1
train_loss.reset_states()
train_acc.reset_states()
Copy the code
Machine learning: Deep learning, deep learning, deep learning, deep learning, deep learning, deep learning, deep learning, deep learning, deep learning, deep learning, deep learning, deep learning, deep learning, deep learning, deep learning, deep learning, deep learning, deep learning, deep learning, deep learning, deep learning, deep learning, deep learning, deep learning, deep learning, deep learning, deep learning851320808To join the wechat group, please scan the code:Copy the code