Build data iterators for deep learning models

While learning about the Keras framework recently, I had to say that Keras is better than PyTorch.

So, let’s take a look at some of the most commonly used data iterators in deep learning.

A data file is a file
def _read_file(filename) :
    """ Reads a file and converts it to a line. ""
    with open(filename, 'r', encoding='utf-8') as f:
        s = f.read().strip().replace('\n'.'. ').replace('\t'.' ').replace('\u3000'.' ')
        return re.sub(R '. + '.'. ', s)

# article iterator
def get_data_iterator(data_path) :
    for category in os.listdir(data_path):
        category_path = os.path.join(data_path, category)
        for file_name in os.listdir(category_path):
            yield _read_file(os.path.join(category_path, file_name)), category

it = get_data_iterator(data_path)
print(next(it))
) Japan, America fight for the Title Fight to the death Meet with life and death. The women's World Cup final and the copa America quarterfinals will no doubt be the focus of attention for football fans and punters around the world on Sunday. Can Japan, the biggest surprise at the Women's World Cup, pull off an Asian miracle? Can the United States, the dominant team in women's soccer, pull off another triple crown? Brazil and Paraguay have narrow rivals. Who will win? Much will be revealed in the wee hours of Monday morning. Japan and America are fighting for the crown. This women's World Cup is about subversion and counter-subversion. Host favourites Germany were beaten by Japan in extra time in the quarter-finals, while fellow favourites Sweden were thrashed 3-1 by Japan in the semi-finals. The United States maintained the dignity of the women's soccer powerhouse, beating Brazil 5-3 in a penalty shootout in the quarterfinals and beating France 3-1 in the semifinals. The U.S. and Japan came into the Tournament in strikingly similar fashion, winning the first two sets of the group, losing the final round, drawing in 90 minutes in the quarterfinals, and beating each other 3-1 in the semifinals. The final, whether Japan or the United States wins, will make new history in the Women's World Cup. When two men meet, they will die. There were plenty of surprises at this Copa America. The narrow path between Brazil and Paraguay seems more legendary. The two teams were drawn in Group B, but both of them had drawn in the first two rounds of the group stage. Brazil came from behind to beat Ecuador 4-2 in the second half to top the group, while Paraguay drew 3-3 with Venezuela to finish third, edging out Third-placed Costa Rica on goal difference for A place in the last eight. Brazil had to draw Paraguay in the group stage in the last minute. Will their luck repeat in the knockout rounds? Paraguay seemed to lack luck in their previous three group games. Could that be compensated for this? . In the other copa America quarterfinal, Chile topped group C with 2 wins and 1 draw. Venezuela, the least favored team in Group B, clinched a place in the group with Brazil and Paraguay in the first two rounds. They are unbeaten in the group with three games, one win and two draws, and scored the same four goals as Chile, but conceded one more than Chile. But since they were able to keep a clean sheet against the mighty Brazil, it was no surprise to see another success. ", "Lottery ticket")"

"" After a bunch of processing... ' ' '

Build a loop's data iterator
def get_handled_data_iterator(data_path) :
    pad_sequences_iter = get_pad_sequences_iterator(data_path, sequences_max_length)
    while True:
        for pad_sequences, label_one_hot in pad_sequences_iter:
            yield pad_sequences, label_one_hot

Build batch iterators
def batch_iter(data_path, batch_size=64, shuffle=True) :
    Generate batch data
    handled_data_iter = get_handled_data_iterator(data_path)
    while True:
        data_list = []
        for _ in range(batch_size):
            data = next(handled_data_iter)
            data_list.append(data)
        if shuffle:
            random.shuffle(data_list)
        
        pad_sequences_list = []
        label_one_hot_list = []
        for data in data_list:
            pad_sequences, label_one_hot = data
            pad_sequences_list.append(pad_sequences.tolist())
            label_one_hot_list.append(label_one_hot.tolist())

        yield np.array(pad_sequences_list), np.array(label_one_hot_list)

it = batch_iter(data_path, batch_size=2)
print(next(it))
''' (array([[ 751, 257, 223, ..., 661, 551, 8], [ 772, 751, 307, ..., 296, 2015, 1169]]), array([[1., 0., 0., 0., 0., .. 0, 0, 0.,. 0, 0. 0. 0., 0. 0.], [. 1, 0, 0), and 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]] "'))
Copy the code

You can use it later

model.fit_generator(batch_iter(data_path, batch_size=64),
                    steps_per_epoch,
                    epochs=100,
                    verbose=1,
                    callbacks=None,
                    validation_data=None,
                    validation_steps=None,
                    class_weight=None)
Copy the code

Let’s train the model

Build data iterators for deep learning models

Related Posts

OpenCV source code compilation, enabling CUDA acceleration

Face detection combat advanced: use OpenCV, Python and Dlib to complete blink detection

Explain the intuition behind Logistic regression