Suck the cat with code! This paper is participating in[Cat Essay Campaign].

Meow language dataset

Link: Data set papers

Researchers from the University of Milan’s computer science department collected 440 sounds from 21 cats of two breeds: the Maine Coon and the European shorthead.

These cats are in three states:

  • Brushing: Cats are brushed at home by their owners
  • Isolation in a strange environment: The cat is moved to a strange environment by its owner. (Try to shorten the distance and use conventional transportation to avoid causing discomfort to the animals. Journeys last less than 30 minutes and cats are allowed no more than 30 minutes with their owners to recover from transport and then isolated in unfamiliar environments where they are left alone for up to 5 minutes)
  • Waiting for food: The owner begins to prepare the cat for feeding in a normal environment familiar to the cat

In addition, each sound file provides information including:

  • Unique ID of the cat;
  • varieties
  • Gender (female, complete; Female, sterilization; Male, complete; Male, sterilized)
  • Unique ID of the cat’s owner
  • Recording session
  • Voice count

Neural network meow recognition

Next, using the cat speech data set, MegEngine was used to construct a neural network for classification, and the state of cats was judged according to their calls.

The data processing

Place the downloaded meow data set in the path./data/. The sound file is first converted to eigenvalues.

The sound file is 1 to 2 seconds long and can be converted to at least 8600 eigenvalues. Round them off and feature the middle 8,000 values.

Take 80% of the files as the training set and the rest as the test set.

The code is as follows:

import os
import wave
import numpy as np
import random

def file_name(file_dir) :
    """ Get all file names in the folder """
    for _, _, files in os.walk(file_dir):
        return files

def Read_WAV(wav_path) :
    Convert sound files to eigenvalues
    wav_file = wave.open(wav_path,'r')
    numframes = wav_file.getnframes()           # Sampling points
    Wav_Data = wav_file.readframes(numframes)
    Wav_Data = np.fromstring(Wav_Data,dtype=np.int16)
    return Wav_Data

# fetch data
data = []
data_path = "./dataset/"
for f in file_name(data_path):
    s = Read_WAV(data_path + f)
    s = (s - min(s)) / (max(s) - min(s))    # Normalized processing
    
    # Hashtag
    if f[0] = ='B':
        label = 0
    elif f[0] = ='F':
        label = 1
    elif f[0] = ='I':
        label = 2
    else:
        print("error " + f)
        continue
    
    # take the middle 8000 values as features
    begin = (s.shape[0] - 8000) / /2
    s = s[begin : begin + 8000]
    
    data.append([s, label])

random.shuffle(data)      # Scramble data

# Divide training set and test set
train_data = data[:352]
test_data = data[:352:]
Copy the code

Create a data reader

You need to customize a DataSet class and rewrite the __len__ and __getitem__ methods

from megengine.data import DataLoader
from megengine.data.dataset import Dataset
from megengine.data.sampler import RandomSampler, SequentialSampler

# customize DataSet
class myDataSet(Dataset) :
    def __init__(self, data) :
        self.data = data 
    
    def __len__(self) :
        return len(self.data)
    
    def __getitem__(self, index) :
        return self.data[index][0], self.data[index][1]

trainData = myDataSet(train_data)
testData = myDataSet(test_data)

batch_size=64

train_sampler = RandomSampler(trainData, batch_size=batch_size)
test_sampler = SequentialSampler(testData, batch_size=batch_size)

train_dataloader = DataLoader(trainData, train_sampler)
test_dataloader  = DataLoader(testData, test_sampler)
Copy the code

Building a neural network

Construct a simple fully connected neural network.

import megengine.module as M
import megengine.functional as F

class myNet(M.Module) :
    def __init__(self) :
        super().__init__()
        self.fc1 = M.Linear(8000.2048)
        self.fc2 = M.Linear(2048.512)
        self.fc3 = M.Linear(512.64)
        self.classifier = M.Linear(64.3)

        self.relu = M.ReLU()

    def forward(self, x) :
        x = F.flatten(x, 1)
        x = self.relu(self.fc1(x))
        x = self.relu(self.fc2(x))
        x = self.relu(self.fc3(x))
        x = self.classifier(x)
        return x

# instantiate the network
net = myNet()
Copy the code

training

Start training network ~

from megengine.optimizer import SGD
from megengine.autodiff import GradManager
import numpy as np
import megengine as mge

optimizer = SGD(net.parameters(), lr=0.01, momentum=0.9, weight_decay=5e-4)
gm = GradManager().attach(net.parameters())

losses = []

net.train()

total_epochs = 300
for epoch in range(total_epochs):
    total_loss = 0
    for step, (batch_data, batch_label) in enumerate(train_dataloader):
        batch_data = mge.tensor(batch_data)
        batch_label = mge.tensor(batch_label).astype(np.int32)

        with gm:
            pred = net(batch_data)
            loss = F.loss.cross_entropy(pred, batch_label)
            gm.backward(loss)
            optimizer.step().clear_grad()

        total_loss += loss.numpy().item()

    if epoch % 50= =0:
        print("epoch: {}, loss {}".format(epoch, total_loss/len(trainData)))
    losses.append(total_loss/len(train_data))
Copy the code

Draw loss curve

It can be seen that after training 300 epochs, Loss has tended to be stable

But there was no significant decrease in loss…

import matplotlib.pyplot as plt

x = [i for i in range(len(losses))]
plt.plot(x, losses)
plt.show()
Copy the code

Evaluation model

The final model recognition accuracy was only 51.42% QAQ

Not nearly as accurate as 95% of the other methods in the paper

net.eval()
correct = 0

for step, (batch_data, batch_label) in enumerate(train_dataloader):
    batch_data = mge.tensor(batch_data)
    pred = net(batch_data)

    pred = pred.numpy()
    correct += np.sum(np.argmax(pred, axis= 1) == batch_label)

print(correct / len(test_data) * 100)
Copy the code

conclusion

The first time I used MegEngine to build the network and train, I felt the training was quick and the code writing process was similar to other frameworks, so it was pretty easy to get started.

It was the first time to use it, so we only tried a simple fully connected neural network, and the accuracy was not very good. Later, we can try other models to improve the accuracy.