Suck the cat with code! This paper is participating in[Cat Essay Campaign].
Meow language dataset
Link: Data set papers
Researchers from the University of Milan’s computer science department collected 440 sounds from 21 cats of two breeds: the Maine Coon and the European shorthead.
These cats are in three states:
- Brushing: Cats are brushed at home by their owners
- Isolation in a strange environment: The cat is moved to a strange environment by its owner. (Try to shorten the distance and use conventional transportation to avoid causing discomfort to the animals. Journeys last less than 30 minutes and cats are allowed no more than 30 minutes with their owners to recover from transport and then isolated in unfamiliar environments where they are left alone for up to 5 minutes)
- Waiting for food: The owner begins to prepare the cat for feeding in a normal environment familiar to the cat
In addition, each sound file provides information including:
- Unique ID of the cat;
- varieties
- Gender (female, complete; Female, sterilization; Male, complete; Male, sterilized)
- Unique ID of the cat’s owner
- Recording session
- Voice count
Neural network meow recognition
Next, using the cat speech data set, MegEngine was used to construct a neural network for classification, and the state of cats was judged according to their calls.
The data processing
Place the downloaded meow data set in the path./data/. The sound file is first converted to eigenvalues.
The sound file is 1 to 2 seconds long and can be converted to at least 8600 eigenvalues. Round them off and feature the middle 8,000 values.
Take 80% of the files as the training set and the rest as the test set.
The code is as follows:
import os
import wave
import numpy as np
import random
def file_name(file_dir) :
""" Get all file names in the folder """
for _, _, files in os.walk(file_dir):
return files
def Read_WAV(wav_path) :
Convert sound files to eigenvalues
wav_file = wave.open(wav_path,'r')
numframes = wav_file.getnframes() # Sampling points
Wav_Data = wav_file.readframes(numframes)
Wav_Data = np.fromstring(Wav_Data,dtype=np.int16)
return Wav_Data
# fetch data
data = []
data_path = "./dataset/"
for f in file_name(data_path):
s = Read_WAV(data_path + f)
s = (s - min(s)) / (max(s) - min(s)) # Normalized processing
# Hashtag
if f[0] = ='B':
label = 0
elif f[0] = ='F':
label = 1
elif f[0] = ='I':
label = 2
else:
print("error " + f)
continue
# take the middle 8000 values as features
begin = (s.shape[0] - 8000) / /2
s = s[begin : begin + 8000]
data.append([s, label])
random.shuffle(data) # Scramble data
# Divide training set and test set
train_data = data[:352]
test_data = data[:352:]
Copy the code
Create a data reader
You need to customize a DataSet class and rewrite the __len__ and __getitem__ methods
from megengine.data import DataLoader
from megengine.data.dataset import Dataset
from megengine.data.sampler import RandomSampler, SequentialSampler
# customize DataSet
class myDataSet(Dataset) :
def __init__(self, data) :
self.data = data
def __len__(self) :
return len(self.data)
def __getitem__(self, index) :
return self.data[index][0], self.data[index][1]
trainData = myDataSet(train_data)
testData = myDataSet(test_data)
batch_size=64
train_sampler = RandomSampler(trainData, batch_size=batch_size)
test_sampler = SequentialSampler(testData, batch_size=batch_size)
train_dataloader = DataLoader(trainData, train_sampler)
test_dataloader = DataLoader(testData, test_sampler)
Copy the code
Building a neural network
Construct a simple fully connected neural network.
import megengine.module as M
import megengine.functional as F
class myNet(M.Module) :
def __init__(self) :
super().__init__()
self.fc1 = M.Linear(8000.2048)
self.fc2 = M.Linear(2048.512)
self.fc3 = M.Linear(512.64)
self.classifier = M.Linear(64.3)
self.relu = M.ReLU()
def forward(self, x) :
x = F.flatten(x, 1)
x = self.relu(self.fc1(x))
x = self.relu(self.fc2(x))
x = self.relu(self.fc3(x))
x = self.classifier(x)
return x
# instantiate the network
net = myNet()
Copy the code
training
Start training network ~
from megengine.optimizer import SGD
from megengine.autodiff import GradManager
import numpy as np
import megengine as mge
optimizer = SGD(net.parameters(), lr=0.01, momentum=0.9, weight_decay=5e-4)
gm = GradManager().attach(net.parameters())
losses = []
net.train()
total_epochs = 300
for epoch in range(total_epochs):
total_loss = 0
for step, (batch_data, batch_label) in enumerate(train_dataloader):
batch_data = mge.tensor(batch_data)
batch_label = mge.tensor(batch_label).astype(np.int32)
with gm:
pred = net(batch_data)
loss = F.loss.cross_entropy(pred, batch_label)
gm.backward(loss)
optimizer.step().clear_grad()
total_loss += loss.numpy().item()
if epoch % 50= =0:
print("epoch: {}, loss {}".format(epoch, total_loss/len(trainData)))
losses.append(total_loss/len(train_data))
Copy the code
Draw loss curve
It can be seen that after training 300 epochs, Loss has tended to be stable
But there was no significant decrease in loss…
import matplotlib.pyplot as plt
x = [i for i in range(len(losses))]
plt.plot(x, losses)
plt.show()
Copy the code
Evaluation model
The final model recognition accuracy was only 51.42% QAQ
Not nearly as accurate as 95% of the other methods in the paper
net.eval()
correct = 0
for step, (batch_data, batch_label) in enumerate(train_dataloader):
batch_data = mge.tensor(batch_data)
pred = net(batch_data)
pred = pred.numpy()
correct += np.sum(np.argmax(pred, axis= 1) == batch_label)
print(correct / len(test_data) * 100)
Copy the code
conclusion
The first time I used MegEngine to build the network and train, I felt the training was quick and the code writing process was similar to other frameworks, so it was pretty easy to get started.
It was the first time to use it, so we only tried a simple fully connected neural network, and the accuracy was not very good. Later, we can try other models to improve the accuracy.