Suck the cat with code! This article is participating in [Cat essay Activity]
background
I’ve seen a lot of videos of shovelers playing piano from their cat owners online. Although we have a noble cat owner, but suffering from no piano, can not play the cat owner’s artistic talent.
As a qualified shovel-poop officer, grievance who can not grievance their cat master, there is no condition to try to create conditions. So we decided to build an “electric” piano for our cat owner so he could pursue his musical dreams.
Although say to want to do “electronic” piano, but suffer from limited time (want to make tin money for cat master son), have no way to buy mechanical parts to do keyboard, so plan to do a “beggar version”, come to cat master whole a “virtual” piano key.
Besides, as we all know, there are only two biggest obstacles to learning the piano — the left hand and the right hand. So, we’re going to do something different — play the piano with your head.
So we decided to make a “cat head piano” for the cat owner, using the camera to capture the position of the cat head, different positions corresponding to different keys, different sounds. This way to solve the problem of time and money cost, and two to avoid the cat master won’t learn to lose temper with us, but also let the cat master enjoy the joy of music, three birds with one stone, not beautiful.
(Cat owner: Is that how they fool employees?)
Overall design scheme
In order to achieve cat head piano, three problems should be solved:
- How do you video the cat owner
- How do I detect the position of the cat’s head
- How to play and how to display
The first is video shooting. I originally bought a cheap USB camera on Nov 11, but it hasn’t arrived yet because of the slow delivery. So you have to record video on your phone to simulate the camera first.
Cat head detection is the most difficult part, and I’m going to use deep learning to do that. First you need to collect some pictures of the cat’s owner and use the annotation tool to label the cat’s head. Then use megvii’s open source YOLOV5 model. Finally, sesensetime’s recently open source OpenPPL is used for model inference deployment.
As for how to play, I’m going to use mingus. Video processing and graphical interface part with OpenCV to do.
Image acquisition and annotation
Using a phone to capture images is fine, but the challenge is getting the cat’s owner to cooperate.
Here, I offered yogurt, walked the cat around the house several times, and finally picked up some suitable videos
(Cat owner: I’m too cute to give you a drink?)
When I picked the video, I didn’t have OpenCV installed on my Windows PC, so I found a website to turn the video into pictures: www.img2go.com/convert-to-… Of course, OpenCV’s VideoCapture can also do this
After converting the image, I used a tagging tool found on Github: github.com/tzutalin/la…
- As we all know, cats are fluid, so labeling the cat body is not conducive to detection, so we chose to label the cat head
- Because the background is simple and the task is single, cat head detection can be completed by labeling a few pictures
- Pay attention to the annotation format. Make sure that the annotation mode can be recognized by YOLOV5 model
We finally annotated 430 pictures, and the number of training set, verification set and test set were 400, 20 and 10 respectively
YOLOV5 model training
Model training
Clone YOLOV5 on github github.com/ultralytics…
The training process does not make many changes to the code, and directly uses the training script train.py in the repo
python .\train.py --data .\data\cat.yaml --weights .\weights\yolov5s.pt --img 160 --epochs 3000
Copy the code
In order to accelerate the training speed, the image size can be appropriately reduced. The image size used in this experiment is (160, 160).
Model test
After the model training, use the unmarked pictures to verify whether the model is correct. The test scripts also refer to YOLOv5 decect.py
The early stage of the modeltrainingrendering
The early stage of the modelvalidationrendering
The model should be able to correctly detect the owner of the cat
Model is derived
To better fit the deployment, we converted the PyTorch model to the ONNX model. The YOLOv5 transformation script export.py can also be used directly for model transformation
Reasoning deployment
Inference frame installation
The deployment framework selected OpenPPL of Sensetime, which can run efficiently on x86/CUDA architecture, and has Python interface, which can easily build deployment projects: github.com/openppl-pub…
Download & compile OpenPPL:
git clone https://github.com/openppl-public/ppl.nn.git
cd ppl.nn
./build.sh -DHPCC_USE_X86_64=ON -DHPCC_USE_OPENMP=ON -DPPLNN_ENABLE_PYTHON_API=ON
Copy the code
After compiling, the pyppl package will be generated in the./pplnn-build/install/lib directory. PYTHONPATH is set to use pyppl
Model reasoning
Following the Python example in repo, I wrote a ModelRunner to run out of the network output given the input:
def RegisterEngines() :
engines = []
# create x86 engine
x86_options = pplnn.X86EngineOptions()
x86_engine = pplnn.X86EngineFactory.Create(x86_options)
engines.append(pplnn.Engine(x86_engine))
return engines
class ModelRunner(object) :
def __init__(self, model_path) :
self.__initialize(model_path)
def __initialize(self, model_path) :
# register engines
engines = RegisterEngines()
if len(engines) == 0:
raise Exception('failed to register engines')
# create runtime builder
runtime_builder = pplnn.OnnxRuntimeBuilderFactory.CreateFromFile(model_path, engines)
if not runtime_builder:
raise Exception('failed to create runtime builder from file: %s' % (model_path))
# create runtime
self.runtime = runtime_builder.CreateRuntime()
if not self.runtime:
raise Exception('failed to create runtime')
def get_input_tensor_shape(self) :
return self.runtime.GetInputTensor(0).GetShape().GetDims()
def forward(self, input) :
if not self.runtime:
raise Exception('runtime not created')
# get input tensor info
tensor = self.runtime.GetInputTensor(0)
shape = tensor.GetShape()
np_data_type = g_pplnntype2numpytype[shape.GetDataType()]
dims = shape.GetDims()
# feed input data
input = np.ascontiguousarray(input) # use contiguousarray to avoid calc error
status = tensor.ConvertFromHost(input)
ifstatus ! = pplcommon.RC_SUCCESS:raise Exception('failed to set input data')
# start to inference
status = self.runtime.Run()
ifstatus ! = pplcommon.RC_SUCCESS:raise Exception('failed to run')
# wait for inference finished
status = self.runtime.Sync()
ifstatus ! = pplcommon.RC_SUCCESS:raise Exception('failed to sync')
# get output data
out_datas = {}
for i in range(self.runtime.GetOutputCount()):
# get output tensor info
tensor = self.runtime.GetOutputTensor(i)
tensor_name = tensor.GetName()
# fetch output data
tensor_data = tensor.ConvertToHost()
if not tensor_data:
raise Exception('failed to get output ' + tensor_name)
out_data = np.array(tensor_data, copy=False)
out_datas[tensor_name] = copy.deepcopy(out_data)
return out_datas
Copy the code
Data pre-processing and post-processing
The data preprocessing code is relatively simple:
# preprocess
img = cv2.resize(img, (self.input_img_w, self.input_img_h)) # resize
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) # BGR -> RGB
img = img.transpose(2.0.1) # HWC -> CHW
img = img.astype(dtype = np.float32) # uint8 -> fp32
img /= 255 # normalize
img = np.expand_dims(img, axis=0) # add batch dimension
Copy the code
Since shape is (160, 160) and all trained images have the same Shape, letterbox is not used.
As for post-processing, the standard YOLOV5 has three outputs, which need to be combined with different levels of Anchor to calculate the box position of the output. However, the ONNX model exported by REPo does this for us, so we just need to filter the box_score and class_score in the result and perform NMS. I won’t post the post-processing code here
To make the keys
The keys sound
I use Python’s mingus library for key sound, which is very simple to install:
pip3 install mingus
pip3 install fluidsynth
Copy the code
Playing notes is also very simple, with just a few lines of code:
from mingus.midi import fluidsynth
fluidsynth.init('/usr/share/sounds/sf2/FluidR3_GM.sf2'.'alsa') # for ubuntu
fluidsynth.play_Note(64.0.100) # Standard a1
Copy the code
The keyboard display
Keyboard graphics with OpenCV to create
There are altogether four kinds of keys on the piano keyboard — three kinds of white keys and one kind of black keys.
class KeyType(Enum) :
WHITE_KEY = 0,
WHITE_KEY_LEFT = 1,
WHITE_KEY_RIGHT = 2,
BLACK_KEY = 3
Copy the code
There is a play(self, position) interface in PianoKey. Once position falls within the range of the key, it is considered that the key has been pressed and the sound corresponding to the key is emitted.
Put a picture of the effect of the keyboard (yellow is the key being pressed, and the programmer will match the color) :
The final result
Put the above modules together and you get our “cat head piano”
Paste a final effect drawing, the red box is the detected cat’s head, and the red dot is the center of the detection box. Touch the keyboard with this dot:
Video demo
Video demo:www.bilibili.com/video/BV17h…
Github Repo link: github.com/ZichenTian/…