Suck the cat with code! This article is participating in [Cat essay Activity]

background

I’ve seen a lot of videos of shovelers playing piano from their cat owners online. Although we have a noble cat owner, but suffering from no piano, can not play the cat owner’s artistic talent.

As a qualified shovel-poop officer, grievance who can not grievance their cat master, there is no condition to try to create conditions. So we decided to build an “electric” piano for our cat owner so he could pursue his musical dreams.

Although say to want to do “electronic” piano, but suffer from limited time (want to make tin money for cat master son), have no way to buy mechanical parts to do keyboard, so plan to do a “beggar version”, come to cat master whole a “virtual” piano key.

Besides, as we all know, there are only two biggest obstacles to learning the piano — the left hand and the right hand. So, we’re going to do something different — play the piano with your head.

So we decided to make a “cat head piano” for the cat owner, using the camera to capture the position of the cat head, different positions corresponding to different keys, different sounds. This way to solve the problem of time and money cost, and two to avoid the cat master won’t learn to lose temper with us, but also let the cat master enjoy the joy of music, three birds with one stone, not beautiful.

(Cat owner: Is that how they fool employees?)

Overall design scheme

In order to achieve cat head piano, three problems should be solved:

  • How do you video the cat owner
  • How do I detect the position of the cat’s head
  • How to play and how to display

The first is video shooting. I originally bought a cheap USB camera on Nov 11, but it hasn’t arrived yet because of the slow delivery. So you have to record video on your phone to simulate the camera first.

Cat head detection is the most difficult part, and I’m going to use deep learning to do that. First you need to collect some pictures of the cat’s owner and use the annotation tool to label the cat’s head. Then use megvii’s open source YOLOV5 model. Finally, sesensetime’s recently open source OpenPPL is used for model inference deployment.

As for how to play, I’m going to use mingus. Video processing and graphical interface part with OpenCV to do.

Image acquisition and annotation

Using a phone to capture images is fine, but the challenge is getting the cat’s owner to cooperate.

Here, I offered yogurt, walked the cat around the house several times, and finally picked up some suitable videos

(Cat owner: I’m too cute to give you a drink?)

When I picked the video, I didn’t have OpenCV installed on my Windows PC, so I found a website to turn the video into pictures: www.img2go.com/convert-to-… Of course, OpenCV’s VideoCapture can also do this

After converting the image, I used a tagging tool found on Github: github.com/tzutalin/la…

  • As we all know, cats are fluid, so labeling the cat body is not conducive to detection, so we chose to label the cat head
  • Because the background is simple and the task is single, cat head detection can be completed by labeling a few pictures
  • Pay attention to the annotation format. Make sure that the annotation mode can be recognized by YOLOV5 model

We finally annotated 430 pictures, and the number of training set, verification set and test set were 400, 20 and 10 respectively

YOLOV5 model training

Model training

Clone YOLOV5 on github github.com/ultralytics…

The training process does not make many changes to the code, and directly uses the training script train.py in the repo

python .\train.py --data .\data\cat.yaml --weights .\weights\yolov5s.pt --img 160 --epochs 3000
Copy the code

In order to accelerate the training speed, the image size can be appropriately reduced. The image size used in this experiment is (160, 160).

Model test

After the model training, use the unmarked pictures to verify whether the model is correct. The test scripts also refer to YOLOv5 decect.py

The early stage of the modeltrainingrendering

The early stage of the modelvalidationrendering

The model should be able to correctly detect the owner of the cat

Model is derived

To better fit the deployment, we converted the PyTorch model to the ONNX model. The YOLOv5 transformation script export.py can also be used directly for model transformation

Reasoning deployment

Inference frame installation

The deployment framework selected OpenPPL of Sensetime, which can run efficiently on x86/CUDA architecture, and has Python interface, which can easily build deployment projects: github.com/openppl-pub…

Download & compile OpenPPL:

git clone https://github.com/openppl-public/ppl.nn.git
cd ppl.nn
./build.sh -DHPCC_USE_X86_64=ON -DHPCC_USE_OPENMP=ON -DPPLNN_ENABLE_PYTHON_API=ON
Copy the code

After compiling, the pyppl package will be generated in the./pplnn-build/install/lib directory. PYTHONPATH is set to use pyppl

Model reasoning

Following the Python example in repo, I wrote a ModelRunner to run out of the network output given the input:

def RegisterEngines() :
    engines = []

    # create x86 engine
    x86_options = pplnn.X86EngineOptions()
    x86_engine = pplnn.X86EngineFactory.Create(x86_options)

    engines.append(pplnn.Engine(x86_engine))
    return engines

class ModelRunner(object) :
    def __init__(self, model_path) :
        self.__initialize(model_path)
    
    def __initialize(self, model_path) :
        # register engines
        engines = RegisterEngines()
        if len(engines) == 0:
            raise Exception('failed to register engines')

        # create runtime builder
        runtime_builder = pplnn.OnnxRuntimeBuilderFactory.CreateFromFile(model_path, engines)
        if not runtime_builder:
            raise Exception('failed to create runtime builder from file: %s' % (model_path))

        # create runtime
        self.runtime = runtime_builder.CreateRuntime()
        if not self.runtime:
            raise Exception('failed to create runtime')
    
    def get_input_tensor_shape(self) :
        return self.runtime.GetInputTensor(0).GetShape().GetDims()
    
    def forward(self, input) :
        if not self.runtime:
            raise Exception('runtime not created')
        
        # get input tensor info
        tensor = self.runtime.GetInputTensor(0)
        shape = tensor.GetShape()
        np_data_type = g_pplnntype2numpytype[shape.GetDataType()]
        dims = shape.GetDims()

        # feed input data
        input = np.ascontiguousarray(input) # use contiguousarray to avoid calc error
        status = tensor.ConvertFromHost(input)
        ifstatus ! = pplcommon.RC_SUCCESS:raise Exception('failed to set input data')
        
        # start to inference
        status = self.runtime.Run()
        ifstatus ! = pplcommon.RC_SUCCESS:raise Exception('failed to run')
        
        # wait for inference finished
        status = self.runtime.Sync()
        ifstatus ! = pplcommon.RC_SUCCESS:raise Exception('failed to sync')
        
        # get output data
        out_datas = {}
        for i in range(self.runtime.GetOutputCount()):
            # get output tensor info
            tensor = self.runtime.GetOutputTensor(i)
            tensor_name = tensor.GetName()
            # fetch output data
            tensor_data = tensor.ConvertToHost()
            if not tensor_data:
                raise Exception('failed to get output ' + tensor_name)
            
            out_data = np.array(tensor_data, copy=False)
            out_datas[tensor_name] = copy.deepcopy(out_data)
        
        return out_datas
Copy the code

Data pre-processing and post-processing

The data preprocessing code is relatively simple:

# preprocess
img = cv2.resize(img, (self.input_img_w, self.input_img_h)) # resize
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)                  # BGR -> RGB
img = img.transpose(2.0.1)                                # HWC -> CHW
img = img.astype(dtype = np.float32)                        # uint8 -> fp32
img /= 255                                                  # normalize
img = np.expand_dims(img, axis=0)                           # add batch dimension
Copy the code

Since shape is (160, 160) and all trained images have the same Shape, letterbox is not used.

As for post-processing, the standard YOLOV5 has three outputs, which need to be combined with different levels of Anchor to calculate the box position of the output. However, the ONNX model exported by REPo does this for us, so we just need to filter the box_score and class_score in the result and perform NMS. I won’t post the post-processing code here

To make the keys

The keys sound

I use Python’s mingus library for key sound, which is very simple to install:

pip3 install mingus
pip3 install fluidsynth
Copy the code

Playing notes is also very simple, with just a few lines of code:

from mingus.midi import fluidsynth
fluidsynth.init('/usr/share/sounds/sf2/FluidR3_GM.sf2'.'alsa')    # for ubuntu
fluidsynth.play_Note(64.0.100)                                   # Standard a1
Copy the code

The keyboard display

Keyboard graphics with OpenCV to create

There are altogether four kinds of keys on the piano keyboard — three kinds of white keys and one kind of black keys.

class KeyType(Enum) :
    WHITE_KEY = 0,
    WHITE_KEY_LEFT = 1,
    WHITE_KEY_RIGHT = 2,
    BLACK_KEY = 3
Copy the code

There is a play(self, position) interface in PianoKey. Once position falls within the range of the key, it is considered that the key has been pressed and the sound corresponding to the key is emitted.

Put a picture of the effect of the keyboard (yellow is the key being pressed, and the programmer will match the color) :

The final result

Put the above modules together and you get our “cat head piano”

Paste a final effect drawing, the red box is the detected cat’s head, and the red dot is the center of the detection box. Touch the keyboard with this dot:

Video demo

Video demo:www.bilibili.com/video/BV17h…

Github Repo link: github.com/ZichenTian/…