@[TOC]

preface

Originally did not want to write this, but turned a circle of domestic mediapipe less tutorial. It’s not that comprehensive, so let’s make a note here.

Environmental installation

If you are Anconada then you do not need to install, but if not, you just need to enter the following command

pip install mediapipe
Copy the code

Again before you must master Python3 Pyharm using OpencV basic use

Quick start (gesture capture)

Here we have something that’s going to rot on the street.

import mediapipe as mp
import cv2


cap = cv2.VideoCapture(0)

mpHand = mp.solutions.hands # MP hand capture
Hand = mpHand.Hands() # Find the hand in the picture
mphanddraw = mp.solutions.drawing_utils # draw tool

while True:
    flag,img = cap.read()


    RGBImage = cv2.cvtColor(img,cv2.COLOR_BGR2RGB) # Convert the picture
    result = Hand.process(RGBImage)

    if(result.multi_hand_landmarks): # If there are hands, then you will get a list of hands to record the coordinates of the hands

        for handlist in result.multi_hand_landmarks:
            mphanddraw.draw_landmarks(img,handlist,mpHand.HAND_CONNECTIONS)
            HAND_CONNECTIONS Connects the dots

    cv2.imshow("Hands",img)

    if(cv2.waitKey(1) = =ord("q")) :break

cap.release()
cv2.destroyAllWindows()
Copy the code

Comments should be very clear, here is not elaborated.

Gets the coordinates of the hand

As you can see from the previous image, the handList contains the complete coordinates of one of our hands and can draw 21 points. So the handlist is actually the number and coordinates of 21 points (the coordinates are measured in percentage), each point corresponds to the figure below

So we can capture the state of our hands very clearly. Now let’s redraw our hand.

import mediapipe as mp
import cv2


cap = cv2.VideoCapture(0)

mpHand = mp.solutions.hands # MP hand capture
Hand = mpHand.Hands() # Find the hand in the picture
mphanddraw = mp.solutions.drawing_utils # draw tool

while True:
    flag,img = cap.read()


    RGBImage = cv2.cvtColor(img,cv2.COLOR_BGR2RGB) # Convert the picture
    result = Hand.process(RGBImage)

    if(result.multi_hand_landmarks): # If there are hands, then you will get a list of hands to record the coordinates of the hands

        for handlist in result.multi_hand_landmarks:

            for id,lm in enumerate(handlist.landmark):
                h,w,c = img.shape
                cx,cy = int(lm.x * w),int(lm.y * h)
                cv2.circle(img,(cx,cy),15, (255.0.255),cv2.FILLED)
            mphanddraw.draw_landmarks(img,handlist,mpHand.HAND_CONNECTIONS,)
            HAND_CONNECTIONS Connects the dots

    cv2.imshow("Hands",img)

    if(cv2.waitKey(1) = =ord("q")) :break

cap.release()
cv2.destroyAllWindows()
Copy the code

And then we can optimize the code, and we know that we can directly capture the coordinates of every joint of a hand, on our picture, so that we can predict and judge the posture of our hand. Now it’s fun. For example:

import mediapipe as mp
import cv2
import math

cap = cv2.VideoCapture(0)
cap.set(3.1280)
cap.set(4.720)

mpHand = mp.solutions.hands # MP hand capture
Hand = mpHand.Hands() # Find the hand in the picture
mphanddraw = mp.solutions.drawing_utils # draw tool

while True:
    flag,img = cap.read()


    RGBImage = cv2.cvtColor(img,cv2.COLOR_BGR2RGB) # Convert the picture
    result = Hand.process(RGBImage)

    if(result.multi_hand_landmarks): # If there are hands, then you will get a list of hands to record the coordinates of the hands

        hands_data = result.multi_hand_landmarks

        for handlist in hands_data:
            h, w, c = img.shape


            shizhi_postion = (int(handlist.landmark[8].x*w),int(handlist.landmark[8].y*h))
            muzhi_postion = (int(handlist.landmark[4].x * w), int(handlist.landmark[4].y * h))
            cv2.line(img, muzhi_postion, shizhi_postion, (255.0.0), 5)

            location = int(shizhi_postion[0]-muzhi_postion[0]) * *2\
            + int(shizhi_postion[1]-muzhi_postion[1]) * *2

            location = int(math.sqrt(location))

            showpostion = (int(((muzhi_postion[0]+shizhi_postion[0) /2)),int(((muzhi_postion[1]+shizhi_postion[1) /2)))

            cv2.putText(img, str(location), showpostion, cv2.FONT_HERSHEY_PLAIN, 1, (255.0.255), 1)

            for id,lm in enumerate(handlist.landmark):

                cx,cy = int(lm.x * w),int(lm.y * h)

                # We mark the distance between thumb and index finger

                cv2.circle(img,(cx,cy),15, (255.0.255),cv2.FILLED)
                cv2.putText(img,str(id),(cx,cy),cv2.FONT_HERSHEY_PLAIN,1, (0.255.255),1)



            mphanddraw.draw_landmarks(img,handlist,mpHand.HAND_CONNECTIONS,)
            HAND_CONNECTIONS Connects the dots

    cv2.imshow("Hands",img)

    if(cv2.waitKey(1) = =ord("q")) :break

cap.release()
cv2.destroyAllWindows()
Copy the code

So it’s fun.

Parameter description is displayed

This return parameter is very important. Look at the previous example. First hands_data is the total coordinates of the hands, how many hands are on the screen, so len() is the number

A handlist is a list of 21 points containing a hand

Handlist contains [{point 1},{point 2}…] So getting the thumb header is handlist.landmark[4]. X * w so it’s ok.

Different operator

I can’t really say what it’s called, but I’m going to call it operator. So what does this do? It’s actually using different algorithmic models to help us extract different features for processing.

And you can see that there are a lot of different operators here. The invocation is similar, but the processing may be slightly different. Of course, let’s talk about how to play hands first. We’ll talk about other things later. At the end of this series, maybe we can make a gesture system to control the computer with our body posture. For example, we can use our hand as our mouse (right hand) and so on. Or if we combine the VR game driver, we will not need the gamepad, only need a good pixel camera, and since Mediapipe runs directly in the CPU, it means that we can run on devices without GPU graphics card. Google has officially said that this device can be used in mobile terminals, Linux, etc.

Ok, with the previous code, we can actually make a simple volume controller, we just need to convert the finger spacing. That’s still not our topic, but our topic today is actually how to recognize our gestures, like 1 to 5.

Gesture Recognition cases

OK, so let’s move on to our case, and this case is actuallyOpencv Quick to Use (Basic Usage & Gesture Recognition) 的 Part, at that time this part is direct copy, the result discovers this elder brother is actually copy me to send that video code (ridicule a wave to write true lousy, have very serious OOM problem. So I’m going to write one myself, it’s fun anyway.)

Finger state judgment

First of all, it is not difficult to find the fact that, in fact, the comparison between the official picture of the finger and the picture of the finger in front of me. All we need to do is compare the coordinates of the joints to see if the hand is up.

Pretty intuitive. But there’s another problem, and that’s the thumb problem. The thumb is too short to do this directly, so we need to use the x coordinate to determine our hand. But then there’s the problem, which is that our left and right hands are constructed a little differently.So we sometimes have to decide if our hand is left or right

And the way to tell if you’re left or right is pretty simple, you just look at 1, 5 X’s relative to each other. But there is a problem with the back of the hand and the back of the hand but the projection of the back of the hand is the same as the projection of the left hand.

coding

import mediapipe as mp
import cv2
import math

cap = cv2.VideoCapture(0)

mpHand = mp.solutions.hands # MP hand capture
Hand = mpHand.Hands() # Find the hand in the picture
mphanddraw = mp.solutions.drawing_utils # draw tool


TipsId = [4.8.12.16.20] # Coordinates of the fixed point
while True:
    flag,img = cap.read()

    RGBImage = cv2.cvtColor(img,cv2.COLOR_BGR2RGB) # Convert the picture
    result = Hand.process(RGBImage)

    if(result.multi_hand_landmarks): # If there are hands, then you will get a list of hands to record the coordinates of the hands

        hands_data = result.multi_hand_landmarks

        for handlist in hands_data:
            h, w, c = img.shape

            fingers = []

            # Judge the thumb switch
            if(handlist.landmark[TipsId[0] -3].x < handlist.landmark[TipsId[0] +1].x):
                if handlist.landmark[TipsId[0]].x >  handlist.landmark[TipsId[0] -1].x:
                    fingers.append(0)
                else:
                    fingers.append(1)
            else:
                if handlist.landmark[TipsId[0]].x < handlist.landmark[TipsId[0] - 1].x:
                    fingers.append(0)
                else:
                    fingers.append(1)
            # Judge other fingers
            for id in range(1.5) :if(handlist.landmark[TipsId[id]].y > handlist.landmark[TipsId[id] -2].y):
                    fingers.append(0)
                else:
                    fingers.append(1)
            Get the number of fingers
            totoalfingle = fingers.count(1)
            cv2.putText(img,str(totoalfingle),(50.50),cv2.FONT_HERSHEY_PLAIN,
                        5, (255.255.255),5)

            # this is just for drawing finger joints, you can ignore this code
            for id,lm in enumerate(handlist.landmark):

                cx,cy = int(lm.x * w),int(lm.y * h)
                cv2.circle(img,(cx,cy),15, (255.0.255),cv2.FILLED)
                cv2.putText(img,str(id),(cx,cy),cv2.FONT_HERSHEY_PLAIN,1, (0.255.255),1)



            mphanddraw.draw_landmarks(img,handlist,mpHand.HAND_CONNECTIONS,)


    cv2.imshow("Hands",img)

    if(cv2.waitKey(1) = =ord("q")) :break

cap.release()
cv2.destroyAllWindows()

Copy the code

The effect

Upgraded version (Christmas Vader)

We should have finished at this point, but then I remembered that we could actually not only display text on top of our text, but also overlay images,SO MAYBE IT CAN WORK SOME INTERESTING THINGS SUCH AS CONFESS TO SOMEBODY WHICH I CAN NOT USE IT RIGHT NOW!FUCK, call me the dog food machine. Thank you! The effectcode

import mediapipe as mp
import cv2
import os

cap = cv2.VideoCapture(0)
cap.set(3.1280)
cap.set(4.720)
mpHand = mp.solutions.hands # MP hand capture
Hand = mpHand.Hands() # Find the hand in the picture
mphanddraw = mp.solutions.drawing_utils # draw tool

MediaPath = "Media"
picsdir = os.listdir(MediaPath)
pics = []
for pic in picsdir:
    img = cv2.imread(f"{MediaPath}/{pic}")
    pics.append(img)




TipsId = [4.8.12.16.20] # Coordinates of the fixed point
while True:
    flag,img = cap.read()

    RGBImage = cv2.cvtColor(img,cv2.COLOR_BGR2RGB) # Convert the picture
    result = Hand.process(RGBImage)

    if(result.multi_hand_landmarks): # If there are hands, then you will get a list of hands to record the coordinates of the hands

        hands_data = result.multi_hand_landmarks

        for handlist in hands_data:
            h, w, c = img.shape

            fingers = []

            # Judge the thumb switch
            if(handlist.landmark[TipsId[0] -2].x < handlist.landmark[TipsId[0] +1].x):
                if handlist.landmark[TipsId[0]].x >  handlist.landmark[TipsId[0] -1].x:
                    fingers.append(0)
                else:
                    fingers.append(1)
            else:
                if handlist.landmark[TipsId[0]].x < handlist.landmark[TipsId[0] - 1].x:
                    fingers.append(0)
                else:
                    fingers.append(1)
            # Judge other fingers
            for id in range(1.5) :if(handlist.landmark[TipsId[id]].y > handlist.landmark[TipsId[id] -2].y):
                    fingers.append(0)
                else:
                    fingers.append(1)
            # Get the number of fingers and draw the picture
            totoalfingle = fingers.count(1)

            coverpic = pics[totoalfingle-1]
            hc, wc, cc = coverpic.shape
            img[0:wc,0:hc] = coverpic

            # this is just for drawing finger joints, you can ignore this code
            for id,lm in enumerate(handlist.landmark):

                cx,cy = int(lm.x * w),int(lm.y * h)
                cv2.circle(img,(cx,cy),15, (255.0.255),cv2.FILLED)

            mphanddraw.draw_landmarks(img,handlist,mpHand.HAND_CONNECTIONS,)


    cv2.imshow("Hands",img)

    if(cv2.waitKey(1) = =ord("q")) :break

cap.release()
cv2.destroyAllWindows()
Copy the code

Picture themselves ready to go (if someone vindicate success remember to kick me)

conclusion

The most basic use is actually such, the back you want to how architecture, how architecture, this does not matter, according to the template to set it.