Make writing a habit together! This is the 9th day of my participation in the “Gold Digging Day New Plan · April More text Challenge”. Click here for more details.

A whim

First, let’s talk about why I want to write this article. Before this, I encountered a problem, that is, I cannot copy the text content of the PDF, and I just want to get it. I tried everything from converting a PDF to A Word document so I could copy the content from the document, but these conversion tools cost a lot of money, so I didn’t have to worry about it. I also thought about taking a screenshot of the text to be copied, and then sending it to the mobile phone. The text will be extracted and copied through the text content extraction function of mobile QQ:

The effect is very good, but also to achieve my needs, but it is too much trouble, to screenshots, but also sent to the mobile phone, so I fell into a deep thinking, can I achieve such a function?

Learning programming for what, is to solve the problem, so, I considered, QQ is how to achieve this function. No doubt, it’s text recognition, which takes a picture, recognizes the text in the picture, and then displays it.

The preparatory work

After understanding the principle, it is necessary to solve the problem of how to realize the text recognition? Their realization is obviously not realistic, their own ah, baidu search, I decided to use baidu to provide the text recognition API. In order to simplify the process, it was decided to implement the whole process in Python.

Apply for Baidu identification API

Let’s apply for baidu’s character recognition API first. Baidu AI Open Platform:

Click on the first official link and go to the console:

You can log in if you have an account. After logging in, you can see the console. Click text recognition in the left navigation bar:

Then click Create app and fill in the information. I’ve already created it here:

Created on the first put, these three data to the back is useful.

Module is introduced

Before writing code, I’ll introduce to use module, say I want to realize the function of the first: first by screenshots (QQ screenshots, WeChat screenshots, computer take screenshots tools) to intercept need text content, and then to save images into the computer, and then through the character recognition to extract the text contained in images, the final output. Let’s take a look at the results:

That’s the general function, and you can expand it as you like.

keyboard

First, let’s introduce the keyboard module. Because we need to get the screenshot directly, we certainly can’t save it by ourselves, because that is too troublesome. We need to automatically save the picture after the screenshot is completed, so we first monitor the input of the keyboard. Install the Keyboard module first, and execute the command in the CMD window:

pip install keyboard
Copy the code

Take a look at this module.

import keyboard

keyboard.wait(hotkey = 's')
print("Keyboard down 's'")
Copy the code

This module uses the wait function to wait for keyboard input. The function argument represents the waiting hotkey, meaning that after executing the program, it will wait until you press ‘S’. We run the program, then press ‘s’ and the result is as follows:

The keyboard was pressed's'
Copy the code

After learning this, the next thing is very simple, I use here is A screenshot of QQ, the shortcut key is: Ctrl + Alt + A, so we need to listen to the shortcut key, the code is as follows:

import keyboard

print("Start screenshot")
keyboard.wait(hotkey = 'ctrl+alt+a')
print("Keyboard pressed 'CTRL + Alt + A '")
keyboard.wait(hotkey = 'enter')
print("Keyboard pressed 'Enter '")
print("End screenshot")
Copy the code

After the operation, we normally perform a screenshot operation and see the running result:

Start screenshots keyboard down'ctrl+alt+a'The keyboard was pressed'enter'The end of the screenshotsCopy the code

This completes the listening screenshot operation.

ImageGrab

This is a very good image processing library, we use it to save captured images, first install this module:

pip install Pillow
Copy the code

This is a module under the PIL(whole process: Pillow) package, so we download the Pillow package, to use a function of this module, the code is as follows:

import keyboard
from PIL import ImageGrab 

print("Start screenshot")
keyboard.wait(hotkey = 'ctrl+alt+a')
print("Keyboard pressed 'CTRL + Alt + A '")
keyboard.wait(hotkey = 'enter')
print("Keyboard pressed 'Enter '")

Save the clipboard snapshot
image = ImageGrab.grabclipboard()
image.save('screen.png')

print("End screenshot")
Copy the code

The wait function of the Keyboard module is used to wait for us to take a screenshot. After the screenshot is taken, the grabclipboard function of the ImageGrab module is used to take a snapshot of the current clipboard and return it as an image. The image is then saved using the save function of image. If only the file name is written, the image will be saved in the current directory.

Now let’s run the project and grab a random image:

Although this achieved the preservation of the screenshot, but careful students can certainly find that the first screenshot when the error, but the second screenshot, save is the content of the first screenshot, why? This is because the grabclipBoard function has a caching problem. It operates too fast, and sometimes it reads the last one, because the first one didn’t read the image, so an error was reported. The problem is found, how to solve it? Since the operation is too fast to read the cache, let it slow down, we add a time delay can be done, the code modification is as follows:

import keyboard
from PIL import ImageGrab 
import time

print("Start screenshot")
keyboard.wait(hotkey = 'ctrl+alt+a')
print("Keyboard pressed 'CTRL + Alt + A '")
keyboard.wait(hotkey = 'enter')
print("Keyboard pressed 'Enter '")

time.sleep(0.1) # Because there will be a delay in reading the captured content, resulting in the reading of the last screenshot, here we actively delay

Save the clipboard snapshot
image = ImageGrab.grabclipboard()
image.save('screen.png')

print("End screenshot")
Copy the code

A 0.1 second delay before saving the clipboard snapshot solves this problem nicely. To use sleep, import the time module and download the module instructions:

pip install time
Copy the code

At this point, the captured image is saved.

baidu-aip

Here is the text recognition API of Baidu. For the introduction of the text recognition API, you can check the official technical documents of Baidu. I only introduce the ones that need to be used here.

AipOcr

AipOcr is a Python SDK client for OCR that provides a series of interactive methods for developers using OCR.

from aip import AipOcr

""" Your APPID AK SK """
APP_ID = 'Your App ID'
API_KEY = 'Your Api Key'
SECRET_KEY = 'Your Secret Key'

client = AipOcr(APP_ID, API_KEY, SECRET_KEY)
Copy the code

These three data are in the previous application management, just paste it in. This creates the client and allows you to configure the client, such as connection timeouts, which you won’t do here.

Universal character recognition

""" Read the picture """
def get_file_content(filePath) :
    with open(filePath, 'rb') as fp:
        return fp.read()

image = get_file_content('example.jpg')
Copy the code

The get_file_content function converts the image to binary data via an image path and returns the binary data of the specified image. Once the binary data is present, text recognition is performed.

  • basicGeneral
  • basicAccurate

Both of these methods can be used for text recognition, but the basicAccurate method is a high-accuracy version of the specific method to see what you like.

Let’s test whether we can successfully extract the image text, such as the following picture:

from aip import AipOcr

# Call Baidu API to identify picture content
APP_ID = '18076523'
API_KEY = 'vlLcZ6VGb88qoAr5IN0OTShw'
SECRET_KEY = '8KzHr2AvEREYGGwdwIMFZSwTUoPB6LC4'

client = AipOcr(APP_ID,API_KEY,SECRET_KEY) Generate a game

Get the binary data of the image
def get_file_content(filePath) :
	with open(filePath,'rb') as fp:
		return fp.read()

image = get_file_content('screen.png')

# call text recognition (high precision version)
text = client.basicAccurate(image)

print(text)
Copy the code

Running results:

We’ve got the text, so let’s do some data. We don’t care about log_id, words_result_num is supposed to be the number of words recognized, it breaks the text in the image into several small chunks, four of them, and the text is stored in a list of words_result, each of which is a dictionary, the key of the text is words, So it’s easy to retrieve the text:

from aip import AipOcr

# 3. Call Baidu API to identify picture content
APP_ID = '18076523'
API_KEY = 'vlLcZ6VGb88qoAr5IN0OTShw'
SECRET_KEY = '8KzHr2AvEREYGGwdwIMFZSwTUoPB6LC4'

client = AipOcr(APP_ID,API_KEY,SECRET_KEY) Generate a game

Get the binary data of the image
def get_file_content(filePath) :
	with open(filePath,'rb') as fp:
		return fp.read()

image = get_file_content('screen.png')

# call text recognition (high precision version)
text = client.basicAccurate(image)

# Process the returned data
textList = text['words_result']
for i in textList:
	print(i['words'])
Copy the code

Running results:

So far, we have mastered the text recognition of pictures.

Program source code

Here is all the code for the program:

import time
import keyboard
from PIL import ImageGrab
from aip import AipOcr

# 1. Capture the image
keyboard.wait(hotkey='ctrl+alt+a') # Trigger event for keyboard input

keyboard.wait(hotkey='enter')

time.sleep(0.1) # Because there will be a delay in reading the captured content, resulting in the reading of the last screenshot, here we actively delay

# 2. Save the picture to your computer
image = ImageGrab.grabclipboard()
image.save('screen.png') Save the captured image

# 3. Call Baidu API to identify picture content
APP_ID = '18076523'
API_KEY = 'vlLcZ6VGb88qoAr5IN0OTShw'
SECRET_KEY = '8KzHr2AvEREYGGwdwIMFZSwTUoPB6LC4'

client = AipOcr(APP_ID,API_KEY,SECRET_KEY) Generate a game

Get the binary data of the image
def get_file_content(filePath) :
	with open(filePath,'rb') as fp:
		return fp.read()

image = get_file_content('screen.png')

# call text recognition (high precision version)
text = client.basicAccurate(image)
textList = text['words_result']
for i in textList:
	print(i['words'])
Copy the code

Effect demonstrated at the beginning, it can not copy the content of PDF text, because where can be screenshot operation, so such as some Baidu library, unable to copy text content, you can be copied in this way.

The last

Finally, I want to say, to learn as their own fun, programming is flexible, when they encounter some problems, you can think of making some small tools, not only to solve their own problems, but also to learn a lot of knowledge, why not?