Can’t tell Tan Zhuo from Hao Lei? Let’s take 200 photos each, and let deep learning help us identify them.
The problem
How to Recognize Images with Python and Deep Neural Networks? In this article, I show you how deep learning can be used to teach a computer to tell the difference between robots Wall-e and Doraemon.
It wasn’t long before there was a message on the background asking,
Teacher, I want to train a picture classifier by myself. Where can I download the training images with labels in batches?
Tell me about how I find images when I write a tutorial.
The largest image library, of course, is Google.
Under Google Images, type “Walle”.
How’s thatοΌ I think the search results match the requirements.
Not only do you find a batch of high-quality images, but Google has labeled them for you.
The next step, of course, is to download the images.
I asked the students to actually do it, each finding two different sets of images and trying to do deep learning categorization based on the tutorial.
None of the solutions I offered them (several different Chrome add-ons) worked well.
Some just next a few, stop work, even the browser whole crash.
Some download pictures, are repeated.
The students told me that the easiest and most effective way to do this is to manually click the download sheet by sheet…
This is obviously not a good idea.
Pain points
It’s not unique to want high-quality, labeled images from Google’s image library in an efficient batch.
Is it true that no one has tried to solve this public pain point?
Today, by accident, I found a great Github project called Google-images-Download.
The Github repo link is here.
Since the launch of the project, only 5 months, the number of stars has actually reached 2000, it seems to be very popular.
Google-images-download is a Python script.
With it, you can do Google image searches and bulk downloads with a single command.
It also runs cross-platform, with Support for Linux, Windows and macOS.
It’s the Gospel of laziness.
The installation
Google-images-download installation is simple.
For example, run the following command on the terminal:
pip install google_images_download
Copy the code
The installation is complete.
Of course, this requires that you already have Python installed on your system.
If you haven’t already installed it, or if you’re not familiar with terminal commands, you can learn how to download and install Anaconda and do terminal command line operations by referring to my how to Install The Python runtime Environment Anaconda?
try
Enter the download directory:
cd ~/Downloads
Copy the code
We tried to download some pictures.
In Dying to Survive, there is an actress named Tan Zhuo, who plays a good role. But at first, I thought she was Hao Lei.
Let’s try to download some pictures of Tan Zhuo.
Execute inside terminal:
googleimagesdownload -k "Man" -l 20
Copy the code
To clarify, the -k here stands for “keyword”, followed by double quotation marks around the keyword you are looking for.
As you can see, using Chinese keywords is also fine.
And then -l, which means “limit”, which is the image limit, you have to specify how many images you want to download.
In this case, we want 20.
Here’s how it works:
The execution is complete.
As you can see, an error occurred during the download process.
But the program persevered and helped us complete the download process.
Let’s see what happens.
The downloaded images are stored under ~/Downloads/ Downloads/ Tancho, and Google-images-Download has helpfully set up a subdirectory for us.
Let’s open it up in Finder:
After looking for a long time, some photos are still not clear with Hao Lei.
To separate the two actresses once and for all, let’s download another 200 photos of Hao Lei.
Following the command above, we execute:
googleimagesdownload -k "Fourth" -l 200
Copy the code
Then… Error:
To solve
Don’t panic when you encounter problems.
You have to look at the error message carefully.
Notice that a key word appears: Chromedriver.
What is it?
We went back to the Github page of Google-images-Download and searched by keyword ChromeDriver.
You will immediately find the following results:
It turns out that if you want more than 100 images, the application must call Selenium and ChromeDriver.
Selenium is installed automatically when you install Google Images-Download.
All you need to do is download chromeDriver and specify a path.
The download link is here.
Please select the appropriate version according to your operating system type:
I chose the macOS version.
After downloading, there is only one file in the zip package. Unzip it and put it in the ~/Downloads directory.
Then, execute:
googleimagesdownload -k "Fourth" -l 200 --chromedriver="./chromedriver"
Copy the code
Here is the — Chromedriver parameter, which tells Google-images-download where the Chromedriver is located after decompression.
This time the machine worked hard and helped us download Hao Lei’s photos.
200 pictures, take a while to download. Please be patient.
The finished.
There are also some errors in the middle, some images are not downloaded correctly.
Fortunately, it didn’t make much difference to the overall result.
To be on the safe side, I suggest that when you set the number of downloads, you set more.
Give yourself a margin of safety.
Let’s open the download directory ~/Downloads/ Downloads/ Hao Li see:
Can you tell them apart this time?
homework
I have a homework assignment for you.
You’ve learned how to download photos of Tan Zhuo and Hao Lei with one command.
Can we make use of the convolutional neural network knowledge we introduced before, and use TuriCreate (or Tensorflow) to establish a model to recognize the photos of two people?
Please let me know your accuracy results after you finish the homework.
Of course, it would be even better if you could use the script we introduced today to download other image collections and perform deep learning exercises.
And feel free to send me your feedback.
If you like, please give it a thumbs up. You can also follow and top my official account “Nkwangshuyi” on wechat.
If you’re interested in data science, check out my series of tutorial index posts entitled how to Get started in Data Science Effectively. There are more interesting problems and solutions.