Puppeteer climbs on the cat she wants to steal
I was so tired that I wanted to write some toys. Usually very like cloud suction cat, wrote a small crawler, every day to climb the cat pictures to increase the point of power, which is much stronger than programmer encouragement division (funny)!
preheating
Puppeteer is Google Chrome’s official Headless Chrome tool. Puppeteer is a Node library that provides an advanced API to control the Headless Chrome on the DevTools protocol. It can also be configured to use the full (non-headless) Chrome. Chrome has long been the dominant browser, so Chrome Headless is set to become the industry standard for automated testing of Web applications. Using Puppeteer, you can operate both Linux and Chrome in a variety of applications. This warehouse is built to try all sorts of tricks using GoogleChrome Puppeteer; In order to have fun at the same time, learn more interesting operations.
Most of the things you can do manually in a browser can be done with Puppeteer! Here are a few examples to start with:
- Generate screenshots and PDF of the page.
- Grab the SPA and generate the pre-rendered content (that is, “SSR”).
- Grab the content you need from the website.
- Automatic form submission, UI testing, keyboard input, etc.
- Create an up-to-date automated test environment. Use the latest
JavaScript
And browser functionality to run tests directly in the latest version of Chrome. - Capture a timeline trace of your site to help diagnose performance issues.
Learn more here
To the chase
usePuppeteer
Get the desired element
As you can see from part 1, Puppeteer simulates the user’s actions, so the first step to viewing Puppeteer is to open a browser.
1. Open the browser
const options = {
// If you use the headless field, visualization mode will be enabled, i.e. Chrome will be launched, and there will be tools such as the header address bar
// headless: false,
executablePath: getNPMConfig('chrome')};const browser = await Puppeteer.launch(options);
Copy the code
2. Create a Tab
const page = await browser.newPage();
Copy the code
3. Enter the cat patch address
// Take Huahua and Sanmao CatLive as an example
await page.goto('https://space.bilibili.com/9008159/dynamic');
Copy the code
4. Highlight: Get the Dom element
Since this is a dynamic web page, using JSDom to retrieve DOM elements is likely to cause errors. This is why Puppeteer is chosen, using the waitForSelector method of Puppeteer, which can wait until the specified DOM element is loaded.
Let’s take a look at what we need in the DOM element:
The outermost parent element of the. Zoom-list is.s-space, and the class of each image is.img-content. The final image is not set with the href attribute of , but with background-image. So we can steal the cat!
// It is better to compare the Puppeteer with the outer element, because if you compare the inner element, an error will be reported if the Puppeteer's decision time is exceeded (optional)
await page.waitForSelector('.s-space .zoom-list');
/ /? Eval: The first parameter is a selector that selects the DOM element to retrieve. The second argument is the callback function: the operation on the DOM element obtained
const content = await page.?eval('.s-space .zoom-list .card',
items =>
items.map(
item= >
item
.querySelector('.img-content')
.style.backgroundImage.split('"') [1]));Copy the code
The picture above is the path to our image. Then take ~
Finally, send an email
We use NodeMailer to send emails. If you haven’t used NodeJs, you can read this article. It should also be noted that NodeMailer must rely on a server to send HTML styles (at least that’s what I tried). We just need to do something simple with the data we get:
htmlContent = content.reduce((acc, cur) = > {
acc += `<div><img style='height: 100px' src='${cur}' alt='' /></div>`;
return acc;
}, ' ');
// Add the following fields to the first parameter of sendMail, options{... html: htmlContent }Copy the code
The email content is as follows:
You can make it look nice, but I just did a simple example. The code does not realize the daily regular climb pictures, we can improve, every day climb some pictures sent to their mailbox suck! After sucking so much leap soil in Denver, it’s time for a change
I hope you can combine work and rest, the body first!
GitHub has encountered some problems in the middle (is the selector pit!!) Thank you water song for your patience, and thank you for the solution of other big guys in the group!