15 minutes for a Google plugin
background
- Some time ago, due to the business needs of the company, it was necessary to climb the data of many foreign clothing websites. At the beginning, it was quite comfortable to use node Superagent module to climb some traditional websites without reverse-climbing. It might finish all the data of a website in a cup of tea
- The good times do not last long, because there are many websites to crawl, for some websites with a little verification, superagent will soon GG, so it is still the old practical headless browser way to puppeteer to crawl data, it is still relatively powerful, can easily block WebDriver, set IP, and even slider verification code login verification. Basically can solve more than 95% of the website.
- Until one day, when the company wanted to collect the details of 1688 products, it found that it had a very strong anti-crawling mechanism, not only with slider verification (it was difficult to simulate dragging through it). At the beginning, the front-end colleagues analyzed the cookie and found that they could directly write the cookie to capture after logging in, but this required to update the cookie every time, which was tedious.
- The company’s demand is not for the whole site of commodities, but for the collection of specified commodity links. In other words, they have entered the commodity details page and given you the URL, so you can collect the corresponding URL information into your own company system
- For this kind of page, The Google plugin can be said to be really sweet, don’t worry about login and other operations, just collect it.
preface
- As long as there is HTML,CSS,JS foundation, then read the document, minutes can write a Google plug-in, this article uses the simplest way to write a page HTML, get page Cookie, get page title effect, in fact, as long as the HTML, crawler is not simple haha ~
The effect
start
- Directory analysis
- css: Put in some common CSS
- reset.css
- popup.css
- Imgs:For img
- js: Js for business logic
- clipboard.js Copy text JS
- popup.js Plug-in page Popupjs
- content_script.js The page you currently have open will execute your js
- background.js The interface implementation is here
- plugins
- Axios – 0.21.1Introduce Axios for requests
- background.html The background page, understandably, is a background admin for your Google plugin (not required).
- manifest.json Configuration files, like uni-app configuration stuff
- popup.html Google Plugin page file
Look at the
- CSS and IMG are used to put style files, what’s the point
- Js is mainly content_script.js
- Plugins simply import the AXIos library
- Popup.html is important; it is the page presentation file
Step1: create a directory
There are only a few files. For the moment, they are fixed names like vue and React, and entry files like app. vue
Step2: modify the manifest. Json
{
"name": "crawler"."manifest_version": 2."version": "1.0.0"."author": "duqingyu"."description": "crawler-demo"."browser_action": {
"default_title": "My Google Plugin".// Mouse over the Google plugin prompt title
"default_popup": "popup.html" // This configuration is popup.html, which will be displayed when you hover over the Google plugin
},
"icons": {
"16": "imgs/logo.png".// Find a logo and put it in the IMGS directory. My logo is the red Bull in my picture above
"24": "imgs/logo.png"."48": "imgs/logo.png"
},
"background": {
"page": "background.html" // Write to death, in fact, this article is not involved, can be ignored
},
"permissions": [ // Some permission configurations are covered in this article
"tabs".//tabs gets the current page
"notifications".// A dialog box is displayed
"cookies"./ / get a cookie
"http://*/*"."https://*/*"."declarativeContent"]."content_scripts": [{"matches": [
"https://*/*".// Configure which pages content_scirpt.js is injected into, and here we are allowing any page
"http://*/*"]."js": [
"js/clipboard.js".// Inject clipboard.js to copy text
"js/content_script.js" / / injection content_scirpt. Js]."run_at": "document_start"}}]Copy the code
Step3: modify the popup. HTML
<! DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8" /> <meta name="viewport" content="width=device-width, /> <link rel="stylesheet" href=" CSS /reset.css" /> <link rel="stylesheet" href="css/popup.css" /> </head> <script src="js/popup.js" type="module"></script> <body> <div class="popup-wrapper"> <div class="title"> <ul class="content-box"> <li id="crawlerPageHtml" class="item"> <img class="icon" SRC ="./imgs/ htm.png "/> <span class="text"> <span > </li> <li class="item"> <img class="icon" SRC ="./imgs/cookie. PNG "/> <input class="input" id="cookieKey" type="text" placeholder=" </button> </li> <li id="crawlerPageTitle" class="item"> <img class="icon" SRC = ". / imgs/title. PNG "/ > < span class =" text "> get the page title < / span > < / li > < / li > < / ul > < / div > < / body > < / HTML >Copy the code
- At this point, the mouse can already see the effect, but without any JS script
Step4: Modify popup.js to add events to popup
// Get the current tabId
function getCurrentTab() {
return new Promise(resolve= > {
chrome.tabs.query({ active: true.currentWindow: true }, tabs= > {
resolve(tabs[0])})})}// Desktop notifications
function notification({ iconUrl, title, content }) {
chrome.notifications.create(null, {
type: 'basic',
title,
iconUrl: iconUrl || 'imgs/tip.png'.// imageUrl: imgUrl,
message: content,
contextMessage: 'Google Capture'})}/ / initialization
function init() {
const crawlerPageHtml = document.getElementById('crawlerPageHtml')
crawlerPageHtml.onclick = crawlerHtml
const crawlerPageCookie = document.getElementById('crawlerPageCookie')
crawlerPageCookie.onclick = crawlerCookie
const crawlerPageTitle = document.getElementById('crawlerPageTitle')
crawlerPageTitle.onclick = crawlerTitle
}
// Get the page HTML
async function crawlerHtml() {
const tab = await getCurrentTab()
tab && getHtml(tab)
}
// Get the page Cookie
async function crawlerCookie() {
const tab = await getCurrentTab()
tab && getCookie(tab)
}
// Get the page title
async function crawlerTitle() {
const tab = await getCurrentTab()
tab && getTitle(tab)
}
function getHtml(tab) {
chrome.tabs.sendMessage(tab.id, { type: 'html' }, async data => {
alert(data)
notification({
title: 'Get HTML collection results prompt'.content: 'Collection successful'
})
// Now you can happily make some Ajax requests to send to the server..
// const bgTab = chrome.extension.getBackgroundPage()
// const res = await bgTab.sendData(data)
// notification({
// title: 'Collection result prompt ',
// content: res.code === 1 ? 'acquisition success: res. MSG | |' acquisition failure '
// })})}function getCookie(tab) {
const name = document.getElementById('cookieKey').value
if(! name) {return notification({
title: 'Cookie result prompt'.content: 'Please enter the key for the Cookie'
})
}
chrome.cookies.get({ url: tab.url, name }, cookie= > {
if(! cookie) {return notification({
title: 'Cookie result prompt'.content: 'No response cookie found, please clear cache refresh and try again'})}console.log(23, cookie);
const cookieText = `${cookie.name}=${cookie.value}`
copyCookie(tab, cookieText)
})
}
function copyCookie(tab, cookie) {
chrome.tabs.sendMessage(tab.id, { type: 'clipboard'.text: cookie }, () = > {
notification({
title: 'Cookie result prompt'.content: 'Cookie copied to clipboard '})})}function getTitle(tab) {
return notification({
title: 'Get page title hint'.content: tab.title
})
}
init()
Copy the code
- Chrome. Tabs and Chrome. notifications are Google built-in variables that can be used to get current tabs, popups, etc.
- The code here initializes the binding listener event
- Note that js is not inline, Google plugin does not support
Step5: Modify content_scirpt so that popup can communicate with the current page
/ / to monitor
chrome.runtime.onMessage.addListener(async (msg, sender, sendResponse) => {
if (msg.type === 'html') {
const node = document.querySelector('html')
sendResponse(node.innerHTML)
} else if (msg.type === 'clipboard') {
// Copy to the clipboard
await clipboard(msg.text)
sendResponse()
}
// Continue adding events
// code...
})
Copy the code
- Content_scirpt.js is injected into your current page when you open a new one, and it executes the DOM Api
- So here we can get the HTML directly from document.querySelector(‘ HTML ‘) and return it back to Popup
- The clipboard method was also specified earlier when the MANIFEST was configured
- When it comes to this step, the function is basically realized, you can happily send the data processing to the server ~
Step6 send ajax (not covered in this article, but covered briefly)
- Axios.js is available for download in the plugins directory
- In background.js, all Api related operations can be placed directly in background for simple plug-ins.
=======background.html=======
<! DOCTYPEhtml>
<html lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="Width = device - width, initial - scale = 1.0" />
<title>Background control page</title>
</head>
<body>This is the background control page</body>
<script src="Plugins/axios - 0.21.1. Js"></script>
<script src="js/background.js"></script>
</html>
Copy the code
=======background.js=======
/** * Data transfer *@param {String} Username username *@param {String} The data data * /
async function sendData({ username, data } = {}) {
return axios.post('/api/... ', {
username,
data
})
}
Copy the code
The installation
- Go to Google Extensions management page Chrome :// Extensions/and load the new folder directly
- Try to package it for someone else and send a CRX file and just click on the package and select the directory and you’re done
To summarize
- Popup writes the plug-in page
- Popup.js binding event, and other data retrieval operations such as obtaining page nodes by communicating with Conent_scirpt via chrome.tabs. SendMessage event mechanism when dom and other operations are needed
- Get the data and use AXIOS to interact with the server
- For some needs, Google plugins are popular
- The file directory provided in this article is just one example. In fact, it can be very flexible to set up and package some plug-ins. This article only briefly mentions some steps of development
The enclosed
- Github complete source code, seek star
- Personal blog