preface

Based on the last article, we’ve slipped back in to update the article when business was slow, and this time we’ll talk about the data storage issues with crawls.

Required dependency installation

Mysql (database, I believe you are familiar with)

npm install mysql

2. Co-mysql (async callback form, async await form)

 npm install co-mysql

Basic configuration implements simple database configuration

Speaking of databases, the first thing that comes to mind is some of the things that are necessary to log in to a database, so let’s talk about a little configuration of the database. First let’s create a new config.js. (The following code does not understand the words, you may have to baidu.)

const mysql = require('mysql'); var host = 'localhost'; Var IP = 'http://127.0.0.1:3000' var pool = mysql.createpool ({host: '127.0.0.1', user: 'root', password: Database: 'personal-db', connectTimeout: 30000}) module.exports = {IP: IP, pool: pool, host: host, }Copy the code

Code logic implementation

1. First, let’s start with the code, which is actually quite simple. Let’s take a walk.

const mysql = require('./config'); Const wrapper = require('co-mysql') let db = wrapper(mysql.pool); // use co-mysql to connect to our databaseCopy the code

Mysql > insert into database

let addSql = 'INSERT INTO finance SET ? 'Copy the code

Maybe many friends will say, why my SQL always prompt syntax error. I wrote it according to the rookie tutorial. The problem is, this is not a rookie tutorial error, because we are currently installing the latest version, so the syntax will be slightly different.

Low version of writing (refer to the ‘www.runoob.com/nodejs/node… ‘) those with SQL basics can skip it.

var addSql = 'INSERT INTO websites(Id,name,url,alexa,country) VALUES(0,? ,? ,? ,?) ';Copy the code

We can reference (www.npmjs.com/package/mys.) This is where the latest documentation is, as shown below.

3. In the implementation of code

Query (addSql, obj). Catch (err => {console.log(err); }) // Just a few lines of code to implement our specific function.Copy the code

Complete code implementation

1. Fixed some minor problems in the last article ().

const puppeteer = require('puppeteer') const baseUrl = 'https://www.yicai.com'; const mysql = require('./config'); const wrapper = require('co-mysql') let db = wrapper(mysql.pool); let addSql = 'INSERT INTO finance SET ? '// wait 3000 milliseconds const sleep = time => new Promise(resolve => {setTimeout(resolve, time); }); (async () => {const browser = await puppeteer.launch({headless: false, // browser interface startup slowMo: Args: ['--no-sandbox'], dumpio: false, devTools: true, // Dev mode}); const page = await browser.newPage(); await page.goto(baseUrl, { waitUntil: 'networkidle2' }); // Because the AD page appears, which affects the click event, $eval('.m-layer', (el, value) => el.setattribute ('style', value), 'display: None ') await page.waitfor (2000) // waitFor a node to load await page.waitforselector ('.u-btn') for (let index = 0; index < 1; index++) { await page.waitFor(2000); await page.click('.u-btn') } const result = await page.evaluate(() => { let apiUrl = 'https://www.yicai.com'; let $ = window.$; // var items = $('.m-con a') Var items = $('# headList ').children('a') var links = [] if (items.length >= 1) {items.each((index, item) => { let it = $(item) let articleTitle = it.find('h2').text() let articleIntroduction = it.find('p').text() let imageAddress = it.find('img').attr('src') let createdTime = it.find('span').text() let detailPage = it.attr('href') links.push({ articleTitle, articleIntroduction, imageAddress, createdTime, detailPage: ApiUrl + detailPage})})} return links}) // Defer 1s await sleep(1000); (async () => { let allResult = [] for (let i = 0; i < result.length; i++) { let detailInfo = ''; // Article details await page.goto(result[I].detailPage, {waitUntil: 'networkidle2'}); // Const element = await page.$(".m-text"); if (element) { await page.waitForSelector('.m-text') } var obj = await page.evaluate((result, i) => { let $ = window.$; if ($("#multi-text").length > 0) { var pageDom = $('#multi-text') if (pageDom) { detailInfo = pageDom[0].innerHTML } result[i].detailInfo = detailInfo return obj = result[i] } else { result[i].detailInfo = '' return obj = result[i] } }, result, i) await db.query(addSql, obj).catch(err => { console.log(err); }) allResult.push(obj) await page.waitFor(2000); await page.goBack(); } console.log(JSON.stringify(allResult)) await page.close(); / / -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- / / page / / await test module page.goto('https://www.yicai.com/image/100730324.html', { // waitUntil: 'networkidle2' // }); // const element = await page.$(".m-text"); // if (element) { // await page.waitForSelector('.m-text') // } // var obj = await page.evaluate(() => { // let obj = {}  // let $ = window.$; // if ($("#multi-text").length > 0) { // var pageDom = $('#multi-text') // if (pageDom) { // console.log('11111111111111') // // https://www.yicai.com/video/100728928.html // // https://www.yicai.com/image/100730324.html // detailInfo = pageDom[0].innerHTML // }else{ // console.log('222222222222222') // } // } // }) // -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --}) (); }) ();Copy the code

2. Achieve screenshots without falsification. Be honest

Write in the last

Writing articles is just for learning, and you should be honest. There are many deficiencies in the article, I hope that passing partners give more advice, common progress.