I am participating in the Mid-Autumn Festival Creative Submission contest, please see: Mid-Autumn Festival Creative Submission Contest for details
Next week is the Mid-Autumn Festival, I wish you a happy Mid-Autumn Festival in advance.
Today we are going to use JS to write a program to crawl the first 100 pages of mooncake sales on Jingdong and see how much mooncakes can be sold every day after the Mid-Autumn Festival.
The data is for reference only and accuracy is not guaranteed.
Thank you for your help. It’s not easy to stay up late to write
The technology that’s going to be used
-
Tampermonkey – Google Chrome plugin
-
JavaScript native DOM manipulation
-
Fetch request
-
Async await delay
-
Express creates data storage API, statistics API
-
Node.js reads the JSON file
-
Deploy to Tencent Cloud Serverless service
Statistical data presentation
Note that the data of 2021-9-7 is mock data, so that the thanLastDay field of 2021-9-8 can calculate the data
Field Description:
{
"date": "2021-9-8"./ / date
"total": "8.9026 trillion".// Total sales as of date
"thanLastDay": "76.87 million" // How much did the total sales increase compared with the previous day
}
Copy the code
Now let’s get started
1. Install the Tampermonkey plug-in
If you can go online directly to science, visit the official link below to install
Chrome.google.com/webstore/de…
If you can’t surf the Internet scientifically, go to Baidu and search for Tampermonkey. There are many websites that provide local installation methods, but I won’t provide them here to avoid infringement.
2. Write scripts to crawl jingdong moon cake data
After the installation is successful, it is displayed in the upper right corner of the browser, as shown in the figure below
Enter the home page of JINGdong first, search for moon cakes and enter the list of products
Then click the Admin panel to enter the script list page, where you can turn a script on or off
Then, click the + sign to create a new script
I’ve got a simple script here that I can paste in
// ==UserScript==
// @name JD mooncake
// @namespace http://tampermonkey.net/
/ / @ version 0.1
// @description is used to crawl 100 pages of commodity data
// @author
// @match https://search.jd.com/**
// @icon https://www.google.com/s2/favicons?domain=jd.com
// @grant none
// ==/UserScript==
(function() {
'use strict';
// Get the number of sales
function getNumber(str) {
if (str.includes('m +')) {
return parseInt(str) * 10000
}
return parseInt(str)
}
// Wait function
function sleep(time) {
return new Promise((resolve, reject) = > {
setTimeout(resolve, time * 1000)})}async function main() {
// Wait for the first page data to load
await sleep(3)
for (let i = 0; i < 100; i++ ){
// Scroll to the bottom
window.scrollTo(0.18000)
// Wait for the bottom data to load
await sleep(3)
// Scroll the bottom again in case any data is not loaded
window.scrollTo(0.18000)
// Wait for the bottom data to load
await sleep(2)
// Calculate the total price of all goods sold
await getTotal()
// Jump to the next page
document.querySelector('#J_bottomPage > span.p-num > a.pn-next').click()
// Wait for the next page of data
await sleep(3)}}async function getTotal() {
let pageTotal = 0
document.querySelectorAll('#J_goodsList > ul > li').forEach(el= > {
// Commodity prices
const price = parseFloat(el.querySelector('.p-price i').innerText)
// Product evaluation quantity
const saleNum = getNumber(el.querySelector('.p-commit a').innerText)
console.log(price, saleNum)
//
pageTotal += price * saleNum
})
// Will this page sales general
const res = await fetch('http://localhost:9000/save', {
method: 'POST'.headers: {
'Content-Type': 'application/json',},body: JSON.stringify({pageTotal})
})
const json = await res.json()
console.log('Success:', json);
}
// Run the program
main()
})();
Copy the code
- First, a for loop, fixed 100, because jingdong’s product list page is 100 pages in total
- Scroll to the bottom of the page, because some of the list data is asynchronously loaded by Ajax
sleep
Function to wait a fixed time, using async await syntax- Then wait 3 seconds before scrolling to the bottom in case the data is not loaded
- Then use the
document.querySelectorAll
Get all the items on the page - Then use the
document.querySelector
Get the price and rated quantity for each item - Calculate total page sales
pageTotal
- And then use
fetch
requestNode.js
Storage API that stores the sales calculated on the current page for subsequent analysis - Finally, I went to the homepage of JINGdong and searched for moon cakes. Then I entered the search page and waited for the page to turn to the last page 100. When the data collection was completed, I could do something else, which would take a long time.
Now, let’s take a look at the demo
[Nuggets can’t upload video, whoo-hoo…]
3. Build storage and analysis apis with Express
The following code
const express = require('express')
const cors = require('cors');
const path = require('path')
const fs = require('fs')
var app = express();
app.use(express.json())
app.use(express.urlencoded({extended: true}))
app.use(cors())
// Get the statistical data
app.get('/get'.(req, res) = > {
const data = []
// Get the total sales for the specified date
const getTotal = (date) = > {
const filePath = path.join(__dirname, 'data'.`${date}.json`)
if(! fs.existsSync(filePath)) {return 0
}
const data = JSON.parse(fs.readFileSync(filePath))
if (data.today) {
return data.total;
}
const total = data.data.reduce((total, currentValue) = > {
return total + Math.floor(currentValue) / 10000;
})
// The total number of caches, not counted next time
data.total = total; / / unit
fs.writeFileSync(filePath, JSON.stringify(data))
return total;
}
// Gets the last day of the specified date
const getLastDay = (dateTime) = > {
let date_ob = new Date(dateTime);
date_ob.setDate(date_ob.getDate() - 1)
let date = date_ob.getDate();
let month = date_ob.getMonth() + 1;
let year = date_ob.getFullYear();
let today = year + "-" + month + "-" + date;
return today
}
// All statistics date data
const dateList = fs.readdirSync(path.join(__dirname, 'data'))
// Return the data to calculate the increase from the previous day
dateList.forEach(fileName= > {
const date = fileName.replace('.json'.' ')
data.push({
date,
total: Math.floor(getTotal(date) / 10000) + '亿'.thanLastDay: getTotal(getLastDay(date)) ! = =0 ? Math.floor(getTotal(date) - getTotal(getLastDay(date))) + '万' : 'No data at present'})})// In descending order by date
res.send(data.sort((a,b) = > new Date(b.date) - new Date(a.date)))
});
// Store 100 pages of merchandise sales for the day
app.post('/save'.(req, res) = > {
// Get the current date
let date_ob = new Date(a);let date = date_ob.getDate();
let month = date_ob.getMonth() + 1;
let year = date_ob.getFullYear();
let today = year + "-" + month + "-" + date;
// File path
const filePath = path.join(__dirname, 'data'.`${today}.json`)
// If there is no storage file
if(! fs.existsSync(filePath)) { fs.writeFileSync(filePath,JSON.stringify({data: []}}))// Read the file
const data = JSON.parse(fs.readFileSync(filePath))
// Store sales under all items in the current page
data.data.push(req.body.pageTotal)
// Write to json file
fs.writeFileSync(filePath, JSON.stringify(data))
// Return data
res.send(data);
});
app.listen(3000.function () {
console.log('Service started successfully: http://localhost:3000');
});
Copy the code
There are two main APIS
GET - http://localhost:9000/get
Copy the code
The data used to obtain statistics is structured as follows
[{"date": "2021-9-8"./ / date
"total": "8.8615 trillion".// Total sales
"thanLastDay": "43.38 million" // An increase in sales over yesterday
},
{
"date": "2021-9-7"."total": "8.8615 trillion"."thanLastDay": "No data at present"}]Copy the code
POST - http://localhost:9000/save
Copy the code
It is used to store sales per page for the day, and the data will be stored in the data/ current date. json file
{"data": [885434000.692030500.234544840.601344769.5.172129350.182674704.6.133972752.6.205753590.80450922.77355786.19999999.151456533.110421752.92058113.7.303276508.174283087.7.271311291.3.63696476.8.141753035.7.338476616.4.270641094.86462147.27128625.36139929.45965566.900000006.72166439.10000001.192549501.10540359.4.69775609.4.22760644.18128574.6.4775594.2.11293833.100000001.69100044.5.18697712.7.5837212.3.10642395.6.12401900.700000003.7687292.750000001.5542854.199999999.6173778.3.15844723.86.312611521.7.322072634.2.57924578.365159510.31830203.6.37628351.7.11473636.700000001.25383806.799999997.30270479.9.82777935.4.71801949.17886438.4.76748973.5.29326328.4.11953917.4.5390966.8.25723722.5.9660846.33003014.7.35118788.5.11297238.8.7611442.84.19172848.34.6824560.18840682.700000003.13633325.1.61348156.3.32949962.4.28584186.1.25574649.3.40607000.4.27084038.700000003.34280644.35.13503164.6.7837763.899999999.27559845.42.12587807.8.11210537.2.10225227.48.14791757.24.14573441.399999999.5919098.6.7467049.7.26552201.6.6259477.100000001.7240613.68.5715078.5421074.500000001.6174596.500000001.12098670.3628428.2.5442460.100000001.6925294.8.16266156.259999998.7562844.060000001.16977870.1.6701592.3999999985.6060801.6081381.699999999]}
Copy the code
- Mainly used in projects
fs.writeFileSync
andfs.readFileSync
To read and write JSON files cors()
Middleware to open up cross-domains
4. Deploy to the Tencent Cloud Serverless service
Finally, I deployed the Express service to the cloud for everyone to see
-
Change the listening port of express project to 9000 (Tencent cloud must be 9000)
-
Create the scf_bootstrap startup file
#! /bin/sh
npm run start
Copy the code
-
Log in Tencent Cloud Serverless console, click on the left function service
-
Click on the New button
-
Select “Custom Create”
- Function type: Select Web Function.
- Function name: Fill in your own function name.
- Region: Enter your function deployment region. The default is Guangzhou.
- Operating environment: Select Nodejs 12.16.
- Deployment mode: Select Code Deployment and upload your local project.
- Submit method: Select Upload Folder locally.
- Function code: Select a specific local folder for the function code.
- Choose to complete
See the video below for details
[Nuggets can’t upload video, whoo-hoo…]
After successful deployment, we will provide Tencent cloud address, which can be used to test the service
service-ehtglv6w-1258235229.gz.apigw.tencentcs.com/release/get
Note:
- Tencent Cloud Serverless has a certain amount of free use, details see
- Serverless allows file modification, so
/save
The service will report an error and the solution can be mountedCFS file systemI won’t bother. I have to pay.
Github source code:
Github.com/cmdfas/expr…
5. To summarize
At the end of the day, we’re done with everything from climbing 100 pages of data per day with oilmonkey, to storing it in JSON files with Express, to calculating the daily margin. Fulfill the requirement of calculating mooncake sales on a daily basis.