Can Excel still do that? Just watch Digg crawler, you may be about to win!!

This article has participated in the “Digitalstar Project” and won a creative gift package to challenge the creative incentive money.

preface

Project Digg is in full swing, and with only a week to go, do you wonder how many people are participating? How many articles have you written? , the distribution of articles at each end? . Check it out with a bighead fish

Solemnly declare

1. The article only from some sides to understand the relevant data, the specific results are mainly official, do not believe this article O (╥﹏╥) O

2. The article is for reference only, without any malice

3. Data sources of the articleDigitalis project – participation in articles as of October 22

4. The space forPartial dimensions (for example, the number of contributions and reading, etc., only the top 30 data are intercepted)

5. Wish us better and better writing in nuggets and write more good articles

The source code to achieve

The overall train of thought

Below is the general data from October 01 to October 22, let’s see how we got it!!

1. The data source

To get the following data, we first need the data source. Of course, the data source is the Excel table maintained by the Mining operation. Originally, WE thought we could download the table directly, but we didn’t find the entry, so we had to manually copy and paste it into the local Excel.

2. Read Excel data

Read the data in the first step of the data source, you can use the open source library XLSX, very convenient to transform into the front-end needs of JSON data, for example, we have an Excel table below

const XLSX = require('xlsx')
// Read file data
const workbook = XLSX.readFile('./read.xlsx')
const articalClassifyInfo = workbook.Sheets

for (sheetName in articalClassifyInfo) {
  // Workspace data
  console.log(sheetName, XLSX.utils.sheet_to_json(workbook.Sheets[sheetName]))
}
/* Table 1 [{name: 'bighead ', age: 100}] Table 2 [{name:' bighead ', age: 1000}] */


Copy the code

3. Write excel data

The whole solution not only needs to read the data, but also needs to write the final result to Excel, of course, with XLSX we can easily do this operation, for example, to implement the following table.


const XLSX = require('xlsx')
/ / the json data
const ws1Json = [
  {
    'name': 'Bighead Head 1'.'age': 100
  },
  {
    'name': 'Bighead Head 1'.'age': 1000},]const ws2Json = [
  {
    'name': 'Bighead Head 2'.'age': 200
  },
  {
    'name': 'Bighead Head 2'.'age': 2000},]// represents an excel
const wb = XLSX.utils.book_new()
// A sheet in Excel
const ws1 = XLSX.utils.json_to_sheet(ws1Json)
// A sheet in Excel
const ws2 = XLSX.utils.json_to_sheet(ws2Json)
// Add the data sheet to Excel
XLSX.utils.book_append_sheet(wb, ws1, 'table 1')
// Add the data sheet to Excel
XLSX.utils.book_append_sheet(wb, ws2, 'sheet 2')
/ / write file
XLSX.writeFile(wb, './write.xlsx')

Copy the code

Does it feel easy to read and write Excel without much understanding of excel itself

4. Access to article details interface

The details of an article, such as the number of views, comments, likes and favorites, can be obtained through this interface, and only the article ID is required

// Request parameters

method: 'post'
url: 'https://api.juejin.cn/content_api/v1/article/detail'
data: {
  article_id: 'xxx'
}

// Response values, only the data we care about are listed here
const { article_info, author_user_info } = result.data.data || {}
// Views, likes, comments, favorites
const { view_count: viewCount, digg_count: diggCount, comment_count: commentCount, collect_count: collectCount } = article_info || {}
// User name userId
const { user_name: userName, user_id: userId } = author_user_info || {}


Copy the code

4. Overall implementation

const axios = require('axios')
const randomUa = require('random-ua')
const XLSX = require('xlsx')
// Get the article category newsletter
const getClassifyInfo = () = > {
  const workbook = XLSX.readFile('./star.xlsx')

  return workbook.Sheets
}

// Get individual article details
const getArticalDetail = async (articleId) => {
  return new Promise(async (resolve) => {
    let resultData = null

    try {
      const result = await axios({
        method: 'post'.url: 'https://api.juejin.cn/content_api/v1/article/detail'.data: {
          article_id: articleId
        },
        headers: {
          origin: 'https://juejin.cn'.referer: 'https://juejin.cn/'.'user-agent': randomUa.generate()
        }
      })

      const { article_info, author_user_info } = result.data.data || {}
      // Views, likes, comments, favorites
      const { view_count: viewCount, digg_count: diggCount, comment_count: commentCount, collect_count: collectCount } = article_info || {}
      // User name userId
      const { user_name: userName, user_id: userId } = author_user_info || {}

      resultData = {
        userName,
        userId,
        viewCount,
        diggCount,
        commentCount,
        collectCount
      }
    } catch (err) {
      console.log('Error getting single article, article ID is:', articleId)
    }

    resolve(resultData)
  })
}
// Sleep for a random period of time to prevent risk control by hit crawlers
const sleep = (timeout) = > {
  return new Promise((resolve) = > {
    setTimeout(() = > {
      resolve()
    }, timeout)
  })
}

const init = async() = > {const articalClassifyInfo = getClassifyInfo()
  const wb = XLSX.utils.book_new()
  let count = 0
  let startTime = Date.now()
  // articalClassifyInfo has four sheets: front end, back end, Android and ios
  for (sheetName in articalClassifyInfo) {
    // Read the contents of each sheet, an array
    const sheetContent = XLSX.utils.sheet_to_json(articalClassifyInfo[sheetName])

    let sheetJson = {}

    for (classifyInfo of sheetContent) {
      // Read the link and intercept the article id
      const articalLink = classifyInfo['link']
      const articalId = articalLink.match(/\/(\d+)/) [1]

      count += 1
      // Sleep at random
      await sleep(Math.random())

      console.log('------- start obtaining${articalLink}Article data, no${count}Article `)

      const articalDetail = await getArticalDetail(articalId)

      console.log(` -- -- -- -- -- -- --${articalLink}Article data end ')
      const name1 = 'Username'
      const name2 = 'Number of entries'
      const name3 = 'Reading volume'
      const name4 = 'Likes'
      const name5 = 'Number of comments'
      const name6 = 'Collection number'

      const originArticaldetail = sheetJson[articalDetail.userId] || {}
      // Count each author's article data according to the userId as the unique key
      sheetJson[articalDetail.userId] = {
        [ name1 ]: articalDetail.userName,
        [ name2 ]: (originArticaldetail[ name2 ] || 0) + 1,
        [ name3 ]: (originArticaldetail[ name3 ] || 0) + articalDetail.viewCount,
        [ name4 ]: (originArticaldetail[ name4 ] || 0) + articalDetail.diggCount,
        [ name5 ]: (originArticaldetail[ name5 ] || 0) + articalDetail.commentCount,
        [ name6 ]: (originArticaldetail[ name6 ] || 0) + articalDetail.collectCount,
      }
    }
    // Generate each sheet table
    const ws = XLSX.utils.json_to_sheet(Object.values(sheetJson))

    XLSX.utils.book_append_sheet(wb, ws, sheetName)
  }

  let endTime = Date.now()

  console.log('Overall time', endTime - startTime)
  / / write to excel
  XLSX.writeFile(wb, 'out.xlsx')
}

init()


Copy the code

An overview of the data

Detailed data can be viewed here, source address click

Number of writers at each end

The front end is still too curly O (╥﹏╥)o

The different end	The number of
The front end	165
The back-end	143
android	30
ios	12

Distribution of the number of people writing articles at each end

Envy can write 10 or even dozens of students, why can do so high production?

The different end	Article 1	Article 2	Article 3 ~ 4	6 ~ 10	11 ~ 20 articles	21 + article
The front end	51	30	38	31	12	4
The back-end	28	20	33	31	21	10
android	6	4	5	12	2	1
ios	3	1	3	3	2	0

Number of submissions

The user name	Number of submissions
Mancuoj	67
Doing will	28
Watermelon watermelon	27
It’s persimmon	23
_ Yang Chen	19
Honest front man	19
XianEr XianEr	15
_Battle	14
YK bacteria	14
jsmask	14
stevezhao6	14
Frozen fish	12
Please call me Ken	11
The other side of the sea	11
Axjy	11
BraveWang	11
A dust ss	10
Time footsteps	9
tangxd3	9
zekelove	9
xn213	8
Early risers _ nuts	8
The battlefield packet	7
Try again tomorrow	7
Small Bryant _	7
Fish sauce	7
Jing Yu	7
Dumb little Y	6
Still, the world is beautiful	6
Saltwater fish at ease	6

reading

The user name	reading
Front end bighead fish	76286
The world of mortals refined heart	65526
Tya o	43850
Sunshine_Lin	31597
_ Yang Chen	18936
_Battle	15347
The front-end picker	14022
The front end little wisdom	13956
Carmelo Anthony, Nuggets	13908
YK bacteria	13048
CUGGZ	12884
jsmask	10560
Fishing alone cold river snow	10227
Tan Guangzhi	10062
Run, Lou	8977
Cold grass	8220
According to technology	7625
Dumb little Y	7152
Ned	6417
Mancuoj	6325
Try again tomorrow	6200
Still, the world is beautiful	6081
The cloud of the world	5892
Bingbing, you little cutie	5570
Honest front man	5567
A guard old ape	5086
DevUI team	4245
Time footsteps	4198
Light burn aware of humidity	4196
The battlefield packet	4167

Thumb up for

The user name	Thumb up for
The world of mortals refined heart	2613
Front end bighead fish	2595
Sunshine_Lin	1451
Tya o	966
Mancuoj	901
_Battle	626
CUGGZ	539
The front-end picker	432
Tan Guangzhi	415
Run, Lou	399
The front end little wisdom	398
Please call me Ken	380
_ Yang Chen	359
Carmelo Anthony, Nuggets	348
Doing will	324
The battlefield packet	281
YK bacteria	273
Honest front man	266
Fishing alone cold river snow	213
Dumb little Y	209
The sea has	194
jsmask	192
Still, the world is beautiful	167
Light burn aware of humidity	161
Cold grass	156
The cloud of the world	150
Bingbing, you little cutie	148
Frozen fish	133
According to technology	126
Try again tomorrow	123

comments

The user name	comments
The world of mortals refined heart	527
Front end bighead fish	383
Tya o	223
Sunshine_Lin	160
The front-end picker	111
Mancuoj	72
Still, the world is beautiful	70
Bingbing, you little cutie	68
_ Yang Chen	67
CUGGZ	64
Curly_Brackets	61
Cold grass	60
Run, Lou	59
YK bacteria	57
Fishing alone cold river snow	53
_Battle	50
Dumb little Y	50
The front end little wisdom	49
Frozen fish	46
The battlefield packet	45
vaelcy	43
jsmask	42
The cloud of the world	41
Doing will	40
The sea has	38
According to technology	36
ndz	36
DevUI team	35
Ned	34
AEI	34

Collect the number

The user name	Collect the number
Front end bighead fish	3849
The world of mortals refined heart	2930
Sunshine_Lin	2226
CUGGZ	1047
Tya o	988
Run, Lou	549
The front end little wisdom	526
_ Yang Chen	522
Tan Guangzhi	499
Carmelo Anthony, Nuggets	442
The front-end picker	367
_Battle	268
Fishing alone cold river snow	239
Orange sb.	175
The battlefield packet	156
Try again tomorrow	148
iwhao	140
Light burn aware of humidity	134
A guard old ape	114
jsmask	112
Bingbing, you little cutie	109
The sea has	106
The cloud of the world	95
Honest front man	90
Time footsteps	90
HelloGitHub	85
Mancuoj	84
Ned	81
Zhengcai cloud front-end team	77
dragonir	76

At the end

“Welcome to the discussion in the comments section. The excavation authorities will draw 100 nuggets in the comments section after project Diggnation. See the event article for details.”