The article teaches you how to do nuggets station data capture, data analysis, and finally form a ranking after sorting.
0821 update: nuggets before the total amount of thumb up 5000 rankings released | nuggets total volume, top 5000 ranking (20190821).
0827 Update: [third party Gold mining function] Gold mining personal data statistics, third party gold mining user dashboard
The project started because I suddenly wanted to see what quality authors there were in the Nuggets station. In order not to miss every one, I chose to directly grab all the articles in the station to find the authors and rank them. Everyone attention + article reading a dragon walk!
Project address Juejin-Spider welcomes Star Issue
Nuggets spider and data analysis, mainly focus on the following several rankings and statistics, rankings click to view directly
- Total number of tags in nuggets station
- Nuggets station under the tag article
- Nuggets User Ranking (Top 5000)
- Article rank by comment volume
- Top likes
- Page view ranking
First on the Nuggets top 50 rankings, follow a wave of ???? The top 5,000 are here
🎉 grade, 👦 number of concerns, 🏠 company
- (1)[🎉 4][👦 67909] [🏠 Nuggets] Yin Ming
- (2)[🎉 5][👦 47061] [🏠 rare Earth
- (3)[🎉 5][👦 45676] [🏠 Alibaba] HollisChuang
- (4)[🎉 5][👦 44229] [🏠] Tencent Cloud Plus Community
- (5)[🎉 3][👦 37565] [🏠 front-end external review network] front-end external review network
- (6)[🎉 0][👦 37062] [🏠 SN] Ding Yi1
- (7)[🎉 3][👦 34825] [🏠 Tencent AlloyTeam -> Tencent Cloud -> Shopee] Li CHENGXI
- (8)[🎉 3][👦 34588] [🏠] Liutao
- (9)[🎉 3][👦 33436] [🏠 Yi Express] Ink cold
- (10)[🎉 1][👦 30516] [🏠 former Nuggets] NeXT
- (11)[🎉 4][👦 28101] [🏠 public number [ocean number]] Superman Wang Xiaojian
- (12)[🎉 4][👦 27221] [🏠] stormzhangV
- (13)[🎉 5][👦 25833] [🏠] Java3y
- (14)[🎉 2][👦 25707] [🏠 call technology] Call technology _Zoran
- (15)[🎉 5][👦 25237] [🏠 Meituan] Technical team
- (16)[🎉 0][👦 23913] [🏠] Liu Xin
- (17)[🎉 6][👦 23829] [🏠 Song Xiaocai] yck
- (18)[🎉 5][👦 22345] [🏠 public id “crossoverJie”] crossoverJie
- (19)[🎉 6][👦 21367] [🏠] Technical fat
- (20)[🎉 5][👦 21170] [🏠] Architectural notes of Huperia
- (21)[🎉 3][👦 21100] [🏠 Alibaba Group] Xianyu Technology
- (22)[🎉 1][👦 20815] [🏠 Didi] Sun Fusheng
- (23)[🎉 5][👦 20785] [🏠 former netease, now Hello] Wood Yiyang said
- (24)[🎉 2][👦 20642] [🏠 Yiyun Technology] AleCC
- (25) [🎉 0] [20562] 👦 [travel] 🏠 drabs five_years_struggle
- (26)[🎉 5][👦 20196] [🏠 ThoughtWorks Entry] SnailClimb
- (27)[🎉 2][👦 20065] [🏠 ofo] Monkeys moved to the rescue
- (28)[🎉 3][👦 20058] [🏠 HUAWEI] Rain god grandpa
- (29)[🎉 2][👦 19307] [🏠 Fintech] Taotao.li
- (30)[🎉 4][👦 19068] [🏠 public number [code hole]] Old money
- (31)[🎉 2][👦 18847] [🏠] Phoenix tail
- (32)[🎉 5][👦 18465] [🏠] Hu Yu
- (33)[🎉 5][👦 18390] [🏠 Tencent wechat] Carson_Ho
- (34)[🎉 2][👦 18318] [🏠 zhisheng
- (35)[🎉 0][👦 17887] [🏠 freelance] IT program lion
- (36)[🎉 3][👦 17741] [🏠 Goertek] is great
- (37)[🎉 4][👦 17633] [🏠 pure source code analysis, the current source code analysis 500+] impression channel source _ to persuade people _ do not accept
- (38)[🎉 3][👦 17588] [🏠 Fat Orange Network] KyXu
- (39)[🎉 5][👦 17535] [🏠 Fundebug] Fundebug
- (40)[🎉 0][👦 16984] [🏠 Tencent] flike
- [🎉 3][👦 16962] [🏠 Baidu] Beard big ha
- (42)[🎉 4][👦 16827] [🏠] Old Driver iOS Weekly
- (43)[🎉 4][👦 16364] [🏠] The heart of the machine
- (44)[🎉 1][👦 15699] [🏠 AXE
- (45)[🎉 3][👦 15466] [🏠] Mockplus
- (46)[🎉 5][👦 15448] [🏠 Tencent Technology (Shenzhen) Co., LTD.] Tencent IVWEB team
- (47)[🎉 6][👦 15421] [🏠 Shanghai] OBKoro1
- [🎉 5][👦 15362] [🏠 ELEME] Sunshine
- (49)[🎉 2][👦 15164] [🏠 ucashin.com] MrMuscles
- (50)[🎉 3][👦 15077] [🏠] Disabled
The script
Full station tag fetching
Get all tag information in the nuggets station
npm run tagList
Copy the code
The label information written to the SRC/assets/tagList/tagList json, each tag contains the following information, mainly is the title and id
{
"id": "5597a063e4b08a686ce57030"."title": "Back-end"."createdAt": "2015-07-04T00:59:16Z"."updatedAt": "2017-06-18T23:34:00Z"."color": "#C679FF"."icon": "https://p1-jj.byteimg.com/tos-cn-i-t2oaga2asx/leancloud-assets/d83da9d012ddb7ae85f4.png~tplv-t2oaga2asx-image.image"."background": ""."showOnNav": true."relationTagId": ""."alias": "backend houduan"."isCategory": true."entryCount": 19840."subscribersCount": 295562."isSubscribe": false
},
Copy the code
Full station article crawl
All articles under all tags in the whole site will be collected. The collection process will be different due to network speed and machine performance. Please wait patiently for the completion of the collection
The data collected in this step is very important and forms the basis for all subsequent analysis
The collected files are stored under SRC/Assets /articleData and contain a number of JSON files, each containing all the column meta information under the tag
npm run allTagData
Copy the code
Each object in the array
{
"collectionCount": 5./ / thumb up
"userRankIndex": 5.4006856695164."buildTime": 1565582852.8327."commentsCount": 2./ / comments
"gfw": false."objectId": "5d40d29d518825221b4cbb40"."checkStatus": true."isEvent": false."entryView": ""."subscribersCount": 0./ / useless
"ngxCachedTime": 1565627197."verifyStatus": true."tags": [{"ngxCachedTime": 1565627193."ngxCached": true."title": "React.js"."id": "555e99ffe4b00c57d99556aa"}]."updatedAt": "The 2019-08-12 T04:07:32. 818 z"."rankIndex": 0.005346156248974."hot": false."autoPass": false."originalUrl": "https://juejin.cn/post/6844903903058739213".// The url of the article
"verifyCreatedAt": "The 2019-07-31 T01: thine. 238 z"."createdAt": "The 2019-07-31 T01: thine. 238 z"."user": {
"community": {
"weibo": { "uid": "5345591282"."nickname": "Trace of Time A88" },
"wechat": {
"avatarLarge": "http://thirdwx.qlogo.cn/mmopen/vi_32/cabLXAUXiavVhiaDh2050AOOEToUvnZTWsSNqqKZC4hzPzHABC7fxwv6VxwebIxfKdaRkYDZoic8UXfonL Dyiafuiaw/132"
},
"github": {
"username": "lxfriday"."avatarLarge": "https://avatars0.githubusercontent.com/u/20264467?v=4"."uid": "20264467"}},"collectedEntriesCount": 154./ / thumb up
"company": "xxx"./ / the company
"followersCount": 35.// The number of followers
"followeesCount": 70./ / concern
"role": "guest".// User role
"postedPostsCount": 19.// The number of columns published
"level": 2.// User level
"isAuthor": false."postedEntriesCount": 2.// How many shares?
"totalCommentsCount": 16.// Total comments
"ngxCachedTime": 1565627197."viewedEntriesCount": 1347.// The number of articles to view
"jobTitle": "Front end".// Work: front-end
"subscribedTagsCount": 166.// The number of tags concerned
"totalCollectionsCount": 120.// Total number of collections
"username": "Cloud sky"./ / user name
"avatarLarge": "https://p1-jj.byteimg.com/tos-cn-i-t2oaga2asx/gold-user-assets/2019/7/14/16bf1155693d96c2~tplv-t2oaga2asx-image.image"."objectId": "57a0c28979bc440054958498" / / user id
},
"author": ""."screenshot": "https://p1-jj.byteimg.com/tos-cn-i-t2oaga2asx/gold-user-assets/2019/7/29/16c3e3d979a96831~tplv-t2oaga2asx-image.image"."original": true."hotIndex": 21.2095."content": The Component constructor does not override the PureComponent constructor when _assign copies the object's attributes, as you can see in the following example. Change PureComponent to Component and userInfo will change normally.."title": React source code series -Component, PureComponent, function Component analysis."lastCommentTime": "The 2019-08-03 T16:53:20. 577 z"."type": "post"."english": false."category": {
"ngxCached": true."title": "frontend"."id": "5562b415e4b00c57d9b94ac8"."name": "Front end"."ngxCachedTime": 1565627098
},
"viewsCount": 267./ / views
"summaryInfo": "After processing, the difference between the three components is not the same type and look not to understand can look at this article https://www.zhihu.com/question/34183746 in js and the difference and relationship between function on the properties of the object is an enumeration, So after redirecting the PureComponent to the constructor..."."isCollected": false
}
Copy the code
Attention ranking
Get site views
npm run follower
Copy the code
After executing the script, two files are generated
SRC/assets/calcUserRank/user followerRank json
Is the meta information after the rankingSRC/assets/calcUserRank/user followerRank. Md
Md documents organized by rank
Thumb up ranking
Get the likes ranking within the site
npm run dianzan
Copy the code
After executing the script, two files are generated
SRC/assets/calcDianzanRank/thumb up rank. The json
Is the meta information after the rankingSRC/assets/calcDianzanRank/thumb up rank, md
Md documents organized by rank
example
- (1)[👍 5409][📌 programmer] Front-end 100 Q: If you can understand 80%, please give me your resume
- (2) [4416] 👍 [📌 Vue. Js] 2018 front interview summary, finish see understand, pay less said plus 3 k | the nuggets technical essay
Total site article page ranking
Get site views
npm run view
Copy the code
After executing the script, two files are generated
SRC /assets/calcViewRank/ views rank.json. Json
Is the meta information after the rankingSRC /assets/calcViewRank/ views rank.json.md
Md documents organized by rank
Total site article comment volume ranking
Get site views
npm run comment
Copy the code
After executing the script, two files are generated
src/assets/calcCommentRank/calcCommentRank.json
Is the meta information after the rankingsrc/assets/calcCommentRank/calcCommentRank.md
Md documents organized by rank
Technical analysis
async
Concurrency controlchalk
Colorful command linerequest
Sending an HTTP requestrequest-promise
Make request Promise easy to use async
The project assistance tool dev Assistant
commitlint
Standardize the commit messageeslint
We all knowprettier
Automatic formatting codehusky
Providing Git hookslint-staged
Perform formatting and ESLint validation only on currently changed filesjest
Test the correctness of the sorting algorithm
How are the top 1000 and top 5000 calculated in 20W data
Build a small top heap, continuously add data to the heap, discard the smaller heap directly, replace the heap top and adjust the binary tree to maintain the small top heap. After going through all the data, the small top heap is the ranking of all the maximum values we want, and then sort the array to get the ranking!!
- Sorting algorithm sortPrev
- Use to calculate the ranking of page views of articles
// The minimum floats up
function heapify(arr, len, i, compareVal) {
let min = i
const l = 2 * i + 1
const r = 2 * i + 2
if (l < len && compareVal(arr[l]) < compareVal(arr[min])) min = l
if (r < len && compareVal(arr[r]) < compareVal(arr[min])) min = r
if(min ! == i) { swap(arr, i, min) heapify(arr, len, min, compareVal) } }@param {array} Target array * @param {*} compareVal gets the comparison from the dataUnit object */
function createHeap(target, compareVal = v => v) {
for (let i = Math.floor((target.length - 1) / 2); i >= 0; i--) {
heapify(target, target.length, i, compareVal)
}
}
function findMaxPrev(dataUnit, target, compareVal = v => v) {
if (compareVal(dataUnit) > compareVal(target[0])) {
target[0] = dataUnit
heapify(target, target.length, 0, compareVal)
}
}
Copy the code
ranking
Page view ranking
👀 page views, 📌 hashtag
- (1)[👀 817784][📌 Android] Dragonfly FM suspected fraud investors and advertisers source code analysis
- (2)[👀 471926][📌 vue.js] Hand in hand, take you to use Vue Background Series 1 (Foundation)
- (3)[👀 336824][📌 vue.js]
- (4)[👀 261110][📌 interview] Middle and senior front-end factory interview secret, escort for you gold, silver and four, direct to dacheng (1)
- (5)[👀 173030][📌 programmer] Front-end 100 Q: If you can understand 80%, please give me your resume
- (6)[👀 147633][📌 Go] Gos: Go MODULE Solution 💪
- (7)[👀 143114][📌 JavaScript] The first wechat small program development tutorial!
- (8)[👀 140469][📌 vue.js]
- (9) [139499] 👀 [📌 Vue. Js] 2018 front interview summary, finish see understand, pay less said plus 3 k | the nuggets technical essay
- (10)[👀 137958][📌 vue.js] Hand touch, take you with Vue Wand Series 4 (vueAdmin a minimalist background foundation template)
- (11)[👀 120472][📌 JavaScript] 28 JavaScript skills that a qualified intermediate front-end engineer must master
- (12)[👀 116779][📌 programmer] was suspected to be resigned due to internal infighting, and zte’s programmer after 70 fell from the company
- (13)[👀 105344][📌 JavaScript] This time, thoroughly understand the JavaScript execution mechanism
- (14)[👀 100848][📌 GitHub] 2018 Recommended book list for Java Backend engineers
- (15)[👀 98813][📌 JavaScript] Self-check list of a [qualified] front-end engineer
- (16)[👀 95634][📌 JavaScript] Knowing these 20 regular expressions will save you 1,000 lines of code
- (17)[👀 89452][📌 Front End] NEW ES6, ES7, ES8, ES9, and ES10 features
- (18)[👀 88587][📌 Android] RxJava2 Just read this article enough
- (19)[👀 86096][📌 vue.js] Hand touch, with your elegant use of icon
- (20)[👀 84639][📌 open source] China’s first post-2000 CEO openly copied and copied my open source works
Top likes
👍 likes, 📌 hashtags
- (1)[👍 5488][📌 programmer] Front-end 100 Q: If you can understand 80%, please give me your resume
- (2) [4431] 👍 [📌 Vue. Js] 2018 front interview summary, finish see understand, pay less said plus 3 k | the nuggets technical essay
- (3)[👍 4368][📌 JavaScript] This time, thoroughly understand the JavaScript execution mechanism
- (4)[👍 4216][📌 interview] A qualified (excellent) front-end should read these articles
- (5)[👍 4183][📌 interview] Middle and senior front-end factory interview secrets, escort for you gold, silver and four, direct to the factory (1)
- (6)[👍 3890][📌 JavaScript] Self-check list of a [qualified] front-end engineer
- (7)[👍 3807][📌 vue.js] nearly 20,000 words of small program walkthrough released
- (8)[👍 3701][📌 JavaScript] 28 JavaScript skills that a qualified intermediate front-end engineer must master
- (9)[👍 3664][📌 reace.js] Technology fat 155 episodes of front-end video tutorials – all free to watch
- (10)[👍 3551][📌 Android] Kotlin resources – learn Kotlin look at this tutorial is enough
- (11)[👍 3342][📌 HTML] Front-end common plug-in, tool class library summary, do not make wheel!!
- (12)[👍 3338][📌 vue.js] New Year gift Technology Fat 262 sets of front-end free video to make your walk easier
- (13)[👍 3205][📌 JavaScript] Knowing these 20 regular expressions will save you 1,000 lines of code
- (14)[👍 3202][📌 front-end] Summarizes the interview experience of 100 front-end interviews from the beginning of 2017 to the beginning of 2018 (including answers)
- (15)[👍 2958][📌 Front-end Framework] A Brief Introduction to front-end Architecture of Large-scale Projects
- (16) [2932] 👍 [📌 React. Js] 2018 for the front interview: recruit remember fine alignment (fine) | the nuggets technical essay
- (17)[👍 2902][📌 vue.js] Hand on hand, take you to use Vue
- (18)[👍 2879][📌 JavaScript] Personal Sharing — Web front-end learning resource sharing
- (19)[👍 2871][📌 CSS] 49 CSS knowledge points you may not know
- (20)[👍 2846][📌 JavaScript] JavaScript In-depth series 15 official end!
- (21)[👍 2743][📌 React. Js] The first half of 2018 nuggets wechat group daily quality articles collection: front-end
- (22)[👍 2643][📌 Backend] Backend Architect Technical Atlas
- (23)[👍 2538][📌 vue.js] Problems and Solutions in Vue Project (Update)
- (24)[👍 2520][📌 JavaScript] Webpack
- (25)[👍 2481][📌 Android] spent 4 months organizing 50 Android dry articles
- (26)[👍 2468][📌 vue.js] Encapsulation and API management of Axios in Vue
- (27)[👍 2439][📌 CSS] Dry! All kinds of common layout + well-known site instance analysis
- (28)[👍 2427][📌 JavaScript] JavaScript series 20 officially concluded!
- (29)[👍 2371][📌 react.js] Year-end review puts together a “front-end checklist” for you
- (30)[👍 2340][📌 CSS] What you need to know about mobile adaptation
- (31)[👍 2301][📌 HTML] Front-end Developer Guide (2017)
- (32)[👍 2279][📌 front-end] There is always a list of programming books you want (GitHub)
- (33) [2247] 👍 [] 📌 programmer a front end 2018, 2019 | Denver annual essay is prospected
- [👍 2243][📌 react.js] Just read these articles (Updated June 2019)
- (35)[👍 2239][📌 React. Js] April Front-end Knowledge Collection (monthly collection of articles not to be missed)
- (36)[👍 2219][📌 JavaScript] Front-end advanced necessary, github quality resources to share!
- (37)[👍 2211][📌 HTML] How to get started with Nginx
- (38)[👍 2207][📌 Android] Android Interview Repository
- (39)[👍 2191][📌 JavaScript] “Mid-advanced Front-end interview” JavaScript handwritten code untouchable secrets
- (40)[👍 2189][📌 JavaScript] JS regular expressions
- (41)[👍 2178][📌 Front end] New ES6, ES7, ES8, ES9, and ES10 features
- (42) [2177] 👍 [📌 React. Js] metal surface after | the nuggets technology 25000 words essay
- (43)[👍 2155][📌 vue.js] Vue 2.x Depit remember – to fill the gaps (summarize the XXX frequently asked in the lower group and give unreliable solutions)
- (44)[👍 2141][📌 CSS] WebPack4 – Experience it at first, click it together 11 times
- (45)[👍 2140][📌 angular.js] Encyclopedia of front-end knowledge
- (46) [2132] 👍 [] 📌 interview recorded interview questions, answer questions aren’t good enough (mostly in the Vue) | the nuggets technical essay
- (47)[👍 2064][📌 JavaScript] A more elegant way of writing JavaScript complex judgments
- (48)[👍 2060][📌 interview] In January
- (49)[👍 2019][📌 CSS] Fix flex layout once and for all
- (50)[👍 2013][📌 JavaScript] The first wechat small program development tutorial!
‘Nuggets’ ===’ front-end community ‘????
Comment ranking
🐶 number of comments, 📌 tag
- (1) [756] 🐶 [] 📌 programmer a front end 2018, 2019 | Denver annual essay is prospected
- (2)[🐶 607][📌 vue.js] New Year gift Technology Fat 262 sets of front-end free video to make your walk easier
- (3) [570] 🐶 [rare earth] 📌 than we have from the beginning to | the Denver nuggets
- (4)[🐶 468][📌 JavaScript] Self-check list of a [qualified] front-end engineer
- (5)[🐶 456][📌 boiling point] AMA: I am android developer throwline (Zhu Kai), do you have a question for me?
- (6) [452] 🐶 [📌 boiling point] in your computer desktop | programmers desktop is what kind of?
- (7)[🐶 445][📌 JavaScript] This time, thoroughly understand the JavaScript execution mechanism
- (8)[🐶 438][📌 TypeScript] Discard JS and use TypeScript
- (9)[🐶 418][📌 boiling point] Boiling point: Share your company’s Mid-Autumn Festival benefits
- (10)[🐶 404][📌 Boiling Point] Boiling Point: Tell me what you are studying now.
- (11)[🐶 403][📌 boiling Point] The boss asked how long it will take to develop the demand, what do you answer?
- (12)[🐶 398][📌 open source] China’s first post-2000 CEO openly copied and copied my open source works
- (13)[🐶 396][📌 interview] Middle and senior front-end factory interview secret, escort for you gold, silver and four, direct to dacheng (1)
- (14)[🐶 391][📌 programmer] Boiling Point: We are halfway through 2017, sum up the first half of the year in one sentence
- (15)[🐶 388][📌 Programmer] Boiling Point 16: What songs do you listen to when you write code?
- (16)[🐶 387][📌 Google] What is your favorite Google development technology? Comments sent to Google limited edition speakers, computer bags
- (17)[🐶 359][📌 front-end] Personal views on IT training institutions
- (18)[🐶 357][📌 Entrepreneurship] Boiling Point issue 36: What do you think of 996 work schedule? Answer the boiling point send octopus cat and nuggets T-shirt
- (19)[🐶 354][📌 Front-end Framework] A Brief Introduction to front-end Architecture of Large-scale Projects
- (20)[🐶 344][📌 GitHub]
That’s it. I also counted the total number of articles on the nuggets site and the total number of people who posted under the hashtag
- The total number of articles in nuggets station after weight removal: about 10W, there may be a big statistical error, before weight removal is 20+ W
- Total number of users Posting articles under the hashtag: about 1.5 million
Check out NPM Scripts and start playing
NPM run All The entire process of naming and capturing data is completed. It takes about half an hour to process a large amount of data
"scripts": {
"all": "npm run tagList && npm run allTagData && npm run dianzan && npm run view && npm run comment && npm run follower",
"start": "npm run tagList",
"tagList": "TASK=tagList node App.js",
"allTagData": "TASK=allTagData node App.js",
"composeArticleData": "TASK=composeArticleData node App.js",
"userData": "TASK=userData node App.js",
"dianzan": "TASK=dianzan node App.js",
"view": "TASK=view node App.js",
"comment": "TASK=comment node App.js",
"follower": "TASK=follower node App.js",
"lint": "eslint .",
"test": "jest"
},
Copy the code
Finally, please follow me on Github and wechat
- GitHub
- Wechat official account