It was a sunny afternoon, I started with the first fund in life, from then on, the chicken 🐔 fell into the abyss of eternal doom, after I actually also silly to add a few pens, so far this pit has not filled…
“It’s time to use some of the power of the seal”, I clutching the wrinkled and deflated purse, carrying the node sword, starting from the novice village, killing dragons… Oh, no, the killing of the chicken begins slowly.
asked
I canvassed the village elders named “Net” and finally got hold of three crucial scrolls of information that gave me a glimpse into the kingdom of chicken Spirit.
-
Scroll 1 — hukou scroll: fund.eastmoney.com/allfund.htm…
Here is a list of the code numbers of each chicken and their names, which make me shudder. There are more than 7,000 chicken mouths.
-
Scroll 2 – file scrolls: fund.eastmoney.com/f10/000001.html
This scroll magic, change the code number at the end of the address, you can see the corresponding basic file of the chicken, chicken know that, a hundred battles without danger.
-
Scroll 3 – M scrolls: fund.eastmoney.com/f10/F10Data…
To my surprise, the power of this scroll is really overbearing, and it is also a dynamic scroll, changing the code in the address spell
code
Start datesdate
, deadlineedate
And the number of pagesper
, it can show the chicken’s life schedule, is fat or thin, happy or unhappy…
At this point, chicken jingguo Jiangshan atlas I have all in mind.
The science of uniting
Ancient scroll has given me enough clues, and I knew I in this article the legendary hero aura, so no knot printed then call out of the shelter V8 mang small god beast of the forest, the crawler, node pedigrees of its body action rapidly, sense of smell, give it a chicken hair, it can help me to find a nest, but to the whole chicken countries, still need to train for it.
First I need to acquire the following equipment, so that reptiles and chickens and humans can communicate properly.
const express = require('express'); // Set up the service
const events = require('events'); // Event listener
const request = require('request'); // Send the request
const iconv = require('iconv-lite'); // Decode web pages
const cheerio = require('cheerio'); // Page parsing
const MongoClient = require('mongodb').MongoClient; / / database
const app = express(); // Server instance
const Event = new events.EventEmitter(); // Event listening instance
const dbUrl = "mongodb://localhost:27017/"; // Database connection address
Copy the code
I gave the cute little god beast a kitschy name: FundSpider, and gave it a packaged olfactory enhancer called Fetch:
// Fund crawler
class FundSpider {
// Database name, table name, number of concurrent fragments
constructor(dbName='fund', collectionName='fundData', fragmentSize=1000) {
this.dbUrl = "mongodb://localhost:27017/";
this.dbName = dbName;
this.collectionName = collectionName;
this.fragmentSize = fragmentSize;
}
// Get the url corresponding to the url. In addition to utF-8, specify the page encoding
fetch(url, coding, callback) {
request({url: url, encoding : null}, (error, response, body) => {
let _body = coding==="utf-8" ? body : iconv.decode(body, coding);
if(! error && response.statusCode ===200) {// Load the requested web page into the jquery selector
callback(null, cheerio.load('<body>'+_body+'</body>'));
}else{
callback(error, cheerio.load('<body></body>')); }}); }}Copy the code
Now, sift out the code number for each chicken:
// Get all fund codes in batches
fetchFundCodes(callback) {
let url = "http://fund.eastmoney.com/allfund.html";
// The original webpage code is GB2312, which needs to be decoded accordingly
this.fetch(url, 'gb2312', (err, $) => {
let fundCodesArray = [];
if(! err){ $("body").find('.num_right').find("li").each((i, item) = >{
let codeItem = $(item);
let codeAndName = $(codeItem.find("a") [0]).text();
let codeAndNameArr = codeAndName.split(")");
let code = codeAndNameArr[0].substr(1);
let fundName = codeAndNameArr[1];
if(code){ fundCodesArray.push(code); }}); } callback(err, fundCodesArray); }); }Copy the code
Then, to build a piece of positioning tracking equipment for the crawler, according to the code of the chicken can check its files:
// Get the basic information according to the fund code
fetchFundInfo(code, callback){
let fundUrl = "http://fund.eastmoney.com/f10/" + code + ".html";
let fundData = {fundCode: code};
this.fetch(fundUrl,"utf-8", (err, $) => {
if(! err){let dataRow = $("body").find(".detail .box").find("tr");
fundData.fundName = $($(dataRow[0]).find("td") [0]).text();// Full name of fund
fundData.fundNameShort = $($(dataRow[0]).find("td") [1]).text();// Fund abbreviation
fundData.fundType = $($(dataRow[1]).find("td") [1]).text();// Fund type
fundData.releaseDate = $($(dataRow[2]).find("td") [0]).text();// Release date
fundData.buildDate = $($(dataRow[2]).find("td") [1]).text();// Date of establishment/scale
fundData.assetScale = $($(dataRow[3]).find("td") [0]).text();// Asset size
fundData.shareScale = $($(dataRow[3]).find("td") [1]).text();// Share size
fundData.administrator = $($(dataRow[4]).find("td") [0]).text();// Fund manager
fundData.custodian = $($(dataRow[4]).find("td") [1]).text();// Fund trustee
fundData.manager = $($(dataRow[5]).find("td") [0]).text();// Fund manager
fundData.bonus = $($(dataRow[5]).find("td") [1]).text();/ / share out bonus
fundData.managementRate = $($(dataRow[6]).find("td") [0]).text();// Manage rates
fundData. trusteeshipRate = $($(dataRow[6]).find("td") [1]).text();// Hosting rate
fundData.saleServiceRate = $($(dataRow[7]).find("td") [0]).text();// Sales service rate
fundData.subscriptionRate = $($(dataRow[7]).find("td") [1]).text();// Maximum subscription rate
}
callback(err, fundData);
});
}
Copy the code
The above information has hardly changed since the founding of the Chicken Spirit state, even though they have become fine after the founding. If I had to summon a reptile every time I wanted to look through the archives, I wouldn’t be able to afford enough food. Fortunately, the novice growth gift package received a MongoDB treasure box, free access to the ability, it will be all these files saved, the day after reading can be no trouble.
In the process of training, the reptilian hand is concurrent tracking, I found that more than 7,000 chickens were checked upside down at a time, there would always be about one-third of the missing chickens, it seems that a little too much, in order to control the tracking rhythm of the reptilian, it is time to have a new partner.
// The concurrent controller controls the number of concurrent calls in a single session
class ConcurrentCtrl {
// caller context, number of concurrent segments (recommended not to exceed 1000), calling function, total parameter number group, database table name
constructor(parent, splitNum, fn, dataArray=[], collection){
this.parent = parent;
this.splitNum = splitNum;
this.fn = fn;
this.dataArray = dataArray;
this.length = dataArray.length; / / total number
this.itemNum = Math.ceil(this.length/splitNum); // Segment number
this.restNum = (this.length%splitNum)===0 ? splitNum : (this.length%splitNum); // The number of times left in the last segment
this.collection = collection;
}
// Go (0) initiates the call, and the next fragment concurrency occurs when the number of fragments in the loop count is reached
go(index) {
if((index%this.splitNum) === 0) {if(index/this.splitNum ! = = (this.itemNum- 1)) {this.fn.call(this.parent, this.collection, this.dataArray.slice(index,index+this.splitNum));
}else{
this.fn.call(this.parent, this.collection, this.dataArray.slice(index,index+this.restNum)); }}}}Copy the code
With its help, it would be an ideal rhythm to control the concurrent amount of each action of crawler at about 1000. Then, the crawler was taught to automatically put every captured chicken essence file into the MongoDB treasure chest, from small to large, and the crawler was first told specifically what to do after each concurrent tracking.
// The fund information fragment obtained concurrently is saved to the specified table in the database
fundFragmentSave(collection, codesArray){
for (let i = 0; i < codesArray.length; i++) {
this.fetchFundInfo(codesArray[i], (error, fundData) => {
if(error){
Event.emit("error_fundItem", codesArray[i]);
Event.emit("fundItem", codesArray[i]);
}else{
// Specify that each data is unique identifier is the fund code, easy to query and sort
fundData["_id"] = fundData.fundCode;
collection.save(fundData, (err, res) => {
Event.emit("correct_fundItem", codesArray[i]);
Event.emit("fundItem", codesArray[i]);
if (err) throwerr; }); }}); }}Copy the code
In this way, the crawler learned to report the situation at any time in the tracking process. After the tracking of each route, the signal named fundItem would be sent, and error_fundItem and correct_fundItem would be sent respectively in case of error or success.
Then, with a new partner called ConcurrentCtrl, the crawler can easily be tracked thousands of miles away by telling it to track a codesArray of code numbers:
// Obtain the basic fund information in the given fund code array concurrently, and save to the database
fundToSave(error, codesArray=[]){
if(! error){let codesLength = codesArray.length;
let itemNum = 0; // The number of crawls
let errorItems = []; // Crawl the failed fund code array
let errorItemNum = 0; // The number of fund codes that failed to crawl
let correctItems = []; // Successfully climb the fund code array
let correctItemNum = 0; // The number of fund codes successfully crawled
console.log(Total fund code${codesLength}A `);
// Database connection
MongoClient.connect(this.dbUrl, (err, db) => {
if (err) throw err;
// Database instance
let fundDB = db.db(this.dbName);
// Data table instance
let dbCollection = fundDB.collection(this.collectionName);
// Concurrent controller instances
let concurrentCtrl = new ConcurrentCtrl(this.this.fragmentSize, this.fundFragmentSave, codesArray, dbCollection);
// Event listener
Event.on("fundItem", (_code) => {
/ / count
itemNum++;
console.log(`index: ${itemNum} --- code: ${_code}`);
// Concurrency control
concurrentCtrl.go(itemNum);
// All fund information has been retrieved
if (itemNum >= codesLength) {
console.log("save finished");
if(errorItems.length > 0) {console.log("---error code----");
console.log(errorItems);
}
// Close the databasedb.close(); }}); Event.on("error_fundItem", (_code) => {
errorItems.push(_code);
errorItemNum++;
console.log(`error index: ${errorItemNum} --- error code: ${_code}`);
});
Event.on("correct_fundItem", (_code) => {
correctItemNum++;
});
// Fragment concurrent startup
concurrentCtrl.go(0);
});
}else{
console.log("fundToSave error"); }}Copy the code
So, catch chicken big method is to cultivate, macro can survey chicken fine national household registration files, micro can easily kill a few in the invisible:
// If the fund code array is given, the corresponding information will be obtained and updated to the database
fundSave(_codesArray){
if(! _codesArray){// All fund information is retrieved and saved
this.fetchFundCodes((err, codesArray) = > {
this.fundToSave(err, codesArray); })}else{
// Filter possible non-array entry cases
_codesArray = Object.prototype.toString.call(_codesArray)==='[object Array]' ? _codesArray : [];
if(_codesArray.length > 0) {// Part of the fund information to crawl save
this.fundToSave(null, _codesArray);
}else{
console.log("not enough codes to fetch"); }}}Copy the code
How do you start it? Here’s the mantra, but don’t forget to take the lid off the MongoDB chest.
let fundSpider = new FundSpider("fund"."fundData".1000);
// Update and save all basic fund information
fundSpider.fundSave();
// Update the basic information of the funds saved with codes 000001 and 040008
// fundSpider.fundSave(['000001','040008']);
Copy the code
Go ahead, Picarbugs! I watched the reptiles spawn 1000 phantoms and all disappear together. After 10 seconds of meditation, I opened the MongoDB chest and saw something like this:
I look up to the sky laugh, finally let me know you these chicken all the bottom line! Ah ha ha ha!
Ah wait, even if I know the family age of each chicken, how background, property several sets, but the world is not finished killing chicken, chicken essence is even more so, I want this iron rod for what? So what if I want the file? (˙-˙) Is still in a state of insecurity and full of positive…
What I need: targeted killing of chickens
I almost forgot that there is a third dynamic scroll: the M scroll, with the help of its power, can know whether any chicken is full, fat or thin, or not to catch. Looks like the crawler needs a little more skill.
// Date to string
getDateStr(dd){
let y = dd.getFullYear();
let m = (dd.getMonth()+1) <10 ? "0"+(dd.getMonth()+1) : (dd.getMonth()+1);
let d = dd.getDate()<10 ? "0"+dd.getDate() : dd.getDate();
return y + "-" + m + "-" + d;
}
// Crawl and parse the fund's net unit value, growth rate, etc
fetchFundUrl(url, callback){
this.fetch(url, 'gb2312', (err, $)=>{
let fundData = [];
if(! err){let table = $('body').find("table");
let tbody = table.find("tbody");
try{
tbody.find("tr").each((i,trItem) = >{
let fundItem = {};
let tdArray = $(trItem).find("td").map((j, tdItem) = >{
return $(tdItem);
});
fundItem.date = tdArray[0].text(); // Net date
fundItem.unitNet = tdArray[1].text(); // Net unit value
fundItem.accumulatedNet = tdArray[2].text(); // Cumulative net value
fundItem.changePercent = tdArray[3].text(); // Daily growth rate
fundData.push(fundItem);
});
callback(err, fundData);
}catch(e){
console.log(e); callback(e, []); }}}); }// Get fund movement data for the selected date range based on the fund code
// Fund code, start date, end date, number of data, callback function
fetchFundData(code, sdate, edate, per=9999, callback){
let fundUrl = "http://fund.eastmoney.com/f10/F10DataApi.aspx?type=lsjz";
let date = new Date(a);let dateNow = new Date(a);// The default start time is 3 years prior to the current datesdate = sdate? sdate:this.getDateStr(new Date(date.setFullYear(date.getFullYear()- 3))); edate = edate? edate:this.getDateStr(dateNow);
fundUrl += ("&code="+code+"&sdate="+sdate+"&edate="+edate+"&per="+per);
console.log(fundUrl);
this.fetchFundUrl(fundUrl, callback);
}
Copy the code
Use as follows:
let fundSpider = new FundSpider();
fundSpider.fetchFundData('040008'.'2018-03-20'.'2018-05-04'.30, (err, data) => {
console.log(data);
});
Copy the code
The road of cultivation is thick and thin, and I have condensed all the insight I need about the kingdom of chicken spirit into three eternal gems:
// All fund code query interface
app.get('/fetchFundCodes', (req, res) => {
let fundSpider = new FundSpider();
res.header("Access-Control-Allow-Origin"."*");
fundSpider.fetchFundCodes((err, data) = >{
res.send(data.toString());
});
});
// Query fund file interface according to code
app.get('/fetchFundInfo/:code', (req, res) => {
let fundSpider = new FundSpider();
res.header("Access-Control-Allow-Origin"."*");
fundSpider.fetchFundInfo(req.params.code, (err, data) => {
res.send(JSON.stringify(data));
});
});
// Fund net value change data interface
app.get('/fetchFundData/:code/:per', (req, res) => {
let fundSpider = new FundSpider();
res.header("Access-Control-Allow-Origin"."*");
fundSpider.fetchFundData(req.params.code, undefined.undefined, req.params.per, (err, data) => {
res.send(JSON.stringify(data));
});
});
app.listen(1234, () = > {console.log("service start on port 1234");
});
Copy the code
A duel
I came to the chicken spirit of the country under the city, Node big sword just inlaid on the gem shining in the sun. I pointed my sword at the gate and shouted:
“It’s time for all of you, oh no, some of your chickens to die!
The Gaelic fortress is about to appear at the top of the castle. He catches a glimpse of the stone on my sword and says coldly:
“Well, all you get is cold statistics. If I put a hundred chickens in front of you, if I plucked them and gave you an hour, I don’t think you’d find what you’re looking for on those numbers.”
He opened the gate and left 100 chickens standing 10 meters away from me, looking unafraid.
The noisy crow made me a little fluster, just as he said, I looked at these almost a hair of the same chicken, forehead sweat began to drop, can lift the sword in the air but dare not fall.
“Passing by and seeing you in trouble, I give you a treasure to help you.”
Suddenly there is a thick voice around the sound, it is an old man, I half believe it, a surface very smooth silver slice, what? This is a data foil! You can punch mixed numbers into a 2-d foil on a 2-D chart! I was overjoyed to find such an artifact.
“May I have your name, please?”
“Icarus ~”
The sound does not disappear, but the people are far away.
I will two to foil carefully throw to the gate center, instant quiet as si, protect the city stunned eyes frozen in place, and other chicken spirit, like paper, flat spread on the wall.
// Fund data visualization (front-end code)
const React = require("react");
const Echarts = require("echarts");
const EcStat = require("echarts-stat");
const fetch = require("isomorphic-unfetch");
class FundChart extends React.Component{
constructor(props) {
super(props);
// Button toggle flag
this.state = {
switchIndex: 1}}// Get the fund file
fetchFundInfo(code, callback) {
return fetch(`http://localhost:1234/fetchFundInfo/${code}`).then((res) = > {
res.json().then((data) = > {
callback(data);
})
}).catch((err) = > {
console.log(err);
});
}
// Get fund net value change data
fetchFundData(code, per, callback) {
return fetch(`http://localhost:1234/fetchFundData/${code}/${per.toString()}`).then((res) = > {
res.text().then((data) = > {
callback(JSON.parse(data));
})
}).catch((err) = > {
console.log(err);
});
}
// Get the data drawn by ECharts
getChart(fundData) {
// Start point net value
let startUnitNet = parseFloat(fundData[0].unitNet);
// Calculate the relative percentage of net value at other time points to net value at the starting point
// The date is the abscissa and the net value is the ordinate
let data = fundData.map(function(item) {
return [item.date, parseFloat((100.0 * ((parseFloat(item.unitNet) - startUnitNet) / startUnitNet)).toFixed(2))]});// Select array subscript as abscissa and net ordinate as ordinate for scatter plot and regression analysis
let dataRegression = data.map(function(item, i) {
return [i, item[1]];
});
// Line graph abscissa array
let dateList = data.map(function(item) {
return item[0];
});
// Line graph ordinate array
let valueList = data.map(function(item) {
return item[1];
});
// Calculate linear regression
let myRegression = EcStat.regression('linear', dataRegression);
// Scatter sort of linear regression
myRegression.points.sort(function(a, b) {
return a[0] - b[0];
});
// Linear regression fitting equation y=Kx+B
let K = myRegression.parameter.gradient;
let B = myRegression.parameter.intercept;
let optionFold = {
title: [{
left: 'center',}],tooltip: {
trigger: 'axis'
},
xAxis: [{
data: dateList
}],
yAxis: [{
splitLine: {
show: false}}].series: [{
type: 'line'.showSymbol: false.data: valueList,
itemStyle: {
color: '#3385ff'}}};let optionRegression = {
title: {
subtext: 'linear regression'.left: 'center'
},
tooltip: {
trigger: 'axis'.axisPointer: {
type: 'cross'}},xAxis: {
type: 'value'.splitLine: {
lineStyle: {
type: 'dashed'}}},yAxis: {
type: 'value'.splitLine: {
lineStyle: {
type: 'dashed'}}},series: [{
name: 'scatter'.type: 'scatter'.itemStyle: {
color: '#3385ff'
},
label: {
emphasis: {
show: true.position: 'left'}},data: dataRegression
}, {
name: 'line'.type: 'line'.showSymbol: false.data: myRegression.points,
markPoint: {
itemStyle: {
normal: {
color: 'transparent'}},label: {
normal: {
show: true.position: 'left'.formatter: myRegression.expression,
textStyle: {
color: '# 333'.fontSize: 14}}},data: [{
coord: myRegression.points[myRegression.points.length - 1[}]}}]};return {
optionFold: optionFold,
optionRegression: optionRegression,
regression: myRegression,
K: K,
B: B
}
}
// Draw a diagram
drawChart(fundData, fundInfo) {
if (!this.chartFold) {
this.chartFold = Echarts.init(document.getElementById('chart_fold'));
}
if (!this.chartPoints) {
this.chartPoints = Echarts.init(document.getElementById('chart_points'));
}
if (fundData && (fundData.length > 0)) {
// Update the diagram drawing
let chartObj = this.getChart(fundData);
this.chartFold.setOption(chartObj.optionFold);
this.chartPoints.setOption(chartObj.optionRegression);
} else {
// Update the chart title
this.chartFold.setOption({
title: {
text: fundInfo.fundNameShort
}
});
this.chartPoints.setOption({
title: {
text: fundInfo.fundNameShort } }); }}// Time range button toggle
dateSwitch(index, per) {
this.setState({
switchIndex: index
}, () => {
this.fetchFundData(this.props.code, per, (data) => {
this.drawChart(data.reverse());
});
});
}
// Time range button
getSwitchBtns() {
let switchArray = [
['Last Week'.7],
['Last Month'.30],
['Last March'.90],
['Last half year'.180],
['Last Year'.365],
['Last 3 Years'.1095]].let switchIndex = this.state.switchIndex;
return (
<div>
{switchArray.map((item, i)=>{
let active = (i==switchIndex ? true : false);
let label = item[0];
let per = item[1];
return (<button className={"switch-btn"+ (active?" active":"")} onClick={this.dateSwitch.bind(this,i,per)}>{label}</button>)})}</div>
)
}
componentDidMount() {
// By default, fund data for the last month is loaded
this.fetchFundData(this.props.code, 30, (data) => {
this.drawChart(data.reverse());
});
// Fund title get
this.fetchFundInfo(this.props.code, (data) => {
console.log(data);
this.drawChart([], data);
});
}
render() {
return (
<div className="fundChart-container">
<div id="chartbox" className="chart-box">
<div className="chart-fold" id="chart_fold"></div>
<div className="chart-points" id="chart_points"></div>
</div>
<div className="switch-box">
{this.getSwitchBtns()}
</div>
</div>); }}Copy the code
“Buy low do not buy high, bottom to copy well!”
I shouted the formula, brandishing the sword, many chickens have been I cut into pieces, scattered in the air.
I see like a dragon, when the enemy is empty, I fight like the wind, with the sword into the palace.
I was finally stopped, the opposite is a member of the chicken spirit of the state of the fierce, a powerful and fierce, unexpectedly made me retreat.
I held my chest and tried to suppress the smell of blood that seemed to surge from my stomach:
“Dare to… May I ask your name?”
“I am the grand priest of the kingdom of chicken spirits, ancient fur coat!”
Ancient fur coat! In the legend has been covering the chicken spirit of the kingdom of the great priest ancient leather jacket! It is said that the king of the Kingdom of Jigujing exists in name only, and gupi jacket monopolizes power. He is a genius who holds the power of destiny, and a weathervane of the whole country!
“There are many things in the world that you can’t understand.”
“Said Ancient fur Coat contemptuously.
“You are not the first interloper to die at my hands, but out of compassion I have given you all a name in your memory.”
“What title… ?”
I was barely holding myself up, but the curiosity made me ask.
“Leek”
With those words, he swung his sickle and came close, and all I could see was a cold smile on his face before all my world went silent.
In the second article, I found myself making up a joke... Think the title should be "Legend of Leek"? In short, it is not easy to make up, please indicate the source, thank youCopy the code
----------- Indecisive dividing line ---------Copy the code
In view of the comments that xiaoxia likes source code, both hands on my immature github address: Could free https://github.com/youngdro/fundSpider, little xia poking a stamp on the way that buling buling stars ✨, later I slowly move the other stock to it (I even a fake programmers...).