Lately my friends have been complaining about where the data comes from when they’re writing projects, and it’s true that data interface data resources are always Mock for a front-end. See a lot of big god python online, Node play fly. However, I feel that there is no good process scheme to walk into our development process. To help my friend and you who need the data, take a closer look at the whole process. Since I’m also a front end, I know what people need and how to deal with it. Then follow me to study together!
preface
Learning is endless, I hope you can follow my ideas simple realization, and the circle, as retreat webs. In this article, I will explain in detail the operation and details of each step, nodeJS some common apis, as well as koA2 simple syntax, you can also start your KOA2 learning from this article, really good use of a Web framework. In addition, the paper will also explain the scheme and concrete implementation of data cross-domain request, and finally the data formatting and basic request. The top three come on and put on a good show.
Technology stack
- Http. request: The request method of the Node HTTP module can be used as an HttpClient to initiate HTTP requests to the server. Crawlers need to initiate HTTP requests to the target link to obtain page information
- Cheerio: The page information requested through HTTP looks like a messy string due to the lack of browser DOM parsing. Unfortunately, we can use the Cheerio library to parse it into the DOM, so that we can use jquery syntax to analyze the page information
- Koa2-static: The KOA-static resource middleware, which can access the static resources in our project
- Koa2-cors: Implements cross-domain Ajax requests for data. The key to this approach is configuration on the server side
- Axios + Promise: Due to the single-threaded nature of Node, it is inevitable to use a lot of asynchronous programming. The layer upon layer of nested callback writing is already low. Let’s try the promise writing
The specific implementation
I. Environment construction
Create a new folder and, once inside, initialize the production package.json file
npm init -y
After generating package.json, install the KOA package, which we use NPM to install
npm install –save koa
Other dependencies are installed in the same way as above, so they are not expanded here, but written together
npm install --save koa-static
npm install --save koa2-cors
Copy the code
NodeJS
Warm up and build a crawly foundation
Before singing opera must be well rehearsed, there should be a script, everyone should clearly know their identity and appearance time. So every time you go on stage, you need to rehearse and warm up. That’s what makes for a good show. Let’s do the same. Let’s warm up with code. Create a new demo01.js file in our folder and type the following code
var http = require('http'Var url = // Node.js provides HTTP module, used to build HTTP server and client var url ='http://www.runoob.com/nodejs/nodejs-tutorial.html'; // Enter any url HTTP. Get (url,function(res){// Send get request var HTML =' '
res.on('data'.function(data){HTML += data // string concatenation}) res.on('end'.function(){
console.log(html)
})
}).on('error'.function(){
console.log('Error getting resource! ')})Copy the code
Open the terminal, execute the command Node demo01.js, and you will see the HTML structure of the page, which is the first sound of our drama.
Let’s start our show
On this page we can get all the HTML, which means we can find the resources we need in this HTML. Nodejs provides a very fast and convenient Cheerio API for this purpose. The introduction section has introduced its functions, here is a direct demonstration of how to operate. Bring in our Cheerio
const cheerio = require(‘cheerio’)
After the reference, we wrapped it up to make it more like jquery, which is nice because it’s very easy to manipulate the DOM
var $ = cheerio.load(html)
The next step is to go to our HTML to find the resources we need, everyone’s needs are different, here is the case, to obtain video resources on iMOOc. To make our body (mentioned earlier in the warm-up) readable, we wrap this part into a function that takes HTML as an argument.
function filterChapters(html) {
var $ = cheerio.load(html)
var chapters = $('.course-wrap'Class var courseData = [] // Create an array to hold our resource Chapters. Each (chapters)functionVar chapter = $(this) var chapterTitle = chapter.find($(this) var chapterTitle = chapter.find('h3').text().replace(/\s/g, "")
var videos = chapter.find('.video').children('li'Var chapterData = {chapterTitle: chapterTitle, videos: []} videos. Each () {chapterTitle: chapterTitle, videos: []} videos.functionVar video = $(this).find() var video = $(this).find()'.J-media-item'Var videoTitle = video.text().replace(/\n/g,"").replace(/\s/g, "");
var id = video.attr('href').split('video/') [1]; / / cutting our href to our id var url = ` http://www.imooc.com/video/${id}'// es6 string template way to get our video url chapterData.videos. Push ({title:videoTitle, id: id, url: url }) }) courseData.push(chapterData) })returnCourseData // Return the resources we need}Copy the code
If we don’t remove it, the json format will be wrong, and it will contain \n,\t, etc. This is not the format and data we need, so we should remove the \n,\t, etc. Use the re and replace APIS.
var videoTitle = video.text().replace(/\n/g, “”).replace(/\s/g, “”);
Finish work
Once we get the resource we need, it’s not going to be a JSON object, so we need to process it again,
var courseData = filterChapters(html)
let content = courseData.map((o)=>{
returnThe json.stringify (o) // json.stringify () method is used to convert JavaScript values to JSON strings. })Copy the code
Once we have the resource we really want, the next step is to save it. Create an index.json file to store our resources. Fs is the most commonly used API in Node. It contains a lot of operations that we need, such as read, write and download. Those of you who are interested can look at the document FS. We introduced FS to write the data that crawled down to our index.json folder
fs.writeFile('./index.json',content, function(err){// File path, what was written, callback functionif(err) throw new Error ('Write file failed'+err);
console.log("Write file successfully")})Copy the code
Now that we’re done, let’s go and look at our results. Open up the index.json file and we can see the data we’ve captured
Is it the data we need?!! Sneaky joy. Nodejs is a great actor!
Second, KoA2
Koa is the next generation web development framework based on node.js platform. It’s small, but it’s very scalable. Koa has a clean, clean feel, small size and clean programming. Why should I use it? Nodejs will do the same for me. It is possible to create a service using creatServer, but as a programmer you should be able to absorb new knowledge, especially popular ones, in order to stay relevant! Its KOA2 is really simple, compared to Node. Koa has already been imported, but I will directly explain how to use it here. For those of you who don’t know, I think you can look at the KOA website to understand the basic usage.
Thought 1: After we get the resource we need, how can we put it up on the Internet and request it
- With easyMock, you just take a copy of the data that you crawl down and throw it directly into the Mock and it will create a URL for you to access.
- Koa2 starts a service, mounts our data, accesses the port number
For a mock that uses a quick vomit, I can’t stand crawling data onto a mock. So we started our koA2 journey.
const app = new Koa()
const staticPath = './static'Use (static(path.join(__dirname, staticPath) //// set the address of the static file. ) app.use(async (CTX) => {// Print hello world on our page. We access our static resources by adding /index.json ctx.body = to the address bar'hello world'}) app.listen(3000, () => {// Start a 3000 port console.log()'[demo] static-use-middleware is starting at port 3000')})Copy the code
Thought two: happily went to my port to request data and found
Self-understanding of KOA2-Cors:
CORS divides requests into simple and non-simple requests. Simple requests are get and POST requests without additional request headers, and if they are POST requests, The request format should not be Application/JSON (because I don’t understand this very well and would like someone to point out any errors and suggest changes). The rest, put, POST, Application/JSON content-type, and custom headers, are non-simple requests. Simple request configuration is very simple, if just complete the response to achieve the purpose, only need to configure the response header access-Control-Allow-Origin
To solve the problem
app.use(cors({
origin: function(ctx) {
if (ctx.url === '/index') {
return false;
}
return The '*';
},
Copy the code
Three, Axios
Since this article focuses on crawlers and I’ve been working on a Vue project to demonstrate basic axios requests, you can learn more about Axios on Axios Github.
methods: {
getdata () {
axios.get('http://localhost:3000/index.js',{// Access the port we created dataType:'json',
contentType:"application/json",
crossDomain: true,
})
.then(function(response){
console.log(response.data);
})
.catch(function(err){ console.log(err); }); }},mounted() {this.getData () // Async /awiter can be used to make your request more elegant. Mostly just being lazy... }Copy the code
After requesting the data, we printed out our data on the console
Conclusion:
The power of the three together is just too strong. Let us see a wonderful show. If you see this, you want to know. Why don’t you do it yourself? My project is on github and you can clone it. In fact, it won’t take much time. Can solve your future data source worries, why not? You can also follow the idea of the whole article to achieve their own, looking forward to your better works to share with me. Feel free to comment and leave a comment below. I am a junior in college and am currently looking for an internship company. Hope to have a recommendation can also be introduced. In the next article I will introduce my vUE game. You can also follow me and study with me. Joy and sharing reap friendship.