Lately my friends have been complaining about where the data comes from when they’re writing projects, and it’s true that data interface data resources are always Mock for a front-end. See a lot of big god python online, Node play fly. However, I feel that there is no good process scheme to walk into our development process. To help my friend and you who need the data, take a closer look at the whole process. Since I’m also a front end, I know what people need and how to deal with it. Then follow me to study together!

preface

Learning is endless, I hope you can follow my ideas simple realization, and the circle, as retreat webs. In this article, I will explain in detail the operation and details of each step, nodeJS some common apis, as well as koA2 simple syntax, you can also start your KOA2 learning from this article, really good use of a Web framework. In addition, the paper will also explain the scheme and concrete implementation of data cross-domain request, and finally the data formatting and basic request. The top three come on and put on a good show.

Technology stack

  • Http. request: The request method of the Node HTTP module can be used as an HttpClient to initiate HTTP requests to the server. Crawlers need to initiate HTTP requests to the target link to obtain page information
  • Cheerio: The page information requested through HTTP looks like a messy string due to the lack of browser DOM parsing. Unfortunately, we can use the Cheerio library to parse it into the DOM, so that we can use jquery syntax to analyze the page information
  • Koa2-static: The KOA-static resource middleware, which can access the static resources in our project
  • Koa2-cors: Implements cross-domain Ajax requests for data. The key to this approach is configuration on the server side
  • Axios + Promise: Due to the single-threaded nature of Node, it is inevitable to use a lot of asynchronous programming. The layer upon layer of nested callback writing is already low. Let’s try the promise writing

The specific implementation

I. Environment construction

Create a new folder and, once inside, initialize the production package.json file

npm init -y

After generating package.json, install the KOA package, which we use NPM to install

npm install –save koa

Other dependencies are installed in the same way as above, so they are not expanded here, but written together

npm install --save koa-static
npm install --save koa2-cors 
Copy the code

NodeJS

Warm up and build a crawly foundation

Before singing opera must be well rehearsed, there should be a script, everyone should clearly know their identity and appearance time. So every time you go on stage, you need to rehearse and warm up. That’s what makes for a good show. Let’s do the same. Let’s warm up with code. Create a new demo01.js file in our folder and type the following code

var http = require('http'Var url = // Node.js provides HTTP module, used to build HTTP server and client var url ='http://www.runoob.com/nodejs/nodejs-tutorial.html'; // Enter any url HTTP. Get (url,function(res){// Send get request var HTML =' '
res.on('data'.function(data){HTML += data // string concatenation}) res.on('end'.function(){
console.log(html)
})
}).on('error'.function(){
console.log('Error getting resource! ')})Copy the code

Open the terminal, execute the command Node demo01.js, and you will see the HTML structure of the page, which is the first sound of our drama.

Let’s start our show

On this page we can get all the HTML, which means we can find the resources we need in this HTML. Nodejs provides a very fast and convenient Cheerio API for this purpose. The introduction section has introduced its functions, here is a direct demonstration of how to operate. Bring in our Cheerio

const cheerio = require(‘cheerio’)

After the reference, we wrapped it up to make it more like jquery, which is nice because it’s very easy to manipulate the DOM

var $ = cheerio.load(html)

The next step is to go to our HTML to find the resources we need, everyone’s needs are different, here is the case, to obtain video resources on iMOOc. To make our body (mentioned earlier in the warm-up) readable, we wrap this part into a function that takes HTML as an argument.

function filterChapters(html) {
    var $ = cheerio.load(html)
    var chapters = $('.course-wrap'Class var courseData = [] // Create an array to hold our resource Chapters. Each (chapters)functionVar chapter = $(this) var chapterTitle = chapter.find($(this) var chapterTitle = chapter.find('h3').text().replace(/\s/g, "") 
        var videos = chapter.find('.video').children('li'Var chapterData = {chapterTitle: chapterTitle, videos: []} videos. Each () {chapterTitle: chapterTitle, videos: []} videos.functionVar video = $(this).find() var video = $(this).find()'.J-media-item'Var videoTitle = video.text().replace(/\n/g,"").replace(/\s/g, "");
            var id = video.attr('href').split('video/') [1]; / / cutting our href to our id var url = ` http://www.imooc.com/video/${id}'// es6 string template way to get our video url chapterData.videos. Push ({title:videoTitle, id: id, url: url }) }) courseData.push(chapterData) })returnCourseData // Return the resources we need}Copy the code

If we don’t remove it, the json format will be wrong, and it will contain \n,\t, etc. This is not the format and data we need, so we should remove the \n,\t, etc. Use the re and replace APIS.

var videoTitle = video.text().replace(/\n/g, “”).replace(/\s/g, “”);

Finish work

Once we get the resource we need, it’s not going to be a JSON object, so we need to process it again,

 var courseData = filterChapters(html)
        let content = courseData.map((o)=>{
            returnThe json.stringify (o) // json.stringify () method is used to convert JavaScript values to JSON strings. })Copy the code

Once we have the resource we really want, the next step is to save it. Create an index.json file to store our resources. Fs is the most commonly used API in Node. It contains a lot of operations that we need, such as read, write and download. Those of you who are interested can look at the document FS. We introduced FS to write the data that crawled down to our index.json folder

 fs.writeFile('./index.json',content, function(err){// File path, what was written, callback functionif(err) throw new Error ('Write file failed'+err);
            console.log("Write file successfully")})Copy the code

Now that we’re done, let’s go and look at our results. Open up the index.json file and we can see the data we’ve captured

Is it the data we need?!! Sneaky joy. Nodejs is a great actor!

Second, KoA2

Koa is the next generation web development framework based on node.js platform. It’s small, but it’s very scalable. Koa has a clean, clean feel, small size and clean programming. Why should I use it? Nodejs will do the same for me. It is possible to create a service using creatServer, but as a programmer you should be able to absorb new knowledge, especially popular ones, in order to stay relevant! Its KOA2 is really simple, compared to Node. Koa has already been imported, but I will directly explain how to use it here. For those of you who don’t know, I think you can look at the KOA website to understand the basic usage.

Thought 1: After we get the resource we need, how can we put it up on the Internet and request it

  • With easyMock, you just take a copy of the data that you crawl down and throw it directly into the Mock and it will create a URL for you to access.
  • Koa2 starts a service, mounts our data, accesses the port number

For a mock that uses a quick vomit, I can’t stand crawling data onto a mock. So we started our koA2 journey.

const app = new Koa() 
const staticPath = './static'Use (static(path.join(__dirname, staticPath) //// set the address of the static file. ) app.use(async (CTX) => {// Print hello world on our page. We access our static resources by adding /index.json ctx.body = to the address bar'hello world'}) app.listen(3000, () => {// Start a 3000 port console.log()'[demo] static-use-middleware is starting at port 3000')})Copy the code

Thought two: happily went to my port to request data and found

Self-understanding of KOA2-Cors:

CORS divides requests into simple and non-simple requests. Simple requests are get and POST requests without additional request headers, and if they are POST requests, The request format should not be Application/JSON (because I don’t understand this very well and would like someone to point out any errors and suggest changes). The rest, put, POST, Application/JSON content-type, and custom headers, are non-simple requests. Simple request configuration is very simple, if just complete the response to achieve the purpose, only need to configure the response header access-Control-Allow-Origin

To solve the problem
app.use(cors({
  origin: function(ctx) {
    if (ctx.url === '/index') {
      return false;
    }
    return The '*';
  },
Copy the code

Three, Axios

Since this article focuses on crawlers and I’ve been working on a Vue project to demonstrate basic axios requests, you can learn more about Axios on Axios Github.

  methods: {
     getdata () {
       axios.get('http://localhost:3000/index.js',{// Access the port we created dataType:'json',
           contentType:"application/json", 
           crossDomain: true,
        })
         .then(function(response){
           console.log(response.data);
         })
         .catch(function(err){ console.log(err); }); }},mounted() {this.getData () // Async /awiter can be used to make your request more elegant. Mostly just being lazy... }Copy the code

After requesting the data, we printed out our data on the console

Conclusion:

The power of the three together is just too strong. Let us see a wonderful show. If you see this, you want to know. Why don’t you do it yourself? My project is on github and you can clone it. In fact, it won’t take much time. Can solve your future data source worries, why not? You can also follow the idea of the whole article to achieve their own, looking forward to your better works to share with me. Feel free to comment and leave a comment below. I am a junior in college and am currently looking for an internship company. Hope to have a recommendation can also be introduced. In the next article I will introduce my vUE game. You can also follow me and study with me. Joy and sharing reap friendship.