First article: Today’s headlines
https://www.toutiao.com/i6805073960447771144/

When we were studying NodeJS, I thought it would be easier to start with a crawler. After all, when we are doing the project, there is no data and it is not convenient to analyze.

I have applied the framework of Egg here. If you don’t know egg, you can check the official document, which is not introduced here.
Let’s start with the CURL request:
Curl is a common command-line tool used to request Web servers. Its name stands for client’s URL utility.

It is very powerful, command line arguments up to dozens of. You can replace a graphical interface tool like Postman if you’re skilled.

Curl curl curl curl curl curl curl curl curl curl curl curl curl curl curl curl curl

curl https://www.example.comCopy the code

Egg get request:

this.ctx.curl(url, option)Copy the code

Url: of course, the request address
option:
method
Request method, which defaults to GET. It can be GET, POST, DELETE or PUT
data
Data to be sent. Will be stringed automatically
dataType
String – The type of response data. It could be text or JSON
headers
The request header
timeout
The request timeout
auth
Username :password is used in basic HTTP authorization
followRedirect
Follow the HTTP 3XX response as a redirect. The default is false
gzip
Lets you get the RES object when you request a connection. The default is false
nestedQuerystring
Urllib defaults to using QueryString to string form data that does not support nested objects. By setting this option to true, QS instead of QueryString supports nested objects
If the result of the request returns JSON data, you need to specify the data type

this.ctx.curl('https://www.example.com', {dataType: 'json'})Copy the code

Then a complete GET/POST request is

this.ctx.curl('https://www.example.com', {
  method: 'GET/POST',
  dataType: 'json',
  headers: {
       token: 'xxx'  
  },
  data: {
    id: 1
  }
  ...
})Copy the code

It is worth noting that the data we are requesting is the entire network body, and the data we really need is the web page body in data.
Curl curl curl curl curl curl curl curl curl curl curl curl curl