First article: Today’s headlines
https://www.toutiao.com/i6805073960447771144/
https://www.toutiao.com/i6805073960447771144/
When we were studying NodeJS, I thought it would be easier to start with a crawler. After all, when we are doing the project, there is no data and it is not convenient to analyze.
I have applied the framework of Egg here. If you don’t know egg, you can check the official document, which is not introduced here.
Let’s start with the CURL request:
Curl is a common command-line tool used to request Web servers. Its name stands for client’s URL utility.
It is very powerful, command line arguments up to dozens of. You can replace a graphical interface tool like Postman if you’re skilled.
Curl curl curl curl curl curl curl curl curl curl curl curl curl curl curl curl curl
curl https://www.example.comCopy the code
Egg get request:
this.ctx.curl(url, option)Copy the code
Url: of course, the request address
option:
method
|
Request method, which defaults to GET. It can be GET, POST, DELETE or PUT
|
data
|
Data to be sent. Will be stringed automatically
|
dataType
|
String – The type of response data. It could be text or JSON
|
headers
|
The request header
|
timeout
|
The request timeout
|
auth
|
Username :password is used in basic HTTP authorization
|
followRedirect
|
Follow the HTTP 3XX response as a redirect. The default is false
|
gzip
|
Lets you get the RES object when you request a connection. The default is false
|
nestedQuerystring
|
Urllib defaults to using QueryString to string form data that does not support nested objects. By setting this option to true, QS instead of QueryString supports nested objects
|
If the result of the request returns JSON data, you need to specify the data type
this.ctx.curl('https://www.example.com', {dataType: 'json'})Copy the code
Then a complete GET/POST request is
this.ctx.curl('https://www.example.com', {
method: 'GET/POST',
dataType: 'json',
headers: {
token: 'xxx'
},
data: {
id: 1
}
...
})Copy the code
It is worth noting that the data we are requesting is the entire network body, and the data we really need is the web page body in data.
Curl curl curl curl curl curl curl curl curl curl curl curl curl