What is an Http Client
Http protocol is the common language of the whole Internet, and Http Client can be said to be the most basic method we need to obtain data from the Internet world, it is essentially a URL to a web page conversion process. With basic Http client functionality, and the rules and policies we want, we can do everything from content retrieval to data analysis.
Workflow with 10 lines of C++ code to implement a high performance Http server, today we continue to give you a high performance Http client with C++ is also very simple!
// [http_client.cc]
#include "stdio.h"
#include "workflow/HttpMessage.h"
#include "workflow/WFTaskFactory.h"
int main (int argc, char *argv[])
{
const char *url = "https://github.com/sogou/workflow";
WFHttpTask *task = WFTaskFactory::create_http_task (url, 2.3,
[](WFHttpTask * task) {
fprintf(stderr, "%s %s %s\r\n",
task->get_resp() - >get_http_version(),
task->get_resp() - >get_status_code(),
task->get_resp() - >get_reason_phrase());
});
task->start(a);getchar(a);// press "Enter" to end.
return 0;
}
Copy the code
Once Workflow is installed, the above code can compile a simple HTTP_client with the following command:
g++ -o http_client http_client.cc --std=c++11 -lworkflow -lssl -lcrypto -lpthread
Copy the code
According to the Http protocol, we execute the executable./http_client and get the following:
HTTP / 1.1 200 OKCopy the code
Similarly, other Http headers and Http bodies can be returned via other apis, all in this WFHttpTask. Workflow is an asynchronous scheduling framework, so this task will not block the current thread, plus internal connection reuse, which fundamentally ensures the high performance of our Http Client.
Next, I will explain the principle in detail
Second, the request process
1. Create an Http task
As you can see in the demo above, the request is implemented by sending an Http asynchronous task from Workflow. The interface for creating the task is as follows:
WFHttpTask *create_http_task(const std::string& url,
int redirect_max, int retry_max,
http_callback_t callback);
Copy the code
The first parameter is the URL we want to request. Accordingly, in the original example, our redirect_max number is two and retry_max number of retries is three. The fourth argument is a callback function. In the example, we used a lambda. Since Workflow tasks are asynchronous, we are passively notified of the result.
using http_callback_t = std::function<void (WFHttpTask *)>;
Copy the code
2. Fill in the header and send it
Our network interaction is nothing more than a request-reply, corresponding to an Http Client, and after we create a task, we have some opportunity to process the request, in Http protocol, which is to fill in the header with protocol stuff, For example, we can specify a Connection that we want to establish a long Http Connection to save time for the next Connection, so we can set Connection to keep-alive. The following is an example:
protocol::HttpRequest *req = task->get_req(a); req->add_header_pair("Connection"."Keep-Alive");
task->start(a);Copy the code
Finally, we will set up the requested task through task->start(); Send out. In the original http_client.cc example, there is a getchar(); Statement, because our asynchronous task is issued non-blocking, the current thread will exit without temporarily stopping, and we want to wait until the callback comes back, so we can pause in various ways.
3. Process the returned result
A return result, according to the Http protocol, contains three parts: the message line, the header, and the body. If we wanted to get the body, we could do this:
const void *body;
size_t body_len;
task->get_resp() - >get_parsed_body(&body, &body_len);
Copy the code
Three, the basic guarantee of high performance
We use C++ to write the Http Client, and the best part is that we can take advantage of its high performance. How does Workflow guarantee high concurrency? There are two things:
- Pure asynchronous;
- Connection multiplexing;
The former is the reuse of thread resources, the latter is the reuse of connection resources, these framework levels are managed for the user, fully reduce the mental burden of the developer.
1. Asynchronous scheduling mode
The synchronous and asynchronous modes directly determine how concurrent our Http Client can be. Why is that? Here’s how the thread model looks when a synchronization framework initiates three Http tasks:
Network latency tends to be so high that if we are waiting synchronously for a task to come back, the thread will always be occupied. At this point we need to look at how the asynchronous framework is implemented:
As shown, as long as the issue after the task, the thread can do other things, we pass in a callback function to do asynchronous notification, so tasks such as network reply after, again let threads execute the callback function can get the result of the Http request, multiple tasks concurrently to go out during the period of time, the thread can be reuse, easily reach hundreds of thousands of QPS concurrency.
2. Connection multiplexing
As we mentioned earlier, once we have long connections, we can improve efficiency. Why is that? Because frameworks reuse connections. Let’s start by looking at what happens if a request creates a connection:
Obviously, holding up a large number of connections is a waste of system resources, and it can be time-consuming to do connect and close every time. In addition to the usual TCP handshake, many application layer protocols can be complicated to establish a connection. Workflow automatically looks for connections that are currently available for reuse when a task is sent, and creates them if there aren’t any. You don’t need to worry about the details of how the connection is being reused:
3. Unlock other functions
Of course, in addition to the above performance, a high-performance Http Client often has many other requirements, which can be shared with you as a practical situation:
- Combined with the serial-parallel task flow of Workflow, super-large scale parallel fetching is realized.
- Request the content of a site in order or at a specified speed to avoid being blocked.
- When the Http Client meets the redirect, it can automatically redirect me to the final result in one step.
- Hope that throughproxyThe proxy to access
HTTP
withHTTPS
Resources;
These Workflow requirements, which require the framework to be extremely flexible with the orchestration of Http tasks and to have very down-to-earth support for practical requirements such as redirect, SSL proxy, and so on, are already implemented.
The project address
Github.com/sogou/workf…
Welcome to Workflow and star support!