Translator: billyma
The original link
Even after all these years of working with REST apis, I couldn’t resist tweeting the title of this article when I first learned about GraphQL and the problem it was trying to solve.
Don’t get me wrong. I’m not saying that GraphQL will “kill” REST or anything like that. REST will probably never go away, just as XML never went away. I just think GraphQL is to REST what JSON is to XML.
This article is not actually 100% in favor of GraphQL. There will be a special section on GraphQL’s cost of flexibility, and more flexibility means more cost.
I like “always start with WHY,” so let’s get started.
Why do we need GraphQL?
The three most important problems GraphQL solves are:
-
Multiple round trips are required to get the data needed for a view: With GraphQL, you can get all the initial data needed for a view from a single round trip server at any time. To achieve the same functionality using REST apis, we need to introduce unstructured parameters and conditions that are difficult to manage and extend.
-
The client relies on the server: The client uses GraphQL as the request language: (1) eliminates the need for the server to hardcode the shape or size of the data, and (2) separates the client from the server. This means we can keep the client side and the server side separate and maintain and improve them separately.
-
Poor front-end development experience: With GraphQL, developers can declaratively express the data requirements of their user interfaces. They declare what data they want, not how to get it. There is a close connection between what data is required by the UI and how developers declare that data in GraphQL.
This article takes a closer look at how GraphQL solves all of these problems.
Before we get started, if you’re not already familiar with GraphQL, you can start with a simple definition.
What is GraphQL?
GraphQL is a language. If we embed GraphQL in a software application, the application can declaratively pass any required data to a back-end data service that also uses GraphQL.
Just as a child can quickly learn a new language – and adults are less able to do so – it’s easier to start GraphQL from scratch than to introduce GraphQL into a full-fledged application.
To enable a data service to use GraphQL, we need to implement a runtime layer and expose it to clients that want to communicate with the server. Think of this layer on the server side as a simple GraphQL language translator, or a GraphQL agent representing a data service. GraphQL is not a storage engine, so it is not a standalone solution. That’s why we can’t just have a GraphQL server, we also need to implement a translation runtime.
This abstraction layer, which can be written in any language, defines a common graph-based schema to publish the functionality of the data services it represents. Clients using GraphQL can query this schema through its functionality. This approach decouples the client from the server and allows both to be developed and extended independently.
GraphQL requests can be queries (read operations) or mutations (write operations). In both cases, the request is a simple string that the GraphQL service can interpret, execute, and parse using data in a specified format. The response format commonly used for mobile and Web applications is JSON.
What is GraphQL? (In Plain English)
GraphQL was created for data communication. You have a client and a server, and they need to communicate with each other. The client needs to tell the server what data is needed, and the server needs to satisfy the client’s data needs with the actual data. GraphQL is an intermediary for this communication.
Screenshots from my Pluralsight course – building extensible apis using GraphQL.
Why, you may ask, doesn’t the client communicate directly with the server? B: Sure.
There are several reasons to consider adding the GraphQL layer between the client and server. One of the reasons, perhaps the most popular, is efficiency. Clients typically request multiple resources from the server, and the server responds with a single resource. So client requests end up going back and forth to the server multiple times to collect all the data they need.
With GraphQL, we can basically transfer this complexity of multiple requests to the server side and handle it through the GraphQL layer. The client makes a single request to the GraphQL layer and gets a response that perfectly matches the client’s needs.
The introduction of the GraphQL layer has many benefits. One big benefit, for example, is the ability to communicate with multiple services. When you have multiple clients requesting data from multiple services, the intermediate GraphQL layer simplifies and standardizes the communication process. While this isn’t an important point to compare with REST apis – it’s easy to implement, the GraphQL runtime provides a structured and standardized approach.
Screenshots from my Pluralsight course – building extensible apis using GraphQL.
Instead of directly connecting to two different data services (as in the slide above), we can have the client communicate with the GraphQL layer. The GraphQL layer then communicates with two different data services. GraphQL first isolates the client from the need to communicate with multiple languages and converts a single request into multiple requests for multiple services in different languages.
Imagine three people who speak three different languages and have different types of knowledge. Then, the answer can only be answered by combining the knowledge of all three. If you have a translator who speaks all three languages, it’s easy to combine the answers to your questions. This is exactly what the GraphQL runtime does.
Computers are not smart enough to answer any questions (at least not yet), so they must follow established algorithms. That’s why we need to define a schema in the GraphQL runtime that can be used by clients.
This schema is basically a functional document that lists all the queries that clients can request from the GraphQL layer. Because we are using diagrams of nodes here, using schema provides some flexibility. This schema roughly represents the range that the GraphQL layer can respond to.
Not clear enough? We can say that GraphQL is essentially a successor to REST apis. So let me answer the question you’re most likely to ask.
What’s wrong with REST apis?
The biggest problem with REST apis is their multi-endpoint nature. This requires the client to make multiple round trips to get the data.
REST apis are typically a collection of endpoints, each of which represents a resource. Therefore, when a client needs to fetch data from multiple resources, it needs to make multiple round trips to the REST API to put the data it needs together.
In REST APIS, there is no client request language. The client has no control over the data returned by the server. No language can do that. Rather, there are very few languages available on the client side.
For example, a READ REST API endpoint might be
-
GET /ResouceName — gets a list of all records from the resource;
-
GET /ResourceName/ResourceID – Gets the single record identified by this ID.
For example, the client cannot specify which fields to select for records in the resource. This means that the REST API service will always return all fields, regardless of what the client actually needs. The term GraphQL defines for this problem is excessive retrieval of unwanted information. This is a waste of network and memory resources for both the client and the server.
Another big problem with REST apis is versioning. If you need to support multiple versions, that usually means you need new endpoints. Using and maintaining these endpoints can cause problems, and it can lead to code redundancy on the server.
The REST API issues mentioned above are exactly what GraphQL is trying to solve. They are certainly not all problems with REST apis, nor do I want to discuss what REST apis are. I’ll focus on the more popular resource-based HTTP endpoint apis. Each of these apis will eventually become a combination of regular REST endpoints plus custom special endpoints tailored for performance reasons. That’s why GraphQL offers a better choice.
How does GraphQL do this?
There are many concepts and design decisions behind GraphQL, but perhaps the most important are:
-
GraphQL Schema is strongly typed schema. To create a GraphQL Schema, we define fields with types. These types can be primitive or custom, and all other types in the schema require types. This rich type system brings rich features such as introspective apis and the ability to build powerful tools for both clients and servers.
-
GraphQL uses graphs to communicate with data, and data is naturally graphs. If you need to represent any data, the structure on the right is a graph. The GraphQL runtime allows us to represent our data using a graph API that matches the natural graph form of that data.
-
GraphQL is declarative for expressing data requirements. GraphQL provides a declarative language for clients to express their data requirements. This declarative nature creates an inherent model for using the GraphQL language that is close to the way we think about data requirements in English, and it makes using the GraphQL API much easier than the alternative (REST API).
This last concept explains why I personally think GraphQL is a rule changer.
These are high-level concepts. Let’s dig into some of the details.
To solve the problem of multiple round trips, GraphQL makes the response server only serve as an endpoint. Essentially, GraphQL takes the idea of custom endpoints to the extreme, making the entire server a custom endpoint that can respond to all data requests.
Related to the concept of a single endpoint is the rich client request language required to use this custom single endpoint. Without the client request language, a single endpoint is useless. It requires a language to handle custom requests and respond to the data for that custom request.
Having a client request language means that the client is in control. They can explicitly ask for what they want, and the server will respond correctly to what they ask for. This solves the overfetch problem.
For version control, GraphQL has an interesting approach. We can avoid version control altogether. Essentially, we can add new fields without removing old fields, because we have a graph, and we have the flexibility to expand the graph by adding more nodes. Therefore, we can leave the old API on the diagram and introduce the new API without marking it as a new version. The API will only grow, not version.
This is especially important for mobile clients because we have no control over the version of the API they are using. Once installed, mobile applications can continue to use the same old API for years. For the Web, it’s easy to control the version of the API because we just push the new code. With mobile apps, however, this is hard to do.
* Not completely convinced? * How about a one-to-one comparison between GraphQL and REST using an actual example?
RESTful APIs vs GraphQL APIs – example
Let’s say we’re developers responsible for building a brand new user interface that showcases the “Star Wars” movies and characters.
The first UI we were responsible for building was simple: displaying information about individual Star Wars characters. Darth Vader, for example, and all the movies he’s been in. This view needs to show the character’s name, year of birth, planet name, and the names of all the movies they’ve appeared in.
It’s as simple as that. We just have to deal with three different resources: characters, planets, and movies. The relationship between these resources is also simple, and anyone can guess the shape of the data here. A character object is subordinate to a planet object and has one or more movie objects.
The JSON data for this UI might look something like:
{
""data"": {
""person"": {
""name"": ""Darth Vader"".""birthYear"": ""41.9BBY"".""planet"": {
""name"": ""Tatooine""
},
""films"": [{""title"": ""A New Hope"" },
{ ""title"": ""The Empire Strikes Back"" },
{ ""title"": ""Return of the Jedi"" },
{ ""title"": ""Revenge of the Sith""}]}}}Copy the code
Suppose a data service gives us the exact structure of the data, here’s a way to use react.js to represent its view:
// Container components:
<PersonProfile person={data.person} ></PersonProfile>
Copy the code
// PersonProfile component:
Name: {person.name}
Birth Year: {person.birthYear}
Planet: {person.planet.name}
Films: {person.films.map(film= > film.title)}
Copy the code
This is a very simple example, and while our experience with Star Wars may be helpful, the relationship between UI and data is actually quite clear. The UI uses all the “keys” in our hypothetical JSON data object.
Now let’s look at how to request this data using RESTful apis.
We need to get information about a single person, and assuming we know the person’s ID, the RESTful API exposes this information as:
GET - /people/{id}
Copy the code
This request will return us the person’s name, year of birth, and other relevant information. A well-designed RESTful API also returns an array of planet ids and movie ids for the character.
The JSON response to this request might look something like this:
{
""name"": ""Darth Vader"".""birthYear"": ""41.9BBY"".""planetId"": 1.""filmIds"": [1.2.3.6], *** Other information we do not need for the time being ***}Copy the code
Then to get the name of the planet, we ask again:
GET - /planets/1
Copy the code
Then to get the movie name, we make a request:
GET - /films/1
GET - /films/2
GET - /films/3
GET - /films/6
Copy the code
Once we get all six responses from the server, we can combine them to fit the data we need for our view.
Aside from the fact that we had to make six round trips to meet the simple data requirements of a simple user interface, our approach to getting the data was imperative. We give instructions on how to get the data and how to process it to make it ready to render the view.
If you don’t know what I mean, try it yourself. Star Wars Data has a RESTful API, currently hosted by swapi.co/. We can try to use it to build our character data objects. The key of the data may be different, but the API endpoint is the same. You need to make six API calls. In addition, you’ll have to go overboard to get information that the view doesn’t need.
Of course, this is just one implementation of the RESTful API for this data. There might be a better implementation that makes this view easier to implement. For example, if the API server implements resource nesting and indicates relationships between characters and movies, we can read movie data in the following way:
GET - /people/{id}/films
Copy the code
However, a purely RESTful API server would probably not be implemented like this, and we would need to have our back-end engineers create this custom endpoint for us in addition. This is the reality of extending RESTful apis — we have to add custom endpoints to effectively meet growing client requirements. However, managing custom endpoints like this can be difficult.
Now take a look at how GraphQL is implemented. The server-side GraphQL incorporates the idea of custom endpoints and takes it to the extreme. The server will be a single endpoint and the channel will not matter. If we were doing this over HTTP, the HTTP method certainly wouldn’t matter. Suppose we have a single GraphQL endpoint exposed to/GraphQL via HTTP.
Since we want to request the data we need in a single round trip, we need a way to express our need for complete data on the server side. We use the GraphQL query to do this:
GET or POST - /graphql? query={... }Copy the code
A GraphQL query is just a string, but it must contain all the data we need. That’s the benefit of being declarative.
How do we declare our data requirements in English: We need a character’s name, year of birth, planet name and all movie titles. In GraphQL, this is converted to:
{
person(ID: ...) {
name,
birthYear,
planet {
name
},
films {
title
}
}
}
Copy the code
Read the requirements in English again and compare them to the GraphQL query. They’re very similar. Now compare this GraphQL query to the raw JSON data we started with. You’ll see that the GraphQL query is the exact structure of the JSON data, except without all the “value” parts. If we think of it in terms of question and answer relationships, then a question is an answer statement with no answer.
If the answer is:
The planet nearest to the sun is Mercury.
A good way to put this question is to make the same statement without an answer:
The planet closest to the sun?
The same relationship applies to GraphQL queries. Take the JSON response, remove all the “answer” parts (the values for the keys), and you end up with a GraphQL query that is perfectly suited to represent the question about the JSON response.
Now compare the GraphQL query to the declarative React UI we defined for the data. Everything in the GraphQL query is used in the UI, and everything in the UI is displayed in the GraphQL query.
That’s the great thing about GraphQL’s design philosophy. The UI knows exactly what data it needs, and it is fairly easy to extract the data it requires. Designing a GraphQL query simply extracts the data used as variables directly from the UI.
If we reverse the pattern, it works as well. If we have a GraphQL query, we know exactly how to use its response in the UI, because the query has the same “structure” as the response. We don’t need to examine the response to know how to use it, and we don’t need any documentation about the API. It’s all built in.
Star Wars Data has a GraphQL API hosted at github.com/graphql/swa… . We can try to use it to build our character data objects. There may be minor changes to the API we’ll explore later, but here’s how you can use it to see our formal query for view data requests (in Darth Vader’s case) :
{
person(personID: 4) {
name,
birthYear,
homeworld {
name
},
filmConnection {
films {
title
}
}
}
}
Copy the code
This request defines a response structure that is very close to the view, and remember, we got all this data in one round trip.
The price of GraphQL flexibility
Perfect solutions don’t really exist. Because GraphQL is so flexible, there are definite questions and concerns.
One important threat GraphQL is vulnerable to is resource exhaustion attacks (also known as denial-of-service attacks). The GraphQL server can be attacked by super-complex queries that exhaust the server’s resources. Query deep nesting relationships (user -> friends -> friends…) , or it is easy to query the same field multiple times using a field alias. Resource exhaustion attacks are not graphQL-specific scenarios, but you have to be careful when using GraphQL.
We can do some mitigation here. For example, we can do a cost analysis of queries ahead of time and impose some kind of limit on the amount of data that can be used. We can also set a timeout to terminate requests that take too long to resolve. Also, since GraphQL is only a parsing layer, we can limit the processing rate at the lower level under GraphQL.
If the GraphQL API endpoint we are trying to protect is not publicly available, but for internal use by our own clients (network or mobile), then we can use the whitelist approach and pre-approve the queries that the server can execute. A client can ask the server to execute only queries that are pre-approved using the query’s unique identifier. Facebook is said to use this approach.
Authentication and authorization are other considerations when using GraphQL. Do we process GraphQL parsing before, after, or in between?
To answer this question, you can think of GraphQL as a DSL (Domain Specific Language) on top of your own back-end data retrieval logic. We simply think of it as an intermediate layer that can be placed between the client and our actual data service (or services).
Then consider authentication and authorization as another layer. GraphQL is not useful in the implementation of actual authentication or authorization logic, because that’s not what it’s all about. However, if we want to place these layers behind GraphQL, we can use GraphQL to pass access tokens between the client and the strong logic. This is very similar to how we do authentication and authorization through RESTful apis.
Another more challenging task for GraphQL is client-side data caching. RESTful apis are easier to cache due to their dictionary nature. A particular address identifies a particular data. We can use the address itself as the cache key.
With GraphQL, we can take a similar basic approach, using the query text as a key to cache its response. However, this approach has many limitations, is not very efficient, and may lead to data consistency problems. The results of multiple GraphQL queries can easily overlap, and this basic caching approach doesn’t solve the problem.
A neat solution to this problem is to use graph queries to represent graph caches. If we formalize the GraphQL query response as a flat collection of records, giving each record a globally unique ID, we can cache these records instead of caching the full response.
However, this is not an easy process. The records will reference each other, and we will manage the cycle diagram in it. Operating and reading the cache requires traversing the query. Although we need to write an intermediate layer to handle the caching logic, this approach is generally more efficient than response-based caching. Relay. Js is a framework that adopts this caching strategy and implements automatic management internally.
Perhaps the most important issue we should be concerned about with GraphQL is what is commonly referred to as an N+1 SQL query. GraphQL query fields are designed to be standalone functions, and parsing these fields using data from the database can result in new database requests for parsed fields.
For simple RESTful API endpoint logic, you can enhance structured SQL queries to analyze, detect, and solve N+1 problems. For GraphQL dynamically parsed fields, it’s not so simple. Fortunately, Facebook pioneered a viable solution: DataLoader.
As the name suggests, DataLoader is a utility that reads data from a database and makes it available for GraphQL parsing functions. Instead of using SQL queries directly, we can use DataLoader to read data from the database, and DataLoader will act as our proxy to reduce the actual SQL queries we send to the database.
DataLoader works by using a combination of batch processing and caching. If the same client request results in multiple requests to the database, the DataLoader can be used to combine these requests and load their responses in batches from the database. The DataLoader will also cache the response so that it can be used for subsequent requests for the same resource.
Thanks for reading!