At the end of this summary, the GraphQL problem was mentioned.
We did not choose GraphQL when we landed the API scheme of a company before, because:
- GraphQL has a learning cost for data users
- The stability of GraphQL is difficult to do and difficult to limit the flow
The cost of learning isn’t a huge problem. Programmers instinctively like to learn new things, which can give them a false sense of acquisition.
The point is that it’s really hard to limit the flow. Opening up the GraphQL API is just like opening up an SQL interface in MySQL, where you can only look up one data piece at a time, or you can look up 100 million data pieces at a time.
All queries are based on primary keys, so a single MySQL instance can produce millions of QPS. If a query looks at 100 million entries, the query will explode the CPU/ memory of the MySQL instance.
A similar situation in GraphQL looks like this:
Query maliciousQuery {album(ID: "some-id") {photos(first: 9999) {album {photos(first: 9999) {album {photos(first: 9999) {album {photos(first: 9999) 9999) { album { #... Repeat this 10000 times... } } } } } } } }Copy the code
This example comes from here. Nested queries can result in unpredictable costs for queries.
The complexity of different queries is completely different, but it is not appropriate to limit the number of queries by traditional REST style apis.
Shopify published an article in June on Rate Limiting GraphQL APIs by Plotting Query Complexity, where they explained the flow Limiting strategies they used with GraphQL.
The following content is mainly the translation of this article.
Most of you are already aware of the limitations of RESTFUL apis. There are two main limitations:
- REST assumes that every request from the client is the same cost, even if there is a lot of data in the API response that they don’t need
- POST, PUT, PATCH, and DELETE requests produce more side effects and consume more server resources than GET requests, but they are treated equally in REST.
Under the monolithic REST style API, clients of all types can only accept fields in a response that they don’t need. And update and delete operations put more load on the service, but they are calculated at the same resource consumption in the request-based response model.
GraphQL mainly solves the problem of dynamic fields, but the cost of GraphQL varies with different requests. So Shopify’s scheme does a static analysis of the GraphQL request to calculate the cost of the request before executing it. This article mainly introduces their calculation methods.
Object:
Object is the basic unit of query. It is usually a single server operation, which can be a database query or an internal service access.
Scalars and Enums: zero
Scalars and enumerations are part of Object itself, and we have already calculated the consumptions in Object. Scalar and enum in this case are actually fields in Object. Returning more fields on an object is less costly.
query {
shop { # Object - 1 point
id # ID - 0 points
name # String - 0 points
timezoneOffsetMinutes # Int - 0 points
customerAccounts # Enum - 0 points
}
}
Copy the code
In this example, shop is an object and costs 1 point. Id, name, timezoneOffsetMinutes, and customerAccounts are all scalar types costing 0 points. The query cost is 1.
Connections: two points + number of objects returned
GraphQL’s Connection represents a one-to-many relationship. Shopify uses Relay compatible connections, which means that connections follow common specifications, such as mixing edges, nodes, cursors, and pageInfo.
The edges object contains fields that describe a one-to-many relationship:
- Node: query Returns a list of objects
- Cursor: Current cursor position in a list
PageInfo has hasPreviousPage and hasNextPage Boolean fields for navigating through the list.
The consumption of connection is considered two points + the number of objects to be returned. In this example, a connection expects to return five objects, so it consumes seven:
query {
orders(first: 5, query: "fulfillment_status:shipped") {
edges {
node {
id
name
displayFulfillmentStatus
}
}
}
}
Copy the code
Cursor and pageInfo do not need to calculate the cost, because their costs have been calculated when doing the return object calculation.
The following example also consumes seven points as before:
query {
orders(first:5, query:"fulfillment_status:shipped") {
edges {
cursor
node {
id
name
displayFulfillmentStatus
}
}
pageInfo {
hasPreviousPage
hasNextPage
}
}
}
Copy the code
Interfaces and Unions: one
Interfaces are similar to objects except that they return different types of objects.
Mutations: very
Mutations refer to a request that has a side effect, meaning that it affects data or indexes in a database and could even trigger webhook and email notifications. This request consumes more resources than a normal query request, so it counts as 10 points.
Get Query Cost information in the GraphQL response
You do not need to calculate query costs yourself. Shopify has designed API responses to include the cost of object consumption directly in the response content. You can run queries in their Shopify Admin API GraphiQL Explorer to see the corresponding query costs in real time.
query {
shop {
id
name
timezoneOffsetMinutes
customerAccounts
}
}
Copy the code
The calculated cost will be displayed in the Extention object:
{ "data": { "shop": { "id": "gid://shopify/Shop/91615055400", "name": "My Shop", "timezoneOffsetMinutes": -420, "customerAccounts": "DISABLED" } }, "extensions": { "cost": { "requestedQueryCost": 1, "actualQueryCost": 1, "throttleStatus": {"maximumAvailable": 1000.0, "currentlyAvailable": 999, "restoreRate": 50.0}}}}Copy the code
Query Cost details are returned
Add an x-graphqL-cost-include-fields to the request: A true header will allow the Extention Object to display more detailed points:
{ "data": { "shop": { "id": "gid://shopify/Shop/91615055400", "name": "My Shop", "timezoneOffsetMinutes": -420, "customerAccounts": "DISABLED" } }, "extensions": { "cost": { "requestedQueryCost": 1, "actualQueryCost": 1, "throttleStatus": {"maximumAvailable": 1000.0, "currentlyAvailable": 999, "restoreRate": 50.0}, "Fields ": [ { "path": [ "shop", "id" ], "definedCost": 0, "requestedTotalCost": 0, "requestedChildrenCost": null }, { "path": [ "shop", "name" ], "definedCost": 0, "requestedTotalCost": 0, "requestedChildrenCost": null }, { "path": [ "shop", "timezoneOffsetMinutes" ], "definedCost": 0, "requestedTotalCost": 0, "requestedChildrenCost": null }, { "path": [ "shop", "customerAccounts" ], "definedCost": 0, "requestedTotalCost": 0, "requestedChildrenCost": null }, { "path": [ "shop" ], "definedCost": 1, "requestedTotalCost": 1, "requestedChildrenCost": 0 } ] } } }Copy the code
Understand request consumption and actual query consumption
Note that the cost field is different and similar in the result above:
- Request consumption is the value obtained by statically analyzing GraphQL before executing the query
- The actual query cost is the value obtained by executing the query
Sometimes the actual cost is less than the cost obtained by static analysis. For example, your query specifies 100 objects in connection, but only returns 10. In this case, the scores deducted from static analysis are returned to the API client.
In the following example, we query the first five items in the inventory, but only one item satisfies the query criteria, so even though the request cost is calculated to be 7, the client does not lose 7 points.
query {
products(first: 5, query: "inventory_total:<5") {
edges {
node {
title
}
}
}
}
Copy the code
Again, the actual query cost is calculated:
{ "data": { "products": { "edges": [ { "node": { "title": "Low inventory product" } } ] } }, "extensions": { "cost": {"actualQueryCost": 7, "actualQueryCost": 3, "throttleStatus": {"maximumAvailable": 1000.0, "currentlyAvailable": 997, "restoreRate": 50.0}}}}Copy the code
The validity of the Query Cost model in this paper is tested
The calculated query complexity and execution time have a linear correlation
Using the query complexity calculation rules, we were able to match the cost of the query to the load on the server almost linearly. This allows Shopify to effectively predict load and scale out its gateway layer infrastructure, and also gives users a stable platform to build apps on. We can also identify those high resource consumers and optimize their performance specifically.
By limiting the flow of GraphQL query complexity calculations, we get an API client that is more reliable than REST, but also more flexible than REST. This API pattern encourages users to request only the data they need, making the load on the server more predictable.
Other information:
- Shopify API rate limits
- Shopify Admin API GraphiQL explorer
- How Shopify Manages API Versioning and Breaking Changes
- ShipIt! Presents: A Look at Shopify’s API Health Report