The author | meter source edge | alibaba cloud native public number

As we all know, the game industry is an evergreen tree in today’s Internet industry. In 2019, before the epidemic, The revenue of China’s game market was about 288.48 billion yuan, up 17.1 percent year on year. In 2020, the game industry has made great progress due to the epidemic. Playing games is already one of the most popular forms of entertainment for Chinese netizens, especially during the pandemic. According to incomplete statistics, as of 2019, the number of mobile game users in China was about 660 million, accounting for 77.92% of China’s total Internet users of 847 million. It can be seen that games, as a means of entertainment with low threshold and low cost, have become a habitual part of most people’s lives.

For players, there are a lot of games on the market, so how can players discover and recognize a game, and continue to play it is probably a problem that all game manufacturers need to think about. Combined with 2018 games of hair events plate number, game makers cherish every gain plate number game products, so it also makes the “depth of grinding product quality” and “improve the scale of operations” these two games industry development direction to become the development direction of the game makers, whether new or old games are trying to implement these two points:

New games: More promotional resources and more complete game content are needed for players.
Old games: Make better versions with more effort and cost through user behavior analysis.

Here we focus on new games. A game company works hard for three years, waiting for a new game to take off when it is released. The question then becomes, how will the new game be seen by a wide audience?

First, let’s look at the categories of companies in the game industry:

Game developer: The company that develops games, produces and produces game content. For example, all the hero design, battle scenes and battle logic of King of Glory are provided by the game development company.
Game publishers: The main work of a game publisher is divided into three parts: marketing, operations and customer service. The game publisher controls the lifeblood of the game. The core of the marketing work is to introduce players, while the core of the operation work is to maximize the value of users and earn more profits.

Game platforms/Distributors: The core purpose of game platforms and distributors is to expose your game so that as many people as possible can find your game.

There are independent companies that specialize in one or all of these types of businesses, but the relationship between the three types of businesses remains the same:

So it’s easy to understand that publishing and operating your game is key if you want to reach as many people as possible. In layman’s terms, if your game is being advertised on all the platforms that you’re currently familiar with, then at the very least the number of new users signing up for your game is significant. Hence the introduction of a key word: buy volume.

According to the data, the average monthly purchase of mobile games in 2019 reached 6,000 + titles, compared with only 4,200 in 2018. On the other hand, with the resource tilt of Tiktok, Weibo and other super apps in the game purchase market, the effect and efficiency of mobile game purchase are also improved, and game manufacturers are more willing to use the purchase amount to attract users.

However, it should be noted that while the precision degree of game purchase is constantly improving, the cost of purchase volume is also increasing. Only by rationally allocating the relationship between purchase volume, channel and integrated marketing, can publicity and development resources be brought into full play.

Popular speaking, buy amount is actually in each big mainstream advertising platform, the masses of users see game advertisement, is likely to click on ads, and then entered the game makers propaganda page, will collect some information about the user at the same time, then the game makers to large user information collected data analysis, further directional promotion.

Core appeal of game operation

Game manufacturers spend money to buy, user information and new user registration information is for the continuous operation of the game, so the core appeal of this scene is to collect the integrity of user information.

For example, if a game manufacturer spends 5,000W a day on advertising and generates 1W clicks per second in a certain period of time on a certain platform, the user information of each click on the advertisement during this period should be collected completely and stored for subsequent analysis. This puts forward high request to data acquisition system.

Among them, the most core point is the link of the system exposed interface to be able to smoothly carry the irregular flow pulse during the buying period. During the buying period, game manufacturers will often advertise on multiple platforms, and each platform will advertise at different times, so there will be intermittent traffic pulses throughout the day. If there is a problem with this link, then the money equivalent to the purchase amount will be wasted.

Traditional architecture of data acquisition system

The figure above is a relatively traditional data acquisition system architecture, the most critical part is to expose the HTTP interface to send back data, if there is a problem in this part, then the data acquisition link will be broken. But this section tends to face two challenges:

When the traffic pulse comes, whether this part can be rapidly expanded to cope with the traffic shock.
The tidal nature of running a game, which doesn’t happen every day, requires thinking about how to optimize resource utilization.

Under normal circumstances, operation and maintenance students will be informed in advance before the operation of the game, and nodes will be added to the service of this link. However, it is impossible to predict how many nodes will be added, and only about a number can be estimated. This is a common scenario in traditional architectures, which leads to two problems:

Because the traffic is too heavy, the nodes are added or lost. As a result, some traffic data is not collected.
Traffic is not as large as expected, and more nodes are added, resulting in a waste of resources.

Data acquisition system Serverless architecture

We can use the function to calculate FC to replace the traditional architecture exposed HTTP return data this part, so as to perfectly solve the problems existing in the traditional architecture, refer to the article: “Resource cost double optimization! See Serverless Subversion programming education innovation practice”.

Let’s start with the architecture diagram:

Both problems in traditional architectures can be solved by calculating the 100-millisecond elasticity of functions. We do not need to estimate how much traffic marketing activities will bring, nor do we need to worry about the performance of the data acquisition system, nor do operation and maintenance students need to prepare ECS in advance.

Because of the extremely elastic nature of function calculation, when there is no buying volume and no marketing campaign, the running instances of function calculation are zero. When there is buying activity, in the case of flow pulse, function calculation will quickly pull up the instance to bear the flow pressure; When the traffic decreases, the function calculation will timely release the unrequested instances for scaling down. So the Serverless architecture brings the following three advantages:

Without operation and maintenance intervention, research and development students can quickly set up.
No matter the size of the flow, can smoothly undertake.
Function can calculate the number of instances pulled close to the curve of traffic size, so as to optimize the utilization of resources, coupled with the charging by volume mode, can maximize the cost.

Architecture parsing

As can be seen from the architecture diagram above, the whole data collection stage is divided into two functions. The first function is to expose the HTTP interface to receive data, and the second function is to process the data, and then send the data to the message queue Kafka and the database RDS.

1. Receive data function

We open the function evaluation console and create a function:

Function type: HTTP (i.e. trigger is HTTP)
Function name: receiveData
Run environment: Python3

Function instance type: elastic instance
Function execution memory: 512MB
Function timeout: 60 seconds
Function single instance concurrency: 1

Trigger type: HTTP trigger
Trigger name: defaultTrigger
Authentication method: Anonymous (no authentication required)
Request mode: GET, POST

Once the function is created, we write the code through the online editor:

# -*- coding: utf-8 -*- import logging import json import urllib.parse HELLO_WORLD = b'Hello world! \n' def handler(environ, start_response): logger = logging.getLogger() context = environ['fc.context'] request_uri = environ['fc.request_uri'] for k, v in environ.items(): if k.startswith('HTTP_'): # process custom request headers pass try: request_body_size = int(environ.get('CONTENT_LENGTH', 0)) except (ValueError): Request_body_size = 0 # Receive data from environ['wsgi.input']. Read (request_body_size) request_body_str = urllib.parse.unquote(request_body.decode("GBK")) request_body_obj = json.loads(request_body_str) logger.info(request_body_obj["action"]) logger.info(request_body_obj["articleAuthorId"]) status = '200 OK' response_headers = [('Content-type', 'text/plain')] start_response(status, response_headers) return [HELLO_WORLD]Copy the code

At this point, the code is very simple, is to receive the parameter from the user, we can call the interface to verify:

You can see the log of this call in the function’s log query:

At the same time, we can also look at the link trace of the function to analyze the call time of each step, such as the function receiving the request → cold startup (when there is no active instance) → preparing the code → executing the initialization method → executing the entry function logic:

As you can see from the call link diagram, the previous request included cold start time because there were no active instances, the whole process took 418 ms, and the actual entry function code was executed in 8 ms.

When the interface is called again, you can see that the logic of the entry function is executed directly, since an instance is already running, and the entire time is only 2.3 milliseconds:

2. Functions that process data

The first function is created on the interface from the function computing console, and Python3 is selected. We can see in the official documentation what modules are built into the preset Python3 runtime, since the second function operates on Kafka and RDS, so we need to confirm the corresponding modules.

As you can see from the documentation, the built-in module contains the RDS SDK module, but there is no Kafka SDK module, so we need to install the Kafka SDK module manually, and create functions in a different way.

1) Funcraft

Funcraft is a command line tool for Serverless application deployment, which helps us easily manage function calculation, API gateway, logging services, and other resources. It helps us develop, build, and deploy through a resource configuration file (template.yml). So we need to use Fun to operate the second function. The operation is divided into four steps:

Install Fun.
Write a template.yml template file that describes the function.
Install the third party dependencies we need.
Upload the deployment function.

2) Install Fun

Fun offers three installation options:

Manage installation through THE NPM package – suitable for developers on all platforms (Windows/Mac/Linux) who already have NPM installed.
Install by downloading binary – suitable for all platforms (Windows/Mac/Linux).
Install via Homebrew Package Manager – suitable for Mac and more suitable for MacOS developers.

The text sample environment is Mac, so use NPM to install, very simple, one line command to do:

sudo npm install @alicloud/fun -g
Copy the code

After the installation is complete. To view the version information, enter the fun command on the control terminal:

$fun - version 3.6.20Copy the code

Before using fun for the first time, run the fun config command to configure Account ID, Access Key ID, Secret Access Key, and Default Region Name as prompted. Where Account ID and Access Key ID can be obtained from the upper right of the home page of the function calculation console:

fun config

? Aliyun Account ID ***01 ? Aliyun Access Key ID ***qef6j ? Aliyun Access Key Secret ***UFJG ? Default region name cn-hangzhou ? The timeout in seconds for each SDK client invoking 60 ? The maximum number of retries for each SDK client 3

3) Write template.yml

Create a new directory and create a YAML file named template.yml that describes the configuration of the function to be created. In other words, write the configuration information in YAML format:

ROSTemplateFormatVersion: '2015-09-01'
Transform: 'Aliyun::Serverless-2018-04-03'
Resources:
FCBigDataDemo:
Type: 'Aliyun::Serverless::Service'
Properties:
Description: 'local invoke demo'
VpcConfig:
VpcId: 'vpc-xxxxxxxxxxx'
VSwitchIds: [ 'vsw-xxxxxxxxxx' ]
SecurityGroupId: 'sg-xxxxxxxxx'
LogConfig:
Project: fcdemo
Logstore: fc_demo_store
dataToKafka:
Type: 'Aliyun::Serverless::Function'
Properties:
Initializer: index.my_initializer
Handler: index.handler
CodeUri: './'
Description: ''
Runtime: python3
Copy the code

Let’s parse the core of the above file:

FCBigDataDemo: user-defined service name. Through the Type attribute indicating below are services, namely the Aliyun: : Serverless: : Service.
Properties: The Properties under Properties are the configuration items of the service.
VpcConfig: VPC configuration of the service, including:
- VpcId: indicates the VPC ID.
- VSwitchIds: Switch IDS. Here is an array. Multiple switches can be configured.
- SecurityGroupId: indicates the ID of a security group.
LogConfig: service binding log service (SLS) configuration, including:
- Project: indicates the log service Project.
- Logstore: indicates the Logstore name.
DataToKafka: custom function name for this service. Through the Type attribute below indicate the is Function, namely the Aliyun: : Serverless: : Function.
Properties: The Properties under Properties are the configuration items of the function.
Initializer: Configures the initialization function.
Handler: Configures the entry function.
Runtime: the environment in which a function runs.

The directory structure is:

4) Install third-party dependencies

Once the templates for the services and functions are created, we install the third-party dependencies we need to use. In this example scenario, the second function requires the Kafka SDK, so it can be installed using the Fun tool in conjunction with the Python package management tool PIP:

fun install --runtime python3 --package-type pip kafka-python
Copy the code

The following information is displayed after the command is executed:

At this point we will see a.fun folder generated in the directory where we installed the dependencies:

5) Deploy functions

Now that we have written the template file and installed the Kafka SDK we need, we need to add our code file index.py, which reads as follows:

# -*- coding: utf-8 -*- import logging import json import urllib.parse from kafka import KafkaProducer producer = None def my_initializer(context): logger = logging.getLogger() logger.info("init kafka producer") global producer producer = KafkaProducer(bootstrap_servers='XX.XX.XX.XX:9092,XX.XX.XX.XX:9092,XX.XX.XX.XX:9092') def handler(event, context): Event_obj = json.loads(event_str) Logger.info (event_obj["action"]) logger.info(event_obj["articleAuthorId"]) # Send message global Producer to Kafka producer.send('ikf-demo', json.dumps(event_str).encode('utf-8')) producer.close() return 'hello world'Copy the code

The code is very simple, here to do a simple parsing:

My_initializer: The my_Initializer function is executed when the function instance is pulled, and then the handler function is executed. While the function instance is running, the my_Initializer function is not executed for subsequent requests. This method is used to initialize Kafka Producer to avoid repeatedly initializing Produer.
Handler: This function has only two logic: to receive the returned data and to send the data to the specified Topic in Kafka.
Let’s deploy the function with the fun deploy command, which does two things:
- Create services and functions from the configuration in template.yml.
- Upload index.py and.fun to the function.

Enter the function, you can also see the directory structure of third-party dependencies:

3. Call between functions

Now that both functions are created, the first function receives the data and then pulls up the second function to send a message to Kafka. We only need to make a few changes to the first function:

# -*- coding: utf-8 -*- import logging import json import urllib.parse import fc2 HELLO_WORLD = b'Hello world! \n' client = None def my_initializer(context): logger = logging.getLogger() logger.info("init fc client") global client client = fc2.Client( endpoint="http://your_account_id.cn-hangzhou-internal.fc.aliyuncs.com", accessKeyID="your_ak", accessKeySecret="your_sk" ) def handler(environ, start_response): logger = logging.getLogger() context = environ['fc.context'] request_uri = environ['fc.request_uri'] for k, v in environ.items(): if k.startswith('HTTP_'): # process custom request headers pass try: request_body_size = int(environ.get('CONTENT_LENGTH', 0)) except (ValueError): Request_body_size = 0 # Receive data from environ['wsgi.input']. Read (request_body_size) request_body_str = urllib.parse.unquote(request_body.decode("GBK")) request_body_obj = json.loads(request_body_str) logger.info(request_body_obj["action"]) logger.info(request_body_obj["articleAuthorId"]) global client client.invoke_function( 'FCBigDataDemo', 'dataToKafka', payload=json.dumps(request_body_str), headers = {'x-fc-invocation-type': 'Async'} ) status = '200 OK' response_headers = [('Content-type', 'text/plain')] start_response(status, response_headers) return [HELLO_WORLD]Copy the code

As shown in the code above, three changes have been made to the code for the first function:

Import the library for function calculation: import fc2
Add an initialization method to create a function that computes Client:

def my_initializer(context):
        logger = logging.getLogger()
        logger.info("init fc client")
        global client
        client = fc2.Client(
            endpoint="http://your_account_id.cn-hangzhou-internal.fc.aliyuncs.com",
            accessKeyID="your_ak",
            accessKeySecret="your_sk"
)
Copy the code

Note that when we add an initialization method to our code, we need to specify the entry of the initialization method in the function configuration:

Client calls the second function by evaluating the function

global client
    client.invoke_function(
            'FCBigDataDemo',
            'dataToKafka',
          payload=json.dumps(request_body_str),
            headers = {'x-fc-invocation-type': 'Async'}
)
Copy the code

The invoke_function function takes four arguments:

The first argument: the name of the service from which the function is called.
Second argument: the name of the calling function.
Third argument: the data passed to the calling function.
Fourth argument: Call the second function Request Header information. The x-FC-Invocation type Key is used to set whether the invocation is synchronous or asynchronous. Set Async to an asynchronous call.

With this setup, we can validate the process of sending a request through the HTTP interface provided by the first function → collecting data → calling the second function → passing the data as a message to Kafka.

The purpose of using two functions

Some of you may wonder why you need two functions instead of sending data directly to Kafka in the first function. Let’s start with this picture:

When we call a function asynchronously, by default, the requested data will be put into the message queue for the first peak load filling, and then each queue will pull up multiple instances of the corresponding function instance for the second peak load filling. So this is one of the core reasons why this architecture can consistently handle large concurrent requests.

4. Configuration of Kafka

In the game operation scenario, the amount of data is relatively large, so the performance requirements of Kafka are relatively high, compared to open source self-built, using Kafka on the cloud saves a lot of operation and maintenance operations, such as:

We no longer need to maintain the nodes of the Kafka cluster.
You do not need to worry about data synchronization between the primary and secondary nodes.
You can quickly and dynamically expand Kafka cluster size, dynamically add topics, and dynamically increase the number of partitions.
Perfect index monitoring function, message query function.

In general, all SLAs are in the cloud, and we only need to focus on message delivery and message consumption.

You can open the Kafka interface and open the Kafka instance with one click according to the requirements of the actual scenario. After opening Kafka, you can log in to the console and see the Kafka access point in the basic information:

Default access point: A VPC Intranet access point.
SSL Access point: The access point in public network access scenarios.

Configure the default access point into the second function of the function calculation.

. producer = KafkaProducer(bootstrap_servers='XX.XX.XX.XX:9092,XX.XX.XX.XX:9092,XX.XX.XX.XX:9092') ....Copy the code

Then click on the left console Topic Management to create a Topic:

Configure the created Topic into the second function of the function calculation.

. Dump ('ikf-demo', json.dumps(event_str).encode(' utF-8 '))...Copy the code

We have already listed the advantages of Kafka in the cloud, such as dynamically increasing the number of partitions in a Topic. We can dynamically adjust the number of partitions in a Topic list:

A single Topic supports up to 360 partitions, which is impossible for open source self-build.

Next click on the left side of the console to create a Consumer Group:

At this point Kafka on the cloud is configured. Producer can send messages to the newly created Topic, and consumers can set the newly created GID and subscribe to the Topic to receive and consume messages.

Flink Kafka consumers

In this scenario, Kafka is often followed by Flink, so here’s a quick look at how to create a Kafka Consumer and consume data in Flink. The code snippet is as follows:

final ParameterTool parameterTool = ParameterTool.fromArgs(args);
String kafkaTopic = parameterTool.get("kafka-topic","ikf-demo");
String brokers = parameterTool.get("brokers", "XX.XX.XX.XX:9092,XX.XX.XX.XX:9092,XX.XX.XX.XX:9092");
Properties kafkaProps = new Properties();
kafkaProps.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, brokers);
kafkaProps.put(ConsumerConfig.GROUP_ID_CONFIG, "ikf-demo");
FlinkKafkaConsumer<UserBehaviorEvent> kafka = new FlinkKafkaConsumer<>(kafkaTopic, new UserBehaviorEventSchema(), kafkaProps);
kafka.setStartFromLatest();
kafka.setCommitOffsetsOnCheckpoints(false);
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStreamSource<UserBehaviorEvent> dataStreamByEventTime = env.addSource(kafka);
Copy the code

This is a very simple code snippet to build Flink Kafka Consumer and add Kafka Source.

Pressure test certificate

At this point, the entire data acquisition architecture is set up, let’s test the performance of the entire architecture through pressure measurement. Ali Cloud PTS is used here for pressure measurement.

Create a pressure test scenario

Open the PTS console and click on the left menu to create manslaughter/Create PTS scenario:

In the scenario configuration, the HTTP interface exposed by the first function calculation function is used as a series link, and the configuration is shown as follows:

After the interface is configured, let’s configure the pressure:

Pressure mode:
- Concurrent mode: Specifies how many concurrent users make requests at the same time.
- RPS mode: Specifies the number of requests per second.
Increasing mode: the pressure can be manually adjusted during the pressure measurement, or the pressure can be automatically increased by percentage.
Maximum concurrency: The number of virtual users who initiate requests at the same time.
Increment percentage: if it is automatic increment, increment by the percentage here.
Single-order duration: The length of time the pressure of each gradient is maintained before the full pressure is reached.
Total duration of manometry: Total duration of manometry.

For resource cost reasons, the number of concurrent users is set to 2500 for verification.

According to the situation in the pressure test in the figure above, TPS reached the maximum of 2W, and 99.99% of 549W + requests were successful. You can also click to view the 369 exceptions, which were caused by the timeout of the pressure test tool request.

conclusion

So far, the whole big data acquisition and transmission architecture based on Serverless has been built, and the pressure test proves that the overall performance is good, and the whole architecture is also very simple and easy to understand. This architecture is not only applicable to the game operation industry, in fact, any scene of big data acquisition and transmission is applicable. At present, many customers are running in the production environment based on the Serverless architecture, or are on the road to transform the Serverless architecture.

There are many other application scenarios based on Serverless, and I will share them with you one by one. If you have any questions, you can also join the Nail group: 35712134 to find the answer, we will be there!

Blockbuster download

By 23 Ali cloud technology experts carefully created 222 pages of “Orange Book” Serverless entry and combat “heavy online, take you from the outside to the inside, from shallow to deep learning Serverless, help you 2021” orange “wind and line, think things” orange “!

This book highlights

Starting from the evolution of architecture, it introduces Serverless architecture and technology selection and construction of Serverless thinking.
Understand the operating principle of Serverless architecture popular in the industry;
Master 10 Serverless real cases, live learning and use.

Click to download directly!

In the game operation industry, how does Serverless solve the pain point of data acquisition and analysis?