I have the honor to share the technology in QCon 2021 front End New trend special session on May 30, 2021, and summarize the speech content of sharing “Serverless Based Tencent online education big front end RESEARCH and development mode upgrade” to communicate with you.
The preface
First, I would like to make a brief self-introduction. I am Haige, an engineer from Tencent. I entered Tencent QQ browser in the school enrollment in 2016 and joined IMWeb team in 2017. Currently, I am a member of the IMWeb front-end team of the Online Education Ministry and the leader of the team’s full-stack r&d direction. This paper has certain reference significance for front-end teams who are interested in full-stack development and students who are interested in improving team r&d efficiency construction. The point of view of this paper is limited to the technical communication itself. There is no difference between high and low technology and right and wrong technology, only whether it is appropriate for the moment. The remaining non-technical issues are just personal subjective opinions with certain limitations, and do not represent the team. Welcome interested students to contact me at any time. It is interesting to meet interesting people. I look forward to growing up with you.
In foreign Internet companies Software Engineer (Software Engineer, do not distinguish between the front and back end) trend is obvious, and in China we distinguish between the concept of front and back end is very clear. In the short term, it is difficult for China to align itself with foreign R&D echelon, just as it is forced to compare the R&D mode and efficiency with Google & FB. Localization transformation is inevitable. The Five Principles of Peaceful Coexistence are still on trial in this era, the trend is indeed here, the pace of implementation is not forced. Of course, there is no need to insist on absolute, appropriate is the best, the general trend and follow the trend is very important. Copying and copying is a bad habit to escape thinking, find a suitable for our own way, is to have lasting core competitiveness.
In the domestic Internet industry policy gradually standardized today, the money of c-terminal users is more and more difficult to earn. In the absence of phenomenal products and technological breakthroughs, the money of C-end users in the short term is indeed not so easy to grow rapidly, and the dividend of the mobile Internet wave has gradually been exhausted. In the context of the transformation of the entire Internet industry to ToB and ToG, the atmosphere of The Times is just like a cup of strong wine — the fragrance of ten miles, so that you have to indulge in it. With the strong breath of The Times, the term “full stack development” seems to have some new ideas. Whether the front-end full stack is a silver bullet or not is no longer important in my opinion today. It is only necessary to create value for the product and be result-oriented for the team. Especially in the last year or two, I’ve understood the concept of the industry a little bit better than before, and I’ve seen more and more vertical architects. In the traditional communication industry, the division of labor in pre-sale, after-sale and sale has been solidified for a long time, and the traditional Internet industry has gradually begun to keep up with the communication industry. Most of the software development engineers are in the sales process, responsible for project landing and implementation, as well as daily maintenance and promotion in part of the after-sales process.
At the same time, with the gradual landing of the cloud era and the gradual closed loop of infrastructure, the industry attribute of vertical category will gradually intensify. With the passage of time, the basic technology will gradually be “buried” and “forgotten”, the Internet world will remain the role of the connector. Through the data pipes that development engineers open up, threading the needle on an infrastructure that’s already in place that’s very different from the infrastructure of the last decade. (PS: We can’t rule out making chips in the future, but it’s hard for most ordinary people and ordinary companies!)
Through the “traditional industry + Internet” model to “subvert” and “reconstruct” one after another traditional industry. In this era, business development will be more focused on delivering product functions, away from the constraints of programming languages and service frameworks themselves, and the front and back r&d mode will gradually be erased by basic technologies. Both the front and back ends can independently fulfill customers’ demands for functions in an independent closed-loop based on the basic capabilities of the existing technology system, while the technical boundary and division of labor will be more flat. This is already on its way. Rapid and urgent, it seems that they will have a kind of “suffocating” feeling, pressure is increasing, in fact, this is stimulation. Today’s traditional industry is not the traditional industry of the past, today’s traditional industry also has its own technical team, how can Internet companies want to subvert it?
It’s easy to overestimate the next two years of technology, but it’s easy to underestimate the next ten. So, it’s great to be young and have the opportunity to witness the next decade of technology with all of you, and to be part of it as players. So back to the present, what we need to do is value oriented programming, let technology in the business to find the value of the ground, to solve the current pain point. That’s the most important thing for engineers right now. Borders are left behind, left to be broken. Today’s full-stack development mode upgrade is a destined journey for our front-end team as we build and experience the “future world”.
The core of this sharing is full stack development from the following three dimensions:
1. Why do businesses carry out full-stack R&D and upgrade?
2. How to implement the R&D mode based on SCF?
3. How to continue deep cultivation around TKE full stack system?
First of all, let’s introduce three products of the department: Tencent Classroom for vocational education, Penguin Tutoring for K12 education, Tencent Happy Mouse for Children’s English and thinking. The business covers all ages of the educational population! At present, the entire Internet industry is in the transition period, and the education industry is also a field full of subversion and challenges, and it is a very magical industry at this time! I am very happy that I can perceive the world in such a track, the close combination of online and offline, and the interaction between traditional and modern modes. The business is challenging and exciting. Industry today has no silver bullet products, this is the basic consensus! Unless you can create a monopoly, which is almost impossible, the key to future success of products has changed a lot. I’m looking forward to growing with the emerging industry Internet model and building energy for the next decade of my career, which is the era of my generation.
The complexity of the problems faced and the quality and resilience of the team made our journey as interesting as the process of finding the winning key to solving the dilemma at hand was as important to us now as the outcome itself! Because the general trend of the industry is changing, the rules of the game of the traditional Internet have changed very much. The new rules are not yet fully formed. The best thing to do right now is to learn. Gone are the days when companies could take off just because you were there. After all, this isn’t the time 10 years ago when everything Tencent does is relatively likely to succeed (PS: I can’t say it was easy in the past, but the probability is relatively greater than today). Now every success of products, behind are doomed to contain blood and tears pay. So, going from ROI to ROL is an important mindset shift right now. It’s important to be patient, to be resilient, to be long-term. (PS: L for ROL is learning)
Topic 1: Why does the business need to carry out full-stack R&D upgrade
In fact, our team has been trying to develop the full stack. From SSR in 2015, to logic service in 2017, to the comprehensive systematic construction in 2020, we have never stopped exploring the steps of full-stack RESEARCH and development. On the right is our architecture diagram before 2020, which forms the RESEARCH and development specifications of some teams, with certain business implementation and technical exploration and practice. Here I would like to share our practice and summary in the past since 2020, so that you can know how we have grown and look forward to growing with you.
First of all, the size of the team has expanded with the development of the business, so that we have the manpower to try more directions; Secondly, the background manpower shortage restricts the rapid business iteration, and the front end students need to act as the first position to solve the problem of product functions; Finally, there are many scenarios in the middle and background, the concurrency of services is not high, and the technical depth of the background is not high. However, the requirements for business understanding are very high, and the communication cost of actual development is also very high, which also restricts the development of background students in the long run.
Conclusion: Front-end students can achieve full-stack self-closed-loop products, which can not only meet the rapid iteration of products, but also improve the value and influence of team business. The front end is not to rob the background students of the rice bowl, is mutually beneficial things, win-win thinking is very important!
Front-end landing on the whole stack, solve the problem of business delivery, not only can meet the rapid product iteration, but also can improve the value of the front-end itself! Of course, the following challenges are also very obvious — front-end to back-end, in the new technical range, how to reduce the probability of mistakes, improve the quality and performance of the business, is the core challenge we need to face!
Topic 2: How to implement the R&D mode based on SCF
Full stack development pain points: there are many listed here, such as: traffic, resources, capacity expansion and so on. The essence of business development is fast delivery, but we are moving away from a lot of fundamental issues, such as reliability, scalability, and service performance, repeatability problems are more, everyone will encounter.
This raises a question: the architect’s capability model discussion? Before the cloud, we built a lot of wheels and systems were relatively closed. For example, I entered the TAF framework used by the company in 2016, and the external open source project is TARS, similar to Ali’s Dubble: a set of micro-service framework! The framework of TAF is relatively closed. It does everything by itself, including the gateway system, logging system, dyeing system, centralized nodes and even storage scheme, all of which are closed loop by the framework itself. The current cloud era is an open community model, and the core value of architects has changed from how to realize basic components in the past to how to better serve the business through technical selection and customization of components in the community. We can also use the model of co-construction to give back to the community and solve common problems in the industry. The technical thinking of architects in the cloud age has changed from building an engine for the product to choosing the right engine, and that engine is open and free!
Technology has changed the developer’s capability model dramatically. Programming for the future is important!
After you look at the contradictions of the front-end full stack, let’s take a look at the current state of cloud computing across the industry. From physical machines, to cloud hosts, to containerization, and then to cloud functions; The trend for platform-based products is towards thicker and more standardized infrastructures. From a business developer’s point of view, services get lighter and thinner. Frontline developers are increasingly focused on the business itself, and development is more focused on the business value rather than the unimportant goals. Technologies like Serverless precisely meet our business scenarios and address common business pain points.
Serverless: Server + less, service + less, first is not no, is reduced! Serverless refers to the idea of building and running applications that do not require server management. It describes a more fine-grained deployment model in which applications bundled as one or more functions are downloaded to the platform and then executed, extended, and billed according to the exact requirements currently required. Serverless in the broad sense is a solution that reduces server-side operation and maintenance.
Let’s take a look at Serverless: Serverless architecture. Trigger, function instance, called the back-end service, when the cloud platform function receives the trigger function of the event, will launch a container to run the function code, if the received new event, and on a container also an event in the processing, the platform will enable the second function instance to deal with the second event. The use up and go mode ensures that each request is independent, atomic enough, and secure.
On the edge of using or not using, we have been entangled for a long time. I will not analyze the specific process here. The process is rather tortuous, but someone always has to try. In the end, we chose to use SCF to land! SCF is Serverless Cloud Function! We also had many rounds of discussions about Serverless products, and we did a lot of research on which products to use, so I won’t go into details here. Of course, in Tencent, you can’t choose aliyun or AWS products. Technology selection is somewhat limited, and if your team is a startup, you can compare it horizontally.
We also found a breakthrough in business, starting from the background business, the background business for stability requirements, not as high as C end users, business has a certain tolerance! Middle and background business for background development students, but also lack of technical challenges and improvement, background students are willing to let front-end students to undertake; Then we connect the front end, as the breakthrough of the whole stack development landing, from the demand of the concurrent quantity is not high, gradually penetrate to the C end; In fact, the big front-end teams themselves are in a process of learning and growing!
During the actual landing of SCF, we also found some problems, as shown in the figure above. Of course, the business side should also have an open mind and grow together with brother departments. Learning from each other and growing together is the rational way of thinking, to believe that there are always more solutions than problems, and this is true.
Git is a natural carrier of specifications. For example, under a Group, projects are unlikely to be the same. This can solve the problem that functions in the same namespace overwrite each other! We also customized the capabilities of cloud functions. For example, the last version is the test environment of our business, including the gateway. We also made some custom specifications. Simplify the development process, complete the development of CLI scaffolding, one-time familiar with the cost, integrated research and development environment, shielding everyone’s learning cost; We also integrated CI/CD for cloud functions, completing the DevOps pipeline. Build around the most fundamental principles and tools: cloud functions, CI/CD, closed-loop DevOps. Here I understand is a principle: around the basic and principled goals for continuous optimization!
Break down the problem one by one by disassembling and refining the problem. And the idea of solving the problem itself is not complicated: standardization, engineering, automation.
This is done by setting new norms for the team. After a year of efforts by the whole team, dozens of core services have been gradually settled.
Through creating CLI tools to solve the problem of development efficiency; Quality landing through gateway and cloud functions themselves provide capabilities such as reserved instances and keepalive mechanisms. In fact, this is also a forced transition plan, history always moves forward in compromise. The gateway solution team is also in the process of gradual optimization, and will later move to CLB and TAPISIX, which will have a new form, but the migration process will be a relatively long process.
PS: At present, NGW is the gateway used by the front-end cloud function of education, which supports the traffic of almost all cloud functions, including the background interface built by our low-code platform. It serves many businesses in ToB scenarios, with daily visits of about 300W. We also want to thank jsonsun from PCG for his cooperation.
The work of SCF landing has been clearly summarized in the figure. First, the specification was formulated, and then the tool chain was improved to gradually open up the closed loop of the process of research and development, and gradually solve the problem in the process. The process is: first, ensure the minimum availability of business, determine the feasibility of technical input, then gradually solve various defects, make development tools easy to use, and finally, systematic construction and promotion within the team.
Topic 3: How to continue to develop around the TKE full stack system
After the implementation of SCF, the business gradually had more demands, SCF could not safely meet all business demands, at this time TKE (Tencent Kubernetes Engine) came on the scene. The following diagram illustrates the concept and advantages of TKE. And also coincides with the company’s cloud background, we also decided to choose TKE. I believe that there are still many common background architectures in the company to implement business, but we still choose TKE in the end. Sometimes the selection and decision of technology are just the story of a few people.
Here can take a look at my collection of TKE advantages, a lot of ah. In fact, at the beginning of the access, there were many problems, and now they are gradually being solved. Cloud for business is not a small amount of work. The front end is fine, the historical burden is much lighter than the background. Currently, after a year, all of our core Node.js services have been moved to TKE.
Here’s a comparison of node.js services running on SCF and TKE containers. In essence, SCF serves as a function service for FasS, hoping that the container can be destroyed after the function execution after the event is triggered, so that the program’s required execution environment is gone. As a result, the memory caching capability that is most commonly used as a back-end service is gone. While the traditional container service is BasS service, the process is resident, can save stateful and long-linked data, but also through memory caching ability, improve the number of concurrent services. While there are advantages, there is also an increase in memory leaks from resident services, and an endless loop of program sudden exceptions resulting in 100% CPU usage. So, when faced with stateful, high-performance, more flexible scenarios, TKE’s ease can be used to undertake services.
Front-end operation and maintenance TKE challenges with us to pay attention to the traditional page operation and maintenance is not too much, mainly: performance, monitoring, alarm, pressure test! In fact, the dimension is the same, but the scene and the focus of the problem is different. And moving from SCF or CVM to TKE’s Node.js service is essentially just a change of runtime environment. So I borrowed the concept of “old wine in new bottles” to illustrate the challenges developers face. Of course, the core problem is to solve the problem of cost! Cost is the foundation of all software services.
First of all, cost is split and measured by efficiency + performance + quality. Secondly, based on these three major goals, three directions are split: the second dimension of R&D framework, development tools and business governance, so as to find the benchmark of landing; Thirdly, based on the second dimension that can be landed, 12 sub-directions with finer granularity are split and implemented step by step. In the future, there will still be some zhongtai-oriented schemes and accumulated technology sharing of the team, so as to improve the overall efficiency and overall technology of the team.
In this dimension of r & D framework, our core concern is framework and ecology! The framework is based on the specification, starting from the specification, the implementation of the team’s RESEARCH and development specifications. The current norms are still in the process of gradual formation and continuous iteration. After all, Rome was not built in a day, and the Colosseum that has been gradually improved and continuously built by generations has become today’s magnificent and majestic. It is the same to do norms. In the process of implementation, we should focus on the big and release the small, solve the core problems first, and then gradually evolve. The next step is to put the specification into the framework. Whether you use Egg or IMServer, this is not the most important point. What is important is that everyone agrees with it and continues to maintain and abide by it. Then customize some basic components and business components based on the scene and business to improve the efficiency of everyone’s development. The completeness and stability of the underlying components is a sign that a team’s progress in this direction has begun to take shape. After that, the capability is implemented according to the business scenario. In the process of business implementation, there will be some service-oriented middle-stage services, which will then precipitate into middle-stage services to support the business within the department in stages.
There are so many students involved in full-stack development, why do we need to pull so many people to do it? I understand that it is not only convenient for the subsequent promotion of access, but more importantly, only after we have tried something together can we promote the continuous integration of large teams. After all, with a 100-plus front-end team, it was my first experience with such a large software development organization. If a unified full-stack specification can be accessed in the future, we can still access and adapt quickly. Even if the current specification is replaced in the future, we have learned and accumulated sufficient experience in the process of landing and exploring, which will become the cornerstone of the team’s future achievements.
Here you can see all the services that are connected to the specification, and we automatically generate the basic unified view on the right. If the business needs to be customized, of course, it can be customized. Unified data reporting ensures problem location efficiency. The list on the left shows the progress of our service access specification and the expected time and who is responsible, so that you can resolve the battle within the expected time. Of course, in many cases, what cannot be done by technology alone can also be assisted by management. Sometimes soft doesn’t work, you can have hard.
Performance tools are the performance support to help developers connect the development domain, test domain and operation domain. Here is a list of tools that address different aspects of the problem. Some are provided by the open source community, some are excellent frameworks and solutions within the company, and some are developed by us. Of course, in my opinion, who does it is not so important, they are all phased tools, the core is to serve the current development students, let front-line developers happy is the most important thing! PS: Going back to basics solves the problem of development.
Tolstoy, a debugger, is gradually adding support for RPC’s Mock capabilities to make it even better for you. To be honest, being a Tolstoy technically isn’t that complicated a thing for you to do, but it’s important that we keep it for a long time and that we keep it going, and that’s what matters. The team is ready to incubate it as much as possible. PS: Tolstoy is our internal front to back integration tool.
Nohost supports the front-end, which has been widely recognized by the company and the industry. In fact, it can also support node.js services on the server side, such as routing, tutoring, and classroom services. It is currently in normal use on the Nohost provided Node.js development environment. Maybe, we may add some Mesh solutions to replace it in the future, but in the short term, the debugging of node.js service test environment supported by Nohsot will continue for some time.
Monitoring here refers to the industry plan of TAFNode and Alinode, and part of the data uses cloud monitoring and IX system developed by us to escort the service for everyone. Logs are currently concentrated on ELK, with some being placed on Prometheus for easy computation and persistence.
From the service calling and being called, to the interface delay and return code; From slow queries in databases, to large Key reads and writes in Redis, and queue lengths in Kafaka. We deeply understood and learned the monitoring indicators and implementation details together with the backstage students, and combined the monitoring problems with TAPD (The abbreviation of Tencent Agile Product Development, namely, Tencent Agile R&D Collaboration platform). Integrating the team cooperation concept and the essence of agile R & D practice of Tencent for more than ten years, the process has been automated. From monitoring and discovering problems, to problem distribution, treatment and follow-up, as well as feedback after repair, regular data statistics are all processed by the assembly line. I also want to thank Simon and his team for their support.
The above is a business case after the service pressure test through Xinghai platform and IX problem location. Improved service QPS to a certain extent, reduced machines and saved business costs. In the future, we will continue to increase our investment in automated testing and help you to find and locate problems faster by using red and blue flame charts.
Here is an analysis of how costs are gradually broken down and absorbed. There is some crossover and overlap between categories. For example, the basic framework of IMServer can not only solve the development efficiency but also improve the quality of business. Here I will not explain one by one, I believe that with your ability, you can read the text. PS: IMServer is the internal Node.js service framework.
Finally, we summarize the architecture diagram of IMWeb team’s full-stack research and development, and introduce some thoughts that the team tried and settled in the past period of time, and will continue to improve and innovate in the future. The form of full-stack research and development will continue to grow from taking root to thriving, and then luxuriant in front of everyone. I hope there will be some better landing plans to share with you.
Topic 4: At the next intersection, where is it?
Here I have chosen four basic themes: cloud integration construction of WebIDE, which needs continuous exploration in the future. Maybe the plug-in ecosystem based on VSCode will fully support cloud integration in the future. The second is TF. In the first half of the year, WE cooperated with Zhiping on two projects, one is AI push questions, the other is AI intelligent correction, and the attention recognition test we did by ourselves. In the future, more scenarios will be implemented and machine learning capabilities will be used. Low code has been rampant, the market is seen to do shallow low code platforms, including our own low code is only the product of solving a specific scenario. We hope to see a real low-code platform in the future. We will try to launch a version of D2C this year. We are looking forward to it. Finally, I chose a WASM to end with. Node.js is often complained about by developers in the background because of performance issues. It’s 2021, and the model of software development has changed a lot. More reasonable technology selection is the most important. Of course, the core of this statement is that WASM can lead to browser-side performance optimization and the server-side language RUST, both of which are highly anticipated in the future.
To summarize what was shared, the historical inevitability of full-stack development is due to technological trends. In the future, the cost of manpower will gradually be higher than the cost of machines. How can engineers create more value? After breaking the boundary of engineers and completing infrastructure capabilities, developers will gradually remove technical barriers. Both front-end and back-end students can complete the closed loop of products through the capabilities provided by the platform. And both the front end and the back end hope to enhance the value of the technology community itself in the future. The level expansion and extension of any technology will not only bring technological innovation and division of work, but also bring changes within the organization. Maybe in the future, full-stack r&d will develop in a different form, but it will continue. I hope there will be some new growth in the organization, and I will share it with you then. I also learned and thought a lot in the process of landing and learning with the team, which was a very valuable experience! In a 100+ front-end team has the opportunity to participate in and promote such things, itself is very lucky, thanks to the leadership to give such an opportunity. We should always keep curiosity and love in technology, and keep a good attitude, continue to invest in technology construction, and gradually improve the iteration of function. There are always various problems that will more or less affect our lives, but we should always keep patience and team resilience. There are no smooth sailing stories, only accidents that we constantly experience. In the process of landing, we should keep restraint and long-term spirit in mind. PS: In the end, I would like to thank every IMWeb Team member who participated in the development and iteration of IMWeb Team. There are many students who participated in the development and iteration of IMWeb Team. Here I will not thank you one by one.