TKEStack is an enterprise-level container orchestration engine that integrates strength and ease of use. It is one of the incubation projects of Open Atom Foundation. This article is about the TKEStack open source project leader Ru Yingzhe and TKEStack senior product manager He Pengfei sharing and collating in the cloud + community salon online, introducing the open source methodology of TKEStack, hoping to communicate with you.

Click this link to see the full replay ~

Introduction to TKEStack

TKEStack is the open source container platform of Tencent Cloud. Many friends ask: Why use TKEStack with Kuberentes? Kuberentes as a capability platform leaves a lot to be desired. For example, it does not have UI, mirror warehouse, permission management, log, monitoring and other basic operation and maintenance capabilities, single Kuberentes as a business platform or very thin, and container platform to complement these function points, the K8S package into a complete business platform.

TKEStack’s open source history

Tencent has long been a container-related computing platform. In 2009, Dr. Wu Jun came to Tencent from Google and developed t-Borg, a business platform developed by Tencent. Around 2011~2013, we started to build a business platform with mature and recognized technologies in the community. There was an important shift in the direction of technology to use mainstream solutions in the industry, and Torca turned to a dedicated big data business platform, hence HOT (Hadoop on Torca). In 2014, Docker technology also showed its power. We were very optimistic about Docker and Kubernetes technology, so we quickly switched over and started to build a general business platform.

In 2014, a new product GaiaStack was introduced to support Tencent’s internal business. In 2019, Tencent made a business integration, including the merger of several business lines. Kuberentes and the public cloud team launched TKEStack container platform. Integrating the technical capabilities of Kuberentes. Gaiastack’s technology and TKE’s technology were combined to form TKEStack, which became open source in 2019.

Last month, TKEStack officially joined the Open Atom Foundation.

Ii. Product positioning and direction

1. The container is the de facto base of the cloud native

Before I introduce the positioning and direction of the entire product, I should first mention the hottest technology direction right now — cloud native.

Cloud native is building and running extensively-resilient applications using public, private, and hybrid clouds. Container is basically the base role of the whole cloud native technology stack. According to the official view of CNCF, a large number of products and manufacturers have been involved in the container family, which is basically a situation of a hundred flowers blooming together.

Two of the most recognizable products are OpenShift and Rancher, which are benchmarking in the container open source community.

The first is OpenShift of Red Hat. We can know from the code contribution of K8S that the code contribution rate of Red Hat K8S is second only to Google. It has strong technical capability and perfect products. Another is Rancher, which was acquired by SUSE this year and has a very good product experience.

However, in the use of domestic enterprises, these two products also have some imperfections.

The first is the difference between domestic and foreign users’ habits. Foreign products take more care of the use habits of foreign users. Foreign users prefer small, specialized products and command line tools. But most domestic users hope to use one-stop platform, interface and one-stop service.

Another problem is that domestic download some mirror images of foreign products, the network is a big problem, often the network is disconnected or very slow, resulting in poor use experience.

Therefore, TKEStack needs to consider how to position itself as a container product. TKEStack is not about competing with OpenShift and Rancher in terms of basic experience and global perfection. They are already very mature and we have just opened source. It is difficult to achieve their maturity in a short period of time.

2. Open source idea of TKEStack

Tencent Cloud’s idea is to look for the blue ocean of cloud origin. We believe that in the future, multi-cloud and heterogeneous hardware and heterogeneous hardware platform are the same direction.

Since 2009, Tencent has accumulated a lot in container technology, and internal business has also accumulated a lot in cloud native field. Based on Tencent’s own technical advantages and accumulation as well as our better understanding of the demands of domestic users, we plan to make some cloud native open source products based on these two points, so as to contribute to the cloud native ecology.

(1) Heterogeneous hardware

First of all, the hardware is heterogeneous, most of the use scenarios are based on X86 hardware, but there is a very important demand in China is to support ARM. Tencent will make X86 and ARM deployed in the same cluster. In terms of GPU, in addition to its traditional GPU virtualization capability, it also ADAPTS Intel GPU capability with Cambrian and Huawei GPU-related products in TKEStack, so that upper users can use it without feeling.

(2) Heterogeneous infrastructure

TKEStack architecture design is to take a cluster management of other clusters, naturally can achieve hybrid cloud, other clusters, self-built clusters registered to TKEStack, using TKEStack provides the mirror warehouse, authentication, log, permissions, monitoring, Multiple K8S clusters can be managed in one TKEStack interface.

Tencent has accumulated a number of technical solutions related to cloud native applications, two well-known ones are TAPP of stateful applications and Nvidia GPU virtualization, and some big data suites are also being integrated. Tencent hopes to integrate the advantage of internal open source components step by step into TKEStack, so that users can use them on TKEStack.

(3) Loose coupling

In addition, another important point is loose coupling. TKEStack supports self-development business on the one hand, and commercial products on the other hand, and ultimately presents more than one product to users. Open source products for direct use by users, but also for internal use.

In the face of such demands, TKEStack is required to be pluggable and building block assembly. Product A needs to use logs, while product B does not. You can choose A combination at will. TKEStack has such loose coupling capability.

Open source methodology

TKEStack has accumulated a lot of experience and adopted a lot of methods in almost one year since it was open source in November last year. Thanks to the guidance of ma Quanyi, a senior open source expert, TKEStack can not only contribute code and technology, but also show the accumulated experience in the form of methodology. The following is to share the experience and summary of open source governance patterns.

1. Why open source

Why open source? There are many answers to this question on many search engines, but most of them are biased towards the value of open source to individual development and to society as a whole. Tencent, as an enterprise or a department, considers what advantages a project can bring to the company and the enterprise.

Only commercially viable open source projects have a chance to thrive. In recent years, in the development of IT industry, there are several well-known projects, Apache, Linux, CentOS and so on, which have a very deep open source background and commercial implementation history.

The first is that open source has a huge Matthew effect on the marketing of commercial versions, where the strong get stronger and the weak get weaker. Apache first had open source projects, and then after the popularity of open source projects was opened, more and more users or developers joined the project, and the open source users naturally turned into users of commercial versions or applications of commercial products. Users who want to improve the commercial version will naturally turn to the open source version community to contribute their code or improve their experience, creating a virtuous cycle.

Open source is an important support for the commercial version. The open source community can contribute resources and strength to the commercial version products, and it is also an important support for the open source commercial products to enrich their features and environmental adaptation, and to build a complete ecosystem.

Some customers are sensitive to the possibility that their business will be tied to one vendor, and they will be more willing to support open source projects if they are offered an open source commercial version.

The whole open source business and ecosystem support each other, establish standards and form an ecosystem, and finally land into a commercial market, which becomes a virtuous circle, making open source projects and commercial projects bigger and better.

2. Why do open source projects need a methodology

In the case of some of the big projects just mentioned, the original projects were all driven by a few brilliant open source people with personal preferences and passions.

But if you look at today’s projects, big projects that can form big communities or big industries, it’s very difficult to push and improve with individual enthusiasm. Big industry projects like this urgently need a methodology to support and guide open source projects in developing strategies, defining features, building standards, and building ecosystems, step by step.

If it is to C’s open source business model project, ecology is the most important point to build. If it is a TO B project, cooperation will be particularly valued. The choice of partners and governance mode are of great significance and value to open source.

3. Open source project architecture

(1) Pyramid model

The whole structure of open source project is firstly the pyramid model of open source strategy. The position of the apex of the whole pyramid of open source for enterprises should first formulate a good strategy. Only after formulating a good strategy and indicating the development direction of future products or projects can there be follow-up standards, products and ecology.

Strategy is the premise that points out the general direction and is the soul of the whole open source project. Before open source, we must think about what kind of effect we want to achieve by making products or projects, what the future development direction is, or what kind of position open source projects can stand in the ideal future world, and what kind of strength they can contribute to everyone. Strategy is the soul and core.

Projects should follow standards or industry de facto standards, and more attention should be paid to whether open source projects can form industry standards or de facto standards. Of course, it does not have to be specific provisions and interface specifications, but can also be implicit standards, such as operation habits, UI interface design, etc., to guide and influence the user group’s ideas and habits, and bring about a set of implicit standards.

Around standard, around the strategic design open source project, to open source projects as a real product design, aware of the open source project is its developer, corresponding to the user, the real users, is not what we want to do, but the open source projects as a real product to do, do it more and more perfect, more and more perfect state.

Finally, build ecology with products and projects as the core, and transform ecology into a carrier of business, which is the pyramid of enterprise open source strategy.

(2) Hourglass model

Next comes the model of open source project governance, or actually getting the whole open source project to work. The most important part of the hourglass is the bottleneck part. The open source project is the product, and the product is our top priority. The open source project is the whole product with the open source project as the core presented to the customer.

The cornerstone of the product is the core developer, who can control the development direction of the whole project and maintain the stability of the community.

Below this are business partners using open source projects, who are highly motivated to improve TKEStack’s functionality, optimize and improve TKEStack’s product quality and experience.

On this basis, there are also more upstream developers from the community or from other teams, who will contribute to TKEStack’s various environment adaptation, various unique functions to enrich TKEStack’s product line.

(3) Ecological partner model

For TKEStack, the focus of enterprise open source projects is ecological partners. How partners choose how to cooperate forms the ecological partner model.

The partners fall into several categories. The first is the core developers and the TOC technical committee, which is the core of the entire open source project. The stability of TKEStack can only be maintained when the core team is stable.

First, the TOC technical committee can include founders, team mentors, architects, product managers, operations managers, etc., and is used as a committee to determine the direction of the entire open source project. The core developer community can come from within the company or from heavily used teams contributing to the core code and architecture of the entire open source project.

In addition to core workers, there are also loyal followers, such as post-investment companies, loyal fans of the project, who will contribute their energy to the whole project, contribute their adaptation to the business scenario, or contribute to the real needs of customers, which can be realized on open source projects.

An open source project should have an open and inclusive attitude to welcome all kinds of developers and even competitors to cooperate with each other. Competitors can also find cooperation points in some aspects and scenarios. Everyone is starting from an open source point of view, for the sake of the ecosystem and the community, and from that point of view it’s easier to negotiate with customers or individual developers or competitors.

The key to the success of the project is to choose the right core partners, to get the right partners in the middle, and in general, to get more developers to contribute to the open source community towards a goal.

4. Other constraints to the development and success of the project

In view of this methodology, it is necessary to have a governance model suitable for this methodology, to realize the methodology and open source ideals, and to have abundant market means to publicize open source projects and commercial products. At the same time, this is a great booster for the entire open source community, or products and projects, to help them grow better and faster.

Iv. Open source governance model

1. Open source project lifecycle governance

TKEStack open source project has life cycle governance, from the very beginning of strategic planning, to product requirements, development iterations, release cycle, operation and maintenance management, operation management, all stages have their own processes and methods.

For strategic plans and requirements, the strategic plan will have regular TOC technical committee meetings, which are semi-annual or annual plans or semi-annual plans to set annual milestones for the entire TKEStack.

Milestones and road maps are posted on the code hosting site after they are developed. Later on, the roadmap will be detailed and implemented into the daily work, with weekly meetings to track milestones and roadmap.

Specific implementation stage, one is internal user feedback, the other is external user feedback. Internal customers will have a TAPD requirements management system to track the overall status, while external users will use Github Issue tracking.

2. Requirements process

Due to open source, there is a lot of feedback from external users about their needs and problems. If the previous process was followed, following two separate requirements development tracking lines would cause a lot of conflict and confusion, and the product manager would be very tired of running between the two lines, coordinating internal and external requirements.

We have improved and automated this problem. For example, the external requirements and problem sheets, automatic tasks are synchronized to the internal system every day, so that the internal core developers can see the external customer requirements, and immediately review them, and give their development opinions.

For large demands, we will also submit proposals or outline designs for the whole community to see.

3. Development process

The development process, including development, testing and final submission, can be tracked in the system as a whole, and can also be timely synchronized to Github open source website, so that internal and external users and teams can know the status of the entire requirement in the first time.

The development process follows the Github standard development process, including branch creation, PR application creation, discussion, and finally deployment and testing, etc. For details, please refer to Github website.

4. Test process

The development process on the other hand is the testing process, now TKEStack can achieve three levels of testing, each test stage and the goal is not the same.

UT ensures that the entire code is compilable and syntax is free of major errors every time the code is compiled. SmokeTest is run during each PR submission, and there are about a dozen test cases to ensure that the submitted code does not affect the core functions of TKEStack. Finally, there are release tests at the end of each release to ensure the reliability of the final release. This is the level 3 testing process.

5. Release and maintenance

Publishing also has its own set of release and branch management processes. The red line in the figure below is the main branch, which is the branch that has been tracked for a long time. All the fixes, feat, docs, and so on are provided to the main branch.

The main branch will release the version according to the plan, for example, the release version is V1.4.0, that is, find the appropriate release version on the main branch according to the plan, pull the release-1.4 branch from the above, and put the v1.4.0 label on the branch when releasing. This v1.4.0 branch is the branch we will maintain for a long time. Any subsequent bug changes will also be released synchronously on this branch, and smaller versions of V1.4.1 will also be released periodically on this branch. These overall release maintenance plans are visible.

6. Community governance

The final part of the governance model is the governance of the community, which engages various partners and teams in the governance of the entire open source community, corresponding to the methodology mentioned earlier.

The core developers within the team and the contributors within the company are responsible for the core functions of TKEStack, and the infrastructure work is assigned to ensure the development direction and core capabilities of the product.

External partners more advantage is that they will be able to combine environmental adaptation in the face of real users do the requirements of quality, it is the product function is very important, there are also more involved in commercial building, as well as the specific functions, such as logs, monitoring and so on for other demand in the user environment optimization.

A large number of developers in the community can also have more optimization experience, some of the industry’s better practices or methods can participate in the community.

For each development team or individual developers in the community, Tencent has also made a lot of efforts, such as organizing online and offline activities to promote and introduce relevant knowledge. For example, TKEStack has recently launched an open source reward task, which will regularly release some development tasks for everyone to participate in. Tencent will also give back to the community with a variety of rewards to improve the enthusiasm of people to participate in community activities.

V. Case introduction

1. Internal upper cloud

Tencent is now engaged in the cloud of full business as a whole, which is a process. Some businesses rely on the surrounding system business has not been cloud, so it is difficult to directly switch to the common cloud deployment in a short time.

On the other hand, the business cloud needs to have an intermediate process, and the machine in the cloud room is different from the machine in the IDC room. When all the business is switched to the cloud, the IDC machine is idle.

To address these two issues, TKEStack has deployed a suite internally to support the business cloud. The TKEStack interface is similar to the public cloud interface, and there is no significant difference in user experience between the two products. Therefore, for the time being, it is impossible to complete the cloud, but it must be transformed into a cloud. Now, TKEStack can complete the cloud with the help of TKEStack. When conditions are mature, TKEStack can be further upgraded to Tencent’s public cloud, and TKEStack has supported the internal business operation of millions of cores.

TKE Enterprise Edition

TKEStack Enterprise edition supports internal business as well as commercial products in addition to open source. The commercial enterprise edition will be more enhanced in many ways than the in-house source cloud.

TKEStack enterprise edition is a bit different from open source TKEStack, which encapsulates more and presents more content based on commercial appeal.

TKEStack enterprise edition of the core capabilities, such as cluster scheduling, creation, business management is the core of the underlying capabilities of TKEStack contribution, but in micro services and operations and operations related to the relatively strong user experience of the function is more enterprise version of their own construction. This is a combination of open source underlying capabilities and commercial-specific capabilities that will become TKEStack Enterprise edition without affecting each other.

Vi. Summary and outlook

Open source is like starting a startup. It’s about being in the right place at the right time.

The right time is the right time to do the right thing. For example, at present, everyone is doing cloud native, so the container platform can attract users. If we build an openstack platform, the relative attraction is not so strong.

Right place, after a good direction also have to be able to do. Tencent has accumulated more than ten years of experience in this field and has strong technical capability. It has not only internal business experience, Tencent public cloud operation experience, but also large-scale training support ability. The combination of these can be said to lay a technical foundation for an open source container platform.

Renhe, at present, it is difficult to have such an opportunity in China. There is a group of people who are completely open source and pay attention to the open source field for a long time without any business KPI, which is an ideal process. The actual process is to support the core business while being a good open source product.

In the comprehensive coordination between the two, it is necessary to have appropriate strategies to promote the continuous iteration of the product. After having the product, more people need to participate in the ecology, identify with the ecology, use, maintain and make contributions to the ecology. The more people involved in the product, the more viable it is and the more open source it is.

This is the right time, the right place, and the same is true for selling a product commercially. Good business opportunities, the ability to make products, and the right people to make popular products.

The cloud computing infrastructure of the future must be multidimensional and heterogeneous. Multidimensional is three-dimensional, hardware is heterogeneous, infrastructure is heterogeneous (public cloud and private cloud), business is also heterogeneous, not only to run some very simple stateless services but also to run stateful tasks, including online business, offline business and so on.

TKEStack’s vision is that we want to provide users with a one-stop universal infrastructure platform at the multidimensional and heterogeneous level.

TKEStack was released in Version 1.4 in October this year. The main core points include the application market, importing cluster support to install plug-ins, as well as the small version of the cluster upgrade function gradually introduced to TKEStack, while fixing a lot of user feedback related to the experience.

In December this year, TKEStack will be upgraded in several important points, such as the integration of big data components, to continue to synchronize Tencent cloud advantage products with TKEStack. We will integrate excellent big data suite within this year, TKEStack can deploy common big data suite with one click. High availability for a cluster is also a difficulty in the industry, and there are too many unknowns in the private cloud environment, which is very troublesome in this scenario.

Next year, TKEStack will continue to develop to the upper level, such as replication of applications, at least have the ability to use all the products of Tencent cloud public cloud.

Currently, IPV6 is the most popular upgrade for TKEStack platform, but this will have to wait for the official IPV6 support. Next year, tencent’s public and hybrid cloud also has the ability to more rich, such as access to all the DNS service can cluster, the cluster migration method can provide more tools, more simple no threshold for the entire K8S cluster migration, the migration of container, or do the ability of ascension can really solve the common problem of many users.

Q&A

Q: How does TKE implement service discovery?

**A: ** We currently do not have the ability to do additional service discovery, which is the K8S native service discovery mechanism, but the next version will soon integrate its own Service Mesh product and go cloud native.

Q: Where do the external demands come from? Are they all from customers?

**A: ** We are open source on Github. Some people in the community come up with requirements and our colleagues who have products come to see if it is suitable for us to do it. If it is, we will do it in A prioritized way.

Q: What does the big data component integrate?

* * A: ** there is no integration in the current version, but we will try to put it in this year, which will contain commonly used big data and AI components. Container deployment will be in this big package, because there are still issues related to intellectual property rights to be dealt with, we hope to integrate it into TKEStack within this year and deploy it with one click. Really let everyone big data and AI business can TKEStack run better.

Q: How is TKEStack’s operating system built?

* * A: The ** operating system has just been introduced. Again, there are mainly internal core developers and commercialization promotion of open source products to continue to do this. A few developers will certainly not be enough, and more people will be attracted to enter. They also have a lot of product capabilities in the containerization process that we will attract, and there are a number of commercial partners, and some of the commercialization partners do good capabilities that can feed back to TKEStack, or they give TKEStack some requirements, and he can implement it himself, or he can submit this part, Commercial partners are also continuing to contribute to TKEStack.

External users are also focusing on attracting more users to join us in the construction. For such a big project as container platform, we hope more people can get involved to make the whole product more perfect. Some activities are planned, such as open source reward related activities. The Open Atom Foundation, for example, wants other users to get involved in TKEStack.

Q: How does TKEStack design its own monitoring?

**A: ** Currently, Cloud monitoring Prometheus is the de facto standard. All cloud products and container products are monitored using Prometheus. We will install A Prometheus plug-in for each business cluster. At present, the internal business of the popular ThanOS solution in the industry has also been applied, and it will be integrated into TKEStack in the next step. In the future, TKEStack will have Prometheus by default. If it is highly available, we will provide an optional ThanOS solution.

Q: How many versions of TKE are there?

**A: ** mainly includes TKE, A container service of public cloud, and TKE in Tencent’s proprietary cloud. In addition, TKEStack is an independently deployed open source version. And based on the open source version of the derived commercial version TKE Enterprise edition, TCNP. Private cloud TKE uses the technology and product form of public cloud TKE and relies on IAAS resources in public or private clouds, such as cloud hosts and VPCS. Independently deployed, based on bare-metal and open source solutions, independent of other resources.

Q: Does TKEStack have to use Kuberentes? What’s the relationship with Kuberentes?

* * A: ** Yes, because it is based on Kuberentes to do the upper layer encapsulation, Kuberentes if directly used is the command line, can do a variety of scheduling, but how the user login, console how to manage applications, monitoring where to look and so on, TKEStack is based on Kuberentes to do the upper layer encapsulation, Making Kuberentes really a usable business platform is the relationship between Kuberentes and TKEStack.

Q: When will TKE’s configMap automatically discover changes and redeploy services without humans?

**A: **TKEStack supports the native K8S ConfigMap function. Currently, there is only one interface that has not been done for the time being. Soon, the current version under development has done the UI, and the next version will use the same UI Settings as the public cloud.

Q: What is the difference between SmokeTest and ReleaseTest test cases?

* * A: * * grade for each of the PR submitted, will mainly focus on testing the code submitted does not affect the basic core function of the whole TKEStack, test cases also have time requirements, as far as possible in one hour or half an hour to complete, it is a simple test set, the core test set, and before every release is the release of automated tests, The purpose is to evaluate the release and guarantee the quality of the release, so the test set is to guarantee the quality of the test in all aspects.

Q: How is it different from Racher?

**A: **Racher is A very well done product with technical experience, A very mature product. The focus of TKEStack is to make the basic experience available, and not to pursue too many details, because it is difficult to use the same level in a short period of time. More efforts are focused on heterogeneous cluster, hybrid cloud and plug-in provision, but the technical experience also needs to continue to ensure, you are welcome to ask questions on the use.

Q: Is TKE service discovery implemented for cross-cluster VPC access?

A: * * * * now support multiple cluster service found that between TKEStack itself has no ability to do we are also planning A hybrid cloud related products, early next year, A hybrid cloud is A great project, not only the service discovery problems to solve, also have the problem of network, cluster versions and so on to solve different problems, plug-ins, probably in the first half of next year to complete.