The background,
Construction companies develop private clouds to provide r&d departments with secure, reliable, and efficient basic resources, data storage services, DevOps pipeline, and operation and maintenance automation services.
Ii. Challenges
- Operation and maintenance: efficient management of infrastructure resources, monitoring and alarm is perfect and easy to use;
- Product: Flexible capacity expansion, resources available at any time;
- R&d overall: improve resource utilization, improve product delivery efficiency, and reduce costs.
Iii. Function Introduction
1. Core Strengths
- Task scheduling: Schedules tasks in the cluster system and automatically allocates services to compute nodes with limited resources based on resource requirements.
- Resource isolation: Provide the ability to isolate management control and service nodes for products to ensure that r&d applications and management services do not affect each other;
- High availability: Automatically monitors the running of services and automatically restarts the services that fail according to the running status.
- Network connectivity: Provides unified IP address assignment and network connectivity capabilities.
- Unified layout management ability: unify the layout management of output products with GITLAB and K8S;
- The public product components provide unified deployment, verification, authorization, scheduling, and control capabilities for teams, and provide basic support for private cloud services.
2. Core Facility Platform (IaaS Cloud)
- Provides virtualization of core computing, network, and storage resources.
- Support for different operating systems, including mainstream WIN and Linux;
- It mainly provides three kinds of services: cloud host, cloud network and cloud hard disk.
- Provides a visual Web UI.
- Provide K8S cluster (container cloud) planning, deployment, and operation;
- Support a variety of computing, storage and network solutions;
- Integrate Ansible operation and maintenance automation tools;
- Supports online and offline deployment.
- Provides a visual Web UI.
3. Basic Service Platform (PaaS Cloud)
- Provide data storage, application services, DevOps, and o&M management services.
- Data storage: Supports common NFS, Ceph RBD, and Local Volume.
- Application services: self-healing and auto-scaling, scheduling and publishing, load balancing, message queuing, container engine, call chain monitoring, Metrics monitoring, log aggregation, service security, API management, resilience and fault tolerance, configuration management, etc.
- DevOps: Agile R&D management, continuous delivery pipeline, code scanning, code warehouse, product warehouse, container warehouse, pressure engine, etc.
- Operation and maintenance management: log monitoring, resource monitoring, and message alarm management.
Four, technology landing
1. Overall architecture
2. Technology selection
2.1. IaaS Cloud Technology
technology | instructions | website |
---|---|---|
OpenStack Nova | Cloud hosting | www.openstack.org/software/re… |
OpenStack keystone | Certification center | www.openstack.org/software/re… |
OpenStack Glance | Disk image | www.openstack.org/software/re… |
OpenStack Neutron | Cloud network | www.openstack.org/software/re… |
OpenStack Cinder | Cloud drive | www.openstack.org/software/re… |
OpenStack Horizon | The visualization UI | www.openstack.org/software/re… |
KubeOperator | Plan, deploy, and operate a K8S cluster | Github.com/KubeOperato… |
NFS | Network storage | nfs.sourceforge.net |
Ceph | Distributed storage | ceph.com |
2.2. PaaS Cloud Technology
technology | instructions | website |
---|---|---|
kubernetes(k8s) | Container cloud | kubernetes.io |
helm | K8s package manager | helm.sh |
Docker | Application container engine | www.docker.com |
Weave Scope | K8s visual monitoring tool | www.weave.works |
Spring Cloud | Microservices framework | Spring. IO/projects/sp… |
Spring Cloud Alibaba | Microservices framework | Github.com/alibaba/spr… |
Spring Boot | Container + MVC framework | Spring. IO/projects/sp… |
Knife4j | Document production tool | Github.com/xiaoymin/sw… |
Elasticsearch | Search engine | Github.com/elastic/ela… |
RabbitMq | The message queue | www.rabbitmq.com |
Redis | Distributed cache | redis.io |
MongoDb | No database | www.mongodb.com |
LogStash | Collecting Application Logs | Github.com/logstash/lo… |
Jenkins | DevOps scheduling tool | Github.com/jenkinsci/j… |
Promethues | Resource monitoring system | prometheus.io |
Grafana | Monitor visual Kanban | grafana.com |
harbor | Docker image repository | Github.com/goharbor/ha… |
SkyWalking | Integrated solutions for distributed tracking, service grid telemetry analysis, measurement aggregation and visualization | skywalking.apache.org |
Kibana | Log visual Kanban | www.elastic.co/cn/download… |
Fluentd | Container Log Collection | Github.com/kubernetes/… |
Gitlab | Code warehouse | about.gitlab.com |
Nexus3 OSS | Products warehouse | www.sonatype.com |
SonarQube | Static code scanning | www.sonarqube.org |
YouTrack | Agile R&D management | www.jetbrains.com/youtrack |
Jmeter | Pressure test engine | jmeter.apache.org |
Kuboard | Microservice management tools | Github.com/eip-work/ku… |
The process of practice
3.1 OpenStack and CloudStack
As two major open source cloud platforms, OpenStack and CloudStack have their own advantages. CloudStack is an open source product from Cloud.com. In terms of productionization, CloudStack is a mature product, easy to install and deploy, and provides a complete upgrade process for future synchronization with the community. However, as the community version is updated and compatible with various products, CloudStack is getting bigger. At present, companies build private cloud landing solutions, many functions are useless and redundant. At present, OpenStack has become the de facto standard for people to choose the landing framework of cloud computing technology. Its advantage lies in its plug-in framework, because the technical framework allows free selection of available plug-ins. In private cloud landing solution, only the required components can be selected for installation. Because the framework allows for different components to be plugged in, the OpenStack community is getting more vendor support and community activity. When the enterprise implements the landing scheme, it can have more choices, and it also has more and faster response to the problems encountered. The OpenStack framework is a better choice if the company needs to develop the required components in the future and needs to tune the cloud platform, such as VM IO, CPU binding, etc., rather than relying solely on the open source community version. If CloudStack is being developed secondary and the code is not merged into the community version, the upgrade needs to merge the code again, which is a lot of work. OpenStack can be made into plug-ins to keep the plug-ins available when the OpenStack version is upgraded. However, due to the high production complexity of OpenStack, the construction, future upgrade and subsequent secondary development require a lot of development and testing manpower input. For non-Internet companies, we do not have a mature operation and maintenance team and RESEARCH and development team, and development and testing are very tight in terms of human resource costs, so for public cloud needs, we can directly buy the cloud services of big factories. In the final choice of open source solution, I prefer OpenStack.
3.2. KVM and VMWare
Native OpenStack provides improved support for KVM. KVM is also a mature virtualization platform. It was written into the Linux kernel in 2006, and after Redhat 6, KVM support shifted to Xen virtualization solutions. VMware is commercial software. In the virtualization platform, it should be the optimal solution for IO and stability. In OpenStack, VMware provides its own driver and provides mature support for VMware. The reason for abandoning VMware was that it was expensive to license. Currently, KVM is used as the main option.
3.3 CentOS and Ubuntu
The OpenStack community supports Ubuntu. Ubuntu updates quickly, and the kernel version is relatively new. Ubuntu supports higher versions of KVM, which provides better performance for OpenStack users. In terms of system stability, CentOS comes from the recompilation of Redhat commercial version, stability, system optimization and compatibility, and CentOS has a relatively complete testing and hair process. After CentOS 7, also changed to Linux 3.x kernel version. Considering the choice of system reliability and the technical accumulation of the previous company, I still choose CentOS series, which is more convenient than Ubuntu management.
3.4. OpenStack & K8s
OpenStack is mainly oriented to resource allocation. Once VMS are created, there is no responsibility for them. As for functions such as high availability of services, automatic scaling, and monitoring, applications are handled by the application side. However, K8S container cloud mainly focuses on service, emphasizing service capability, flexibility and high availability, rather than simply providing IT resources. In turn, application services should be retooled using cloud-native concepts to better take advantage of the platform capabilities k8S provides. As far as our company is concerned, we still need OpenStack, because many old systems do not have container, so we have to compromise in the real environment.
However, whether it is OpenStack or K8S technology threshold is high, the most painful thing is to see its architecture.
Don’t believe it? Okay, let me show you an OpenStack architecture diagram:
Let me just use a simple picture to make it a little bit clearer. As follows:
The architecture diagram of K8S is as follows:
It looks a little better than the OpenStack architecture diagram, but it’s technically complex.
3.5. Difficulties in embracing open source
Cloud computing has developed to today, whether in technology, service level, or in the business level has been relatively mature. At present, the strategy of the vast majority of startups in the infrastructure must be public cloud, and there is very little self-built or hosted IDC, so there is no such entanglements as whether to go to the cloud. However, for a company of our size, we must consider more comprehensively: the change of infrastructure, the smooth transition of business, the transformation of operation and maintenance mode, the adjustment of cost control, and many details.
Although there are many factors that need to be compared and considered, they are mainly explained from the aspects of personnel ability, technology control, future development, etc. The human factor is put in the first place. Generally, enterprises are business development, application and infrastructure operation and maintenance personnel (such as system and network management, etc.). If open source solutions are adopted, there is no problem if they are only installed and used in the development and test environment, but when they are ready to be deployed in production, In addition, in order to further improve the capabilities of open source platform (such as analysis of production problems, internal system integration, front-end customization, optimization/upgrade following the community, etc.), we will find it difficult to do anything when these all require a strong team of people who understand both development and infrastructure.
The lower the level of technology, the higher and more complex the technical threshold, and the more cannot do without the investment of high-end talents. For example, hardware resource virtualization requires professionals who understand kernel, network, OpenStack, K8S, distributed storage such as Ceph, etc. But there are not many people who are really good at it, and in order to solve the whole solution, you need a whole team, which makes it even more difficult and even more difficult to build a team. Talent shortage means high labor cost, which is the hidden cost of technology investment. And because the barriers to entry are so high, there is a higher risk that the existing technology platform will not be able to control the flow of people. This tends to be the biggest hidden administrative cost.
Of course, in terms of talent acquisition, we can increase the cost of human resources and recruit the best people. But as a business focused and business development company like ours, we are more expected to achieve business innovation and development, rather than achieve extraordinary achievements in technology (which is inconsistent with the company’s development aspirations). So that basically means that we’re not going to spend unlimited amounts of money, or very large amounts of money, on these basic technologies.
Therefore, from the perspective of future development, private cloud (stock + test & development) and public cloud (production + incremental) solutions are suitable for our direction. Open source solutions and commercial solutions will exist for a long time, which is determined by the strategic positioning of each company’s technology. For most enterprises, the technical strategy based on the implementation of functional services front-end business should be more inclined to choose business solutions.
3.6. Challenges brought by cloud native Architecture
Roughly speaking, we now have three major eras in the way applications are deployed:
- Era of traditional deployment: Early enterprises deployed applications directly on physical machines. Because you cannot define resource usage boundaries for applications on a physical machine, it is difficult to allocate computing resources properly. For example, if multiple applications run on the same physical machine, one application consumes most computing resources, causing other applications to fail to run properly. One solution to this problem is to run each application on a different physical machine. However, this approach cannot be implemented on a large scale because resource utilization is low and it is expensive for enterprises to maintain more physical machines.
- Virtualization deployment era: To address the above problems, virtualization technology arises at the historic moment. Users can run multiple VMS on the CPU of a single physical machine. The virtualization technology separates applications from each other by the VMS, which limits illegal access between applications and provides a certain degree of security. Virtualization improves resource utilization on physical machines, makes it easier to install or update applications, and reduces hardware costs, thus enabling better scale-up. Each VIRTUAL machine can be regarded as a complete machine on top of the physical machine being virtualized. All the components of a machine, including the operating system of the virtual machine, run on it.
- Container-based deployment: Containers are similar to VMS, but the isolation level is lowered and the operating system is shared. Therefore, containers can be considered lightweight. Like a virtual machine, each container has its own file system, CPU, memory, process space, and so on. The resources needed to run the application are wrapped in containers and decoupled from the underlying infrastructure. Containerized applications can be deployed across cloud providers and across Linux operating system distributions. The container technology presents an elegant abstract scenario that balances the flexibility and openness of development with the standardization and automation concerns of operations and maintenance. Container mirroring quickly became the industry standard for application distribution.
As we said earlier, our company is currently transitioning from the “virtualization deployment era” to the “container deployment era,” which brings with it higher requirements:
- Agile creation and deployment of applications: creating containers is easier and faster than creating virtual machines;
- Highly automated software delivery that can be built faster and more frequently, deployed, and easily rolled back;
- Separate development and operations concerns: You can deploy to multiple infrastructures during the development and construction phases. Focus your deployment phase concerns on how the infrastructure is provided and how it is used. The coupling degree of development and operation is reduced.
- Monitorability: You can view not only the resource monitoring information at the operating system level, but also the application health status and k8S cluster monitoring information.
- Environment consistency in different stages of development, testing, and production: the containers in the development environment are consistent with those running in the test and production environment;
- Portability across cloud providers and operating systems: The container can run on different operating system distributions such as Ubuntu, RHEL, and CentOS, as well as on private clouds and cloud vendors such as Aliyun.
- Application-centric management: In the virtual machine era, it’s about running an operating system on virtual hardware. In the container era, it’s about running an application on the logical resources of the operating system.
- Loosely coupled, distributed, elastic, and unconstrained microservices: to meet smaller, independent microservices that can be dynamically deployed and managed;
- Resource isolation: Supports multi-tenancy to ensure application performance is not disturbed.
- Resource utilization: Resources must be used efficiently and in high density.
In view of the above higher demands, we need to focus on the construction of private cloud and one by one.
3.7 private Cloud Construction Route
Private cloud construction rarely takes place in one step. The initial phase is usually to meet the most basic requirements, such as computing virtualization, storage virtualization, then network virtualization, then container, monitoring, big data, orchestration, database, and other public applications. In fact, this has a lot to do with the layering of cloud computing. From IaaS to PaaS, and then to SaaS, one step at a time. Research and development cloud construction, often follow this law. In practice, there may be some overlap. There are usually several installments. The first phase is a pilot project, and the second phase will form a standard platform for new development and application based on the results of the first phase. Phase iii will gradually move older applications to the private cloud.
For private cloud implementation:
- Generally, private cloud infrastructure is only used by the company’s R&D team and supports single tenant.
- Generally, four networks are used: management network, service network (VXLAN), out-of-band network, and storage network.
- Using different subnets
3.8 Private Cloud Capacity Evaluation
The capacity of a private cloud needs to be evaluated according to the company’s business. In general, there are web areas, apps, and dB. The required resources are calculated based on the number of services running on the private cloud. For example, if the Web area needs nginx, HA and LB need to be considered, then two or more are needed. If multiple businesses share NGINX, then multiple NGINX clusters are needed to share the stress.
The OpenStack Celimeter module is dedicated to congestion of CPU, memory, and network usage. It can calculate how much resource is saved by using cloud resources compared with traditional environment. As for the balance between cost and efficiency, it is usually difficult to achieve in the early stages of private cloud construction. The purchase of equipment, the deployment, the allocation of personnel and so on are a large amount of expenditure; The real advantage of the private cloud is in the later use. There is certainly no universal standard for assessing capacity. Capacity is usually considered in three aspects: computing power, storage capacity, and network bandwidth. These three aspects can be said to be the most basic three core elements of the data center, we all know that cloud computing is to win by scale, how to decide the scale? Whether it’s 20 nodes, 50 nodes, or 100+ nodes depends on your business needs. In the small case, for high availability, if the available capacity estimate is N, then you calculate 2N. Private clouds don’t usually reach the scale of public clouds. So it is usually impossible to reach a lot of big mouth virtual machine casually hang up, really hang up your host machine resources, how to do? As for the cost, it depends on your plan. We are using open source self-construction. If it is to cooperate with manufacturers, how to divide the work under cooperation and what to outsource should be carefully considered. Usually, if the human resources are limited, the technical strength is limited, it is contracted to the manufacturer to implement it, otherwise it is also a mess in the back. However, the shortage of all external manufacturers is also obvious. If you are not careful, you will be locked by others. Every year, you can not run away with the money, especially after the project starts, you can not stop.
3.9 Full stack monitoring
First of all, we need to understand the necessity of monitoring. If there is no monitoring data, there is no way to talk about the operation and maintenance side. Without data, how to measure an indicator? Therefore, it is necessary to build a suitable monitoring system as far as possible in the construction and development of private cloud. In terms of monitoring classification, the following are common:
category | The main content | Usage scenarios |
---|---|---|
Metrics monitoring | Counters, time series | Monitoring alarm |
Log monitoring | Discrete events, structured/unstructured | Debugging found |
Call chain monitoring | Call chain | Debugging found |
Health check | Agent live | Monitoring alarm |
The alarm system | Monitoring alarm | Monitoring alarm |
Metrics monitoring mainly includes core business Metrics monitoring and resource monitoring. Core business Metrics monitoring is mainly buried, which is currently a blank area for us. As for resource monitoring, our integrated cloud-native open source monitoring system Prometheus can cover most of the requirements. Log monitoring includes the aggregation of application logs and platform logs. The current technology is relatively mature, and it is basically integrated with mature ELK & EFk open source solutions. For health checks, the K8S container cloud provides survival and readiness probes, and the microservices framework also provides an application monitoring center that can be easily monitored. The alarm system is a prerequisite for each monitoring system. Prometheus supports various alarm rules and methods, such as email, email, and wechat. In particular, call chain monitoring is a necessary component in current microservice systems. Distributed tracking, service grid telemetry analysis, measurement aggregation and visualization are useful. In this case, We chose Skywalking.
Compare the item | CAT | Zipkin | Pinpoint | Skywalking |
---|---|---|---|---|
Call chain visualization | There are | There are | There are | There are |
The report | A very rich | less | In the | In the |
ServerMap | Simple dependency graph | simple | good | good |
Buried point | intrusion | intrusion | No intrusion bytecode enhancement | No intrusion bytecode enhancement |
Performance loss | high | high | high | low |
Hearbeat support | There are | There is no | There are | There are |
Metric support | There are | There is no | There is no | There are |
Java/.Net support | There are | There are | Only the Java | There are |
Dashboard Chinese Support | good | There is no | There is no | good |
Community support | Good, the document is rich, the author reviews in Ctrip | Good, the document is general, no Chinese community yet | General, lack of documentation, no Chinese community | Ok, well documented, active Chinese community |
Domestic cases | Ctrip, Review, Lufax | Jd.com and Ali do not open source | no | A lot of |
Source ancestors | eBayCAL~Centralized Application Logging | Google Dapper | Google Dapper | Google Dapper |
To sum up, we can briefly summarize the design of the full stack monitoring section by looking at the following figure:
4.0. K8s cluster scale
In practice, one question that is often asked is should our company choose one or more K8S clusters?
- A unified platform that supports multiple application loads, environments, and multi-tenant isolation;
- Or, a set of small application-centric clusters that support lifecycle management for different applications and environments.
We can compare the two options:
Single large cluster | Multiple applications are in the central cluster | note | |
---|---|---|---|
The characteristics of | Supports multiple application loads, environments, and multi-tenant isolation | Supports lifecycle management for different applications and environments | |
Hard multiple rent (safe, strong resource isolation) | complex | simple | |
Hybrid scheduling multiple types of resources (Gpus, etc.) | complex | simple | |
Cluster management complexity | The lower | Higher (self-built) | |
Node Management Overhead | higher | The lower | |
Flexibility of the cluster lifecycle | difficult | Simple (different clusters can have different versions and scaling policies) | |
Complexity introduced by scale | complex | simple | |
Node Management Overhead | The lower | higher | |
Operating system overhead | The lower | higher | |
Node scheduling complexity (such as NUMA) | higher | The lower | After the deployment density increases, more reasonable resource scheduling is required to ensure application SLAs |
Node stability | The lower | higher | As the deployment density increases, the stability of the node itself decreases |
Nodal failure blast radius | larger | smaller | Failure of a large size instance can affect more application containers. More resources need to be reserved for downtime migration |
Master module pressure | smaller | larger | The number of Worker nodes is one of the factors affecting the capacity planning and stability of Master nodes. The NodeLease feature introduced in K8S 1.13 makes the number of nodes much less stressful for the Master component |
By default, Kubelet uses CFS quotas to enforce THE POD’s CPU constraints. When there are many CPU-intensive applications running on a node, the workload may migrate to different CPU cores, and the workload may be affected by CPU cache affinity and scheduling delays. When a large-specification instance type is used, the number of cpus on the node is large, and the performance of existing applications such as Java and Golang is significantly reduced when multiple cpus are shared. For large-size instances, you need to configure CPU management policies and allocate resources using CPU sets.
Another important consideration is NUMA support. On a numA-enabled physical machine instance or a large size instance, memory access throughput can be reduced by 30% compared to the optimized method if not handled properly. Topology manager can open NUMA kubernetes perception. IO/docs/tasks /… . However, the k8S support for NUMA is relatively simple, and it cannot fully utilize NUMA’s performance.
A typical scenario for the latter is:
- Development and test environments use different clusters;
- Different departments use different clusters for isolation.
- Different applications use different clusters.
- Different versions of the K8S cluster.
The main reason for using multiple small clusters is that the explosion radius is relatively small, which can effectively improve the availability of the system. In addition, resource isolation can be implemented through clusters. The increased complexity of management, operation and maintenance is a disadvantage of using multiple small clusters.
It is worth mentioning that Kubernetes, derived from the concept of Google Borg, has the vision to become a Data Center Operating System. Besides, Kubernetes also provides RBAC, Namespace and other management capabilities. Multiple users can share a cluster and realize resource limitation. But these are more “soft multi-tenancy” capabilities and cannot achieve strong isolation between different tenants. As multi-tenant best practices, we can suggest the following:
- Data plane: Can improve container isolation with PSP (PodSecurityPolicy); Use Network policies to improve Network isolation between applications. You can bind nodes to a namespace to improve resource isolation between namespaces.
- Control plane: The control plane of Kubernetes includes master component API Server, Scheduler, ETCD, system addon such as CoreDNS, Ingress Controller, and user extension. For example, the third-party Customer Resource Definition (CRD) controller. Most of these components do not have good security, resource, and fault isolation capabilities between tenants. An incorrect CRD Contoller implementation may cause a cluster API Server to be suspended.
For now, Kubernetes’ support for hard isolation has many limitations, and the community is actively exploring several directions.
Another solution to consider is the scalability of Kubernetes itself. As we know, the size of a Kubernetes cluster is limited by multiple dimensions while ensuring stability. Generally speaking, a Kubernetes cluster is less than 5000 nodes. Cloud vendor Kubernetes has a lot of experience at scale, but for most companies, the operational and customization complexities of a very large cluster cannot be solved.
It is also worth mentioning that the Istio service grid can be used to easily implement unified routing management for applications in multiple K8S clusters.
4.1 Computing Resources in the K8S cluster
The following table can be used to consider the computing resources of the K8S cluster: Physical machine | | | | cloud host prepare the virtual machine | | — – | — – | — – | — – | — – | | | I/O loss low | | | high slightly high low in the cost of | | | | | | extra resources cost high low | in the | | | | low high flexibility | | | |
- Traditional virtualization technology has a large I/O loss; For I/ O-intensive applications, physical machines provide better performance than traditional VMS.
- Less resources (such as virtualization management and VM OS) are deployed on a physical machine. Higher deployment density can be achieved, resulting in lower infrastructure costs;
- On a physical machine, you can flexibly select network, storage devices, and software application ecology.
In general, it is recommended:
- For performance-sensitive applications, such as high-performance computing, physical machines are a better choice.
- Cloud hosts support live migration, which reduces o&M costs.
- In our working practice, we divide the K8S cluster into static resource pools and elastic resource pools. In general, a fixed resource pool can select physical or cloud host instances as required. For an elastic resource pool, you are advised to use cloud host instances of appropriate specifications based on application loads to optimize costs, avoid waste, and improve elastic supply.
4.2 pressure engine
As for the pressure engine, we still prefer to choose Jmeter, the first open source pressure measuring tool.
The design of Jmeter container cloud is as follows:
Current problems with conventional Jmeter:
- When the number of concurrent requests exceeds the carrying capacity of a single node, configuration and maintenance in a multi-node environment are complicated.
- It is not possible to run multiple tests in parallel in the default configuration. You need to change the configuration to start additional processes.
- It is difficult to support the elastic scaling requirements of test resources in a cloud environment.
Changes to the Jmeter Container Cloud
- One-key installation of pressure test execution node;
- Multiple projects and tests can use the same test resource pool (multi-tenant) in parallel to improve resource utilization.
- The K8S HPA can automatically start and release the pressure test execution nodes based on the number of concurrent operations, and the capacity can be automatically expanded.
4.3, other
In fact, there are many technical details in this process that can be expanded, but due to the limited space of the article, I will not list them one by one. If there is an opportunity, I can open another detailed elaboration.
Five, the operation effect display
1. Basic Service Platform (IaaS Cloud)
1.1 login authentication
Here we customized the default home page into the ali Cloud home page style:
1.1 Efficient Management of physical Resources (computing, network, storage)
Visual UI way to create a cloud host with one click:
Support for simple editing of cloud networks:
Easy to mount the cloud hard drive:
1.2. Easy management of system image
Custom operating systems can be easily uploaded:
1.3 Web Desktop
Desktop access using Web VNC:
1.4 Cluster Management Web UI
Create k8S cluster for foolproof and support cluster extension:
2. Basic Service Platform (PaaS Cloud)
2.1 DevOps pipeline
Kubernetes container orchestration and management capabilities integrate DevOps toolchains, microservices, and application frameworks to help r&d teams achieve agile application delivery and automated operations management:
2.2. Load balancing
Integrated Ingress Specifies the rule set for the inbound connection to the cluster service. It provides layer 7 load balancing capabilities and provides externally accessible URLS, load balancing, SSL, and name-based virtual hosts. As the cluster traffic access layer, it provides high reliability:
2.3. Code Repository
Git based project management platform. Provide the interface of web version and client version, provide git storage for user space, save some data documents or code of users and other data. An open source distributed version control system for handling version iterations in projects:
2.4. Agile R&D management
Implement defect and problem tracking; Provide efficient ways to plan, visualize and manage R&D while supporting Scrum and Kanban processes; Support for multiple shareable dashboards; Support task time management:
2.5. Call chain monitoring
Provide microservice distributed tracking, service grid telemetry analysis, measurement aggregation and visualization integration solutions:
2.6 Service registration
Provides a set of easy-to-use features to quickly implement dynamic service discovery and service monitoring checks.
2.7 Service Monitoring
SpringBoot service applications use the Actuator to expose metrics during application running. The SpringBoot Admin monitors SpringBoot applications using these metrics and displays them on the GUI:
2.8 log aggregation
ELK, or Elasticsearch, Logstash, and Kibana, combine to build a log aggregation system:
2.9 API Documentation
Micro-services aggregate Swagger documentation with support for online interface debugging:
3.0. Configuration management
Dynamically configuring services enables centralized, external, and dynamic management of application configurations and service configurations for all environments:
3.1. Product Warehouse
Provide maven private server and binary repository:
3.2. Container warehouse
Enterprise-class repositories for storing, managing, and distributing Docker images:
3.3. Static code scanning
Code quality analysis platform, easy to manage the quality of code, can check out the project code vulnerabilities and potential logic problems. At the same time, it provides rich plug-ins to support detection in multiple languages:
3.4. Visual microservice management
Visual UI management of applications and components lowers the threshold of cloud use of K8S container:
Web Online editing container application:
Easy to view application logs:
Edit configuration files online:
3.5 pressure engine
Collect pressure test results through visual Kanban:
3.6. Resource Monitoring
K8s Cluster resource Monitoring:
Server resource Monitoring:
Relational database resource monitoring:
Message queue resource monitoring:
NoSQL database resource monitoring:
K8s Cluster database resource monitoring:
K8s Cluster Core component resource Monitoring:
3.7 Visual monitoring of services
Application service topology display:
3.8. Customize Alarms
Nail real-time alarms:
Vi. Experience and experience
- Based on open source, embrace open source;
- Try to use mature third-party platforms;
- Do not duplicate the wheel;
- The technology stack and components should be suitable for the r&d team’s technical capabilities and mainstream technology direction;
- KISS principle (Keep It Simple and Stupid);
- Try to consider ROI (input-output ratio);
- Evolve to meet the business development of the company, big company solutions are not necessarily appropriate, avoid technology for technology’s sake;
- Any scheme need not be the most comprehensive and perfect, in practice, continuous improvement and evolution;
- Demands-oriented, relying on open source, scientific selection, rapid integration, emphasis on expansion, evolution and improvement.
Vii. Future Outlook
- In the container age, k8S is not the only thing to look at. For the infrastructure within the enterprise, “up” and “down” integration and compatibility issues are also critical. “Upward” is used to provide interconnection for users in business scenarios. Containers do not directly serve businesses. It also involves application deployment, service governance, and scheduling. “Down” refers to the combination of containers and infrastructure. In this case, compatible resource types, stronger isolation, and higher resource utilization efficiency are all key issues.
- Cloud native application management: Currently, we have implemented cloud native application management projects in the production environment. In the future, we need to further expand the coverage of cloud native application and continuously improve the RESEARCH and development efficiency.
- Cloud native architecture implementation: promote the implementation of various middleware, storage system, big data and core business cooperation in various areas of cloud native system.
References:
- [1] : Ali Cloud native Architecture white paper
- [2] : Build a private cloud platform based on OpenStack
- [3] : How does Kubernetes change Meituan’s cloud infrastructure
- [4] : Questions about the soul of Kubernetes’ program