This article is available on Github. Welcome to Star.
Java server development is such a broad field that it would take more than a few books to cover it. This article will cover many techniques and tools, but will not go into detail, in order to explore the field from above.
directory
- directory
- The framework
- Spring Boot
- Vert.x
- network
- Five layer protocol
- The HTTP protocol
- TCP congestion control
- Network I/O model
- The database
- Relational database
- The storage engine
- NewSQL
- No database
- Sequential database
- Column database
- Embedded database
- The middleware
- Web Server
- Distributed cache
- KV storage
- The message queue
- Timing schedule
- RPC
- Database middleware
- Logging system
- Configuration center
- Micro service
- Service registration and discovery
- Fuse and downgrade
- Link tracing/APM
- API gateway
- Service grid
- Common Open Source Components
- The data access
- Tool components
- The cache
- Bytecode modification
- The HTTP client
- Responsive programming
- serialization
- Distributed transaction
- Event-driven framework
- The rule engine
- test
- Programming ideas
- The principle of
- conclusion
The framework
Spring Boot
Spring Framework has become the standard configuration in the Java server development field. Countless services are developed based on Spring Framework. It integrates the majority of components needed for server development.
Vert.x
The Spring framework has long been mainstream, but there are other good frameworks out there.
Vert.x is a set of tools for building responsive applications on top of the JVM in multiple languages. It is not only a set of tools, but also a framework that includes Netty written Web frameworks, GPRC, Redis clients, and more. It covers most of the components needed to develop web applications. Its most important core concept is that it uses an event-driven, non-blocking model, and therefore has a high degree of scalability. It uses a responsive programming model, a topic I’ll revisit later.
network
Five layer protocol
When learning computer networks, a compromise is generally adopted, that is, to neutrsume the advantages of OSI and TCP/IP. A five-layer protocol architecture is adopted, that is, the physical layer, the data link layer, the network layer, the transportation layer and the application layer. Each layer has its own terms, for example: Throughput, subnet mask, VIP, DNS, etc., which is also crucial in the communication process of daily work. To do server-side programming well, we must have a clear understanding of the basic concepts of networking. It is recommended to read Computer Networking: A Top-down approach.
Recommended Reading:
- Five layer protocol
- Computer network knowledge summary
The HTTP protocol
For server-side programming, the most important part of the network is HTTP protocol. From TCP, DNS, and finally to browser response, we must know how the whole process works. When CDN, reverse proxy, flow control and other services are added in the middle, it will be more complicated, but because of the hierarchical model of the network, This allows us to optimize the response performance of the server during this intermediate process.
If you look at HTTP, which is hosted on top of TCP, and you add TSL or SSL in between, you get HTTPS, how the header is parsed, how the response body is sent, you can easily develop a simple HTTP service. The HTTP protocol has also been improved, and is currently in version 2.0, with significant improvements in transport performance.
HTTP is transmitted in plain text, so it is vulnerable to manin – man attacks. The transmitted information can be intercepted at multiple levels, such as routers and proxies. Therefore, HTTP will eventually disappear from the stage of history, and HTTPS will inevitably become the mainstream. As a result, many websites use insecure SSL certificates. Therefore, many applications use custom encryption methods to enhance the security of information transmission.
TCP congestion control
TCP uses a variety of congestion control policies to prevent the link between sender and receiver from becoming congested. It has a number of specific implementation algorithms, the specific implementation details hidden in the kernel of the operating system, by using different algorithms, can achieve the best performance in different scenarios. For example, Google designed and published BBR (Bottleneck Bandwidth and Round-trip Propagation Time) congestion algorithm, which can make more efficient use of network environment, especially in long-distance network transmission can gain greater performance improvement. It has been ported to the Linux kernel version 4.9.
Since many of the algorithms associated with the network layer are hidden in the operating system kernel, the average computer user does not need to understand these concepts, but a server developer who has some understanding of them can find solutions to improve the throughput of the system from this layer.
Network I/O model
Common I/O models include BIO (blocking I/O), NIO (non-blocking I/O), I/O multiplexing, event-driven I/O, and AIO (asynchronous I/O). In the case of reading data, a traditional BIO calls the socket’s read method, which blocks until data is received. In the case of NIO, it returns data if there is data, and returns zero if there is data, and there is no blocking. AIO goes one step further, and not only does it wait for data to be ready, but it does not block. Even the transfer of data from the network adapter to memory is asynchronous.
The combination of NIO, AIO, and I/O reuse solves threading bottlenecks and handles massive connections. Nginx, for example, uses the AIO model and thus performs better than Apache HTTP Server. In The Java domain, Netty implemented an asynchronous event driven NIO framework based on Reactor model, which has been applied in many areas of the Internet, such as big data, communication industry, game industry, as well as redis client, Web framework and other open source components.
The database
Relational database
MySQL is the most popular open source database, PostgreSQL is the most advanced open source database, SQL Server is an enterprise database developed by Microsoft, and Oracle is widely used by large companies. MySQL has the highest market share when it comes to server-side development, but it’s also recommended to learn about PostgreSQL and so-called “enterprise databases,” where MySQL can sometimes seem simple and impractical.
In real work, database design is a balancing process. Sometimes to optimize query performance, we have to do some data redundancy, and in the case of a large amount of data, we must carefully select the storage type of each column to avoid redundancy.
When the amount of data is very large, most of the time we need to design the sub-table.
- ShardingSphere
- At present, the mainstream Java library and table middleware, support client architecture, proxy architecture, Sidecar architecture is still under development.
- Vitess
- Vitess is Youtube’s open source MySQL database cluster system, which adopts a centralized database agent architecture. This data cluster carries hundreds of millions of data volumes and access requests of Youtube.
The storage engine
The main use of MySQL is InnoDB storage engine, which uses B+ tree index structure. Percona XtraDB is an enhanced version of InnoDB storage engine. Percona is compatible with MySQL and boasts better performance, but also has a certain market share.
In addition to InnoDB and its derivative engine, RocksDB is also an option, which is an LSM storage engine. Unlike the traditional B+ tree based storage engine, the LSM based database is especially suitable for scenarios with more write and less read, since it was originally designed for persistent key data storage. Therefore, it has very high performance on KV storage. Unfortunately, MySQL cannot select RocksDB as the storage engine. Currently, the supported databases are MariaDB and Percona.
NewSQL
RocksDB is also heavily used as a storage engine in the emerging field of NewSQL, and TiDB, a popular NewSQL product, is used for data persistence.
No database
- MongoDB
- MongoDB is between relational database and non-relational database. It does not require a fixed schema for data storage and can be used to store super-large data sets.
Sequential database
Increasingly rich, along with the development of the Internet application scenarios, such as system running state, system indicators collected scene to produce a large amount of data, this kind of based on time series data, to write read less, large amount of data for the characteristics of the traditional database is not suitable for this kind of data storage, temporal database was born.
The main temporal databases are:
- influxdb
- Prometheus
- graphite
Column database
Traditional relational databases use row storage, while column storage is used in the field of big data. The main advantage of column storage is that it can be accessed on demand and has more advantages in parallel processing and data compression. Relational databases are good for OLTP, column databases are better for OLAP, and in order to make column databases better for OLTP, there are excellent open source products like Kudu and Druid that combine the advantages of column storage with special optimization for OLTP.
The main column databases are:
- HBase
- Cassandra
- kudu
- Druid
Embedded database
Traditional relational databases can support enterprise-level applications, but in many scenarios where we may only need a small application, an embedded database is a convenient choice. In addition, an embedded database is ideal for unit testing.
Popular embedded databases in Java are:
- h2base
- moby
The middleware
Web Server
- Nginx
- Nginx uses AIO’s model for high concurrency, with Apache monopolizing one thread per request.
- AIO model is suitable for IO intensive services, and multiple processes or threads are suitable for CPU intensive services. Because most Web services are IO intensive, Nginx gradually overtakes Apache in market share. Due to this feature, Nginx is also very suitable for reverse proxy, and load balancing through this mechanism is also a very mainstream solution.
- Tomcat, Jetty, WebLogic and other traditional Java Web servers
- With the popularity of containerization technology, such servers have become less popular and their market share has gradually decreased. When containerization is implemented, Tomcat is usually built into the application, which allows developers to focus more on the business code itself, rather than on the details of such servers.
- OpenResty
- Great open source products often produce great derivative products, such as Percona for MySQL, OpenResty for Nginx, and Kong for OpenResty.
- Nginx has a high market share, but is used as a reverse proxy in many scenarios. OpenResty is designed to make Web services run directly within Nginx services.
- OpenResty is also a Web platform based on LuaJIT. Developers can easily use Lua to call Nignx modules, with strong scalability. For example, you can replace the typical Nginx + Tomcat + MySQL architecture with Nginx + Lua + Redis + Tomcat + MySQL architecture.
- Kong is also technically a Web Server, but is generally used as an API gateway, more on this below.
Distributed cache
- Redis
- As a high-performance in-memory database, Redis has been widely used. It supports a variety of data structures, which can be used in different scenarios to make the most effective use of it.
KV storage
- Pika
- The performance of Redis is very high, but there is the problem of data persistence when it is used as a database. Pika is to solve this problem. Its underlying layer is based on RocksDB, and some of its source code has been modified, so it has very high performance in KV data persistence. There is only a small performance degradation compared to memory-based Redis, and it is compatible with most of the Redis protocols, almost the same as Redis, and easy to use.
- Tair
- Tair is similar to Pika in that it supports multiple storage engines, including MDB, RDB and LDB. LDB is based on LevelDB (Google open source, Rocksdb is optimized on the basis of it), which combines memory storage and persistence with a highly available distributed architecture. At present, the open source version is no longer maintained, and alibaba Cloud provides enterprise-level Tair storage service.
- SSDB
- SSDB is also a KV database compatible with Redis. At present, the update frequency is low. In contrast, Pika is still being updated and endorsed by enterprises.
The message queue
Request message queue in peak clipping, cross decoupling, publish, subscribe to the communication between system and many other scenarios are used to, not only can solve these problems, adopt the message-driven architecture can enhance the scalability of the system, such as a new subscribers, which can realize new features, and without any invasive in the current system.
Commonly used message queuing products such as Kafka and RabbitMQ have their own advantages and disadvantages. Kafka has an absolute advantage in the field of big data and the overall market share is high, while RabbitMQ is widely used due to its maturity.
In the process of using message queues, a number of details need to be dealt with, such as: Define the message handler, how to send messages, how how to publish events, messages, serialization design, how to record a message, message routing and message processing failure retry mechanism, message id, etc., in the process of specific coding can’t focus on business development code, so there are some ESB products internally to deal with these details, It also provides a cleaner API at a higher level of abstraction, allowing the development process to focus more on the business logic itself. When our systems are facing these problems, it is a good idea to choose an ESB product to improve development efficiency.
Timing schedule
Simple scheduled tasks can be configured using Linux CRon. In complex scenarios, the distributed task scheduling framework can also be used. There are many ways to implement scheduled tasks.
- Quartz
- The old task scheduling system is based on which many distributed task scheduling frameworks are extended.
- Spring Scheduler
- It is very convenient to use it for simple task scheduling, but it should be noted that since most systems today are deployed in a distributed manner, when using it for task scheduling, it is best to do it in a separate service and avoid coupling with other systems.
- Domestic distributed task scheduling system
- Currently, Elastic Job and XXL-Job are popular. The Elastic Job uses the decentralized architecture and relies on ZooKeeper to store task scheduling data. Xxl-job uses the centralized scheduling architecture and uses RPC scheduling.
- PowerJob is an emerging open source task scheduling system. It is more powerful in functions and supports MapReduce sharding. Therefore, it deserves attention.
RPC
When it comes to RPC, we have to mention the fading Web Service, which uses XML as the message format and is encapsulated by SOAP protocol. Due to its complexity and high performance overhead, it is gradually replaced by THE REST Service using JSON format. By contrast, REST is simple and uses a more efficient serialization method, so many systems today use HTTP for remote procedure calls.
In scenarios with high performance requirements or considering the overall architecture, people will choose specialized RPC products. Such systems generally have more efficient communication protocols and data transmission formats, such as DuBBo, GRPC, and Thrift. Among them, GRPC has the best performance.
The principle of RPC framework is similar to that of HTTP call, except that it uses a more concise protocol header and data serialization, and it also makes special encapsulation in service registration discovery and load balancing. In Spring Cloud, using OpenFeign for inter-service invocation is a very convenient option, which uses HTTP, and when performance is not satisfactory, consider replacing serialization or GRPC for communication.
Database middleware
The database itself is a huge product. In addition to the aforementioned middleware such as ShardingSphere and Vitess, there is also a kind of middleware specialized in data processing.
- otter
- Distributed database synchronization system, support MySQL, Oracle.
- canal
- Incremental data subscription and consumption based on MySQL database incremental log parsing.
- DataX-Web
- Distributed data synchronization tool that can be used to simplify ETL work.
- gh-ost
- When changing the structure of a data table, the table may be locked. If the amount of data is very large, this problem will have a great impact on online publishing. You can manually process the problem by creating a new table, migrating data and modifying the table name, which is error prone and time-consuming. Github’s open source MySQL Online Architecture Migration tool is a great choice for doing this sort of thing programmatically.
Logging system
- ELK
- Log systems generally use the ELK technology stack, which consists of three subsystems, so to extend a new function, you can do it in a variety of ways. For example, for monitoring alarms, you can write metrics to Prometheus using LogStash. You can also use Kibana’s Sentinl or ElastAlert plug-ins.
- Logstash supports collecting data from a number of pipes, including Kafka, and in particularly large log volumes, sending logs to Kafka first.
- Sentry
- Logs are used for error detection in a large number of scenarios, and in addition to ELK there are systems that focus on application error reporting, such as Sentry.
Configuration center
As more and more systems are deployed based on Docker, the configuration center can not only simplify the configuration management of the system, but also simplify the release process of the system. Apollo is a popular open source configuration center at present. In addition, you can use tools such as ZooKeeper and Consul to implement unified configuration management.
Nacos is an ali open source system that integrates configuration center and registry. It is also more convenient to use it to do the configuration center, and the server deployment is much simpler than Apollo.
Micro service
Because of the characteristic of the monomer applications inescapably, development in many large applications are split multiple subsystems, which is in the service concept put forward the way of before they can be widely used, and the service concept of went further, this paper proposes a new way of system development, the system can be conveniently split into smaller particle size, namely the micro service, As the number of services increases, issues such as service governance, circuit breaker downgrading, and link tracing are emerging, and the Spring Cloud framework is emerging to address these issues.
Service registration and discovery
The main service registration and discovery components are: Eureka, Consul, Nacos, etc., they use different CAP distributed consistency rules or a variety of support, but no matter which one is used, there is still the problem of service loss, such as in the rolling update process, the registry fails to remove the service in time, so the caller is still calling the stopped service. First, we can reduce the update cycle by adjusting the configuration, modifying the source code if necessary, using persistent connections, and removing the service from the registry whenever the connection is broken. We need to write an article on the details.
Where possible, it is a better choice to use a messaging mechanism for inter-service communication, in addition to better decoupling, to keep the system running continuously in the part of rolling updates.
Fuse and downgrade
Excessive calls between services increase the coupling degree of the system to a certain extent. When other microservices have problems or respond slowly, the whole system is affected. If necessary, the faulty services need to be fused or degraded.
- Hystrix
- The Spring Cloud framework integrates circuit breakers by default.
- Sentinel
- The integrated circuit breaker component in Spring Cloud Alibaba provides an external console to adjust the system’s circuit breaker downgrade configuration in real time, which is stronger in this part than Hystrix.
Link tracing/APM
Services call each other, making debugging much more complicated than single applications. In this case, using a link tracing tool can simplify debugging and provide more intuitive monitoring of application performance.
The main link tracking components are:
- zipkin
- pinpoint
- SkyWalking
- jaeger
API gateway
In The Spring Cloud system, there are Zuul gateways before and Gateway gateways after, which are closely combined with Spring Cloud and easy to use. However, because they are written in Java, they are still not as good as some specialized Gateway products in many scenarios.
- Kong
- Kong is an open source gateway product derived from OpenResty, with excellent performance and rich plug-ins to meet many scalability needs.
- Traefik
- Traefik is a gateway written in Go language, positioned as a cloud-native boundary routing gateway product with rich features, easy-to-use control panel, deep integration with cloud-native scenarios, and real-time traffic metrics that can be connected to Prometheus. The enterprise version includes features such as limited streaming and high availability, which the open source version lacks.
Service grid
From individual applications to the evolution of microservices, we will find that service governance, circuit breakers, and Tracing are almost indispensable. Even with the Spring Cloud framework, we need to pay attention to a large number of microservices technical details in order to separate this concern and make these technologies infrastructure-specific. The service grid was born.
What is a Service Mesh?
The service grid is like TCP/IP between microservices, responsible for network invocation, traffic limiting, fusing, and monitoring between services. In the same way that you don’t have to worry about the TCP/IP layer when writing applications (e.g., RESTful applications over HTTP), you don’t have to worry about all the things between services that are implemented through the Service framework when using the Service Mesh. Spring Cloud, Netflix OSS, and other middleware, for example, are now left to Service Mesh.
The current mainstream service grids are:
- Istio
- Linkerd
Common Open Source Components
What has been mentioned above will not be repeated here.
The data access
- MyBatis Plus
- Mapper
- jOOQ
- JPA
- dynamic-datasource-spring-boot-starter
- sharding-jdbc
Tool components
- guava
- commons-lang3
- hutool
The cache
- redission
- jetcache
- caffeine
Bytecode modification
- asm
- javassist
- cglib
The HTTP client
- okhttp
- Aache HttpClient
- retrofit
- openfeign
Responsive programming
- RxJava
- reactor-core
serialization
- protobuf
- protostuff
- hessian
Distributed transaction
- seata
Event-driven framework
- AxonFramework
The rule engine
- drools
test
- junit
- mockito
- Spock
Programming ideas
Programming thought is an abstract concept, to it we must look at the essence through the phenomenon, good programming thought is for a variety of good ideas, these ideas can be refined into many principles, the principle is an important part of a programming thought, all programming can comply with the general principles. On the basis of the principles, the problems repeatedly solved in the coding process are grouped into patterns. These two are the main components of thought. There are also different programming paradigms and methodologies.
The principle of
Many of these principles apply not only to programming but also to other fields, and I think that’s why Steve Jobs advocated that everyone should learn to code, because it gives you a better way to think.
- Keep it simple
- Keep It Simple, Stupid (KISS)
- One of the most important principles is that reliability comes from simplicity, and the best way to create good software is to keep your systems simple and your code simple.
- You Ain’t Gonna Need It (YAGNI)
- Don’t add complexity if you don’t need to, and avoid overdesigning.
- Separation of Concerns (SoC) – Separation of Concerns
- Encapsulate the parts associated with the target and identify them as concerns. This is an important principle for reducing complexity, and the MVC or MVP pattern is an application of this principle, treating the model, view, and controller as distinct concerns, making each concern more effectively understood and reused.
- The same idea can be applied to coding. For example, it is much easier to focus on the usability of the application first, and then the efficiency when it runs correctly, than to do both at the same time.
- Keep It Simple, Stupid (KISS)
- Don’t repeat
- Don ‘t Repeat Yourself (DRY)
- One of the simplest and most understandable principles, every programmer should be ashamed to copy and paste code.
- Convention over Configuration (CoC) – Convention over Configuration
- Using conventions for configuration and information as default rules can reduce the number of decisions developers have to make, reduce the amount of coding, and gain the benefits of simplicity without losing flexibility.
- One of the problems that the Spring Boot framework addresses is to simplify the configuration of a project, and it makes heavy use of CoC principles.
- Don ‘t Repeat Yourself (DRY)
- The principle of S.O.L.I.D
- The Single Responsibility Principle (SRP
- A class that does one thing and does it well has only one reason for it to change.
- It’s a simple principle, but many programmers often violate it when they work, such as introducing many DAO objects in a service class that provides multiple unrelated services.
- The Open/Closed Principle (OCP
- Modules are extensible, not modifiable. That is, it is open for extension, but closed for modification.
- The agent, policy, and observer patterns in design patterns better implement this principle.
- When we define an API that accepts a function as an argument, it is actually a variation of the strategy pattern that also reflects this principle.
- Liskov Substitution Principle (LSP
- Subclasses must be able to replace their base classes.
- This principle can be used as a benchmark for how we design class inheritance.
- Interface Segregation Principle (ISP) – The Interface isolation Principle
- It is better to split interfaces and use multiple specialized interfaces than to use a single overall interface.
- Interfaces can inherit more than one thing, so why define them in one general interface because of laziness?
- Dependency Inversion Principle (DIP
- High-level modules should not depend on the implementation of low-level modules, but on high-level abstractions.
- IoC is a concrete implementation of DIP that has been deeply embedded in the programming language, and the Spring framework began as an IoC container before being extended with many practical features and eventually becoming a development framework.
- The Hollywood Principle (all components are passive, and all component initialization and invocation are the responsibility of the container).
- The Single Responsibility Principle (SRP
- High cohesion, low coupling
- Law of Demeter — Law of Demeter
- Also known as the Principle of Least Knowledge, the less a class knows about other classes, the better. The more it knows, the more it is coupled.
- Both the facade pattern and the mediation pattern are examples of the application of Demeter’s law.
- This principle emphasizes low coupling.
- Common Closure Principle (CCP) — Common Closure Principle
- If changes must be made to the code in the application, we want all the changes to be in one package (change closed), rather than spread across many packages.
- In a microservice architecture, if changing one function often requires changing multiple services, it is likely that service splitting is inappropriate in violation of CCP principles.
- This principle emphasizes high cohesion.
- Common Reuse Principle (CRP
- All classes in a package are reused together, and classes that are not reused together should not be grouped together. To depend on a package is to depend on everything that package contains.
- CCP benefits the maintainers of the system by making the packages as large as possible (CCP principle adds functionally-relevant classes), while CRP makes the packages as small as possible (CRP principle eliminates unused classes). Their starting points are different, but they are not in conflict.
- This principle also emphasizes high cohesion.
- Law of Demeter — Law of Demeter
conclusion
As I said at the beginning, the field of server-side development is huge, and there are a lot of side effects that I haven’t covered in this article, such as security, DevOps, and so on. Maybe Service Mesh will become mainstream. Maybe NewSql will become standard. The longer a technology evolves, the more time it takes to learn it. We can’t ignore cloud computing. The provision of cloud services hides a lot of details for developers. In this era, we don’t need to know the principle of every technology, but we can also develop products that serve tens of millions of users.
You can look forward to what the future will evolve into, but don’t just look forward to it, because the future is here, but it’s not evenly distributed yet.