Core principles of software development
Here are some of the core principles that software development should follow:
Don’t Repeat Yourself:This is a fundamental principle of software development: don’t do repetitive work. It’s part of what’s now called geek culture. Code duplication and work duplication are unreasonable in software development. Eliminating these duplications by various means is a core working principle of software development.
Keep it simple stupid:The KISS principle. In the work of software design, most of the time do not think too complicated, do not over-design and premature optimization, with the most simple and effective scheme will avoid all kinds of additional costs brought by the complex scheme. It is not only conducive to follow-up maintenance, but also conducive to further expansion.
You Ain’t Gonna Need It:The YAGNI principle. Just include the functionality that is necessary for your application, and don’t try to add anything else that you think you might need. In a software application, 80% of requests are spent on 20% of features.
Done is better than perfect: When faced with a development task, one of the best ideas is to build things first and then iterate. It’s easy to get bogged down and delay a project if you try to cover everything from the beginning and consider all the details.
Choose the most suitable things: This is a very important principle when making program selection and technology selection. In the face of many technical solutions, open source implementation, it is important to do is not blindly new, to choose the most appropriate rather than the hype.
Software process
There are many other steps in a software life cycle besides development, and they are all techniques to master.
Project management:Project management is very important for the development of a software project to ensure that the project schedule is orderly and delivered in a controlled time with a certain quality. Waterfall development model and spiral development model are traditional project management models.
In the development of Internet, agile development is a more popular way of development. Agile development is about prototyping quickly and then iterating quickly. Scrum is one of the most popular agile development approaches.
Test-driven development:In the normal development process, a popular and effective way is Test Driven Development. At the heart of this approach is writing unit tests.
In simple terms, it is to complete the unit test case of a function first, and then complete the development of the function in the process of gradually eliminating the compilation error of the test case.
Continuous integration:After a certain software function is developed, there are subsequent testing, pre-release, deployment and other processes. The whole process is called integration, and continuous integration refers to the process that can continue without human intervention. Jenkins and Quick Build are typical continuous integration tools.
Daily development
Daily development refers to skills, tools, etc that need to be mastered on a daily basis.
Editor:The most popular editors in development today include Emacs, Vim, and SublimeText. I use SublimeText the most, which can basically meet my development needs, including writing script code and viewing code files.
Vim and Emacs have a lot of commands to remember compared to SublimeText, so there’s a bit of a barrier to getting started.
Source code version management:The code version management tools from CVS to SVN and now Git have formed a distributed version management scheme in fact. Based on Git, Git Flow can be used as the source management model.
Project tools:Github is a third-party Git repository, currently the world’s largest open source code repository, and can also be used as a private code management software.
Facebook’s open source Phabrictor provides very powerful task management, Bug management, testing, code management, etc., but it has a relatively high threshold to get started.
Zen Dao is a project management tool developed by Chinese people, but its free version has limited functions.
Third party project management services such as Tower.im are also an alternative, with the risk that the data will no longer be private.
Runtime environment
After the back-end application is developed, it needs to be deployed on the server to provide external services. From the first direct deployment of services on the physical machine to the virtual environment, cloud environment and now the hot container, until the recent rise of serverless technology. The goal is to make the operating environment of the service easier to set up, maintain, and extend.
Linux: There’s no getting around Linux when it comes to back-end servers. At least for now, the vast majority of the Internet’s back-end services are deployed in various server versions of Linux. CentOS, Ubuntu and Debian are the most popular versions.
For Linux, it is necessary to master many common Shell commands such as ps, netstat, lsof, ss, df, dh and so on. In addition, you need to be familiar with many performance analysis commands, such as top, vmstat, iostat, and SAR.
Application server:In terms of Java, most of the time the development is Web applications, HTTP protocol to provide services. Except for the performance critical cases where you build your own HTTP services, most of the time you rely on Java application servers. The most commonly used ones are Tomcat and Jetty.
Strictly speaking, these are just Servlet containers, and real JavaEE application servers such as Jboss and Weblogic are rarely used in the Internet world. Of course, the software does not provide Web server capabilities such as URL rewriting, request delegation, etc., and is not sufficient to play the role of a full Web server. Nginx is the most popular Web server.
Load balancing:In an environment with high concurrent traffic, back-end services provide services in cluster mode.At the front of the cluster, a load balancer is required to distribute requests to nodes in the cluster.LVS is the most popular 4-layer load balancing software,HAProxy is another software that supports both 4-layer and 7-layer load balancing, and Nginx is the most popular solution for 7-layer load balancing.
Of course, the best performance load balancing scheme is hardware load balancing represented by F5, but it is rarely used in Internet teams due to its high cost. In addition, it needs to be added here to ensure the high availability of services with the same role. For example, LVS are often used as the entrance of traffic. Therefore, multiple LVS nodes are deployed in active/standby mode to prevent service unavailability when one node fails. Keepalived is the most widely used technology to achieve mutual backup.
Virtualization:Virtualization is a technology that was often used in private clouds a few years ago. In this way, a physical host can be divided into multiple virtual hosts using virtualization technology to isolate resources. Among them, the representative technologies of VPS (Virtual private Server) include Microsoft Virtual Server, VMware ESX Server, and SWsoft Virtuozzo.
In addition, OpenStack provides the function of building private IIAS, Cloud Foundry provides the operating environment for building private platforms, and Docker container services are all virtualization technologies.
Third Party services
Fundamentally, though, all software services can be developed or deployed on their own servers. However, due to cost, cycle time or other objective factors, many services still need to use a third party.
IAAS:Infrastructure As A Service is the earliest mode of cloud computing. Now almost all cloud Service providers have IAAS services. Among them, the world’s most powerful cloud service provider is Amazon’s AWS, the domestic should be ali Cloud.
At present, even strong as AWS will also have some operation and maintenance failures, so the service robustness and operation and maintenance response of these domestic cloud computing providers are often criticized by people. From my own experience. In 2010 or so, Sheng Cloud’s cloud service was actually doing well, but since then, due to various reasons, it has basically lost its share.
In addition to Ali cloud, UCloud is a more reliable company focusing on cloud computing. In addition, there is a qingyun, do something slightly higher, is also a good choice. Of course, these cloud service providers are not only IAAS, but also PAAS services.
PAAS:Platform As A Service means that you only need to submit code to A specified runtime environment. The Platform does all the rest, such As code packaging, deployment, and IP binding.
In addition to building your own PAAS platform using Cloud Foundry, the most popular third-party PAAS services are Sina’s SAE, Baidu’s BAE, and Google’s GAE.
Domain name:Once you have an application that can provide a service, then the domain name is also a necessary infrastructure. A good domain name not only represents the image of the enterprise, but also can be more convenient for users to remember and spread. At present, domain name can be purchased through foreign name.com, GoDaddy and domestic wanwang.
After having a domain name, the next step must be put on record. Domain name providers generally provide supporting services or go looking for some agents. In addition, for domain name resolution, domain name providers usually have built-in resolution functions or can use independent DNS services, such as DNspod.
CDN: Content delivery network, that is, a technical implementation of nearby requests. The service provider will cache the heavily accessed content at multiple nodes across the country, so that when users visit, they can choose the nearest one, thus reducing network transmission delay and improving access speed.
At present, Both Quniu and Youpi provide good CDN services in China. Of course, such integrated cloud service providers as Ali Cloud and UCloud also have CDN services.
Email:This mainly needs to rely on the mail server, and then through THE SMTP protocol can be sent. Can choose to build their own, can also choose such as Tencent mailbox, netease mailbox.
SMS sending:Using SMS to send verification codes and marketing SMS is a common application scenario. Because SMS needs the support of the operator, this part basically needs to rely on the third-party agent. There are also many SMS gateway agents on the market.
Message push:Push has become a standard feature on mobile apps. At present, Jitui should be the leader of third-party push services, and because of its many customers, it has a great advantage in waking up the alliance.
Open Platform:Through the open platform, OAuth and other protocols can be used to obtain user information on the third-party platform to achieve third-party platform login. At present, Weibo, wechat and QQ are the most common third-party login methods, which basically use OAuth protocol to provide services for third-party developers.
Payment interface:The payment interface is an essential component of many software programs with built-in purchase functions. At present, alipay and wechat are the most accessible, both of which provide open platforms for merchants to access. Of course, there are also direct binding bank card payment, at this time need to go is the bank or unionPay gateway interface.
Basic computer science knowledge
Fundamentals of computer science, such as data structures, algorithms, computer networks, operating systems, and how computers are made up, are required skills in back-end and other areas, and are the foundation of all software development. A solid foundation in computer science will give you a solid foundation to follow when learning, using certain technologies to develop software, debug software, and troubleshoot problems.
Data structure:Data structures form the basis of programs. Classical data structures include: strings, arrays, linked lists, hash tables, trees (binary trees, balanced trees, red-black trees, B-trees), stacks, queues, graphs.
Algorithm:Classical sorting and search algorithms are often used in the usual development work, such as: bubble sort, insertion sort, selection sort, merge sort, quick sort, hill sort, heap sort and binary search.
In addition, the advantages and disadvantages of recursion and iteration should be noted in the algorithm implementation of functions/methods. The performance of the algorithm is measured by space complexity and time complexity.
Business related algorithm:In addition to the basic algorithm above, some more complex algorithms are often involved in the business, such as compression algorithm, LRU cache algorithm, cache consistency, state machine in the compilation principle and so on.
In addition, there are many algorithms in machine learning that are becoming more and more popular at present, which are also of great use in many business scenarios, such as stutter segmentation and ICTCLAS for text segmentation; Tf-idf and TextRank for keyword extraction; Topic model, Word2Vec, cosine similarity and Euclidean distance are used to calculate text similarity. Naive Bayes for text classification; Used for recommendation clustering, collaborative filtering, user portrait, cryptic meaning model, etc.
Computer Network:TCP/IP protocol is the most fundamental network protocol, its seven/four layer protocol stack design is very essence of things, the establishment of the connection, disconnect and connection of the various states of the conversion is the investigation, solve the basic basis of network problems.
From TCP/IP up, HTTP is the protocol that most back-end applications provide to the outside world, and now it is about to step into the HTTP2.0 era, bringing exciting new features such as persistent connections and connection reuse.
In addition, HTTPS protocol based on HTTP is gradually becoming the mainstream protocol for open back-end services due to its security. At the business level, RESTful specification based on HTTP protocol is becoming the mainstream specification of external interface, and OAuth2.0 protocol is also becoming the mainstream protocol of open platform. Besides HTTP, SMTP is another APPLICATION protocol based on TCP/IP and is mainly used for sending mails.
Design pattern:In software development, the experience of our predecessors has formed many classic design patterns for us to use, which can make the implementation of software wearable, extensible, and maintainable. Classical factory pattern, simple factory pattern, singleton pattern, observer pattern, agent pattern, builder pattern, facade pattern, adapter pattern, decorator pattern are important in many daily development scenarios.
data
Now all the business of the Internet is actually around data. And data transmission, data storage, data analysis and processing are the key parts.
Cache:At present, the most widely used cache software Redis can support rich data structures, such as: string, list, ordered collection and other data storage. Understanding the principle of cache implementation and the strategy of memory flushing can make better use of cache.
In addition, because of the high cost of caching, we must do a good job of quantization and storage optimization when using caching.
Database:A big key to mastering databases is the use of indexes, so to speak, the correct use of indexes is basically equal to mastering the use of databases. At present, most databases use b-tree as the index data structure, the purpose is to take advantage of the sequential disk read and write characteristics.
Different databases have some unique advantages due to their different design purposes, such as: MongoDB naturally supports Sharding, but limited by NoSQL, it is not applicable in the scenario of heavy transactions and associated relationships; HBase uses LSM as the underlying data structure, sacrificing read performance for high write performance.
Search engine:Search engines mainly deal with full-text retrieval and multi-dimensional query business scenarios. Knowing the data structure, clustering mode, and key configuration points used by search engines helps you better use search engines to serve business applications.
Message queue:Message queues have two roles: producer and consumer, and each role has different requirements for message queues. Among them, for consumers, the way of message consumption includes publish – subscribe and queue.
The semantic guarantee of message queue can be divided into three modes: At Most Once, At Least Once and Exactly Once. It is necessary to select appropriate semantic guarantee for specific business scenarios. In addition, the reliability of message queue is determined by the guarantee of high availability and message security.
Data storage and processing:Data stored will eventually be used for analysis and processing. Data processing is divided into offline processing and real-time processing. The advantage of offline processing is that it can process a large amount of data, but generally there is a delay of T+1. It is suitable for scenarios with a large amount of calculation but a delay is allowed for the results.
However, for offline data analysis, there is also a key problem of data skew. Data skew refers to the uneven distribution of data in a region. As a result, some nodes have low load while others have high load, affecting overall performance.
Therefore, it is very important to deal with the data skew problem for offline data processing. However, real-time processing is generally a stream processing mode, which is applicable to the scenario where data can be converted into data stream and results require timeliness.
For real-time data analysis, it is important to consider the concurrency problem when writing real-time data processing results to storage. Although there is no concurrency problem for Storm Bolt, the storage medium is faced with multi-task simultaneous reading and writing.
Usually, the time window is used to buffer data and then write data in batches.
Data synchronization:In addition to the direct logs, another key source of data for a data warehouse is the business database. The process of moving from a business database to a data warehouse is called data synchronization. There are SQL-based synchronization schemes and incremental synchronization schemes based on MySQL binglog.
Java
There are two major components to Java skills, including Java programming and the JVM.
Let’s start with the Java programming section, which is the most basic skill of Java engineers.
IDE:The most used Java ides today are Eclipse and Intellij IDEA. The former is an older IDE, phasing out Jbuilder and Netbeans and taking over most of the Java IDE market. The latter is an upstart that is now in widespread use and poised to replace Eclipse due to incremental compilation, intelligent analysis of code, and other performance improvements.
Core syntax:The most commonly used syntax is JDK6’s Java syntax. Java7 introduced syntax such as try with resource, switch string, and diamonds. Java8 introduces syntax such as lambda and stream.
Collection classes:Collection class is the essence of Java language, including: HashMap, ArrayList, LinkedList, HashSet, TreeSet and thread-safe ConcurrentHashMap, ConcurrentLinkedQueue and other thread-safe collections. It is important to understand how they are implemented, as well as the performance of queries, modifications, and usage scenarios.
Tools:Google Guava, Apache Commons, And FastJson provide many tool classes, collections, and more that the JDK doesn’t have. In addition, ASM bytecode manipulation and CGLIB code generation provide lower-level Java programming capabilities.
Advanced features:Apart from the basic programming of Java core, concurrent programming, generics, network programming, and serialized RPC are all advanced programming features of Java. For concurrent programming, you need to master all kinds of concurrent tools provided by Executors, fork/ Join framework and synchronization tools such as CountDownLatch, Semaphore, and CyclicBarrier brought by Java7;
Network programming should distinguish BIO, NIO and AIO; Protobuf and Kryo are efficient third-party implementations of serialization in addition to JDK’s own serialization implementations. Thrift, Hessian, Dubbo, and RMI are common protocols in RPC implementation. Hessian is based on Http, Dubbo is based on TCP, and Thrift supports both protocols.
JavaEE:JavaEE is now one of the most common areas of Java application. Servlets are one of the most fundamental components of JavaEE. Servlet3.0 brings asynchronous servlets that improve their performance in handling requests.
Project Construction:The most commonly used Java project building tools include Maven and Gradle, which provide source package dependency management, compilation, packaging, deployment and a series of functions.
Programming framework:Spring is an unavoidable framework in Java programming. Until now, apart from IOC and AOP of Spring core, SpringMvc, Spring Data, Spring Cloud and so on have brought convenience to Java developers and greatly improved development efficiency.
In addition, ORM framework MyBatis is also one of the popular frameworks in the Java field, which implements the mapping operation of database records to Java objects. In addition, Jersey provides a complete development framework that complies with RESTful specifications from client to server.
Testing:Testing is a necessary step in any programming. Black box testing mainly refers to the usual functional testing, white box testing is mainly refers to the code function, quality testing.
In addition, critical unit testing is the focus of development engineers, and the concept of “test-driven development” is a worthy development approach. JUnit is currently the dominant solution for unit testing in Java.
In general, most programming tasks can be handled with the Java programming skills described above. But if you’ve done your best at the code level but still can’t meet performance requirements, you need to do something at the JVM virtual machine level. It can be said that mastering JVMS is a key step in Java development
Virtual machine implementation:Java virtual machine implementation in addition to our common HotSpot, there are JRockit, J9 and mobile platform Dalvkit. Most of the JVM optimizations we typically describe are for HotSpot virtual machines.
Class loading mechanism:The JVM’s classloading machine follows the parent delegate principle, whereby the current classloader first asks the parent to load the current class, and then tries to load it itself if it can’t.
The OSGI framework breaks this mechanism and uses an equal, netted class loading mechanism to achieve a modular loading scheme.
Runtime memory composition:The program counter, stack, method area, heap, and off-heap memory together make up the JVM’s runtime memory.
Java memory model:Java’s main memory + thread private memory model is at the root of thread-safety problems.
GC principles and tuning:GC is an advantage over other languages such as C and C++, but because the details of GC are shielded by the JVM, it is difficult to control freely in the case of very demanding memory and performance requirements, which is also a disadvantage to some extent.
If you want to maximize the performance of the GC in certain scenarios, you can optimize the configuration of various GC parameters, such as the selection of garbage collectors for the new generation and the old generation, and the configuration of various garbage collection parameters.
In addition, there are many instances where JVM GC is frequent due to code quality or external factors that require tools to quickly locate and resolve problems.
Performance tuning and monitoring tools:The JDK comes with many powerful tuning and monitoring tools, including JMap, JStack, JCMD, JConsole, jInfo, and more.
In addition, bTrace is a very powerful online troubleshooting tool that can dynamically insert some code logic without restarting the Java process, thus intercepting the code execution logic print log and troubleshooting problems
System architecture
An application starts from 0 and evolves from a single application to a vertical application to a distributed service architecture. As shown below:
Single application:When the application size is small and the team size is small, you only need one application that includes all the features. The number of deployment nodes is reduced and the deployment cost is also reduced. At this point, ORM operations on the database are key to the implementation of the architecture.
Vertical application:As the user scale of the application increases, the volume of requests increases. The resource waste associated with adding nodes to a single application can be highlighted, because most interface requests are not particularly large and there is no need to scale to multiple nodes.
In this case, the single application can be divided into several unrelated applications to provide services externally. At this point, the MVC framework that speeds up the development of each application is the key to the implementation of the architecture.
Distributed services:As more and more vertical applications become available, interactions between applications are inevitable. Separate the core business from the deployment, and gradually form a stable service center. With the corresponding expansion of the team size, services will become more and more with the increase of the team, and the granularity will become smaller and smaller, which gradually forms the architecture of distributed services. When the granularity reaches a certain degree and the number of services reaches a certain degree, it can be called microservices.
That is, after business boundaries are designed, the original monolithic applications are broken down into fine-grained services that communicate with each other in some way. The key of microservice architecture is how to manage, schedule and maintain the service well. Currently, Dubbo is one of the most widely used frameworks in microservices architecture, but Dubbo only addresses some of the problems in microservices architecture. Spring Cloud basically covers all aspects of microservices architecture.
Deployment architecture
For Web applications, LVS+Nginx+Tomcat+MySQL+Redis can form a simple and universal deployment architecture, as shown in the figure below:
LVS, as the most advanced node, is responsible for forwarding traffic and load balancing at layer 4 of the network.
Multiple LVS use Keepalived to master each other for high availability.
As a reverse proxy, Nginx is responsible for forwarding traffic and load balancing at layer 7 of the network.
Tomcat is a business container where the main application code is stored.
Redis acts as a cache to isolate high concurrency requests from back-end databases.
MySQL persists data in master-slave mode.
The dashed line is the database layer, which adopts the master-slave mode. You can also use Redis Cluster (Codis, etc.) and MySQL Cluster (Cobar, etc.) instead.
The original article was published at: 2018-05-16
Author: Ruran Hang
This article is from The cloud community partner “Mesozoic Technology”. For related information, you can follow “Mesozoic Technology”.