Generally speaking, technical people (programmers) start businesses in the language they’re most familiar with based on their technical background, but think about it carefully when you consider that you won’t always be writing programs alone. No matter what language you use, ultimately code quality is about management, so let’s be realistic from a pure language level. Currently popular Java, PHP,.NET, Python and Ruby all have their own advantages and disadvantages. Python and Ruby are still relatively difficult to recruit personnel and require some effort in performance optimization. The.NET platform cannot afford Windows Server. Java and PHP are still used the most. At the beginning, almost all applications rely on front-end support for the website, PHP has a slightly bigger advantage, simple to get started, simple design mode, fast to write, sufficient performance, but it does not pay attention to the design mode is also its disadvantages, easy to become loose, hidden bugs, difficult to maintain. The advantage of Java is that there are already mature tools to support the entire management process, strong classes to avoid some stupid bugs, and most Java programmers focus on design patterns, regardless of whether they are practical, and the code format looks good. This is also a disadvantage, as beginners may be too focused on patterns to address real needs.

The front end is not just HTML and CSS. The whole part responsible for interacting with the user is the front end, including the handler. This kind of program is still recommended to use PHP, Java, the main reason is rapid development, extensive employees. As for the back end such as behavior analysis, banking interface, asynchronous message processing, etc., you can only choose a different language according to different business needs.

Second, code version management

Assuming SVN is selected, there are several considerations. One is what tree structure to use. There may be only one trunk at the beginning, but later branches need to be established, such as a development branch, a live branch, and later, one branch per team. It is suggested to choose two branches, development and online, when the number of people is small at the beginning, and submit each function to the development branch after the local test is correct. Finally, the unified test can be combined to the online branch when online. If you like to use SVN as a portable hard drive, you can write a few and commit once, but the head of merge is bigger. These people can build their own branch or even set up a local repository, commit to their branch, and then commit to the development branch after testing.

Deployment can be manual or automatic. Manual deployment is relatively simple. Generally, you need to update SVN on the server or find a new SVN checkout directory and send the Web root to LN-S. The more complex the application, the more complex the deployment, there is no unified standard, as long as you do not use the FTP upload format, one is the inconsistency of file reference error rate increases, two is easy to appear the developer version is inconsistent with the online version, resulting in the original attempt to change a typo turned into a rollback of the disaster. If multiple servers exist, you are advised to deploy them automatically. The server with the new code is temporarily removed from the current service pool and added again after the update.

No matter how small the project, develop a good habit of using version management, at least can be used as your backup, although my http://zhiyi.us is a wordpress, or SVN, only change a sentence or two CSS that is also the result of labor.

Server hardware

Don’t envy big customers and rich, look at the machine room retail area, a server lonely support of countless websites. If you have a little more money, it is recommended to have at least three standard configurations for Web processing, data library, and backup. The Web server must have at least 8 GB memory and dual SATA RAID1. If the economy is relatively loose or there are many static files or pictures, the WEB server must have a 15K SAS RAID1 +0. The database has at least 16 gb memory and 15k sas raid 1+0. The backup server should be configured the same as the database server. Hardware can buy their own brand of baseboard, that is, the chassis with motherboard and hard disk box, CPU memory hard disk are their own, can also be on the whole set of brands, but also to compatible machine. Three machines, the market price of 60, 70 thousand will be matched.

Web server can run the program again when the memory cache, the database server is only run primary database (if be MySQL), backup server work is relatively more, web configuration, cache configuration, database configuration are two consistent, web and database any one question, the backup server to change the IP switch it. The backup strategy can be DRBD, Rsync, or many, many other open source backup solutions. Rsync is the easiest. Just put it in cron and run. You are advised to perform multiple tests for backup and switchover, select the most secure and suitable for services, and perform remote backup as far as possible.

Four, computer room

Three kinds of machine room as far as possible do not choose: Unicom access particularly slow telecom room, telecom access particularly slow Unicom room, telecom Unicom access particularly slow mobile or tietong room. What about the netcom computer room? Dear, netcom Unicom N long ago merged and changed its name to Unicom. On-the-spot, looking for more, a lot of testing, multilateral play-pleasure, the master node cities such as Beijing, Shanghai, guangzhou, there are still a lot of high quality machine room, find a network of good quality and strict management of the machine room, especially the management to strictly, don’t, the site to make a phone call past just know others to maintain cable you touch off, this is a headache than DOS. He pulled a few optical fiber is called the machine room, see your degree of wind risk and psychological quality. Computer room can be said to be very important, directly related to the speed of website access, website access speed is directly related to user experience, I can F wall to see the scenery, but buy an online game VPN to open your website is not how well-known is difficult. Maybe your site’s Ajax is great, but document is never ready, and some code is never ready for users.

Five, architecture,

The initial architecture is generally simple, web load balancing + database master/slave + cache + distributed storage + queue. It is true that there are only these things in the general direction, and the details have been repeated by countless articles. In the future, there will be N more WEB, N more master-slave relationship, N more cache, N more XXX design. The basic scheme is ready-made. It’s just that you’re better than everyone else in designing for the avalanche effect of cache failures, data consistency and time differences between master and slave synchronization, queue stability and retry strategies after failures, file storage efficiency and backup methods, and so on. Caches will fail someday, database replication will fail someday, queues will fail to write, and power supplies will burn out. According to Murphy’s law, if you don’t take these into account, the site will eventually become a coffee table.

Vi. Server software

Linux, Nginx, PHP, mysql, almost standard, we look at the name, we have to choose the version. There are many Linux distributions, so as long as there are no special requirements, choose the one with the most users, the most active community, the most convenient configuration, and the most complete and up-to-date software package, such as Debian and Ubuntu. As for RHEL or something like that, do you use software that only runs on RHEL? The rest of nginx, PHP, mysql, ActivemQ, etc., unless you change these software or your program is really not compatible with the new version, try to make the version as new as possible. New version means more features, fewer bugs, and more performance. There are always hearsay people who tell you that the old version is stable. The so-called stability is relative to a particular business, and on a website written in PHP, most people have not changed any of the server software source code, the vast majority of cases are able to smoothly upgrade to the new version. Big upgrades like jdk5 to JDK6, python2 to python3 are rare. Look at ChangeLog, look at the upgrade instructions, combined with their own situation evaluation, the earlier the upgrade the better, others are using PHP6 to write procedures here also PHP4 of the tour. Good open source program upgrade or very responsible, good documentation, don’t be afraid.

With these six points in place, now that we have the runtime environment, the basic architecture skeleton, and the backup and switch scheme, it’s time to get started on the design and development side. The development side of things is endless, the next article will cover some of the highlights first.

Vii. Database

Almost all operations end up in the database, which is the most difficult to scale (and store). For mysql, what tables to use myISAM and what tables to use InnoDB should be determined before development.

The replication strategy and sharding strategy should also be determined. In general, myISam can be used for tables that are not updated enough and do not require transactions. Innodb can be used for tables that require row locking and transaction support.

Myisam’s lock table is not necessarily the cause of poor performance. Innodb’s lock table is not necessarily full of row locks. For details, read the relevant documentation.

Modern WEB applications have become increasingly complex, and we often design table structures with a lot of redundancy, which is not in line with the traditional paradigm, but is still worth it for speed, and even for high requirements to eliminate federated queries.

Pay more attention to data consistency when programming. In terms of the replication strategy, it is also best to design the multi-master multi-slave structure from the beginning, and write the code directly according to the multi-master multi-never, with some tricks to avoid the replication delay problem.

And also to solve the data consistency of multiple databases, you can write or find ready-made operation and maintenance tools.

Sharding policy. There will always be a few tables with large amounts of data, so sharding is inevitable. Sharding has many strategies, from simple partitioning to auto-tuning based on heat, choosing one that suits you according to your particular business.

Avoid self-increasing ID as the primary key, which is not conducive to sharding.

It is difficult to scale with stored procedures, and this situation is common in traditional C/S, especially for developers converted from OA systems.

Low-cost web sites are not one or two minicomputers running a database to handle all the business model, is a sea of machine warfare. Convenient horizontal scaling is much more important than that point of pre-analysis time and network traffic.

No. It’s just a concept. In practical applications, websites have more and more intensive write operations, hundreds of millions of simple relational data reads, hot standby, etc., which are not good at traditional relational databases.

This resulted in a number of non-relational databases, such as Redis/TC&TT/MongoDB/Memcachedb, which, in tests, almost all achieved at least 10,000 writes per second.

Memory type even more than 50,000. For example, MongoDB can set up a replication + automatic sharding +failover environment with a few configurations. The documented storage also simplifies the traditional design library structure redevelopment mode.

Many businesses can use such a database to replace mysql.

Cache.

The database is very fragile, there must be cache in front of the block, in fact, we optimize the speed, almost is to optimize the cache, can use the cache, do not run to the back-end database that.

There are persistent caches, in-memory caches, generating static pages is the easiest way to understand persistent caches, and many others, such as Varnish’s block cache, memcachedb mentioned earlier,

Memory caching, memcached, redis first. Cache updates can be passive or active. The nice thing about passive updating is that it’s easy to design, and when the cache is empty, it automatically goes to the database and fills the cache,

However, it is easy to cause an avalanche effect, once the cache fails in a large area, the pressure of the database will rise sharply and it is likely to fail. Active caching avoids this but can cause problems with applications not fetching data.

How to cooperate between these two, the program design should think more.

9. Queues.

A user’s operation is likely to cause a series of resource and function transfers. If these transfers happen at the same time, the pressure cannot be controlled and the user experience is not good. These operations can be put into a queue.

By several other modules to perform asynchronously, such as sending emails, sending mobile phone SMS. There are a lot of open source queue servers that don’t require much performance and you can use a database as a queue,

As long as the interface of the program to read and write the Queue remains unchanged, the underlying Queue service can be changed at any time, similar to the Zend_Queue class in the Zend Framework, java.util.Queue interface, etc.

X. File storage.

In addition to structured data, we often store other data, such as images. This kind of data is abundant and highly visited.

Typically, images, ranging from user profile pictures to photos uploaded by users, are generated in different thumbnail sizes. Distribution of storage is almost as difficult as database scaling.

Without the use of professional storage, they basically rely on their own NAS. That’s where the structure comes in. Image storage, for example, is very hot,

Some images are uploaded and no one sees them again, some can be accessed hundreds of thousands of times a day, and asynchronous backups of lots of small files can be time-consuming. In order to prepare for the future image CDN, it is best to separate the image domain name from the main domain name at the beginning. A lot of websites have cookies set to.domain.ltd,

If the image is in the same domain, the cache is likely to be invalidated due to cookies and excessive traffic, and access may be slow due to browser concurrent thread restrictions. There is an easy way to store images using a normal file system. Compute the hash value of the file, such as MD5, using the first digit of the result as the level 1 directory, so that level 1 has 16 directories.

From 0 to F, this letter can be used as a domain name, 0.yourimg.com to f.yourimg.com (client DNS pressure increases), and can be extended to up to 16 NAS clusters.

The second level is available for years for example, 201011, the third level is available for days, and the fourth level is optional, depending on the number of uploads, such as AM/PM, or even hours. The directory structure may be e / 201008/25 / am/e43ae391c839d82801920cf JPG. Rsync allows you to use scripts to synchronize only the files of a certain day and a certain year to avoid the overhead associated with a large number of files.

Ideally, you can use a dedicated distributed file system or a more specialized storage solution.