Translation: Zou Yongsheng

Welcome to visit NetEase Cloud Community to learn more about NetEase’s technical product operation experience.


There’s a critical part of the Internet that you rely on every day, and it’s in TCP, and it’s one of the basic building blocks of the Internet. TCP is a reliable transport layer protocol. This means that you use TCP to send data and there is no confusion or loss of data. We use TCP for many everyday things, such as browsing the Web or sending emails. Every message can arrive steadily because of the reliability of TCP. Although there is a lot of junk mail. By comparison, another unreliable protocol for transmitting data is called ‘IP’. There is no guarantee that the data you send will arrive, and even if it does, the data may be scrambled. If you use IP to send a batch of data, don’t be surprised if only half of it arrives. Even when the data is reached, there is no guarantee that the data is correct. But here’s the magic: TCP is on top of IP, and TCP has an obligation to transmit data in an untrustworthy manner. To explain this amazing phenomenon, think about it in a moral way in the real world: Imagine we’re transporting actors from Broadway to Hollywood, and we’re trucking them across several states. In transit, some vehicles crashed, and the poor actor died. Some actors have gotten drunk, cut their hair, tattooed noses, etc., and their images have changed so much that Hollywood no longer accepts them. At the same time, the arrival time of each actor is different due to the different driving routes of each car. Now suppose there was a service called Hollywood Express that transported actors to Hollywood quickly, safely and in an orderly fashion, each in a perfect order of arrival and departure. The magic part is that the Hollywood Express has no way to transport actors other than by loading them into cars and transporting them across several states. Hollywood Express checks each actor to make sure they arrive in one piece, and if there is a problem, asks headquarters to re-send a duplicate of the damaged actor. If the actors arrive in the wrong order, the express company will reorder them. If a UFO crash damaged roads in Nevada, all transport vehicles would reroute roads in Arizona. When they arrive at their destination, the express company does not inform the Hollywood directors in California of what happened during the journey. For the director, his actors are just a little slower than before, and they have no idea about the mid-day UFO crash. That said, the magic of TCP is what computer scientists like to call abstraction: masking complex processes for the sake of simplicity. It turns out that a lot of computer programming is made up of abstractions. What is a string library? This is a way of making it as easy for a computer to manipulate strings as it is to manipulate numbers. What is a file system? This is a layered system that pretends that the hard drive is not a bunch of spinning disks that can store bits of binary data in a specific location, but a folder within a folder containing individual files, which in turn are made up of one or more bytes. Let’s go back to TCP. Earlier, for the sake of simplicity, I told a little lie, and some of you are getting angry right now, because lying makes you crazy. Earlier I said TCP guaranteed that your message would arrive. Not really. If your pet snake bites through the network cable that connects the computer, then no IP packets will get through, and any data sent over TCP will not get through, and your message will not get through. At the same time, if the system administrator in the company punishes you by allocating your network to an overloaded hub where only some IP packets can pass, TCP will work, but everything will be very slow. This is what I call * Leaky abstraction *. TCP tries to provide stable service over an unstable network line, but due to network vulnerabilities, this abstraction does not protect your data smoothly. This is just one example of what I am saying to illustrate the abstraction vulnerability: > In a sense any meaningful abstraction is made up of the vulnerability > -all non-trivial Abstractions, to some degree, are leaky. Abstract failure. Sometimes less, sometimes more. There are loopholes. There are mistakes. When you have abstractions, there are always loopholes. Here are some examples. – Sometimes, just like iterating over a large two-dimensional array, pure horizontal and vertical operations can have completely different performance, with one direction causing more errors than the other. Even if programmers pretend they have a large flat address space (which is really an abstraction of memory), certain physical Spaces take more time to extract data than other Spaces when errors occur. – SQL is designed to abstract away the procedural steps required to query a database, allowing you to define only what you need and let the database determine the procedural steps required to query a database. But in some cases, some SQL queries are thousands of times slower than other equivalent query queries. A famous example is that some SQL servers are much faster if you specify “where a=b and b=c and a= C” than if you just specify “where a=b and b= C”, even if the result set is the same. You don’t have to worry about the program, just the specification. But sometimes abstractions can leak out and lead to poor performance, and you need to jump out of the query plan analyzer, study what it’s doing wrong, and figure out how to make queries run faster. – Even though network libraries like NFS and SMB allow you to treat files on remote machines “as if they were local,” sometimes the connection becomes so slow or broken that the files stop working as if they were local, and as a programmer you have to write code to handle this. The “remote file is the same as the local file” abstraction is leaked. Here is a concrete example of a UNIX system administrator. If you put the user’s home directory on an NFS-mounted drive (one abstraction) and your user creates a.forward file to forward all of his E-mail elsewhere (another abstraction), the NFS server will not forward the message when the new E-mail arrives, because the.forward file cannot be found. A bug in the abstraction actually caused some messages to be discarded. – C++ string classes should make you feel like strings are first-class data. They try to abstract strings and make them look like integers. Almost all C++ string classes overload the + operator, so you can write S+ “bar” to concatenate. But you know what? No matter how hard they try, there is no C++ string class on earth that lets you type “foo” + “bar” because string literals in C++ are always char *, never strings. There’s a bug in the abstraction that the language won’t allow you to insert. (interestingly, the history of C++ evolution over time can be described as a history of attempts to insert holes in string abstractions. Why can’t they just add a local string class to the language) – you can’t drive that fast when it’s raining, even though your car has windshield wipers, headlights, roof and heaters that are there to protect you from the fact that it’s raining, but you still have to worry about water skiing. In England, there are times when it rains so hard that you can’t see far ahead, so you need to slow down in the rain, because weather can never be completely abstracted, because the laws of abstraction have loopholes. One reason the abstraction loophole is problematic is that abstraction doesn’t really simplify our lives. When I train someone to be a C++ programmer, it would be nice if I never taught them about character Pointers and pointer algorithms and could go straight into STL strings. But one day they’ll be writing string concatenations, “foo” + “bar”, and really weird things will happen. Anyway, I have to stop and teach them all about character Pointers. Or one day, they can’t call Windows API functions with OUT LPTSTR arguments until they learn about char*, Pointers, Unicode, wchar_t, TCHAR headers. When teaching COM programming to someone, it would be nice if I could teach them how to use Visual Studio wizards and all the code generation features, but if something goes wrong, they don’t know why and how to debug it and fix it. So I’m going to teach them all about IUnknown, CLSID, ProgIDS… God there are so many details! When teaching ASP.NET programming, it would be great if I could teach them that they could double-click on something and then write code that runs on the server when the user clicks on it. But in reality, ASP.NET abstracts the nuts and bolts of writing HTML code to handle click events on hyperlinks () and button clicks. The problem: One detail that the ASP.NET designers need to hide is that forms cannot be submitted from HTML hyperlinks. They do this by writing a few lines of JavaScript and appending them to the hyperlink. Obviously, there are holes in abstraction, ASP.NET applications won’t work if JavaScript is disabled by the most end user, and programmers won’t know what’s wrong if they don’t understand what ASP.NET abstracts behind the behavior. The abstract vulnerability law means that every time someone comes up with a dazzling new code-generation tool that makes our work more efficient, you’ll hear a lot of people say, “First learn how to do it manually, and then use an automated tool to save time. Code generation tools are people trying to abstract away some work and wrap it. This can save us working time, but not the cost of learning. Paradoxically, all this means that programming tools are becoming more sophisticated and abstract, but it makes it harder to become a skilled programmer. During my first internship at Microsoft, I wrote a string library to run on the Macintosh. A typical task: write a version of Strcat that returns a pointer to the end of a new string. A few lines of C code. This is the right thing to do, and I learned this from K&R, a very thin book on C. Today, to work at CityDesk, I need to understand Visual Basic, COM, ATL, C++, InnoSetup, Internet Explorer internals, regular expressions, DOM, HTML, CSS, and XML. A decade ago, we might have imagined that a new programming paradigm would make programming easier. In fact, the abstractions we’ve created over the years have really made it easier to deal with the complexities of software development that we didn’t have to deal with 10 or 15 years ago, like GUI programming and network programming. While these great tools, like modern object-oriented programming paradigms, allow us to get a lot of work done quickly. But then one day we found a problem, an abstraction flaw in the paradigm, and it took two weeks to fix it. And when you need to hire a programmer to program in VB, obviously VB programmers aren’t enough, because every time a VB abstraction bug is discovered, they get completely anxious. The law of abstraction is dragging us down. This article is translated from The Law of Leaky by Joel Spolsky Abstractions](https://www.joelonsoftware.com/2002/11/11/the-law-of-leaky-abstractions/)


Free access to verification code, content security, SMS, live on demand experience package and cloud server packages

For more information about NetEase’s technology, products and operating experience, please click here.


FLINK source code debugging method