Object-oriented programming, design pattern programming (i.e., design pattern), interface programming, template programming (i.e., generic programming), functional programming (i.e., functional programming), parallel programming for the multi-core era, machine learning programming for big data… You’ve had enough to face over the years, but what I’ve seen is that many programming languages make you face XXX while trying to avoid Pointers. I didn’t want to face so many things, so I joined the dark side of the pointer. I’d like to write an article on Pointer Oriented Programming as a proxy for my complete break with the light forces of the software world.

There are very few programming languages in the world that provide Pointers, such as assembly language, C/C++, and Pascal. Pascal, I never learned it. Assembly language is too dark, and I’m not good enough to handle it now. C++, I think, is the scum of the dark side – it tries to get away from the pointer and into the light, but it does a lot of crap. So I’ll stick with the cliche C language to illustrate the dark power of Pointers.

Before you read this, read three times what the Unix guru said: When the Venerable Ritchie invented C, he punished programmers in hell for buffer overflows, heap corruption, and rotten pointer bugs. Then console yourself with the thought that if hell does not bend me, THEN I am darker and stronger than hell.

What are Pointers?

Memory is a large but often insufficient amount of space in bytes. Pointers are data stored in x consecutive bytes of memory — on 32-bit machines, x has a value of 4; On 64-bit machines, the x value is 8. For the sake of simplicity, this article talks only about Pointers on 64-bit machines.

A pointer is a piece of data, which is nothing new. From the machine’s point of view, everything in a program is data stored in an array. Only a sentimentalist programmer would think, as Aristotle did, that programs are composed of objects + methods or many functions. In fact, from the point of view of Lisp, the language most remote from the machine, everything about a program is also data, stored in tables. If we ignore the fact that programs are data, it’s easy for the procedural apes to go metaphysical, and they’ll spend the long, sinful, miserable Middle Ages worships one stick after another, with a few St. Augustine’s in between.

So what data is stored in the pointer? Memory address.

Memory is the amount of space in bytes, each of which is accompanied by an address that is given by the machine, not by our programs. You can think of the entire memory space as a building, think of the bytes as each room in the building, and think of the address of each byte as the house number of the room, so the data stored in the pointer is similar to the house number.

If you’ve never learned C, you’re probably reading this and wondering, why are we storing memory addresses in memory? I wonder if you have stayed in a hotel. In a proper hotel, an escape route is posted behind the door of each room, which “stores” the number and layout of all the rooms on the same floor of the hotel as yours. If you never look at your escape route map when you’re staying in a hotel, make sure it’s the first thing you do when you check in. It could save your life. Storing memory addresses in memory may not save your life, but it can be used to construct an abstraction similar to a hotel escape route — abstraction and composition of memory data.

Name and nameless of memory space

Now look at two lines of C code:

int foo = 10;
int *bar = &foo;Copy the code

What is foo? Foo represents a memory address. The int before foo is a data type modifier that means foo is the first byte address of four consecutive bytes in memory. (On 64-bit machines, int is four bytes long.) The C compiler always determines the logical meaning of the data stored in the contiguous bytes starting at a memory address based on the type corresponding to that memory address. Therefore, when we modify foo with an int, the compiler will assume that the data stored in four consecutive bytes starting with foo is an integer. In the code above, the integer is 10, which we store in memory in four consecutive bytes starting with address foo using the assignment operator =.

From now on, remember that all variable names in C are essentially memory addresses. Instead of using a memory address, use a name that makes sense. This is similar to when no one calls you by your ID number, they call you by your name.

C considers that the length of data is determined by its type. For example, the length of data of the int type is 4 bytes, the length of data of the char type is 1 byte, and the length of data of the user-defined struct type is determined according to the actual situation. In this case, all the names that represent memory addresses are essentially the starting address — or, technically, the base address — of the various types of data storage space in memory. Any memory space where a base address is represented by a name is called a named memory space.

Now, what is bar? Bar is the name of the memory address, and since bar is preceded by an *, this means that we intend to store a memory address in eight consecutive bytes of the bar-based address (remember, we’re on a 64-bit machine, and the length of pointer data is eight bytes) — the address represented by foo, I.e. & foo. In this case, the ampersand is the value, and it says to Foo, stop playing games with me and give me your ID number! The * is preceded by int, which means that the memory address stored in the consecutive 8 bytes based on the bar address is the base address of some memory space used to store integer data.

Since bar is the base address of a memory space that stores a memory address, bar is called a pointer. In this case, we can think of bar as a “reference” to a block of memory with a base address of foo, i.e. room number foo is stored in a room with a room number of bar. Int *bar = &foo int *bar = &foo int *bar = &foo int *bar = &foo int *bar = &foo Everything is a reference to memory space. In the above example, foo is used to refer directly to a memory space, and bar is used to refer indirectly to a memory space.

In the above example, bar refers to a named memory space. Are there any nameless memory Spaces? Look at the following code:

int *bar = malloc(sizeof(int));Copy the code

Malloc (sizeof(int)) is an unnamed memory space because it is an expression that describes a set of behaviors that need to be described by a verb rather than a noun. For example, “I’m writing an article.” You can’t just use a noun to describe this behavior, you have to use a verb. Any terminating action can be expressed as a series of changes of state, that is, any terminating action has a result which can be described by a noun. For example, malloc(sizeof(int)) is terminable, and the result is that it creates a base address of 4 bytes in memory, and this base address has no name, so it’s an unknown base address, so its corresponding memory space is an unknown memory space, but if we want to access this space, We have to give it a name, and when we refer to its base address with a bar pointer, it becomes named.

The founders of C — Dennis Ritchie and Brian Kernighan — called named storage Spaces objects — not objects in “object-oriented programming,” and then called expressions that referred to this Object lvalues. That is, in C, foo and bar in the above example are lvalues because they can always appear to the left of the assignment symbol.

Look at the following code:

int foo = 10;
int *bar = &foo;
printf("%d", *bar);Copy the code

The *bar in the third printf statement is also an lvalue because it refers to a named storage space, which is called *bar. The storage space is actually the storage space with the base address foo. In the expression *bar, the * is used to dereference the memory address stored in the memory space based on bar, and then access the memory space corresponding to that memory address. Since *bar is of type int, the program itself knows that it is accessing the four bytes at the base address of *bar, so it can accurately extract the integer 10 and hand it to printf to display.

The darkest part of Pointers is that once you have a base address for a memory space, you can use that base address to access any area of memory you want! That is, you can get access to memory space from Pointers, and then you can let your program wander through memory (stack space) and destroy it, and then your program might crash. If your program contains a buffer overflow vulnerability, it can even be controlled by other programs to execute code that is very bad for your system. This is called a buffer overflow attack. The C language does not provide any buffer protection mechanism, the effective buffer protection depends on your C programming skills.

Now, when we write C programs, we don’t have to worry about buffer overflow attacks. Because only those widely used C programs are at risk; If, unfortunately, your C program is used by a lot of people, don’t worry too much. Understanding Computer Systems, in section 3.12, “Out-of-bounds References to Memory and Buffer overflows,” tells us that modern operating systems randomly generate the stack space that a program needs to run, making it difficult for an attacker to obtain a certain address in the stack space, at least on Linux systems. C compilers provide stack destruction-detection, at least in GCC, based on the idea that a program puts a canary in its stack space and terminates when it detects code that attacks the canary. The processor level also limits the memory area where the executable code resides, making it harder for an attacker to insert executable code into the program’s stack space.

The stack and the heap

If I were to say that C is a language that partially supports garbage memory collection… You might think I’m out of my mind. In fact, all local variables in C, including Pointers, are “reclaimed” when they go out of scope. Does this count as memory garbage collection?

From a C program’s point of view, memory is not a large but often insufficient amount of space in bytes, not one, but two. One of these Spaces is called the stack and the other is called the heap. The storage space that can be “reclaimed” by C programs is stack space. That is, all local variables in a function occupy storage space on the stack. Perhaps more technically, all lvalues are in the stack space (I’m not sure if that’s true at all).

When a function finishes running, the stack space it occupies no longer belongs to it, but will be occupied by a new function to be run. So, in essence, the C program ignores stack space collection because it doesn’t recycle at all, but the old data is overwritten by the new data.

Heap space, we can’t access it directly in the program, we can only access it with Pointers. Because the memory address of heap space can be referenced by pointer. For example, when using malloc to allocate space, the base address of the allocated space is always stored in a pointer in the stack space.

The stack space is usually much smaller than the heap space, and even then it is almost impossible for a function to run out of stack space. If this happens, the programmers who created it should continue to learn C. When stack space is exhausted, it is often because some programs are written recursively, but the code may be wrong, causing recursion. Another possibility is that the recursion level is too deep, which can be solved by simulating a stack in heap space. There is also the case where a function defines an array so large that it cannot fit on the stack… This situation can always be solved by allocating heap space.

Data abstraction

Once you have some BASIC C programming and can understand the above, you can abstract various types of data.

Why do we abstract data? The construction and interpretation of computer programs of the introduction part of the chapter 2 gives a good answer, that is: many applications in the design is in order to simulate the complex phenomenon, because they often need to construct some operation object, in order to be able to simulate all aspects of the phenomena in the real world, you need to operand is expressed as some components of composite structure.

Let’s simulate any link of the bicycle chain:

struct chain_node {
        struct chain_node *prev;
        struct chain_node *next;
        void *shape;
};Copy the code

Then we can make three links, and then we can make the shortest car chain in the world:

struct chain_node a, b, c;

a.next = &b;
b.prev = &a;

b.next = &c;
c.prev = &b;

c.next = &a;
a.prev = &c;Copy the code

If you make a few more links, you can get a bigger chain, and you can make polygons of various shapes, but it’s best to use nameless memory space. The following code can create a chain with 1000 links:

struct chain_node *head = malloc(sizeof(struct chain_node));
struct chain_node *tail = head;
for (int i = 0; i < 1000; i++) {
        struct chain_node *new_tail = malloc(sizeof(struct chain_node));
        tail->next = new_tail;
        new_tail->prev = tail;
        tail = new_tail;
}
tail->next = head;
head->prev = tail;Copy the code

If we treat a, B, and C in the previous example as the three vertices of a triangle, the chain we created becomes a triangle. Similarly, the chain of 1000 links created above becomes a polygon with 1000 sides end to end. If you have studied topology, you can naturally see that any structure that is homeomorphic to a ring can be modeled based on a data structure like struct chain_node, and all we have to do is encapsulate three Pointers into one structure.

In fact, the third pointer to the struct chain_node void *shape is not used yet. This is a void * pointer, a favorite of programmers who like to play with abstractions in C code, because it can refer to the base address of any type of data in memory. This means that struct chain_node has the power to expand with shape Pointers.

Now, I’m going to make a very simple chain link, which is just a small rectangular piece of iron with two small round holes punched in it. I designed its data structure as:

struct point {
        double x;
        double y;
};

struct rectangle {
        double width;
        double height;
};

struct circle {
        struct point *center;
        double radius;
};

struct chain_node_shape {
        struct rectangle *body;
        struct circle *holes[2] ;
};Copy the code

Based on these data structures, I was able to write a function specifically for making small rectangular pieces of iron:

struct chain_node_shape *
create_chain_node_shape(struct circle *c1,
                        struct circle *c2,
                        struct rectangle *rect) 
{
        struct chain_node_shape *ret = malloc(sizeof(struct chain_node_shape));
        ret->body = rect;
        ret->holes[0] = c1;
        ret->holes[1] = c2;
        return ret;
}Copy the code

Then write the corresponding constructors for the two arguments accepted by create_chain_node_shape:

struct circle * create_circle(struct point *center, double radius) { struct circle *ret = malloc(sizeof(struct circle));  ret->center = center; ret->radius = radius; return ret; } struct rectangle * create_rectangle(double w, double h) { struct rectangle *ret = malloc(sizeof(struct rectangle)); ret->width = w; ret->height = h; return ret; }Copy the code

To make create_circle easier to use, create a struct point constructor:

struct point *
create_point(double x, double y)
{
        struct point *ret = malloc(sizeof(struct point));
        ret->x = x;
        ret->y = y;
        return ret;
}Copy the code

With all the necessary components in place, it is now possible to start production of a specific type of link, namely:

Struct chain_node * create_chain_node(void) {double radius = 0.5; Double left_x = 1.0; Double left_y = 1.0; struct point *left_center = create_point(left_x, left_y); struct circle *left_hole = create_circle(left_center, radius); Double right_x = 9.0; Double right_y = 1.0; struct point *right_center = create_point(right_x, right_y); struct circle *right_hole = create_circle(right_center, radius); Struct Rectangle *body = create_rectangle(struct rectangle *body); struct chain_node *ret = malloc(sizeof(struct chain_node)); ret->prev = NULL; ret->next = NULL; ret->shape = create_chain_node_shape(left_hole, right_hole, body); return ret; }Copy the code

Finally, the code of the manufacturing chain is slightly modified:

struct chain_node *head = create_chain_node();
struct chain_node *tail = head;
for (int i = 0; i < 1000; i++) {
        struct chain_node *new_tail = create_chain_node();
        tail->next = new_tail;
        new_tail->prev = tail;
        tail = new_tail;
}
tail->next = head;
head->prev = tail;Copy the code

Now the chain we’re simulating looks a little bit like the chain in real life. The above code is a bit verbose and will be refactored later, but I’ll summarize the use of Pointers in the above code.

Looking closely at the structures we define in the code above, they all have one common characteristic: all data types that are not built into C are structure types and are declared as pointer types when they are a structure member type. Why? If you really want to ask this question, take a look at the five create_xxx functions above. You will see that the arguments and return values of these create functions are also Pointers to the structure type. Putting these phenomena together, the following conclusions can be drawn:

  1. Using structure Pointers as function parameters and return values can avoid excessive memory copying during function calls.

  2. When a structure type is a member type of another structure, declaring the former as a pointer type avoids cumbersome dereferencing in the latter’s CREATE function.

  3. A void * pointer can refer to the base address of any type of data storage space. For example, in create_chain_node, we assign a struct chain_node_shape pointer to a void * shape pointer.

These three conclusions are the common practice of pointer in data abstraction, which not only relates to the design of data structure, but also to the construction of data structure and the design of destruction function. (For the sake of convenience, the above code does not define the destruction function of the data structure.)

Data abstraction

The code in the previous section was a bit verbose, so we can try to simplify it. Let’s start with the following three constructs and their create function:

struct point {
        double x;
        double y;
};

struct rectangle {
        double width;
        double height;
};

struct circle {
        struct point *center;
        double radius;
};

struct chain_node_shape {
        struct rectangle *body;
        struct circle *holes[2] ;
};

struct point *
create_point(double x, double y)
{
        struct point *ret = malloc(sizeof(struct point));
        ret->x = x;
        ret->y = y;
        return ret;
}

struct circle *
create_circle(struct point *center, double radius)
{
        struct circle *ret = malloc(sizeof(struct circle));
        ret->center = center;
        ret->radius = radius;
        return ret;
}

struct rectangle *
create_rectangle(double w, double h)
{
        struct rectangle *ret = malloc(sizeof(struct rectangle));
        ret->width = w;
        ret->height = h;
        return ret;
}

struct chain_node_shape *
create_chain_node_shape(struct circle *c1,
                        struct circle *c2,
                        struct rectangle *rect) 
{
        struct chain_node_shape *ret = malloc(sizeof(struct chain_node_shape));
        ret->body = rect;
        ret->holes[0] = c1;
        ret->holes[1] = c2;
        return ret;
}Copy the code

Obviously, the code looks a lot alike! Those four structures are all structures that store two members, and the corresponding CREATE function does nothing more than store the parameters accepted by the function into the members of the structure. Is there a way to represent them with very little code? There are!

Since each structure holds two members, let’s delete the above code and define a pair structure:

struct pair {
        void *first;
        void *second;
};Copy the code

In the pair structure, we use two void * Pointers so that we can be confident that a pair can store two data of any type. Next, just change the definition of the create_chain_node function:

struct chain_node * create_chain_node(void) { double *left_x = malloc(sizeof(double)); double *left_y = malloc(sizeof(double)); * left_x = 1.0; * left_y = 1.0; struct pair *left_center = malloc(sizeof(struct pair)); left_center->first = left_x; left_center->second = left_y; double *left_radius = malloc(sizeof(double)); * left_radius = 0.5; struct pair *left_hole = malloc(sizeof(struct pair)); left_hole->first = left_center; left_hole->second = left_radius; double *right_x = malloc(sizeof(double)); double *right_y = malloc(sizeof(double)); * right_x = 9.0; * right_y = 1.0; struct pair *right_center = malloc(sizeof(struct pair)); right_center->first = right_x; right_center->second = right_y; double *right_radius = malloc(sizeof(double)); * right_radius = 0.5; struct pair *right_hole = malloc(sizeof(struct pair)); right_hole->first = right_center; right_hole->second = right_radius; struct pair *holes = malloc(sizeof(struct pair)); holes->first = left_hole; holes->second = right_hole; struct pair *body = malloc(sizeof(struct pair)); double *width = malloc(sizeof(double)); * width = 10.0; double *height = malloc(sizeof(double)); * height = 2.0; body->first = width; body->second = height; struct pair *shape = malloc(sizeof(struct pair)); shape->first = body; shape->second = holes; struct chain_node *ret = malloc(sizeof(struct chain_node)); ret->prev = NULL; ret->next = NULL; ret->shape = shape; return ret; }Copy the code

I’ll be brave enough to admit that the create_chain_node function based on struct pair is ugly, but we’ve eliminated a lot of structs and constructors and reduced the overall code by about 1/6.

Looking closely at the above code, it becomes apparent that there is a high degree of duplication in the following three pieces of code:

double *left_x = malloc(sizeof(double)); double *left_y = malloc(sizeof(double)); * left_x = 1.0; * left_y = 1.0; struct pair *left_center = malloc(sizeof(struct pair)); left_center->first = left_x; left_center->second = left_y; double *right_x = malloc(sizeof(double)); double *right_y = malloc(sizeof(double)); * right_x = 9.0; * right_y = 1.0; struct pair *right_center = malloc(sizeof(struct pair)); right_center->first = right_x; right_center->second = right_y; struct pair *body = malloc(sizeof(struct pair)); double *width = malloc(sizeof(double)); * width = 10.0; double *height = malloc(sizeof(double)); * height = 2.0; body->first = width; body->second = height;Copy the code

All three pieces of code store two double * values into the pair structure. In this case, we can write a function that generates a pair structure for double *, that is:

struct pair *
pair_for_double_type(double x, double y)
{
        struct pair *ret = malloc(sizeof(struct pair));
        double *first = malloc(sizeof(double));
        double *second = malloc(sizeof(double));
        *first = x;
        *second = y;
        ret->first = first;
        ret->second = first;
        return ret;
}Copy the code

Then rebuild the create_chain_node function again:

Struct chain_node * create_chain_node(void) {struct pair *left_center = pair_for_double_type(1.0, 1.0); double *left_radius = malloc(sizeof(double)); * left_radius = 0.5; struct pair *left_hole = malloc(sizeof(struct pair)); left_hole->first = left_center; left_hole->second = left_radius; Struct pair *right_center = pair_for_double_type(9.0, 1.0); double *right_radius = malloc(sizeof(double)); * right_radius = 0.5; struct pair *right_hole = malloc(sizeof(struct pair)); right_hole->first = right_center; right_hole->second = right_radius; struct pair *holes = malloc(sizeof(struct pair)); holes->first = left_hole; holes->second = right_hole; Struct pair *body = pair_for_double_type(10.0, 1.0); struct pair *shape = malloc(sizeof(struct pair)); shape->first = body; shape->second = holes; struct chain_node *ret = malloc(sizeof(struct chain_node)); ret->prev = NULL; ret->next = NULL; ret->shape = shape; return ret; }Copy the code

Mountains and rivers doubt there is no way

Create_chain_node looks better after refactoring, but there are still two pieces of code that are highly repetitive:

Struct pair *left_center = pair_for_double_type(1.0, 1.0); double *left_radius = malloc(sizeof(double)); * left_radius = 0.5; struct pair *left_hole = malloc(sizeof(struct pair)); left_hole->first = left_center; left_hole->second = left_radius; Struct pair *right_center = pair_for_double_type(9.0, 1.0); double *right_radius = malloc(sizeof(double)); * right_radius = 0.5; struct pair *right_hole = malloc(sizeof(struct pair)); right_hole->first = right_center; right_hole->second = right_radius;Copy the code

But these two pieces of code can’t be simplified from the pair result body level alone, and I really don’t want to write an auxiliary function like the following:

struct pair *
create_hole(struct pair *center, double radius)
{
        struct pair *ret = malloc(sizeof(struct pair));
        double *r = malloc(sizeof(double));
        *r = radius;
        ret->first = center;
        ret->second = r;
        return ret;
}Copy the code

Although create_hole can simplify the above two duplicated pieces of code into:

Struct pair *left_center = pair_for_double_type(1.0, 1.0); Struct pair *left_hole = create_hole(left_center, 0.5); Struct pair *right_center = pair_for_double_type(9.0, 1.0); Struct pair *right_hole = create_hole(right_center, 0.5);Copy the code

But create_hole is very narrow in scope compared to pair_for_double_type. Since the pair_for_double_type function can store two double values in a pair structure, it can be used to create points and rectangles in our example, polar coordinates, complex numbers, and all conic equations in scientific calculations. But create_hole is only useful for creating car chains. That is, it is the success of the pair_for_double_type function that leads us to think that create_hole is in poor taste. We should consider whether there are other ways to eliminate the duplication of the code described above.

After careful analysis of the construction process of left_hole and right_hole, it is not difficult to find that the inconsistency of data types of hole center and RADIUS is the main reason that makes it difficult to simplify the above repeated code effectively. Create_hole is able to simplify this duplication of code significantly because it constructs a special pair structure — let’s call it X — based on our problem. The X structure is special in that its first pointer stores a pair structure of isomorphic type for double *, and its second pointer stores a base address of type double. The structure of X is so special that create_hole is so narrowly defined that only circles fit the structure.

Since this is a heterogeneous pair, we have implemented a function pair_for_double_type to create a pair that stores data of type double. The result of this function can be directly stored in a heterogeneous pair. Now all we’re missing is a function that can convert a double value into something that can be stored directly in a heterogeneous pair:

double *
malloc_double(double x)
{
        double *ret = malloc(sizeof(double));
        *ret = x;
        return ret;
}Copy the code

With this function, we can continue to simplify create_chain_node:

struct chain_node * create_chain_node(void) { struct pair *left_hole = malloc(sizeof(struct pair)); Left_hole - > first = pair_for_double_type (1.0, 1.0);; Left_hole - > second = malloc_double (0.5); struct pair *right_hole = malloc(sizeof(struct pair)); Right_hole - > first = pair_for_double_type (9.0, 1.0);; Right_hole - > second = malloc_double (0.5); struct pair *holes = malloc(sizeof(struct pair)); holes->first = left_hole; holes->second = right_hole; Struct pair *body = pair_for_double_type(10.0, 1.0); struct pair *shape = malloc(sizeof(struct pair)); shape->first = body; shape->second = holes; struct chain_node *ret = malloc(sizeof(struct chain_node)); ret->prev = NULL; ret->next = NULL; ret->shape = shape; return ret; }Copy the code

In addition, the pair_for_double_type function can be simplified based on malloc_double:

struct pair *
pair_for_double_type(double x, double y)
{
        struct pair *ret = malloc(sizeof(struct pair));
        ret->first = malloc_double(x);
        ret->second = malloc_double(y);
        return ret;
}Copy the code

In fact, if we had another function like this:

struct pair *
pair(void *x, void *y)
{
        struct pair *ret = malloc(sizeof(struct pair));
        ret->first = x;
        ret->second = y;
        return ret;
}Copy the code

Reate_chain_node can be simplified one more step:

Struct chain_node * create_chain_node(void) {struct pair *left_hole = pair(pair_for_double_type(1.0, 1.0), Malloc_double (0.5)); Struct pair *right_hole = pair(pair_for_double_type(9.0, 1.0), malloc_double(0.5)); struct pair *holes = pair(left_hole, right_hole); Struct pair *body = pair_for_double_type(10.0, 1.0); struct pair *shape = pair(body, holes); struct chain_node *ret = malloc(sizeof(struct chain_node)); ret->prev = NULL; ret->next = NULL; ret->shape = shape; return ret; }Copy the code

You see, a lot of seemingly unreducible code can be simplified with a little bit of perspective. This simplified process has been carried out with the help of the pointer, but in fact, when your attention has been focused on how to simplify the code, the use of Pointers is instinct as exist, so that you feel that you have not any power, with the help of a pointer is entirely your own logic in guiding your behavior. In this process, either object-oriented or template-oriented, it’s hard to save yourself from long code…

For what, you might lose what you don’t face

In the program of simulating car chain in the above article, I started to write it in an object-oriented way, so I created five structures, respectively describing two-dimensional points, rectangles, circles, shapes of chain sections and objects such as chain sections, but there was a lot of tedious code. Although object-oriented programming is very simple in mind, that is, we simulate what exists in reality. But if you think about it, a lot of things have commonalities, and if you’re stupid enough to simulate them one by one and ignore them, you’re going to end up with very bloated code.

Of course, object-oriented programming also advocates extracting commonalities from the things you emulate and then using inheritance to simplify your code. But once you believe in classes and inheritance, the best abstraction you can do is to abstract a class of things, like you can abstract a car, but you can’t abstract an airplane or a car. Obviously, planes and cars have something in common. For example, they both carry passengers, have dashboards, have Windows, have seats, and have attendants…

When I realized that all of my object-oriented constructs had a common feature — they all contained two members, it was natural to think that I should make a pair of any two types of constructs, and then use the pair to hold the data I needed. When the object-oriented programming paradigm is ingrained in your mind, this simple phenomenon is often overlooked, especially if you’re content to write programs that already work.

Then, when I tried to replace two-dimensional points, rectangles, circles, joint shapes, etc., with pairs, I started down the path of generics. There are no C++ templates available in C, so I have to rely on void *. In order to simplify the conversion of double data to void *, I define:

double *
malloc_double(double x)
{
        double *ret = malloc(sizeof(double));
        *ret = x;
        return ret;
}

struct pair *
pair_for_double_type(double x, double y)
{
        struct pair *ret = malloc(sizeof(struct pair));
        ret->first = malloc_double(x);
        ret->second = malloc_double(y);
        return ret;
}Copy the code

If you know anything about generic programming in C++, the pair_for_double_type function is essentially a pair specialization. Since I wanted a pair to store arbitrary types of data, but now I need to use it frequently to store a pair of doubles, I should create a dedicated pair structure.

When I found that I needed to frequently generate a pair instance and store some type of datastore base address in its first and second Pointers, I abstracted the commonality as:

struct pair *
pair(void *x, void *y)
{
        struct pair *ret = malloc(sizeof(struct pair));
        ret->first = x;
        ret->second = y;
        return ret;
}Copy the code

As a result, the create_chain_node function is defined cleanly:

Struct chain_node * create_chain_node(void) {struct pair *left_hole = pair(pair_for_double_type(1.0, 1.0), Malloc_double (0.5)); Struct pair *right_hole = pair(pair_for_double_type(9.0, 1.0), malloc_double(0.5)); struct pair *holes = pair(left_hole, right_hole); Struct pair *body = pair_for_double_type(10.0, 1.0); struct pair *shape = pair(body, holes); struct chain_node *ret = malloc(sizeof(struct chain_node)); ret->prev = NULL; ret->next = NULL; ret->shape = shape; return ret; }Copy the code

I used to write 104 lines of code in the object-oriented paradigm and 75 lines in the generic paradigm. Can I conclude, then, that generic programming saved object orientation? Of course not! Because we haven’t finished writing our program yet, we still need to be object-oriented.

Regression of objects

Create create_chain_node:

Struct chain_node * create_chain_node(void) {struct pair *left_hole = pair(pair_for_double_type(1.0, 1.0), Malloc_double (0.5)); Struct pair *right_hole = pair(pair_for_double_type(9.0, 1.0), malloc_double(0.5)); struct pair *holes = pair(left_hole, right_hole); Struct pair *body = pair_for_double_type(10.0, 1.0); struct pair *shape = pair(body, holes); struct chain_node *ret = malloc(sizeof(struct chain_node)); ret->prev = NULL; ret->next = NULL; ret->shape = shape; return ret; }Copy the code

The create_chain_node function creates a chain. It uses the abstract pair structure to encapsulate multiple types of data into a chain+node structure. And make them reproduce the reality of what they simulate?

For example, how do we get information about a left_hole from a chain_node structure? Obviously, the following code

struct *t = create_chain_node();
struct pair *shape = t->shape;
struct pair *holes = shape->second;
struct pair *left_hole = holes->first;Copy the code

It doesn’t solve our problem because there are only two void * Pointers in left_hole, and all we need to know is the center and radius of left_hole. So let’s move on:

struct pair *center = left_hole->first;
double radius = *((double *)(left_hole->second));Copy the code

This still doesn’t solve our problem, because we want the center of left_hole, not a center containing two void * Pointers, so we need to continue:

double center_x = *((double *)(center->first));
double center_y = *((double *)(center->second));Copy the code

We end up with three doubles, center_x, center_y, and radius, so it looks like we’re done, but how do you write that as a function, get_left_hole? C functions can have only one return value. Get_left_hole can be written if some value is returned from a function argument, for example:

void get_left_hole(struct chain_node *t, double *x, double *y, double *r)
{
        struct pair *shape = t->shape;
        struct pair *holes = shape->second;
        struct pair *left_hole = holes->first;
        struct pair *center = left_hole->first;
        *x = *((double *)(center->first));
        *y = *((double *)(center->second));
        *r = *((double *)(left_hole->second));
}Copy the code

But if you do, it just goes to show that no good programming language can save your taste.

We should continue to explore pointer functionality, and it would be better to define get_left_hole as follows:

struct point {
        double *x;
        double *y;
};
struct hole {
        struct point *center;
        double *radius;
};

struct hole *
get_left_hole(struct chain_node *t)
{
        struct pair *shape = t->shape;
        struct pair *holes = shape->second;
        return holes->first;
}Copy the code

What’s good about it? We take full advantage of the C compiler’s implicit conversion of data types, which is essentially a compile-time calculation by the C compiler. This prevents *((double *)(…) from appearing in code. ) such code. Void * Pointers can always be automatically converted to an lvalue by an assignment statement, as long as you ensure that the lvalue is of the same type as void *. This is one of the commandments of C, and no good programming language can save a programmer who fails to follow it.

C++ is a traitor, so no matter how powerful it is, it can’t save programmers who can’t guarantee that an lvalue type is the original type of void *. Using the C++ compiler forces the program to have to

struct pair *shape = t->shape;
struct pair *holes = shape->second;Copy the code

Written as:

struct pair *shape = (struct pair *)(t->shape);
struct pair *holes = (struct pair *)(shape->second);Copy the code

Otherwise the code will not compile. In addition to making the code more confusing, this still doesn’t save programmers who can’t guarantee that an lvalue is of the same type as void *. It only makes them afraid of bare Pointers and type conversions and leads them down the path of type-safe metaphysics. C++ 11 brings new smart Pointers and rvalue references, so let’s hope they get the new C++ style salvation.

Once we’ve implemented get_left_hole with an object-oriented approach, we can use it like this:

struct *t = create_chain_node();
struct hole *left_hole = get_left_hole(t);
printf("%lf, %lf, %lf\n", *(left_hole->center->x), *(left_hole->center->y), *(left_hole->radius));Copy the code

Everything is built on the pointer, except that the pointer needs to be dereferenced with * to output data.

A feature of the code above is that left_hole does not occupy memory, it is merely a re-reference to the memory space referenced by t. One might worry that left_hole’s ability to directly access the memory space referenced by T is dangerous… What’s the danger? You just need to know that left_hole is only a reference to another space, and this has been intuitive since you used Pointers. If you want to modify the data in the memory space referred to by left_hole, you can do it. If you don’t want to modify the data, you can not do it. If you do not intend to modify the data in the memory space referenced by left_hole, but you are worried that you or someone else might modify the data by mistake… You should write these concerns in the get_left_hole comment!

It’s overkill to avoid something from the syntactic level of a programming language that could have been largely avoided with a little attention. If we’re programming with implicit conversion of void * Pointers 99% of the time, why change the programming language for 1% of errors, filling it with all sorts of clever tricks and making the code even more obscure?

The author of C Traps and Pitfalls gives a good analogy. Have you ever cut your hand while cooking while using a kitchen knife? How to improve the kitchen knife to make it safer in use? Would you like to use such a modified kitchen knife? The author’s answer: It’s easy to think of ways to make a tool more secure, at the cost of making a tool that was once simple a little more complicated. Food processors usually have interlocking devices to protect the user’s fingers from injury. However, kitchen knife is different, if the simple and flexible tool can be installed to protect the finger device, it can only lose its simplicity and flexibility. In practice, the result may be a food processor rather than a kitchen knife.

I managed to skew the title of this section on the pointer. Now back up, let’s talk about partners. Get_left_hole returns a typicalized pointer to a generic pointer. With a typicalized pointer to a void * pointer in a pair, we can avoid the verbose process of casting a void * pointer to a pair.

Turn functions into data

Consider the create_chain_node function, which has been greatly simplified:

Struct chain_node * create_chain_node(void) {struct pair *left_hole = pair(pair_for_double_type(1.0, 1.0), Malloc_double (0.5)); Struct pair *right_hole = pair(pair_for_double_type(9.0, 1.0), malloc_double(0.5)); struct pair *holes = pair(left_hole, right_hole); Struct pair *body = pair_for_double_type(10.0, 1.0); struct pair *shape = pair(body, holes); struct chain_node *ret = malloc(sizeof(struct chain_node)); ret->prev = NULL; ret->next = NULL; ret->shape = shape; return ret; }Copy the code

This function is fine for our example, but it only produces links of a particular shape, which is clearly not general-purpose enough. If we wanted to change the shape of the link, for example replacing a rectangular piece of iron with two holes with an oval piece of iron with two holes, we would have to rewrite the create_elliptic_chain_node function. When we do this, we can easily see that the create_elliptic_chain_node function also needs the following code:

        struct chain_node *ret = malloc(sizeof(struct chain_node));
        ret->prev = NULL;
        ret->next = NULL;
        ret->shape = shape;
        return ret;Copy the code

If we were to produce 100 shapes of links, the above code would have to be repeated 100 times in different implementations of the link constructor, which would not be good enough, since 500 lines of code would be repeated. Too much duplicate code, which is the ultimate humiliation of the programmer.

An object-oriented programmer might imagine that we could make a base class for chain_node, encapsulate the common code above in the constructor of the base class, and then make different shaped links in the constructors of each chain_node’s derived class… Before you complicate things, it’s recommended to look at code like this:

Void * rectangle_shape(rectangle_shape) {struct pair *left_hole = pair(pair_for_double_type(1.0, 1.0), malloc_double(0.5)); Struct pair *right_hole = pair(pair_for_double_type(9.0, 1.0), malloc_double(0.5)); struct pair *holes = pair(left_hole, right_hole); Struct pair *body = pair_for_double_type(10.0, 1.0); return pair(body, holes); } struct chain_node * create_chain_node(void *(*fp)(void)) { struct chain_node *ret = malloc(sizeof(struct chain_node));  ret->prev = NULL; ret->next = NULL; ret->shape = fp(); return ret; }Copy the code

See, I took create_chain_node out of its rectangle_shape definition, packaged it into a rectangle_shape function, and made create_chain_node take a pointer. Thus, when we need to create a rectangular link with two small holes, just:

struct chain_node *rect_chain_node = create_chain_node(rectangle_shape);Copy the code

If we want to create an ellipse with two small holes, we can define an elliptic_shape function and pass it to create_chain_node as an argument:

struct chain_node *elliptic_chain_node = create_chain_node(elliptic_shape);Copy the code

Wouldn’t that be cleaner and more efficient than a bunch of classes and inherited code?

In C, a function name is also a pointer that refers to the base address of the memory space in which the function code resides. So we can pass a rectangle_shape as an argument to create_chain_node, and then call the rectangle_shape in the latter.

Since we have defined the shape pointer in the chain_node structure as a void * pointer, it is ok to return void * for the function accepted by create_chain_node. More importantly, void *(*fp)(void) is an abstraction for all functions that take no arguments and return pointer data. This means that for a chain shape, no matter how special, we can always define a function that takes no arguments and returns a pointer to produce that shape, so the create_chain_node function is infinitely scalable.

If Archimedes were still alive, perhaps he would boldly say, give me a pointer to a function and a void star, and I will describe the universe!

Basic principles of code simplification

When you code with the world view that everything is an object, finding common data abstractions between classes often means creating a generic data container and then using the combination of that container and the concrete type of data to eliminate those classes.

When you want to fetch data from a generic data container that visually simulates something in the real world, this often means creating data structures into which the data stored in the generic data container can be converted into typed, named data. These data structures are similar to the various observers or parsers that we use to interpret or modify data in generic containers.

When a function f has a part of the code that is relevant to a particular problem and a part of the code that is not. To make this function extensible, you need to extract the code that is relevant to the specific problem into specialized functions that you then pass to F.

There is a cost to avoiding C Pointers

In C, the pointer is the simplest and sharpest tool for implementing these basic principles, like the famous chef’s knife. In statically typed languages, any attempt to avoid Pointers is bound to complicate the syntax of the programming language or weaken its expressiveness.

References — essentially weakened Pointers — were invented in C++ to avoid Pointers. As a result, C++ beginners often ask the question “when to use Pointers and when to use references”. Before the advent of smart Pointers, generic containers provided by STL could not store references, and Pointers were often stored in containers to avoid excessive memory copying when storing objects in containers. When a function creates a large object internally and wants to pass that object to another object, without Pointers, it returns the large object as its value, triggering the object’s data to be copied more than once. If you create a large object in a function and then return it as a pointer, this again contradicts the C++ ideal of hiding Pointers from the user… In order to solve this problem, in C++ 11, we finally developed a complicated and twisted rvalue reference method, which solved the problem of using Pointers secretly in the copy constructor of a class, but the user of the class could not see the pointer…

Java’s pointer avoidance strategy is a bit more sophisticated than C++’s. In Java, there are no Pointers and no references. Whenever an instance (object) of a class is passed as an argument to a function, as a return value of a function, or copied to another object of the same class, it is passing an address, not a value. That is, Java uses all class instances as potential Pointers, and only the primitive types are passed as values. This explicit distinction between data types is admirable, but when Java wants to pass one function (method) to another, the code gets twisted and is not as straightforward as passing functions as Pointers in C.

C# seems to handle Pointers much better than C++ or Java, but classifying code that uses Pointers as unsafe is discriminatory. Similar to “excuse, eat!” . A man of morality receives no alms.

In dynamically typed languages, such as Python, everything is said to be referenced, which is fine. You can also pass a function as an argument directly to another function, or even better, return a function within a function. Dynamically typed languages largely outperform statically typed languages like C, C++, and Java in terms of syntax, abstraction, type safety, and resource management, but programs written in dynamically typed languages tend to be twice as slow as statically typed languages.

There is no perfect pointer, and there is no perfect programming language, all because we program on machines, not in our heads.

C program ape pointer creed

Tampering with riflemen’s Creed for your own amusement.

This is my pointer. There are many similar ones, but this one is mine. My pointer is my best friend, like my life. I will use it as if IT were my life. Without me, I’m useless. Without me, I’m useless. I will use my pointer accurately, I will use it better than the enemy, I will pass him before he can outpace me, I will pass him.

My pointer and I know that programming doesn’t matter how elegant the language, how powerful the standard library, how powerful the programming paradigm. It only makes sense to solve the problem. We’ll figure it out.

My pointer is human, like me, because it is my life. So I will know it like a brother. I will learn its weaknesses, its strengths, its composition, what it points to and what points to it. I will continue to build up the perfect knowledge and skill of the Pointers so that they are as ready to go as I am. We’ll be a part of each other.

I swear this creed before God. My pointer and I are the computer’s guardians, we are the problem busters, and we will save my program. Hopefully, until there’s no programming, no problems, just rest.