Pay attention to + star standard public number, miss the latest article





[TOC]

One, foreword

1. Why write this article

On Saturday, I sent an article in the public number: C language pointer – from the underlying principle to design techniques, using graphic and code to help you to explain, in plain language, be clear at a glance the pictures to explain the pointer to the underlying logic, there is a friend to test the code, found it’s an odd question. I sent the test code to verify, pondering for a long time can not explain why there is such a strange print results.

To clear my head, I went out on the balcony for a smoke. The wind at night is very strong, A cigarette I smoke half, the wind smoke half, the wind may also have their own troubles. Later I thought, I bought the cigarettes, why let the wind smoke? So I started having convulsions! No, I started to go back to my room and continue to extract code, I don’t believe, such a simple printf statement, how can not solve? !

Hence the article.

2. What do you get out of it

  1. Function parameter transfer mechanism;
  2. The realization principle of variable parameters (Va_list);
  3. The implementation mechanism of printf function;
  4. Analytical thinking in the face of problems.

Friendly reminder: most of the content in the front of the article is in the record of thinking about the problem, the idea of solving the problem, if you are not interested in this process, you can directly jump to the last part of the fourth, with the picture clearly explained the realization principle of variable parameters, after reading once, you can remember.

3. My test environment

3.1 Operating System

Everyone’s computer environment is different, including the operating system, compiler, compiler version, and maybe any small difference can lead to some strange phenomena. However, most people use the VS integrated development environment on Windows or the GCC command line window on Linux.

I generally test my code using ubuntu 16.04-64, which is where all the code in this article was tested. If you use the VC compiler in the VS development environment, you may be different from my test results in some details, but the problem is not serious. If you encounter problems, you can analyze them again. After all, solving problems is the fastest way to improve your ability.

3.2 compiler

The compiler I used was ubuntu 16.04-64, and it looks like this:

In addition, I have installed a 64-bit system, and in order to compile the 32-bit executable, I added the -m option to the compile instruction, which looks like this:

gcc -m32 main.c -o main
Copy the code

Use the file main command to look up the compiled executable:

So, when you test, if the output is slightly different from what you expect, check the compiler first. C language is essentially a number of standards, every compiler is the implementation of the standard, as long as the result meets the standard, as for the implementation process, code execution efficiency is different.

Two, problem introduction

1. User help code

#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>

typedef struct 
{
    int age;
    char name[8];
} Student;

int main()
{
    Student s[3] = {{1, "a"}, {2, "b"}, {3, "c"}};
    Student *p = &(s[0]);
    printf("%d, %d \n", *s, *p);
}
Copy the code

2. Expect results

From the discussion in the previous article, we know that:

  1. S is an array of three elements, each of which is of type Student;
  2. P is a pointer to the variable S, that is, the pointer p holds the address of the variable S, because the array name is the first address of the array.

Since s is also an address, it also represents the first address of the first element in the array. The first element type is a structure, and the first variable in the structure is of type int, so the position represented by s is an int, which corresponds to the number 1 in the example code. So the printf statement wants to print the address directly as an int, with the desired result: 1, 1.

There seems to be nothing wrong with this analysis process.

3. Print the actual result

Let’s compile the program and print a warning:

Printf requires int data, but passes a Student struct. Ignore the warning because we want to access the address through a pointer.

Execute the program, see the actual print result is: 1, 97, it is a pity, inconsistent with our expectation!

Third, the idea of analyzing the problem

1. Print the memory model

As can be seen from the print result, the first output number is 1, which is in line with the expectation; The second output, 97, is clearly the ASCII value of the character ‘a’, but how could p refer to the address of the name variable?

First, confirm three things:

  1. How much memory does the Student structure occupy?
  2. What is the model for memory in array S?
  3. Are the values of S and the pointer variable P correct?

Change the code to the following:

Student s[3] = {{1, "a"}, {2, "b"}, {3, "c"}};
Student *p = s;

printf("sizeof Student = %d \n\n", sizeof(Student));

printf("print each byte in s: ");
char *pTmp = p;
for (int i = 0; i < 3 * sizeof(Student); i++)
{
   if (0 == i % sizeof(Student))
        printf("\n");
   printf("%x ", *(pTmp + i));
}
printf("\n\n");

printf("print value of s and p \n\n");
printf("s = 0x%x, p = 0x%x \n\n", s, p);

printf("%d, %d \n", *s, *p);
Copy the code

Let’s draw the expected memory model of array S as follows:

Compile and test, and print the following results:

From the printed result:

  1. The Student structure takes up 12 bytes, as expected.
  2. The memory model for array S is also as expected, occupying 36 bytes in total.
  3. Both S and P represent the same address, and the printed result is the same as expected.

That would be a heck: since s and p represent the same memory address, why does *p get the value of the character ‘a’ when reading int?

2. Print the information separately

Since the first *s print result is correct, then print the two data separately, the test code is as follows:

Student s[3] = {{1, "a"}, {2, "b"}, {3, "c"}};
Student *p = s;

printf("%d \n", *s);
printf("%d \n", *p);
Copy the code

Compile and test, and print the following results:

Print results as expected! The int in the target address can be read correctly in two print statements, but not in one statement!

In fact, at this point, you can probably tell the reason for the printf statement. The value of p is affected by the printf statement after the first number is printed in the execution process. However, it is not clear how the effect is affected, and it is a library function in the system.

Google the keyword “glibc printf bug” and you’ll find a lot of information about it, but you’ll have to keep thinking about it if you look around and see nothing similar to our test code.

Step by step, analyze the root cause of the problem

3.1 Print the simplest string

Since the problem is caused by printing two pieces of data in the printf statement, I will simplify the problem and use the simplest possible string to test it, as follows:

char aa[] = "abcd";
char *pc = aa;
printf("%d, %d \n", *pc, *pc);
Copy the code

Compile, execute, and print: “97, 97”, exactly! This indicates that the printf statement does not change the address to which the pointer variable points.

3.2 Printing a structure variable

Since there is no problem with testing on strings, the problem is with the structure type. Let’s continue with the struct variable test, because the above test code is an array of struct variables, now let’s remove the array effect, only a single struct variable test:

Student s = {1, "a"};
  
printf("%d \n", s); 
printf("%d, %d \n", s, s); 
Copy the code

Note that s is a variable, not an array, so there is no need to print with the * operator. Compile, execute, output:

The output is the same as the previous error, so you can conclude that the problem is at least not related to the array!

Now we have two variables in the tested structure: age and name. Let’s continue simplifying and keep only ints, which makes it easier to simplify the problem.

3.3 Test simpler structure variables

The test code is as follows:

typedef struct _A
{
   int a;
   int b;
   int c;
}A;

int main()
{
    A a = {10, 20, 30};
    printf("%d %d %d \n", a, a, a);
}
Copy the code

Compile, execute, print result: 10, 20, 30, print out 3 member variable values, too weird! It looks like in memory, starting with the first member variable, it increments automatically and then it gets an int.

So I removed the last two parameters a and tested the following code:

A a = {10, 20, 30};
printf("%d %d %d \n", a);
Copy the code

Compile, execute and print the same: 10, 20, 30! I’m going crazy by this time, mainly because it’s so late and I don’t like staying up late.

So the brain starts to slack off, turns to Google for help again, and actually finds this page: https://stackoverflow.com/questions/26525394/use-printfs-to-print-a-struct-the-structs-first-variable-type-is-char. If you are interested, you can open it and browse it.

Use printf to print undefined behavior! Undefined behavior means that anything can happen, depending on how the compiler implements it.

It seems that I have found the cause of the problem: IT turns out that I am not knowledgeable enough to know that printing structure variables is undefined.

A few more tips:

  1. When we write a program, most of the knowledge we have in our heads is correct, so most of the code we write is correct, and we can’t write weird code on purpose. For example, printing structure information, the normal idea is to print out the member variables in the structure according to the corresponding data types.
  2. Occasionally, however, you make a silly mistake, like the one you encountered in this case: printing a structure variable directly. Because of an error, we realized that printing structure variables directly was an undefined behavior. Of course, this is also a way to gain knowledge.

This is where it looks like it’s over. But I’m still a little bit worried. Since it’s undefined behavior, why is the print-out so consistently wrong? Since this is determined by the compiler’s implementation, how does the version of GCC I’m using internally print structure variables?

So I kept looking…

3.4 Continue to print structure variables

The members of the structure A are ints, and each int occupies 4 bytes in memory, so the data printed just now spans exactly 4 bytes. If it is a string, it will print across 4 bytes, so change the test code to the following:

typedef struct _B
{
   int a;
   char b[12];
}B;

int main()
{
    B  b = {10, "abcdefgh"};
    printf("%d %c %c \n", b);
}
Copy the code

Compile, execute, and print the following results:

Sure enough, the character A crosses four lines between the number 10, and the character E crosses four bytes between the character A and a. This means that printf statements may be executed in units of int size (4 bytes) across the memory and then read the data from the memory address following the percent sign (%).

To verify this idea, test the following code:

Student s = {1, "aaa"}; char *pTmp = &s; for (int i = 0; i < sizeof(Student); i++) { printf("%x ", *(pTmp + i)); } printf("\n"); printf("%d, %x \n", s);Copy the code

Compile, execute and print the result as follows:

The output does exactly that: memory after the number 1 holds three ‘a’ characters, and the second print is in % X format, so it reads as an integer, resulting in a hexadecimal number of 616161.

GCC: undefined behavior = undefined behavior = undefined behavior = undefined behavior When a printf statement is called, the number and type of the parameters passed in is not fixed.

Variable parameters in C language

The following data types and functions (macro definitions) are used to implement mutable parameters in C:

  1. va_list
  2. va_start
  3. va_arg
  4. va_end

The process for handling dynamic parameters is the following four steps:

  1. Define a variable va_list arg;
  2. Call va_start to initialize the arG variable. The second argument passed is the variable before the mutable argument (three points);
  3. Extract variable arguments using the va_arg function: Loop to extract each variable from the ARG, with the last argument specifying the data type to extract. For example, if the format string is %d, an int is extracted from the mutable argument. If the format string is %c, a char is extracted from the mutable argument.
  4. After data processing is complete, use vA_end to release the ARG variable.

The text can seem a little abstract and complex, but it’s easy to understand if you look at the three examples below and then go back to the four steps above.

1. Examples of three functions using mutable parameters

Example 1: The parameter type is int, but the number of parameters is not fixed
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdarg.h>

void my_printf_int(int num,...)
{
    int i, val;
    va_list arg;
    va_start(arg, num);
    for(i = 0; i < num; i++)
    {
        val = va_arg(arg, int);
        printf("%d ", val);
    }
    va_end(arg);
    printf("\n");
}

int main()
{
    int a = 1, b = 2, c = 3;
    my_printf_int(3, a, b, c);
}
Copy the code

Compile, execute, and print the following results:

Example 2: The parameter type is float, but the number of parameters is not fixed
#include <unistd.h> #include <stdio.h> #include <stdlib.h> #include <stdarg.h> void my_printf_float (int n, ...) { int i; double val; va_list vl; va_start(vl,n); for (i = 0; i < n; i++) { val = va_arg(vl, double); printf ("%.2f ",val); } va_end(vl); printf ("\n"); } int main() {float f1 = 3.14159, f2 = 2.71828, f3 = 1.41421; my_printf_float (3, f1, f2, f3); }Copy the code

Compile, execute, and print the following results:

Example 3: The parameter type is char*, but the number of parameters is not fixed
#include <unistd.h> #include <stdio.h> #include <stdlib.h> #include <stdarg.h> void my_printf_string(char *first, ...) { char *str = first; va_list arg; va_start(arg, first); do { printf("%s ", str); str = va_arg(arg, char*); } while (str ! = NULL ); va_end(arg); printf("\n"); } int main() { char *a = "aaa", *b = "bbb", *c = "ccc"; my_printf_string(a, b, c, NULL); }Copy the code

Compile, execute, and print the following results:

Note: In all three examples, the number of arguments passed in is not fixed, but the types of arguments must be the same!

In addition, the handler must be able to know how many arguments were passed. Functions that handle int and float are determined by the first argument, and functions that handle char* by the last mutable argument, NULL.

2. Principle of variable parameters

2.1 Several macro definitions of variable parameters
typedef char *    va_list;

#define va_start  _crt_va_start
#define va_arg    _crt_va_arg  
#define va_end    _crt_va_end  

#define _crt_va_start(ap,v)  ( ap = (va_list)_ADDRESSOF(v) + _INTSIZEOF(v) )  
#define _crt_va_arg(ap,t)    ( *(t *)((ap += _INTSIZEOF(t)) - _INTSIZEOF(t)) )  
#define _crt_va_end(ap)      ( ap = (va_list)0 )  
Copy the code

Note that va_list is a char* pointer.

2.2 Processing process of variable parameters

Using the my_printf_int function as an example, let’s repaste it:

void my_printf_int(int num, ...) // step1
{
    int i, val;
    va_list arg;
    va_start(arg, num);         // step2
    for(i = 0; i < num; i++)
    {
        val = va_arg(arg, int); // step3
        printf("%d ", val);
    }
    va_end(arg);                // step4
    printf("\n");
}

int main()
{
    int a = 1, b = 2, c = 3;
    my_printf_int(3, a, b, c);
}
Copy the code

Step1: when a function is called

When a function is called in C, the arguments are pushed on the stack one by one from right to left, so when entering the function body of my_printF_int, the stack layout looks like this:

Step2: va_start execution

va_start(arg, num);
Copy the code

Replace the above statement with the following macro definition:

#define _crt_va_start(ap,v)  ( ap = (va_list)_ADDRESSOF(v) + _INTSIZEOF(v) ) 
Copy the code

After macro expansion, we get:

arg = (char *)num + sizeof(num);
Copy the code

Consider the following chart: Char * num = 0x01020304; char* num = 0x01020304; char* num = 0x01020304 So the arg pointer points to the address of the number 1 on the stack, which is the first parameter, as shown below:

Step3: va_arg execution

val = va_arg(arg, int); 
Copy the code

Replace the above statement with the following macro definition:

#define _crt_va_arg(ap,t)    ( *(t *)((ap += _INTSIZEOF(t)) - _INTSIZEOF(t)) )  
Copy the code

After macro expansion, we get:

val = ( *(int *)((arg += _INTSIZEOF(int)) - _INTSIZEOF(int)) )  
Copy the code

Arg = 0x01020308; arG = 0x01020308; After subtracting 4 bytes from the address (0x01020308), the value in the resulting address (0x01020304) is forcibly converted into an int and assigned to val, as shown in the following figure:

Get the int that the arG points to, and then point the arG to the next parameter at the high address.

Va_arg can be called repeatedly until all the parameters passed in by the function in the stack are retrieved.

Step4: va_end execution

 va_end(arg); 
Copy the code

Replace the above statement with the following macro definition:

#define _crt_va_end(ap)      ( ap = (va_list)0 )  
Copy the code

After macro expansion, we get:

arg = (char *)0;
Copy the code

This makes sense, setting the pointer arG to null. Since all dynamic parameters in the stack are extracted, the value of arg is 0x01020310(the last address of the last parameter), if not set to NULL, the following results are unknown. To prevent misoperation, set to NULL.

3. Printf prints information using variable parameters

Now that you understand the processing mechanism of variable parameters in C, it’s easy to think about the implementation mechanism of printf statements.

3.1 Printf code in GNU
__printf (const char *format, ...)
{
   va_list arg;
   int done;

   va_start (arg, format);
   done = vfprintf (stdout, format, arg);
   va_end (arg);

   return done;
}
Copy the code

The vfprintf function will eventually call sys_write to output data to the STDout device (display). The vfprintf function code still looks a bit complicated, but a little analysis will give you an idea of how it should be implemented:

  1. Aligns each character in the formatted string one by one;
  2. If it is a normal character, it is printed directly;
  3. If it is a format character, it reads data from a variable parameter according to the specified data type, and displays the output.

The above is only a very rough idea, the implementation details will be much more complex, need to consider all kinds of details. Here are two simple examples:

void my_printf_format_v1(char *fmt, ...) { va_list arg; int d; char c, *s; va_start(arg, fmt); while (*fmt) { switch (*fmt) { case 's': s = va_arg(arg, char *); printf("%s", s); break; case 'd': d = va_arg(arg, int); printf("%d", d); break; case 'c': c = (char) va_arg(arg, int); printf(" %c", c); break; default: if ('%' ! = *fmt || ('s' ! = *(fmt + 1) && 'd' ! = *(fmt + 1) && 'c' ! = *(fmt + 1))) printf("%c", *fmt); break; } fmt++; } va_end(arg); } int main() { my_printf_format_v1("age = %d, name = %s, num = %d \n", 20, "zhangsan", 98); }Copy the code

Compile, execute, output:

Perfect! But test the following code (change the end of the formatted string num to score) :

my_printf_format_v1("age = %d, name = %s, score = %d \n", 
        20, "zhangsan", 98);
Copy the code

Compile, execute, output:

An error occurred because the character S in the ordinary string score was caught by the first case. A slight improvement:

void my_printf_format_v2(char *fmt, ...) { va_list arg; int d; char c, lastC = '\0', *s; va_start(arg, fmt); while (*fmt) { switch (*fmt) { case 's': if ('%' == lastC) { s = va_arg(arg, char *); printf("%s", s); } else { printf("%c", *fmt); } break; case 'd': if ('%' == lastC) { d = va_arg(arg, int); printf("%d", d); } else { printf("%c", *fmt); } break; case 'c': if ('%' == lastC) { c = (char) va_arg(arg, int); printf(" %c", c); } else { printf("%c", *fmt); } break; default: if ('%' ! = *fmt || ('s' ! = *(fmt + 1) && 'd' ! = *(fmt + 1) && 'c' ! = *(fmt + 1))) printf("%c", *fmt); break; } lastC = *fmt; fmt++; } va_end(arg); } int main() { my_printf_format_v2("age = %d, name = %s, score = %d \n", 20, "zhangsan", 98); }Copy the code

Compile, execute, print results:

Five, the summary

Let’s repeat the above analysis, starting with code that was meant to test Pointers, and ending up analyzing variable arguments in C. It can be seen that analyzing problems, locating problems and solving problems is a series of thinking process. After going through this process, understanding will be more profound.

I have another feeling: if I had not written the public number, I would not have written this article; If I hadn’t written this article, I wouldn’t have studied it so seriously. Maybe somewhere in the middle, I’ll be lazy and say to myself, “This level of understanding is enough, don’t go any further. Therefore, it is very beneficial to output your thinking process in the form of an article, and I strongly recommend you to try this.

Moreover, if these thought processes can be recognized by you, THEN I will be more motivated to summarize and output the article. Therefore, if this summary can be of any help to you, please forward and share it with your technical friends. Thank you very much!

In addition, if there are any mistakes in the article, you are welcome to discuss them in the comment area. Or add my personal wechat, so that communication will be more timely and efficient.

Good luck!



Author: Doggo (public id: IOT Town) Zhihu: Doggo B station: Doggo share Nuggets: Doggo share CSDN: Doggo share

















[1] C Pointers – From basic principles to fancy tricks, Use graphics and code to help you explain thoroughly [2] step by step analysis – how to use C to achieve object-oriented programming [3] the original GDB underlying debugging principle is so simple [4] producer and consumer mode of double buffering technology [5] about encryption, certificate of those things [6] in-depth LUA scripting language, let you thoroughly understand the debugging principle