Compilation and linking

First, let’s look at an example

#include <stdlib.h>
#include <stdio.h>
#include <assert.h>

int main(a) {
    void* mem = malloc(400); assert(mem ! =NULL);
    printf("Yay! hi");
    free(mem);
    return 0;
}
Copy the code

If you use the GCC compiler, two files are generated: mam.o and a.out. As shown below, the name is custom.

During linking, different.o files and standard library files are combined to produce an executable file (.out).

Now let’s make some changes to the current code to see what happens to compilation and linking.

If we comment out the #include

line, most people will assume that the compiler will report an error. Because if the stdio.h header is not included, the file will not contain the declaration of the printf function after preprocessing. But this is true for some compilers, and for GCC, it will only report warnings, but no errors.

This is because the GCC compiler determines whether the current statement looks like a function call, from which it can infer the function prototype. For example, in the case of the printf function, where the GCC compiler finds no corresponding function declaration, it will issue a warning and assume that the function’s argument is a string argument, and that the function call is fault-free as long as it conforms to the function’s definition (GCC assumes that the return value of the function is of type int). But if you call printf again after the current one, the argument must also be a string argument, because the prototype is a bit different from the actual prototype.

If we comment out the stdlib.h header, the compiler will issue three warnings. Void * mem = malloc(400); void* mem = malloc(400); There are two warnings: a missing function declaration and an assignment of int to void*. When free is executed, a warning is generated that the function declaration is missing.

This works because all the header file does is tell the compiler what function prototypes there are, not where the code for those functions is. The link stage is responsible for finding the code in the standard library.

If we comment out #include

, the compiler will assume that assert(mem! = NULL) is a function that takes a Boolean and returns an int, so only warnings are raised at compile time. During linking, an error is reported because the corresponding function does not exist in the standard library.

The reason for the stereotype is to allow the caller and the caller to agree on the layout of the activity record on savedPC (that is, to make the call parameters of the function conform to the call parameter type specification). The prototype really only involves the parameter above the saved PC in the active record, and the part below the saved PC is the responsibility of the caller. When we call printf function corresponding to the code, we need to ensure that records the format of the upper middle part information corresponding to the caller and callee is consistent, I will be the address of a string constants as a parameter, and then by the caller to take over the continue, will take over after the printf function according to the function prototype form for processing.

int main(a) 
{
    int num = 65;
    int length = strlen((char*)&num, num); 
    printf("length = %d\n", length);
    return 0;
}
Copy the code

You might expect strlen to have only one argument when linking. But that’s not the case. There’s no record in the.o file of how many arguments a call takes. When linking, GCC only looks at symbolic names, not parameter types. This way, the function call doesn’t correspond to the signature, but the link stage doesn’t care about that. All the link stage does is look up strlen’s definition. When the assembly code jumps to the Strlen function, since there is only one char* argument in the strlen function, the corresponding assembly code only operates on that part of the diagram

Corresponding to the above code, the teacher’s original words are only a warning, but I will make mistakes in the process of testing. To solve this problem, use the following code.

int strlen(char *s, int len);

int main(a) {
    int num = 65;
    int length = strlen((char*)&num, num);
    printf("length = %d\n", length);
    return 0;
}
Copy the code

For big-endian systems, length is 0 because num is read as 0x00000041, which is 0 when the first byte is read. For small endian systems, length is equal to 1, because num is read as 0x41000000 and 0 is read as the second byte.

int memcmp(void* v1);

{
    int n = 17;
    int m = memcmp(&n);
}
Copy the code

For the above code, the program does not report errors during compilation, but unknown errors may occur during execution. This is because the function is defined as follows

int memcmp(void *v1, void* v2, size_t num) {... }Copy the code

The function takes the 12 bytes above the saved PC as arguments. (32)

As a result, you can see that the GCC compiler gives us more freedom and is more prone to errors.

Also, function overloading is possible in C ++, but not in C. This is because c uses function names directly as symbols at compile time, whereas C ++ generates based on arguments and their corresponding types.

C: CALL <memcmp>
C++: CALL <memcmp-void-p-void-p-int>
Copy the code

From the above we can see that when the parameter type in c++ does not match the definition, the resulting symbol is also different, so c++ is safer in this respect.

Common mistakes

Seg fault: Occurs when a pointer is dereferenced. bus errors

For bus Errors, let’s look at an example

void*vp = ... ; * (short*)vp = 7;
Copy the code

Bus errors occur when vp is an odd address. This is because the hardware will expect the starting address of the short type to be an even address.

* (int*)vp = 4;
Copy the code

Bus errors also occur if the start address of the VP is not a multiple of 4.

Buffer overflow

int main(a) {
    int i;
    int array[4];
    for(i = 0; i <= 4; i++)
        array[i] = 0;
    return 0;
}
Copy the code

When array[4] = 0 is executed, I is again assigned to 0, so an infinite loop is performed.

int main(a) {
    int i;
    short array[4];
    for(i = 0; i <= 4; i++)
        array[i] = 0;
    return 0;
}
Copy the code

For big-endian systems, this works fine; But the small end system, you get stuck in an infinite loop. You can try to analyze it for yourself.

int main(a) {
    int array[4];
    int i;
    
    for(int i = 0; i <= 4; i++)
        array[i] -= 4;
       
    return 0;
Copy the code

For this program, the array[4] location is saved PC, the location of the line following the Call function. When operation -4 is performed, the saved PC points to the Call Function. As a result, the program gets bogged down in infinite function calls.

Note: The memory model here is defined by the teacher himself.