There is no doubt that learning the basics of a programming language is not as much fun as writing programs. However, not knowing the basics of the language can make writing programs less fun.

2.1 environment

In either implementation of ANSI C, there are two different environments. The first is a translation environment, in which source code is translated into executable machine instructions. The second is the execution environment, which is used to actually execute the code. The standard makes it clear that the two environments need not be on the same machine. A cross compiler, for example, runs on one machine, but produces executable code that runs on a different type of machine. The same goes for operating systems. The standard also discusses freestanding environments, which are environments where there is no operating system. You may encounter this type of environment in embedded systems such as microwave oven controllers.

2.1.1 translation

The translation stage consists of several steps in which each (possibly multiple) source file that makes up a program is converted into object code through the compilation process. The various object files are then bundled together by linker to form a single and complete executable program. The linker also introduces any functions from the standard C library that are used by the program, and it can search the programmer’s personal library to link functions that need to be used in the program. Figure 2.1 illustrates this process.

 

Figure 2.1 Compilation process

The compilation process itself consists of several stages, starting with the preprocessor processing. At this stage, the preprocessor performs some text operations on the source code. For example, replace symbols defined by the #define directive with actual values and read in the contents of files included by the #include directive.

The source code is then parsed to determine the meaning of its statements. Phase 2 is where most of the errors and warnings are generated. The object code is then generated. Object code is a preliminary form of machine instructions used to implement the statements of a program. If we add an option for optimization to the command line of the compiler, the Optimizer takes the object code further and makes it more efficient. The optimization process takes extra time, so it is generally not done until the program is debugged and ready for production. It doesn’t matter to us whether the object code is generated directly or exists first as assembly language statements and then compiled into object files in a separate phase.

I. File name conventions

Although there are no rules for naming files, most environments have a convention for naming file names that you must follow. C source code is usually stored in files with a.c extension. Files included into C source code by the #include directive are called header files and usually have the extension.h.

As for the target filename, different environments may have different conventions. For example, on UNIX systems they have the extension.o, but on MS-DOS systems they have the extension.obj.

Compiling and linking

The specific commands used to compile and link C programs vary from system to system, but many are similar to the two systems described here. On most UNIX systems, the C compiler is called CC, and it can be called in a number of different ways.

1. Compile and link a C program entirely contained in a source file:

cc program.cCopy the code

This command produces an executable program called A. out. An object file named program.o is generated in the middle, but it is deleted after the linking process is complete.

2. Compile and link several C source files:

cc main.c sort.c lookup.cCopy the code

When more than one source file is compiled, the target file will not be deleted. This allows you to make changes to the program and recompile only the changed source files, as shown in the following command.

3. Compile a C source file and link it to an existing object file:

cc main.o lookup.o sort.cCopy the code

4. Compile a single C source file and produce an object file (program.o in this case) to link to later:

Cc - c program. CCopy the code

5. Compile several C source files and generate an object file for each file:

Cc -c main.c sort.c lookup.cCopy the code

6. Link several object files:

cc main.o sort.o lookup.oCopy the code

All of the commands above that produce an executable can be added with the “-o name” option, which causes the linker to save the executable in the “name” file instead of “A.out”. By default, the linker is found in the standard C function library. If the “-lname” flag is added at compile time, the linker will also look in the library for “name”. This option should appear at the end of the command line. In addition, there are many options for compiling and linking commands, please consult the documentation for your system.

Borland C/C++ 5.0 for MS-DOS and Windows has two user interfaces that you can choose from. The Windows Integrated Development Environment is a complete stand-alone programming tool that includes a source code editor, debugger, and compiler. Its specific use is beyond the scope of this book. The MS-DOS command line interface is not much different from UNIX compilers, except for the following:

1. Its name is BCC.

2. The name of the target file is file.obj.

3. When a single source file is compiled and linked, the compiler does not delete the target file.

4. By default, the executable file is named after the first source or destination file name on the command line, although you can use the “-ename” option to name the executable file “name.exe”.

2.1.2 to perform

The execution of the program also goes through several phases. First, the program must be loaded into memory. In a host environment (that is, one with an operating system), this task is performed by the operating system. Uninitialized variables that are not stored on the stack will get their initial values at this time. In a standalone environment, program loading must be arranged manually, perhaps by putting executable code into read-only memory (ROM).

Then, the execution of the program begins. In the host environment, there is usually a small launcher linked to the program. It handles a number of routine tasks, such as collecting named row parameters so that programs can access them. Next, the main function is called.

Now it’s time to execute the program code. On most machines, the program will use a run-time stack, which stores the local variables and return address of the function. Programs can also use static memory. Variables stored in static memory retain their values throughout the execution of the program.

The final stage of program execution is termination, which can be caused by a number of different causes. A “normal” termination is when main returns [1]. Some execution environments allow programs to return a code indicating why the program stopped executing. In the host environment, the launcher will again take control and may perform a variety of everyday tasks, such as closing any files that the program may have used but not explicitly closed. In addition, the program may terminate when the user presses the break key or the phone connection is suspended, or it may terminate spontaneously due to an error during execution.

2.2 Lexical rules

Lexical rules, like spelling rules in English, determine how you form individual character fragments, known as tokens, in your source program.

An ANSI C program consists of declarations and functions. A function defines the work that needs to be done, while a declaration describes the function and/or the data type (and sometimes the data itself) that the function will operate on. Comments can be scattered throughout the source file.

2.2.1 character

The standard does not specify which particular character set the C environment must use, but it does specify that the character set must include all upper and lower case letters in English, the numbers 0 through 9, and the following symbols:

! "# % '() *, +, -. / :; < > =? [] \ ^ _ {} | ~Copy the code

A newline character is used to mark the end of each line of source code and the end of each input line when the character input of the executing program is ready. Newlines can also be a string of characters if the runtime environment requires them, but they are treated as a single character. The character set must also include Spaces, horizontal tabs, vertical tabs, and format feedback characters. These characters, with a newline character, are often called whitespace characters because when they are printed, they appear on the page as whitespace instead of tokens.

The standard also defines several three-letter words (TRIGrph), which are sequences of several characters that together represent another character. Three-letter words enable C environments to be implemented on certain character sets that lack some required characters. Here is a list of three-letter words and the characters they represent.

?? (< {[???? = #??) ?? >}???? / \?? ! |???? '^?? - -Copy the code

Two question marks followed by a character do not normally appear in other expressions, so three-letter words are represented in this way so that they do not lead to misunderstanding.

Warning:

While a three-letter word can be useful in some contexts, it can be a nasty little thing for those who don’t use it. The reason for choosing? This sequence is the beginning of each three-letter word because they appear in an unnatural form, but they still have hidden dangers. You don’t usually have the idea of three-letter words in your head, because they’re so rare. So, when you occasionally write a three-letter word, something like this:

 

printf("Delete file (are you really sure??): " );
Copy the code

You will no doubt be surprised to see] characters in the result output.

When you write some C source code, you may not be able to use a particular character in some context because that character has special meaning in that context. For example, double quotes are used to delimit string constants. How do you include double quotes inside a string constant? K&R C defines several escape sequences, or character escapes, to overcome this problem. ANSI C adds several escape sequences to it. An escape sequence consists of a backslash \ plus one or more other characters. Each escape sequence listed below represents the character after the backslash, but does not add any special meaning to the character.

\? Used when writing consecutive question marks to prevent them from being interpreted as three-letter words.

\” is used to indicate double quotes inside a string constant.

\’ Used to represent character constants’.

\\ is used to indicate a backslash, preventing it from being interpreted as an escape sequence character.

There are many characters that do not appear in the source code, but are useful when formatter output or when manipulating a terminal display. C also provides some of these escapes so you can include them in your programs. These escape characters are chosen with particular consideration to whether they help remember the function of the character they represent.

K&R C:

Some of the following escapes are marked with the “†” symbol, indicating that they are new to ANSI C and not implemented in K&R C.

 

\a† Warning character. It will sound a terminal bell or generate some other audible or visible signal.

\b Backspace key.

\f Input character.

\n Newline character.

\r carriage return.

\t Horizontal TAB character.

\ V † Vertical TAB character.

\ DDDDDD indicates 1 to 3 octal digits. The character represented by this escape is the character represented by the given octal value.

\ XDDD † This is similar to the above example, except that the octal number is replaced with a hexadecimal number.

Note that any hexadecimal number may be included in the \ XDDD sequence, but if the size of the resulting value exceeds the range of the representation characters, the result is undefined.

2.2.2 annotation

C comments start with the character /* and end with the character */, and can contain any character except */. In source code, a comment may span multiple lines, but it cannot be nested within another comment. Note that /* or */ no longer act as comment delimiters if they occur inside a string literal.

All comments are removed by the preprocessor and replaced with a space. Therefore, comments can appear anywhere whitespace can appear.

Warning:

Comments start at the comment start/and end at the comment terminator /, and everything in between is the content of the comment. This rule may seem obvious, but the same may not be true for the student who wrote the innocent-looking code below. Can you see why only the first variable is initialized?

X1 = 0; / * * * * * * * * * * * * * * * * * * * * * * * x2 = 0; * * the Initialize the * * x3 = 0; * * counter variables. * * x4 = 0; * * * * * * * * * * * * * * * * * * * * * * * /Copy the code

Warning:

Notice that the abort comment uses */ instead of *? . If you hit the key too fast or hold down the Shift key for too long, you might mistakenly type the latter. This error is obvious when pointed out, but in real programs it is hard to spot.

2.2.3 Free form source code

C is a free-form language, that is, there are no rules about where statements can be written, how many statements can appear in a line, where blanks should be left and how many blanks should appear [2]. The only rule is that adjacent tags must have one or more whitespace characters (or comments) between them, otherwise they might be interpreted as a single tag. Therefore, the following statements are equivalent:

y=x+1;
y = x + 1;
y = x
+
1
Copy the code

For the following set of statements, the first three statements are equivalent, but the fourth statement is illegal:

int x; Intx; int/*comment*/x; intx;Copy the code

This extreme freedom to write code has its pros and cons. Soon you’ll hear some soapbox philosophy on this topic.

2.2.4 identifier

An identifier is the name of a variable, function, type, etc. They consist of uppercase and lowercase letters, numbers, and underscores, but cannot begin with a number. C is a case-sensitive language, so ABC, ABC, ABC, and ABC are four different identifiers. There is no limit to the length of an identifier, but the standard allows the compiler to ignore characters after the 31st character. The standard also allows the compiler to restrict the identifiers used to represent external names (that is, names manipulated by the linker) to recognize only the first six case-insensitive characters.

The following C keywords are reserved and cannot be used as identifiers:

Auto do GOto signed unsigned break double if sizeof void case else int static volatile char enum long struct While const extern registerswitch continue float return typedef default for short unionCopy the code

2.2.5 Form of program

A C program may be stored in one or more source files. Although a source file can contain more than one function, each function must appear in its entirety in the same source file [3]. The standard does not specify this, but the source file of a C program should contain a set of related functions, which is reasonably organized. This approach has the added advantage of making it possible to implement abstract data types.

2.3 Program Style

Here are some comments on programming styles, in order. Free-form languages like C tend to produce sloppy programs that are quick and easy to write but hard to read and understand later. People read based on visual cues, so having your source code in order will help someone else read it later (probably yourself). Program 2.1 is an extreme example, but it illustrates the problem. This is a runnable program that performs somewhat useful functions. The question is, can you understand what it does [4]? Worse, if you want to change the program, where do you start? Although, given time, experienced programmers can deduce what it means, few will be willing to do so. It’s much easier and faster to throw it away and write one from scratch.

#include <stdio.h> main(t,_,a) char *a; {return! 0<t? t<3? The main (a + - 79-13, the main (- 87, 1 - _, the main (- 86, 0, + 1 a) + a)) : 1, t < _? main(t+1, _, a ):3,main ( -94, -27+t, a )&&t == 2 ? _ < 13? main ( 2, _+1, "%s %d %d\n" ):9:16:t<0? t<-72? main(_, t,"@n'+,#'/*{}w+/w#cdnr/+,{}r/*de}+,/*{*+,/w{%+,/w#q#n+,/#{l,+,/n{n+\ ,/+#n+,/#; #q#n+,/+k#; *+,/'r :'d*'3,}{w+K w'K:'+}e#'; dq#'l q#'+d'K#! /\ +k#; q#'r}eKK#}w'r} eKK{nl}'/#; #q#n'}{}#}w'}{}{nl}'/+#n'; d}rw' i; # }{n\ l}! /n{n#'; r{#w'r nc{nl}'/#{l,+'K {rw' iK{; [{nl}'/w#q#\ n'wk nw' iwk{KK{nl}!/w{%'l##w#' i; :{nl}'/*{q#'ld;r'} {nlwb!/*de}'c \ ;;{nl'-{}rw}'/+,} ##'*}#nc,',#nw]'/+kd'+e}+; \ #'rdq#w! nr'/ ') }+}{rl#'{n' '}# }'+}##(!! /") :t<-50? _==*a ? putchar(a[31]):main(-65,_,a+1):main((*a == '/')+t,_,a\ +1 ):0<t? main ( 2, 2 , "%s"): *a=='/'|| main(0, main(-61,*a, "! ek; dc \ i@bK'(q)-[w]*%n+r3#l,{} :\nuwloca-O; m .vpbks,fxntdCeghiry"),a+1); }Copy the code

Procedure 2.1 Mystery procedure

mystery.c

Tip:

Poor style and poor documentation are two important reasons why software is expensive to produce and maintain. A good programming style can greatly improve the readability of programs. The direct result of a good programming style is that programs are easier to run correctly, and the indirect result is that they are easier to maintain, which saves a lot of money.

The example programs in this book use a style that emphasizes structure through the proper use of whitespace. I list a few characteristics of this style below and explain why they are used.

1. Blank lines are used to separate different pieces of logical code, which are segmented by function. This way, the reader can see the end of a logical piece of code at a glance, rather than having to peruse every line of code to find it.

2. The parentheses of the if and related statements are part of those statements, not the expression they test. So, I leave a space between the parentheses and the expression to make the expression stand out. The same is true for function prototypes.

3. Most operators are separated by Spaces, which makes expressions more readable. Sometimes, in complex expressions, I omit whitespace, which helps to show grouping of subexpressions.

4. Statements nested within other statements are indented to show the hierarchy between them. By using the Tab key instead of the space key, you can easily arrange related statements neatly. When the entire page is filled with program code, indentation is large enough to help the program match parts of the location; two or three Spaces are not enough.

Some people avoid using the Tab key because they think it indented statements too much. In complex functions, the nesting level is often very deep, and using large Tab indentation means that there is less room to write statements on a single line. However, if the function is so complex, you might as well break it up into several functions and use other functions to implement parts of the statement that are too deeply nested.

5. Most comments appear in blocks so that they stand out visually from the code. Readers can find and skip them more easily.

6. In a function definition, the return type appears on a separate line, with the function name at the beginning of the next line. This way, when looking for a function definition, you can find the name of the function at the beginning of the line.

As you explore these code examples, you’ll see many other features. Other programmers can choose their own personal style. It doesn’t really matter whether you choose one style or another. The key is to stick to the same sound style consistently. If you keep your style consistent, your code will be easier for anyone of a certain level to read.

2.4 summarize

The source code of a C program is stored in one or more source files, but a function can only appear completely in the same source file. It is a good strategy to keep related functions in the same file. Each source file is compiled separately to produce the corresponding object file. The object files are then linked together to form an executable program. The machine that compiles and ultimately runs the program may or may not be the same.

Programs must be loaded into memory to execute. In a hosted environment, the operating system does this. In a freestyle environment, programs are often permanently stored in ROM. Initialized static variables can obtain their values before the program executes. The starting point for your program execution is the main function. Most environments use a stack to store local variables and other data.

The character set used by the C compiler must include certain characters. If certain characters are missing from your character set, use three-letter words instead. Escape sequences enable the expression of certain characters that cannot be printed, such as the inclusion of certain whitespace characters in a program.

Comments start with /* and end with */, which does not allow nesting. Comments will be removed by the preprocessor. An identifier consists of letters, digits, and underscores (_), but cannot start with a number. Upper and lower case letters are different in identifiers. Keywords are reserved by the system and cannot be used as identifiers. C is a free-form language. However, writing programs in a clear style helps to read and maintain programs.

2.5 Summary of warning

1. Characters in string constants are incorrectly interpreted as three-letter words.

2. A poorly written comment can accidentally abort a statement.

3. Improper closing of a comment.

2.6 Summary of programming hints

Good program style and documentation will make programs easier to read and maintain.

This article is excerpted from C and Pointers

 

This book provides comprehensive resources and in-depth discussions related to C programming. This book discusses the basic knowledge and advanced features of Pointers to help programmers to integrate the powerful functions of Pointers into their own programs.

In 18 chapters, the book covers almost all important C programming topics, including data, statements, operators and expressions, Pointers, functions, arrays, strings, structures, and unions. There are plenty of programming tips and hints in the book, with specific exercises at the end of each chapter and answers to some of the exercises in the appendix.

This book is suitable for beginners of C language and C programmers, and can also be used as a reference for computer students to learn C language.