At the end of “M4 says Lambda, Lambda”, it points out that anonymous functions simulated using M4 cannot be used as closures and asserts that closures cannot be implemented within the C standard. In fact, this conclusion was arbitrary because I hadn’t considered the utility of global variables in C.
The story of the tree
The C library provides a qsort function that can sort an array of values of any type. It is declared as follows:
void qsort(void *base,
size_t nmemb,
size_t size,
int (*compar)(const void *, const void *));Copy the code
It requires the user to pass in a function that determines the size of any two elements in the array, for example:
#include
#include
#include
static int cmpstring(const void *p1, const void *p2) {
return strcmp(* (char * const *) p1, * (char * const *) p2);
}
int main(void) {
char *str_array[5] = {"fetch", "foo", "foobar", "sin", "atan"};
qsort(&str_array, 5, sizeof(char *), cmpstring);
for (int i = 1; i < 5; i++) {
printf("%s\n", str_array[i]);
}
exit(EXIT_SUCCESS);
}Copy the code
Cmpstring is a user-provided function that determines the size of any two strings in an array of strings.
Theoretically, cMPString is already a closure at this point, because it has the ability to access some variables in its environment, but those variables need to be passed to it explicitly. That is, in order for cMPString to be called with more variables, you must pass those variables to it as parameter values. To illustrate this, we need to make the string sorting program a little more complicated: For any two strings s1 and s2 in str_array, let the distance between s1 and s2 from string “foo” be d1 and d2, and then determine the size of s1 and s2 based on the values d1 and d2.
Regardless of how the “distance” between two strings should be defined, suppose we now have a dist function that computs the distance between two strings, so the definition of cmpString needs to be changed to:
static int cmpstring(const void *p1, const void *p2, const char *foo) { int d1 = dist(* (char * const *) p1, foo); int d2 = dist(* (char * const *) p2, foo); if (d1 < d2) { return -1; } else if (d1 == d2) { return 0; } else { return 1; }}Copy the code
Then we realized that Qsort did not support such cmpStrings because in its opinion, the function passed in by the user only needed two arguments, not three!
To solve this problem, GNU adds an implementation of qsort_r to the C library it implements, which is declared as follows:
void qsort_r(void *base, size_t nmemb, size_t size,
int (*compar)(const void *, const void *, void *),
void *arg);Copy the code
You can understand why GNU does this. However, although qsort_r can accept a function with three arguments, it is not a C library function after all, and to use it you have to define the _GNU_SOURCE macro in your code to use it.
If you stick to qsort instead of qsort_r, the only way to solve the above sort problem based on string “distance” is to map str_array to a dist_array and sort dist_array, Finally, we map dist_array to str_array, in which we must introduce an array to record the relationship between str_array and the dist_array elements. The result is cumbersome and bloated code, so try it out. An alternative is to implement a function like qsort_r yourself.
Utility of global variables
The problem in the previous section could be solved elegantly if C could support nested function definitions. Such as:
int main(void) {
char *str_array[5] = {"fetch", "foo", "foobar", "sin", "atan"};
char *foo = "foo";
static int cmpstring(const void *p1, const void *p2) {
int d1 = dist(* (char * const *) p1, foo);
int d2 = dist(* (char * const *) p2, foo);
if (d1 < d2) {
return -1;
} else if (d1 == d2) {
return 0;
} else {
return 1;
}
}
qsort(&str_array, 5, sizeof(char *), cmpstring);
for (int i = 1; i < 5; i++) {
printf("%s\n", str_array[i]);
}
exit(EXIT_SUCCESS);
}Copy the code
Such a CMPString function is a true closure.
Unfortunately, the C standard so far does not support nested definitions of functions. Although some C compilers can support nested function definitions through extensions, it is generally not desirable to design your code to rely on a C compiler feature unless you are certain that your code will always be compiled with that C compiler. The Linux kernel code, for example, relies on certain features of GCC.
However, if we change the code to:
static char *foo = "foo";
static int cmpstring(const void *p1, const void *p2) {
int d1 = dist(* (char * const *) p1, foo);
int d2 = dist(* (char * const *) p2, foo);
if (d1 < d2) {
return -1;
} else if (d1 == d2) {
return 0;
} else {
return 1;
}
}
int main(void) {
char *str_array[5] = {"fetch", "foo", "foobar", "sin", "atan"};
qsort(&str_array, 5, sizeof(char *), cmpstring);
for (int i = 1; i < 5; i++) {
printf("%s\n", str_array[i]);
}
exit(EXIT_SUCCESS);
}Copy the code
At least that’s C99 compliant. In this way, there is no need to call the non-C library function qsort_r, there is no need to do the twists and turns to do the array mapping, and there is no need to implement a function like qsort_r. But can a CMPString still be called a closure in this case? Theoretically no, because a closure can only access local variables inside it and non-global variables outside it, whereas the cmpString function accesses a global variable, foo.
Programming experts tell us to be careful with global variables. Fortunately, they say use with caution, not never. It is not good to abuse global variables, especially when they are used to represent the running state of a program, and this state is changed from time to time by the program itself.
However, for the purposes of this article, foo is a global variable, but if we can be sure that it is only used by a function, then it can be a real closure with cmpString! Because, if a global variable is only used by a function, it is the meat of a local variable sold under the guise of a global variable, that is, it is neither global nor local in nature, and when a function accesses it, it can theoretically form a closure with the function. In fact, in C programming practice, as long as we are willing to treat a.c file as a package, all calculations in that file are closed to that package.
To reduce the global nature of a global variable, you should first add the static modifier to it, telling the compiler that the variable is visible only to functions in the file in which it resides. The second is to control its name so that it causes as little pollution as possible to the C namespace. For example, in the example above, we can name foo var_foo_in_cmpstring_func. The longer and more unusual the name, the less pollution it causes to the C namespace.
C Basic paradigm of closures
We now have the basic principle of C closure simulation: reduce the global nature of a global variable. Following this principle, several basic closure simulation paradigms can be summarized. The reason for summarizing a paradigm is that it allows you to write code generators for closure emulation of C.
Variable transfer paradigm
If a local variable x and a function f form a closure, we must create a static modified global variable var_x_JUST_in_f and assign the value of x to var_x_JUST_in_f before f is applied. Such as:
Static int var_x_just_in_f; static int var_x_just_in_f; static int f(int y) { return var_x_just_in_f + y; }... . . int foo(int y) { int x = 1; var_x_just_in_f = x; return f(y); }Copy the code
For the qsort example above, if you wanted to sort each element in the string array str_array based on its relative distance from the string foo, the variable pass paradigm would look like this:
static char *var_foo_just_in_cmpstring; static int cmpstring(const void *p1, const void *p2) { ... . . } int main(void) { char *str_array[5] = {"fetch", "foo", "foobar", "sin", "atan"}; char *foo = "foo"; var_foo_just_in_cmpstring = foo; qsort(&str_array, 5, sizeof(char *), cmpstring); . . . exit(EXIT_SUCCESS); }Copy the code
Anonymous function paradigm
Closures are typically composed of anonymous functions and their external variables (non-global or local in the case of anonymous functions). Although C does not support anonymous functions, code generators can automatically generate a set of functions with names in a special format: _LAMBDA_N, for example, _LAMBDA_0, _LAMBDA_1… We treat these functions as anonymous functions of C and forbid any non-anonymous functions to be named in this format.
For the qsort example, using an anonymous function, we could write:
static char *var_foo_just_in_LAMBDA_0; static int _LAMBDA_0(const void *p1, const void *p2) { ... . . } int main(void) { char *str_array[5] = {"fetch", "foo", "foobar", "sin", "atan"}; char *foo = "foo"; var_foo_just_in_cmpstring = foo; qsort(&str_array, 5, sizeof(char *), _LAMBDA_0); . . . exit(EXIT_SUCCESS); }Copy the code
If you use GNU M4 to generate C functions and call them in the appropriate places, refer to the article “M4 said Lambda, Lambda”.
Closure maker paradigm
Global variables are dangerous. Closure makers must abide by a contract that global variables used to build closures are only accessible by the closure being built.
Although banned a global variable Shared by multiple closure will cause the program run-time occupied memory space greatens, global variables are located in the static storage area of the program, but relative to the Shared global variables in terms of the risks, sacrifice is worth it, some memory space and closure this technique in practice is not abused. In fact, languages that support closures also have memory costs associated with using closures.
The realization of the M4
Based on the anonymous functions modeled in “M4 says Lambda, Lambda”, C closures can be implemented by simply simulating the variable transfer paradigm described above. However, in the interest of macro security, I have made some changes to the M4 simulation code for anonymous functions, which makes the code less intuitive. I will present them here and cover the details later.
divert(-1) changeword(`[_a-zA-Z@&][_a-zA-Z0-9]*') define(`_C_CLOSURE', `divert(1)') define(`_C_CORE', `divert(2)') define(`_LAMBDA_SET_VAR', `undefine(`$1')define(`$1', `$2')') _LAMBDA_SET_VAR(`? N', 0) define(`_LAMBDA', `_C_CLOSURE`'static $2 _LAMBDA_`'defn(`? N')`('$1`)'{$3; } _C_CORE`'_LAMBDA_`'defn(`? N')`'dnl _LAMBDA_SET_VAR(`? N', incr(defn(`? N')))`'dnl ') define(`_VAR_IN_L_N', `var_$1_just_in_LAMBDA_`'defn(`? N')') define(`@', `_C_CLOSURE`'static $1 _VAR_IN_L_N($2); _C_CORE`'$1 $2 = $3; _VAR_IN_L_N($2) = $2`'') define(`&', `_VAR_IN_L_N($1)') divert(0)dnlCopy the code
Assuming the above code is stored in the c-closure. M4 file, an example of a simulated anonymous closure is as follows:
include(`c-closure.m4')dnl #include _C_CORE int main(void) { @(`int', `x', `1'); if(_LAMBDA(`int y', `int', `return &(`x') > y')(2)) { printf("False! \n"); } else { printf("True! \n"); }}Copy the code
Where @ and & are defined as GNU M4 macros. GNU M4 provides a ChangeWord macro that allows special characters to be used as macro names. Although according to the official documentation of GNU M4, M4 2.0 May provide a better mechanism to replace ChangeWord macros, but considering that M4 just reached 1.4.17, 2.0 is still a long way off, so let’s leave it at that.
The @macro takes three arguments. The first argument is the type of the local variable, the second argument is the name of the local variable, and the third argument is the value of the local variable. The @ macro takes a local variable and turns it into a closure variable according to the variable passing paradigm.
The ¯o accepts only one argument, the name of the local variable that @ takes.
After the above code is expanded by GNU M4, the following results are obtained:
#include static int var_x_just_in_LAMBDA_0; static int _LAMBDA_0(int y) { return var_x_just_in_LAMBDA_0 > y; } int main(void) { int x = 1; var_x_just_in_LAMBDA_0 = x; if (_LAMBDA_0(2)) { printf("False! \n"); } else { printf("True! \n"); }}Copy the code
Using function Pointers, it is also possible to return a closure within a function, for example:
include(`c-closure.m4')dnl #include typedef int (*Func)(int); _C_CORE Func test(void) { @(`int', `x', `1'); return _LAMBDA(`int y', `int', `return &(`x') > y'); } int main(void) { if (test()(2)) { printf("False! \n"); } else { printf("True! \n"); }}Copy the code
The expansion result is:
#include typedef int (*Func) (int); static int var_x_just_in_LAMBDA_0; static int _LAMBDA_0(int y) { return var_x_just_in_LAMBDA_0 > y; } Func test(void) { int x = 1; var_x_just_in_LAMBDA_0 = x; return _LAMBDA_0; } int main(void) { if (test()(2)) { printf("False! \n"); } else { printf("True! \n"); }}Copy the code
discuss
If you are worried about the memory usage of C “anonymous” functions due to a flood of global variables, consider building a global list of external variables for anonymous functions that can be freed up before the anonymous function ends. I’ll try it sometime.
As @nareix pointed out, M4 simulated closures do not support nesting, i.e. an anonymous function definition cannot be nested within another anonymous function definition. This is due to the limitation that M4 macros can only be expanded once. I need to think again about how to solve the problem.
P.S. qsort no longer has stories
include(`c-closure.m4')dnl #include _C_CORE int main(void) { char *str_array[5] = {"fetch", "foo", "foobar", "sin", "atan"}; @(`char *', `foo', `"foo"'); qsort(&str_array, 5, sizeof(char *), _LAMBDA(`const void *p1, const void *p2', `int', `int d1 = dist(* (char * const *) p1, &(`foo')); int d2 = dist(* (char * const *) p2, &(`foo')); if (d1 < d2) { return -1; } else if (d1 == d2) { return 0; } else { return 1; } ')); for (int i = 1; i < 5; i++) { printf("%s\n", str_array[i]); } exit(EXIT_SUCCESS); }Copy the code