First of all, we generally understand an lvalue as an identifier that can appear on the left side of an assignment operator, that is, can be assigned. This may not be exact, and lvalues are defined differently in different languages. In the case of pre-increment (and decrement) operators, pre-increment needs to return an lvalue, or more concisely, a variable itself, or a reference to itself.

First, analyze the problem

When I encountered this problem in PHP, I initially wrote code like this:

<? php function func01(&$a) { echo $a . PHP_EOL; $a += 10; } $n = 0; func01(++$n); echo $n . PHP_EOL;Copy the code

In C++ writing experience, the above code should print 1 and 11, but PHP unexpectedly prints 1 and 1. To find out, I use the zendump_opcodes() function in the Zendump extension to print out the OPCODES of the above code:

[root@c962bf018141 php-7.2.2]# func_arg_pre_inc_by_ref.php 1 1 op_array("") refcount(1) Addr (0x7F7445C812A0) vars(1) T(5) filename(/root/ PHP/func_arg_pre_INC_by_ref.php) line(1,12) OPCODE OP1 OP2 RESULT EXTENDED ZEND_NOP ZEND_ASSIGN $n 0 ZEND_INIT_FCALL 128 "func01" 1 ZEND_PRE_INC $n #var1 ZEND_SEND_VAR_NO_REF #var1 1 ZEND_DO_UCALL ZEND_CONCAT $n "\n" #tmp3 ZEND_ECHO #tmp3 ZEND_INIT_FCALL 80 "zendump_opcodes" 0 ZEND_DO_ICALL ZEND_RETURN  1Copy the code

According to OPCODES, the main problem is with the ZEND_PRE_INC directive, which returns #var1 instead of $N, and because OPCODE and the variable layout on the vm stack are determined at compile time, which means that the Zend engine does not use $n itself as the return value at compile time. By looking at the implementation of the ZEND_PRE_INC and ZEND_PRE_DEC directives in zend_vm_def.h, you can see that the #var1 returned by the runtime is not a reference to $n, but a value copy using the ZVAL_COPY_VALUE and ZVAL_COPY macros.

Take a look at this 2012 Bug: bugs.php.net/bug.php?id=… Appears to have been submitted by a developer with C++ experience. It was PHP 5.4 at the time and is still in Open state, so it seems that the authorities are not going to fix this Bug, maybe they think it is not a Bug. Since PHP doesn’t have a standardization committee or a syntax whitepaper, it’s hard to say whether this is a Bug or not. Because of this, sometimes it is difficult to find authoritative information when encountering some unexpected phenomena, so we can only study their implementation.

Second, try to modify

ZEND_PRE_INC and ZEND_PRE_DEC were modified in PHP 7.2.2. The main idea is that if the left-hand operand is not a reference type, Convert it to a reference type (the ZVAL_MAKE_REF macro determines that), and then let the resulting operand of the instruction refer to the left-hand operand:

[root@c962bf018141 Zend]# diff zend_vm_def.h zend_vm_def.h.bak 1211c1211 < zval *var_ptr, *varptr; --- > zval *var_ptr; 1213c1213 < varptr = var_ptr = GET_OP1_ZVAL_PTR_PTR_UNDEF(BP_VAR_RW); --- > var_ptr = GET_OP1_ZVAL_PTR_PTR_UNDEF(BP_VAR_RW); 1218121 9 c1218 ZVAL_MAKE_REF (varptr); < ZVAL_COPY(EX_VAR(opline->result.var), varptr); --- > ZVAL_COPY_VALUE(EX_VAR(opline->result.var), var_ptr); 1241124 2 c1240 < ZVAL_MAKE_REF (varptr); < ZVAL_COPY(EX_VAR(opline->result.var), varptr); --- > ZVAL_COPY(EX_VAR(opline->result.var), var_ptr); 1253c1251 < zval *var_ptr, *varptr; --- > zval *var_ptr; 1255c1253 < varptr = var_ptr = GET_OP1_ZVAL_PTR_PTR_UNDEF(BP_VAR_RW); --- > var_ptr = GET_OP1_ZVAL_PTR_PTR_UNDEF(BP_VAR_RW); 1260126 1 c1258 ZVAL_MAKE_REF (varptr); < ZVAL_COPY(EX_VAR(opline->result.var), varptr); --- > ZVAL_COPY_VALUE(EX_VAR(opline->result.var), var_ptr); 1283128 4 c1280 ZVAL_MAKE_REF (varptr); < ZVAL_COPY(EX_VAR(opline->result.var), varptr); --- > ZVAL_COPY(EX_VAR(opline->result.var), var_ptr);Copy the code

This change was inspired by the way the Zend engine implements global and static variables, much like when we override the ++ operator for a class in C++ to return a reference to itself. I’ve also tried using an INDIRECT type pointer, but it causes core dump, and it seems that the INDIRECT type is only used in certain scenarios in the Zend engine, not as widely supported as the reference type.

Zend_vm_gen.php: zend_vm_gen.php: zend_vm_gen.php: zend_vm_gen.php: zend_vm_gen.php: zend_vm_gen

[root@c962bf018141 php-7.2.2]# func_arg_pre_inc_by_ref.php 1 11 Vars (1): { $n -> zval(0x7fd182c1d080) -> reference(1) addr(0x7fd182c5f078) zval(0x7fd182c5f080) : long(11) }Copy the code

Using the zendump_vars() function in the Zendump extension to print local variables, you can see that $n is indeed converted to a reference type.

Verify the modification

The concern now is whether this change introduces any bugs, especially if PHP has any features that rely on implementations that do not return lvalues. I ran make test on the modified and unmodified PHP projects and compared the results. Two tests did not pass:

Check key execution order with new. [tests/lang/engine_assignExecutionOrder_007.phpt] Execution ordering with comparison  operators. [tests/lang/engine_assignExecutionOrder_009.phpt]Copy the code

Further analysis of the failed test code shows that both tests use multiple pre-increment operators within the same statement, as shown below:

<? php $a[2][3] = 'stdClass'; $a[$i=0][++$i] = new $a[++$i][++$i]; print_r($a); $o = new stdClass; $o->a = new $a[$i=2][++$i]; $o->a->b = new $a[$i=2][++$i]; print_r($o);Copy the code

Print OPCODES again using the zendump_opcodes() function:

[root@c962bf018141 php-7.2.2]# sapi/cli/ PHP -f PHP /testcase007.php OP_array ("") refCount (1) addr(0x7Fba8347f2A0) Vars (3) T (36) filename (/ root/PHP/testcase007. PHP) line (1, 13) OPCODE OP1 OP2 RESULT EXTENDED ZEND_INIT_FCALL 80 "zendump_opcodes" 0 ZEND_DO_ICALL ZEND_FETCH_DIM_W $a 2 #var1 ZEND_ASSIGN_DIM #var1 3 ZEND_OP_DATA "stdClass" ZEND_ASSIGN $i 0 #var3 ZEND_PRE_INC $i #var5 ZEND_PRE_INC $i #var7 ZEND_PRE_INC $i #var9 ZEND_FETCH_DIM_R $a #var7 #var8  ZEND_FETCH_DIM_R #var8 #var9 #var10 ZEND_FETCH_CLASS #var10 #var11 ZEND_NEW #var11 #var12 0 ZEND_DO_FCALL ZEND_FETCH_DIM_W $a #var3 #var4 ZEND_ASSIGN_DIM #var4 #var5 ZEND_OP_DATA #var12 ZEND_INIT_FCALL 96 "print_r" 1 ZEND_SEND_VAR $a 1 ZEND_DO_ICALL ZEND_NEW "stdClass" #var15 0 ZEND_DO_FCALL ZEND_ASSIGN $o #var15 ZEND_ASSIGN $i 2 #var19 ZEND_PRE_INC $i #var21 ZEND_FETCH_DIM_R $a #var19 #var20 ZEND_FETCH_DIM_R #var20 #var21 #var22 ZEND_FETCH_CLASS #var22 #var23 ZEND_NEW #var23 #var24 0 ZEND_DO_FCALL ZEND_ASSIGN_OBJ $o "a" ZEND_OP_DATA #var24 ZEND_ASSIGN $i 2 #var28 ZEND_PRE_INC $i #var30 ZEND_FETCH_DIM_R $a #var28 #var29 ZEND_FETCH_DIM_R #var29 #var30 #var31 ZEND_FETCH_CLASS #var31 #var32 ZEND_NEW #var32 #var33 0 ZEND_DO_FCALL ZEND_FETCH_OBJ_W $o "a" #var26 ZEND_ASSIGN_OBJ #var26 "b" ZEND_OP_DATA #var33 ZEND_INIT_FCALL 96 "print_r" 1 ZEND_SEND_VAR $o 1 ZEND_DO_ICALL ZEND_RETURN 1Copy the code

Suffice it to say that the ZEND_PRE_INC directive and the three ZEND_PRE_INC directives immediately following the ZEND_ASSIGN directive are enough. Note When compiling, the Zend engine evaluates the array subscripts in brackets from left to right, and then evaluates the outer layer expression. If a preincrement operator returns a reference to a variable, the resulting operands refer to the same variable immediately after the assignment, or if three preincrement instructions are executed in a row, the subsequent logic will not be correct. I don’t know why the Zend engine is implemented this way, but I guess it’s to make the syntax parser easier to implement.

conclusion

In order for the pre-increment and decrement operators to return variable references, and for the above features to work, the Compiler of the Zend engine needs to be modified to generate the instruction code in a reasonable order for scenarios like the one above. But changing the compiler is so involved that it is harder to predict how many problems it will cause. So that’s the end of the story, and at least we know a little bit more about the Zend engine.

Even if you can make the pre-increment and decrement operators return a reference to a variable, there are very limited scenarios. For example, statements like the following cannot be compiled in PHP at all, and the syntax benefits of returning a reference cannot be fully realized without modifying the compiler. Perhaps we can also argue that there is no need to introduce too much complexity into this not-so-common syntax.

$b = &++$a;
++$a += 10;
++(++$b);Copy the code