In the last section we looked at the Solidity assembly language, which can be used with different Solidity idity. This assembly language can also be embedded in the Solidity source code for use as inline assemblies. We’ll start by looking at how inline assembly is used, how it differs from stand-alone assembly languages, and finally the assembly language.
Solidity Assembly
Inline assembly
Usually we use library code to enhance the language and give us some fine-grained control over Solidity. Solidity gives us a language that is close to EVM’s underlying level, inline compilations that allow for use with Solidity. Solidty inline assembly provides the following features to solve various problems caused by handwritten underlying code:
-
Allow function style opcodes: mul(1, add(2, 3)) is equivalent to push1 3 push1 2 add PUSH1 1 mul
-
Let x := add(2, 3) let y := mload(0x40) x := add(x, y)
-
Function f(uint x) {assembly {x := sub(x, 1)}}
-
Let x := 10 repeat: x := sub(x, 1) jumpi(repeat, eq(x, 0))
-
Cycle: for {let I: = 0} lt (I, x) = {I: add (I, 1)} {: y = the mul (2, y)}
-
Switch x case 0 {y := mul(x, 2)} default {y := 0}
-
Function f(x) -> y {switch x case 0 {y := 1} default {y := mul(x, f(x, 1))}}
The inline Assembly language is covered in more detail below.
It is important to note that inline compilation is a very low-level way to access the EVM virtual machine. He doesn’t have the multiple safety mechanisms that Solidity provides.
The sample
The following example provides a library function to access another contract and write it to a bytes variable. There are things that can’t be done with regular Solidity languages, inline libraries can be used to enhance the power of languages in certain ways.
Pragma solidity ^ 0.4.0; library GetCode {function at(address _addr) returns (bytes o_code) {
assembly {
// retrieve the size of the code, this needs assembly
let size := extcodesize(_addr)
// allocate output byte array - this could also be done without assembly
// by using o_code = new bytes(size)
o_code := mload(0x40)
// new "memory end" including padding
mstore(0x40, add(o_code, and(add(add(size, 0x20), 0x1f), not(0x1f))))
// store length in memory
mstore(o_code, size)
// actually retrieve the code, this needs assembly
extcodecopy(_addr, add(o_code, 0x20), 0, size)
}
}
}
Copy the code
Inline compilation is useful when the compiler can’t get efficient code. But keep in mind that inline compiled languages are harder to write because the compiler doesn’t do any checks, so you should only use them for complex things where you know what you’re doing.
Pragma solidity ^ 0.4.0; library VectorSum { // Thisfunction is less efficient because the optimizer currently fails to
// remove the bounds checks in array access.
function sumSolidity(uint[] _data) returns (uint o_sum) {
for (uint i = 0; i < _data.length; ++i)
o_sum += _data[i];
}
// We know that we only access the array in bounds, so we can avoid the check.
// 0x20 needs to be added to an array because the first slot contains the
// array length.
function sumAsm(uint[] _data) returns (uint o_sum) {
for (uint i = 0; i < _data.length; ++i) {
assembly {
o_sum := mload(add(add(_data, 0x20), mul(i, 0x20)))
}
}
}
}
Copy the code
grammar
Inline compiled languages also parse comments, literals and identifiers like Solidity. So you can comment with // and /**/. Inline compiled syntax in Solidity is wrapped in assembly {… }, here’s the syntax available, more on that later.
-
Literal. Such as 0x123, 42, or ABC (string of up to 32 characters)
-
Opcodes (instruction mode), such as mload, sload, dup1, sstore, followed by a list of supported instructions
-
Functional style opcodes such as add(1, mlod(0)
-
Tags, such as name:
-
Variable definitions, such as let x := 7 or let x := add(y, 3)
-
Identifiers (labels or inline local variables or external), such as jump(name), 3 x add
-
Assignment (instruction style), e.g., 3 =: x.
-
Function style assignment, such as x := add(y, 3)
-
Block level local variables are supported, such as {let x := 3 {let y := add(x, 1)}}
opcode
This document is not intended to provide a complete description of the EVM virtual machine, but the following list should serve as a reference for the EVM virtual machine’s instructions.
If an opcode has arguments (passed at the top of the stack), they are placed in parentheses. Note that the order of arguments can be reversed (non-functional style, more on that later). Opcodes marked with – do not push one parameter to the top of the stack, whereas those marked with * are very special, and all others push one and only one to the top of the stack. In the following example, mem[a…b] represents the contents of memory bytes from position A to position B (not included), and storage[p] represents the contents of strorage at position P.
The opcodes pushi and Jumpdest cannot be used directly.
In syntax, opcodes are represented as predefined identifiers.
literal
You can use integer constants, which will automatically generate appropriate pushi instructions by expressing them directly in decimal or hexadecimal. assembly { 2 3 add “abc” and }
In the above example, we would add 2,3 to get 5, and then operate with the string ABC. Strings are stored left-aligned and cannot exceed 32 bytes.
Function style
You can enter the opcodes after the opcodes, and they all eventually produce the correct bytecode. For example, 3 0x80 mload add 0x80 mStore
The value of 3 and location 0x80 in memory will be added below.
Since it is often hard to visually see the actual parameters of an opcode, Solidity inline compilation provides a function-style expression with the code equivalent to: mstore(0x80, add(mload(0x80), 3))
Functional style expressions cannot use instruction style internally, such as 1 2 mstore(0x80, add) will not be legal and must be written as mstore(0x80, add(2, 1)). For opcodes that do not take arguments, parentheses can be ignored.
It is important to note that the function-style arguments are the opposite of the instruction-style arguments. If functional style is used, the first argument will appear at the top of the stack.
Access external functions and variables
Variables and other identifiers in Solidity can simply be referenced by name. For memory variables, this pushes the address onto the stack instead of the value. For Storage, the value may not occupy the entire Storage slot, so its address is offset by the slot and actual Storage location relative to the start byte. To search for the slot to which the variable x points, use x_slot. To obtain the offset of the variable relative to the starting slot, use x_offset.
In the assignment (see below), we can even assign directly to the Solidity variable.
You can also access inline compiled external functions: inline compilation pushes the label of the entire entry (the way virtual function parsing is applied). The call semantics in Solidity are as follows:
-
The caller pushes the returned label, arg1, arg2… argn
-
The call returns ret1, ret2… Retm,
This is still a bit cumbersome to use because the stack offset basically changes during the call, so references to local variables will be incorrect.
Pragma solidity ^ 0.4.11; contract C { uint b;function f(uint x) returns (uint r) {
assembly {
r := mul(x, sload(b_slot)) // ignore the offset, we know it is zero
}
}
}
Copy the code
The label
Another problem with EVM assembly is that jump and Jumpi use absolute addresses that can be easily changed. Solidity Inline assembler provides tabs to make jump jumping easier. Note that tags are very low-level features, so use inline assembler functions, loops, and Switch instructions instead. Here is an example of finding Fibonacci:
{
let n := calldataload(4)
let a := 1
let b := a
loop:
jumpi(loopend, eq(n, 0))
a add swap1
n := sub(n, 1)
jump(loop)
loopend:
mstore(0, a)
return(0, 0x20)
}
Copy the code
Note that automatic access to stack elements requires the inliner to know the current stack height. This will fail if there are different stack heights between the source and target of the jump. You can still use jump in this case, but you’d better not access variables on the stack (even inline ones) in this case.
In addition, stack height analyzer will parse the code opcode by opcode (rather than by control flow), so the assembler will make incorrect judgments about the stack height of tag two in the following cases:
{
let x := 8
jump(two)
one:
// Here the stack height is 2 (because we pushed x and 7),
// but the assembler thinks it is 1 because it reads
// from top to bottom.
// Accessing the stack variable x here will lead to errors.
x := 9
jump(three)
two:
7 // push something onto the stack
jump(one)
three:
}
Copy the code
This problem can be solved by manually adjusting the stack height. You can add the desired stack height increment before the label. Note that you don’t have to care if you’re just using looping or assembler level functions.
The following example shows how, in extreme cases, you can solve this problem by using the above:
{
let x := 8
jump(two)
0 // This code is unreachable but will adjust the stack height correctly
one:
x := 9 // Now x can be accessed properly.
jump(three)
pop // Similar negative correction.
two:
7 // push something onto the stack
jump(one)
three:
pop // We have to pop the manually pushed value here again.
}
Copy the code
Define assembler – local variables
You can define variables that are valid in inline assembly by using the let keyword, which is only valid in {}. Internally, a new slot on the stack is created when the let instruction appears to hold the defined temporary variable, and the corresponding variable is automatically removed from the stack when the block ends. You need to provide an initial value for the variable, such as 0, but it can also be a complex function expression:
Pragma solidity ^ 0.4.0; contract C {function f(uint x) returns (uint b) {
assembly {
let v := add(x, 1)
mstore(0x80, v)
{
let y := add(sload(v), 1)
b := y
} // y is "deallocated" here
b := add(b, v)
} // v is "deallocated" here
}
}
Copy the code
The assignment
You can assign values to inline local variables, or function local variables. Note that when you assign to a pointer to memory or storage, you are only modifying the pointer and not the data.
There are two types of assignment: functional style and instruction style. Functional styles, such as variable := value, you have to supply a variable in a functional style expression, and you end up with a stack variable. Instruction style =: variable, the value is taken directly from the bottom of the stack. In both cases, the colon refers to the variable name. The effect of assignment is to replace the value of the variable on the stack with the new value.
assembly {
let v := 0 // functional-style assignment as part of variable declaration
let g := add(v, 2)
sload(10)
=: v // instruction style assignment, puts the result of sload(10) into v
}
Copy the code
Switch
You can use the switch statement as a basic version of the if/else statement. It takes a value that can be compared to multiple constants. Each branch corresponds to a constant corresponding to Chelsea. Contrary to the error-prone behavior of some languages, control flow does not automatically move from one judgment scenario to the next. Finally, there’s a bottom pocket called Default.
assembly {
let x := 0
switch calldataload(4)
case 0 {
x := calldataload(0x24)
}
default {
x := calldataload(0x44)
}
sstore(0, div(x, 2))
}
Copy the code
Possible cases do not need to be wrapped in braces, but each case needs to be wrapped in braces.
cycle
Inline compilation supports a simple for-style loop. The head of a for-style loop has three sections, an initial section, a condition section, and a post-overlay section. The condition must be a functional expression with the other two parts wrapped in braces. If any variables are defined in the initialized block, the scope of those variables is extended to the body of the loop by default (condition, similar to variables defined in the following superposition section). Because the default is block scope, this is a special case.
assembly {
let x := 0
for { let i := 0 } lt(i, 0x100) { i := add(i, 0x20) } {
x := add(x, mload(i))
}
}
Copy the code
function
Assembly language allows you to define low-level functions. These require fetching parameters on the stack (and a returned line of code), which also stores the result on the stack. Calling a function looks the same as executing a function-style opcode.
Functions can be defined anywhere and can be visible within the defined block. Inside a function, you cannot access a local variable defined outside the function. There is also no explicit return statement.
If you call a function and return multiple values, you can assign them to a tuple using a, b := f(x) or let a, b := f(x).
The following example implements an exponential function by multiplying it square.
assembly {
function power(base, exponent) -> result {
switch exponent
case 0 { result := 1 }
case 1 { result := base }
default {
result := power(mul(base, base), div(exponent, 2))
switch mod(exponent, 2)
case 1 { result := mul(base, result) }
}
}
}
Copy the code
Things to watch out for in inline assembly
Inline assembly language requires a high level of vision, but it is a very low-level syntax. Function calls, loops, and switches are converted to simple rewriting rules. Another language provides rearranging function-style opcodes, managing jump tags, calculating stack heights to facilitate variable access, and, at the end of a block, removing local variables defined within the block. It is the last two cases that need special attention. It is important to be aware that assembly language only provides stack height calculations from start to finish, it does not calculate the stack height according to your logic. In addition, operations such as swapping simply swap the contents of the stack, not the positions of variables.
Conventions in Solidity
Unlike EVM assembler, Solidity knows that the type is less than 256 bytes, for example, uint24. To make them more efficient, most math operations simply count as a 256-byte number, with the higher bits cleaned up only when needed, such as before writing to memory or when comparing. This means that if you access such a variable in inline assembly, you will have to manually clear the high level of invalid bytes.
Solidity manages memory in a very simple way: internally there is a pointer to spatial memory at the memory location 0x40. If you want to allocate memory, you can just use the memory from that location and update the pointer accordingly.
Memory array elements in Solidity always take up more than 32 bytes of memory (that is, the same is true for byte[], but not for bytes and strings). Multidimensional arrays of memory are arrays that refer to memory. The length of a dynamic array is stored in the first slot of the data, followed by the elements of the array.
Fixed-length memory arrays do not have a length field, but they will soon add this field to allow better conversion between fixed-length and variable-length arrays, so don’t rely on this.
The content of the reference: https://open.juzix.net/doc
Smart Contract Development Tutorial video: Introduction to Smart contracts in the blockchain video series