Swift compiler intermediate code SIL

Posts are posted synchronously on personal blogs at the Swift compiler intermediate SIL

Why design SIL

The figure above shows a traditional LLVM-based compiler flow, such as C, C++, and Objective-C. Code analysis is mainly based on THE CFG (AST level), CFG full name Control Flow Graph (function Flow Control Graph), is in the Clang layer, but this has many disadvantages.

Disadvantages:

The large abstraction gap between source code and LLVM IR does not apply to source-level analysis
CFG lacks fidelity
CFG is not on a Hot path, which is a block of code that has been iterated over many times. As you can see, it is not on the main path of the compiler flow
There is a lot of rework in CFG and IR reduction

Swift is a high-level language with some advanced features, such as generics based on protocol. It is also a secure language that ensures variables are initialized before being used, detecting unreachable code. A layer of SIL was added to the Swift compiler to do this.

SIL

The ability to fully preserve the semantics of the program
Designed for code generation and analysis
On the hot Path of the compiler flow
Make up for the huge abstraction between source code and LLVM

Introduction of SIL

The Swift compiler has an intermediate representation between AST and LLVM IR called SIL. The SIL is generated by scanning the AST using the Visitor pattern. SIL performs high-level semantic analysis and optimization on Swift. Like LLVM IR, it has structures such as Module, Function, and BasicBlock. Unlike LLVM IR, it has a richer type system, information about loops and error handling is retained, and virtual function tables and type information is retained in a structured form. It is designed to retain the meaning of Swift for powerful error detection, memory management, and advanced optimization.

Also, like LLVM IR, SIL is static-single-assignment (SSA), so values can never be redefined. When an instruction refers to a value, the value is either an input parameter to the current base block or is defined by a single unique instruction in that block. Note that, unlike a “real” programming language, SIL is “flat” because there is no nested structure syntactically. Each instruction references values produced by other instructions and performs a logical operation on them to produce new values.

Related code: /lib/SILGen /lib/SIL

SSA

SSA stands for static Single-Assignment and is an IR(intermediate representation code) that ensures that each variable is assigned only once. This helps simplify the compiler’s optimization algorithm.

x = 0;
x = 1;
y = 2 * x;

Copy the code

For example, in the above code, y = 1 is actually not available. This requires a defined reachability analysis to determine whether y should use 1 or 2. SSA has an identifier called version or “generation”.

x1 = 0;
x2 = 1;
y = 2 * x2;
Copy the code

So there’s no indirect value. The advantage of using SSA is that independent use of the same variable can be expressed as different “generations”, which can facilitate the implementation of many compiler optimization algorithms.

In summary, SSA provides four benefits:

Because SSA makes each variable uniquely defined, data flow analysis and optimization algorithms can be simpler.
The space consumed by using a -definition chain has been reduced from exponential growth to linear growth. If a variable has N uses and M definitions, if SSA is not used, there is M×N use-definition relationship.
SSA is more accurate because of the relationship between usage and definition, which can simplify the algorithm of constructing interference graph.
Several unrelated uses of the same variable in the source program are converted into uses of different variables in the SSA form, thus eliminating many unnecessary dependencies.

With precise object use-definition relationships, many optimizations that take advantage of use-definition relationships can be more precise, thorough, and efficient. Such as

Constant communication
Dead code removal
global
Some redundancy is deleted
The strength weakening
Register allocation

I will not talk more about SSA here. I have not studied it in depth, and I may write a separate article about it later. In addition, I recommend a Static Single Assignment Book.

SIL language features

Single line instruction, i.e. a line representing an instruction
Strongly typed
Method decomposed into consecutive building-blocks (BB)
Contains the ARC directive
Designed for code distribution
The number of SIL registers is infinite, increasing from %0, %1, %2

SIL structure

SIL programs are collections of named functions, each consisting of one or more basic blocks. A base block is a linear sequence of instructions, with the last instruction in each block transferring control to another base block or returning from a function.

Module

The entire SIL source file, called module in Chinese, consists of SILFunction and SILGlobalVariable. We get iterators from begin() and end(), which quickly iterate through all the functions in the module.

using iterator = FunctionListType::iterator;
iterator begin() { return functions.begin(); }
iterator end() { return functions.end(); }
Copy the code

The full interface can be viewed at the following address:

include/swift/SIL/SILModule.h

Function

Contains all objects related to function definitions and declarations. Each SILFunction consists of SILBasicBlock and SILArgument, which can be regarded as a direct conversion of the Swift function. Check to see if it is declared in this module by isDefinition(). Both cases contain parameter lists, which can be obtained by getArgument().

SILArgument *getArgument(unsigned i) 
  
ArrayRef<SILArgument *> getArguments() const
Copy the code

Complete interface can be through the following address view: include/swift/SIL/SILFunction. H

Basic Block

SILBasicBlock consists of SILInstruction, which is a linear sequence of instructions. The last instruction in each SILBasicBlock transfers control to another SILBasicBlock or returns control from a function. SILInstruction can be accessed via begin()/end(). You can also access the last instruction directly using the getTerminator() method.

TermInst *getTerminator()
Copy the code

Complete interface can be through the following address view: include/swift/SIL/SILBasicBlock. H

Instruction

The basic unit in the SIL, the actual manipulation of values or instructions to call functions.

SILValue and SILType

SILValue

SILValue defines use_begin() and use_end() methods for iterating through users, or getUses() to get the range of all users. This is useful for iterating through all users. If you want to ignore the debug message instructions, you can use getNonDebugUses instead. These methods make it easy to access its def-use chain.

  inline use_range getUses() const;
Copy the code

SILType

Each SIL value has a SIL type, and you can view the source siltype.h here. There are two main SIL types, object and addresses. Objects here are different from objects in traditional object-oriented programming. Object types include integers, instances of a class, structural values, or functions. An address is a value that stores a pointer to an object type. You can determine the type by isAddress() and isObject().

/// True if the type is an address type.
bool isAddress() const { return getCategory() == SILValueCategory::Address; }

/// True if the type is an object type.
bool isObject() const { return getCategory() == SILValueCategory::Object; }
Copy the code

Metatype Types Siltypes there are many Metatype Types, of which metadata Types are one. A specific metadata type in SIL must describe its representation

@thin, which represents an exact concrete type that does not need to be stored.
Thick, which describes the representation of a metatype, refers to an object type or a subclass of it
@objc, which means to use objective-C class objects instead of pure Swift object representations

Type lowering Type lowering is a system that provides formal types for writing swift. Swift’s formal type system intentionally abstracts many representative issues, such as transfer of ownership conventions and directness of parameters. The SIL is intended to represent most of these implementation details, and these differences should be reflected in the SIL type system, so the SIL type is much richer. Conversion operations from Formal type to SIL Type are known as type degradation, and SIL Types are also known as “saving types”, the reduced types.

Since SIL is an intermediate language, the SIL value roughly corresponds to the abstract machine’s infinite register. Address-only types are essentially those that are “too complex” to be stored in registers. Non-address-only types are called loadable types, which means they can be loaded into registers.

It is legal for an address type to point to a non-address-only type, but it is not legal for an object type to contain address-only.

Can be in/lib/SIL/IR/TypeLowering CPP check implementation details, the main method is getLoweredType (), returning from formal type SIL type.

Builtin

Swift’s Mysterious Builtin Module was a mysterious building module built by mysterious people.

In Swift, Int is actually a struct, and + is a global function for Int overloading. Strictly speaking, Int and + are not part of the Swift language; they are part of the Swift standard library. Does this mean that there is an extra burden when operating on Int or +, causing Swift to run slower? Of course not, because we have Builtin.

Builtin exposes LLVM IR types and methods directly to the Swift library, so there is no additional runtime burden when we operate on Int and +.

Int, for example, is a struct in the library that defines a property value of type builtin.int64. We can convert the value attribute between Int and builtin.int64 using unsafeBitCast. Int also overloads the init method, allowing us to construct an Int directly from builtin.int64. These are efficient operations with no performance penalty.

Raw SIL and Canonical SIL

There are two forms of SIL, raw SIL (raw SIL) and Canonical SIL (canonical SIL). The non-optimized SIL that has just come out of SILGen is called a RAW SIL.

Swift source code can be converted to RAW SIL via swifTC’s -EMIT – Silgen.

swiftc -emit-silgen Source.swift -o Source.sil
Copy the code

The optimized SIL produced by the SIL Optimizer is called a canonical SIL. Raw sil can be converted to canonical sil via swifTC’s -EMIT -sil.

swiftc Source.sil -emit-sil  > Source-canonical.sil
Copy the code

You can also convert the Swift source code directly to Canonical SIL.

swiftc Source.swift -emit-sil  > Source-canonical.sil
Copy the code

example

Let’s look at the simplest example

func test(number: Int) -> Bool {
    if number > 0 {
        return true
    } else {
        return false}}Copy the code

Convert to raw SIL using swiftc:

swiftc -emit-silgen Source.swift -o Source.sil

Copy the code

sil_stage raw

import Builtin
import Swift
import SwiftShims

func test(number: Int) -> Bool

// main
sil [ossa] @main : $@convention(c) (Int32, UnsafeMutablePointer<Optional<UnsafeMutablePointer<Int8>>>) -> Int32 {
bb0(%0 : $Int32, %1 : $UnsafeMutablePointer<Optional<UnsafeMutablePointer<Int8>>>):
  %2 = integer_literal $Builtin.Int32, 0          // user: %3
  %3 = struct $Int32 (%2 : $Builtin.Int32)        // user: %4
  return %3 : $Int32                              // id: %4
} // end sil function 'main'

// test(number:)
sil hidden [ossa] @$s6Source4test6numberSbSi_tF : $@convention(thin) (Int) -> Bool {
// %0                                             // users: %8, %1
bb0(%0 : $Int):
  debug_value %0 : $Int.let, name "number", argno 1 // id: %1
  %2 = metatype $@thin Int.Type                   // user: %8
  %3 = integer_literal $Builtin.IntLiteral, 0     // user: %6
  %4 = metatype $@thin Int.Type                   // user: %6
  // function_ref Int.init(_builtinIntegerLiteral:)
  %5 = function_ref @$sSi22_builtinIntegerLiteralSiBI_tcfC : $@convention(method) (Builtin.IntLiteral, @thin Int.Type) -> Int // user: %6
  %6 = apply %5(%3, %4) : $@convention(method) (Builtin.IntLiteral, @thin Int.Type) -> Int // user: %8
  // function_ref static Int.> infix(_:_:)
  %7 = function_ref @$sSi1goiySbSi_SitFZ : $@convention(method) (Int, Int, @thin Int.Type) -> Bool // user: %8
  %8 = apply %7(%0, %6, %2) : $@convention(method) (Int, Int, @thin Int.Type) -> Bool // user: %9
  %9 = struct_extract %8 : $Bool.#Bool._value // user: %10
  cond_br %9, bb1, bb2                            // id: %10

bb1:                                              // Preds: bb0
  %11 = integer_literal $Builtin.Int1, -1         // user: %14
  %12 = metatype $@thin Bool.Type                 // user: %14
  // function_ref Bool.init(_builtinBooleanLiteral:)
  %13 = function_ref @$sSb22_builtinBooleanLiteralSbBi1__tcfC : $@convention(method) (Builtin.Int1, @thin Bool.Type) -> Bool // user: %14
  %14 = apply %13(%11, %12) : $@convention(method) (Builtin.Int1, @thin Bool.Type) -> Bool // user: %15
  br bb3(%14 : $Bool)                             // id: %15

bb2:                                              // Preds: bb0
  %16 = integer_literal $Builtin.Int1, 0          // user: %19
  %17 = metatype $@thin Bool.Type                 // user: %19
  // function_ref Bool.init(_builtinBooleanLiteral:)
  %18 = function_ref @$sSb22_builtinBooleanLiteralSbBi1__tcfC : $@convention(method) (Builtin.Int1, @thin Bool.Type) -> Bool // user: %19
  %19 = apply %18(%16, %17) : $@convention(method) (Builtin.Int1, @thin Bool.Type) -> Bool // user: %20
  br bb3(%19 : $Bool)                             // id: %20

// %21                                            // user: %22
bb3(%21 : $Bool):                                 // Preds: bb2 bb1
  return %21 : $Bool                              // id: %22
} // end sil function '$s6Source4test6numberSbSi_tF'

// Int.init(_builtinIntegerLiteral:)
sil [transparent] [serialized] @$sSi22_builtinIntegerLiteralSiBI_tcfC : $@convention(method) (Builtin.IntLiteral, @thin Int.Type) -> Int

// static Int.> infix(_:_:)
sil [transparent] [serialized] @$sSi1goiySbSi_SitFZ : $@convention(method) (Int, Int, @thin Int.Type) -> Bool

// Bool.init(_builtinBooleanLiteral:)
sil [transparent] [serialized] @$sSb22_builtinBooleanLiteralSbBi1__tcfC : $@convention(method) (Builtin.Int1, @thin Bool.Type) -> Bool

Copy the code

The first one is the main function, which I won’t talk about here. From the test (number:) look

sil hidden [ossa] @$s6Source4test6numberSbSi_tF : $@convention(thin) (Int) -> Bool {
   .....
}
Copy the code

This is a SILFunction that contains three silbasicblocks: bb0, bb1, bb2, and bb3.
The method name s6Source4test6numberSbSi_tF is test(number:) after command refactoring. Identifier names in SIL begin with the @ symbol.
Convention (thin) (Int) -> Bool. Convention (thin) is a Swift function call. The parameter is Int and the return value is Bool.

bb0(%0 : $Int):
  debug_value %0 : $Int.let, name "number", argno 1 // id: %1
  %2 = metatype $@thin Int.Type                   // user: %8
  %3 = integer_literal $Builtin.IntLiteral, 0     // user: %6
  %4 = metatype $@thin Int.Type                   // user: %6
  // function_ref Int.init(_builtinIntegerLiteral:)
  %5 = function_ref @$sSi22_builtinIntegerLiteralSiBI_tcfC : $@convention(method) (Builtin.IntLiteral, @thin Int.Type) -> Int // user: %6
  %6 = apply %5(%3, %4) : $@convention(method) (Builtin.IntLiteral, @thin Int.Type) -> Int // user: %8
  // function_ref static Int.> infix(_:_:)
  %7 = function_ref @$sSi1goiySbSi_SitFZ : $@convention(method) (Int, Int, @thin Int.Type) -> Bool // user: %8
  %8 = apply %7(%0, %6, %2) : $@convention(method) (Int, Int, @thin Int.Type) -> Bool // user: %9
  %9 = struct_extract %8 : $Bool.#Bool._value // user: %10
  cond_br %9, bb1, bb2 
Copy the code

This simple method is divided into four blocks, bb0 for number > 0, bb1 for if, bb2 for else, and bb3 for return Bool, receiving a Bool and returning a Bool. We’re only going to analyze bb0 here, and we’re not going to analyze anything else.

This is a basic block.
Bb0 (%0: $Int) : the % symbol represents a register where the first argument is Int and stored in %0.
Debug_value %0: $Int, let, name “number”, argno 1 // id: %1: // id: %1
%2 = metatype $@thin int. Type // user: %8: Creates a metatype object of Type Int. @thin indicates that the metatype does not need to be stored because it is an exact Type. The user is %8.
%3 = integer_literal $builtin.intliteral, 0 // user: %6: Creates an integer literal value of type builtin.intliteral, which must be of built-in integer type. The literal value is specified using Swift’s integer literal syntax with a value of 0 and user %8.
%4 = metatype $@thin int. Type // user: %6, create a metatype object of Type Int. @thin indicates that the metatype does not need to be stored because it is an exact Type. The user is %6.
%5 = function_ref @@convention(method) (Builtin.IntLiteral, @thin Int.Type) -> Int // user: %6: function_ref is a reference to a function created with arguments of Type builtin. IntLiteral and @thin int. Type, which returns an Int. This function actually converts the integer literal value 0 to an Int value 0.
%6 = apply %5(%3, %4) : $@convention(method) (Builtin.IntLiteral, @thin Int.Type) -> Int // user: %8: apply calls the function, the function is %5, the parameters are registers %3, %4), the result is stored in register %8.
%7 = function_ref @@convention(method) (Int, Int, @thin int.type) -> Bool // user: %8: creates a reference to a function. The arguments are Int, Int, @thin int.type, and the user is %8.
%8 = apply %7(%0, %6, %2) : $@convention(method) (Int, Int, @thin Int.Type) -> Bool // user: %9: the apply function is called as %7, the parameters are registers %0, %6, %2, the result is stored in register %9;
Cond_br %9, bb1, bb2: cond_BR is a kind of conditional branch instruction for SILBasicBlock Terminators, if %9 is true, jump to BB1, false jump to bb2. It’s a conditional jump if number > 0.

Potential use of SIL

Swift hot update Rollout operates by adding a prefix to each method in the SIL layer

func add(a:Int, b:Int) -> Int {
if Rollout_shouldPatch(ROLLOUT_a79ee6d5a41da8daaa2fef82124dcf74) {
    letresultRollout : Int = Rollout_invokeReturn(Rollout_tweakData! , target:self, arguments:[a, b, origClosure: { argsin returnself.add(a:args[0],b:args[1]); });return resultRollout;
Copy the code

In the code above, Rollout_invokeReturn is responsible for executing a JavaScript function downloaded from the Rollout cloud. This function can call back to the original method if needed.

Mutation tests, also known as “Mutation tests,” usually change a place in the code to see if the program can follow the wrong code logic. In fact, this can be done by compiling the JIT of the backend LLVM, but it is very difficult.

Swift Intermediate Language (SIL) 2015 LLVM Developers’ Meeting: Joseph Groff & Chris Lattner A Case Study…” Benng. Me / 2017/08/27 /… How to talk to your kids about SIL type use Cocoaheads KRK #29 Swift Intermediate Language – Bartosz Polaczyk Swift’s mysterious Builtin module The secret life of types in Swift what is SSA and The role of SSA