In a previous blog post, I wrote a simple tool to export function call diagrams in iOS code using the library LibTooling provided by Clang. However, this implementation had some obvious drawbacks:
- When analyzing a single code file in a project, classes or methods defined in other files cannot be known, resulting in the loss of generated syntax tree nodes, which has a great impact on the final result.
- Clang will be preprocessed during parsing, so the resulting output may include some functions from external system libraries, which is useless information for us (of course, this should be my posture problem).
- Swift cannot be supported. The Front end of the Swift compiler is not Clang, and the tool is based on the Clang library, so there is no possibility of swift support.
Due to these shortcomings (mainly the third point, because SWIFT is still the main part in daily work), it has not been continued to use and improve. Until recently, DUE to my work schedule, I needed to maintain a relatively old code. Faced with thousands of lines of code files, I still needed a relatively handy tool to assist in reading. Some time ago, just in time for the National Day holiday, I took time to rewrite a tool with Swift: DrAfter, which, as the name suggests, is designed to generate sketches describing code.
What is the Drafter
- Drafter is a command line tool for analyzing iOS engineering code with support for Objective-C and Swift.
- Automatically parses the code and generates method call diagrams.
- Automatically parses the code and generates class inheritance diagrams.
Installation and use
The full code is here: github.com/L-Zephyr/Dr…
Here is a quick setup script that executes commands in the shell:
curl "https://raw.githubusercontent.com/L-Zephyr/Drafter/master/install.sh" | /bin/sh
Copy the code
The drafter program is automatically installed in the /usr/local/bin directory and can be used directly on the terminal.
Please refer to the usage guide for details
Realize the principle of
In the previous approach, all the parsing of the source code is left to Clang, and only the generated AST is processed. This is actually a lazy approach, which has no control over the final generated result, and also breaks the possibility of supporting Swift. To get better output and support Swift and OC, you have to do the source code parsing yourself. Fortunately, we only need to parse the class, method definition, and method call pieces, and the actual work is not very complicated.
lexing
Lexical parsing is the first step in program compilation. Lexical parsing is the division of code into a series of lexical units. A lexical unit is a tag with special meaning and is the smallest unit of a parser when dealing with source code. For example, a simple assignment expression int I = 3 is processed into a series of lexical units: int, I, =, 3 after lexical analysis.
struct Token {
var type: TokenType
var text: String
}
enum TokenType {
case endOfFile // End of file
case name / / variable names
case colon / / the colon
case comma / / a comma. }Copy the code
A structure named Token is first defined to represent the lexical unit, where the enumeration value type is used to represent the type of the lexical unit, and text holds the original data of the lexical unit. For example, for a variable N, its type is.name and text is n after parsing into Token. Since our goal is to parse only classes and methods, we define only the lexical unit types that will be used in class and method definitions and ignore those we don’t care about.
The lexical parser parses any incoming source code into a lexical unit stream, which for upper-level users acts as an iterator to traverse the lexical unit until the end of the file, so here we can define a basic lexical parser type with just one calculated attribute, nextToken, to fetch the next lexical unit:
protocol Lexer {
var nextToken: Token { get}}Copy the code
Syntax parsing
After splitting the source code into typed lexical units through the first step of lexical analysis, it is time for parsing. To analyze a program, such as the expression 1 + 2, we cannot treat it literally. We have to convert it into some intermediate form that can be processed. That’s what parsing does. The syntax parser scans the lexical unit stream according to the grammar rules of the language and generates an intermediate representation (IR), typically an abstract syntax tree (AST), which is then analyzed in the semantic analysis phase. Drafter only goes as far as parsing the syntax, parsing only the classes, method definitions, and method calls in the code, and the resulting data structures are relatively simple.
A grammatical description of a language
Programs are composed of multiple valid expressions, and what we need to do is to identify those expressions that conform to specific rules. Language-specific grammar rules are called the grammar of the language, and these rules can be described by a DSL (BNF paradigm).
For example (from Programming Language Implementation Patterns), for a list declaration that can contain any letter such as [a, b, c], its grammar rules are described as follows:
list = '[' elements ']'; Elements = elemenet (',' element)*; / / * said zero or more element = NAME | list; / / | said or elements may be another list NAME = (' a '.. 'z' | 'A'.. 'Z')+; // + means one or moreCopy the code
This is a description of a list declaration grammar, and here I make a distinction between lexical rules and grammar rules, the names of the grammar rules are lowercase, and the names of the lexical rules are uppercase. Each rule contains one or more analytic options, multiple analytic options separated by | symbol. The NAME rule matches a lexical unit containing at least one letter. The list rule says that the list must be surrounded by brackets and contain at least one element, separated by commas, that can be a variable or another list declaration.
With a clear definition of grammar rules we can write a grammar parser, which I refer to here for the objective-C grammar.
Recursive descent analysis
After defining the structure of the grammar and the related lexical units, it is only necessary to identify the corresponding formulas during parsing. In short, the job of the parser is to perform certain operations when encountering certain structures. In terms of implementation, we provide a special matching function for each grammar rule, and match function is used uniformly for lexical rules:
@discardableResult
func match(_ t: TokenType) throws -> Token // Matches the lexical unit of the specified type. The lexical unit is returned on success
Copy the code
For the above list example, we could write the following function for identification:
func list(a) throws
func elements(a) throws
func element(a) throws
Copy the code
Each function recognizes a particular substructure and may call other recognition functions or recursively call itself. In recognition from the beginning of the lexical unit, top-down derivation. So this method of analysis is also called recursive descent analysis, and parsers written in this way are called LL parsers. The first L indicates that parsed content is entered from left to right, and the second L indicates that parsed content is also derived from left to right (leftmost derivation).
For the element rule above, it may match either a variable name or another list, which needs to be checked before entering the Element function. Luckily, the list rule always starts with a [symbol, and the variable rule always starts with a letter. Just check the current lexical unit type to determine:
func element(a) throws {
if currentToken.type == .leftSquare {
try list()
} else {
try match(.name)
}
}
Copy the code
A grammar like this is called an LL(1) grammar. The corresponding parser is called an LL(1) parser. 1 means that the parser can only look at one lexical unit forward from the parsing position. This lexical unit is often called a lookahead.
LL (k) parser
The LL(1) parser is very simple, but not very powerful. For example, in the list syntax example above, adding an assignment to a list element: [a, b = c, d], the element rule becomes:
element = NAME
| NAME '=' NAME
| list
Copy the code
There are two parsing options in Element’s grammar that start with the lexical unit NAME. Looking at only one lexical unit does not determine this, so we need to look forward to more lexical units when parsing, which means that the syntax is no longer LL(1).
The actual parsing is much more complicated than this, and you may need to look ahead to multiple lexical units to determine the parsing strategy, so you need to build a parser that can view as many symbols as you need, the LL(k) parser. There are currently tools in use that can automatically generate parsers based on specific DSLS, such as Antlr, but considering that the code generated over DSLS is not particularly debugging, and Drafter only does some very simple parsing, he wrote a simple LL(k) parser himself. A basic parser like this is provided in Drafter:
class BacktrackParser: Parser {
init(lexer: Lexer) {
self.input = lexer
}
func token(at index: Int = 0) -> Token{... }... }Copy the code
Taking a Lexer as an initialization parameter, the token() method provides the ability to look forward from the current location to any location lexical unit, while parsing of specific grammar rules is done through the various subclassed parsers. Objective-c and Swift code is processed by different parsers, which output the same data structures, such as nodes representing types:
class ClassNode: Node {
var superCls: ClassNode? = nil / / parent class
var className: String = "" / / the name of the class
var protocols: [String] = [] // Implement the protocol
}
Copy the code
After all the syntax node information of interest has been parsed out, all that remains is to process and present the information. Drafter provides options for filtering and searching syntax nodes, filtering out information of interest through the parameters provided, and finally passing this data to the DotGenerator class, which generates code for Dot language (a graph-describing language) based on node information. Pass it to Graphviz to generate the image.
Method call resolution
To discuss the parsing of method calls separately, first define a syntax node type for the method call:
enum MethodInvoker {
case name(String) // Common variables
case method(MethodInvokeNode) // Another method call
}
class MethodInvokeNode: Node {
var isSwift: Bool = false
var invoker: MethodInvoker = .name("") / / the caller
var params: [String] = [] / / parameter name
var methodName: String = ""
}
Copy the code
The caller of one method may be a variable or the return value of another method call (a chained call), so invoker is defined as an enumerated value.
The OC method is called by ObjcMessageSendParser, and the Swift method is called by SwiftInvokeParser. Take OC for example, for a simple call like this:
[self.view insertSubview:subview atIndex:0];
Copy the code
The matching result is :[self.view insertSubview: atIndex:], ignoring the parameter. For chained method calls:
[[self objectAtIndex: 1] doSomethingWith: param];
Copy the code
The parse results in just a chained call representation: [[self objectAtIndex:] doSomethingWith:], not objectAtIndex: and doSomethingWith:.
For more complex forms, such as the definition of a Block, other methods are called in the Block, such as:
[Post globalTimelinePostsWithBlock:^(NSArray *posts, NSError *error) { if (! error) { self.posts = posts; [self.tableView reloadData]; } }];Copy the code
Let’s start with a simple definition of the OC method call grammar:
message_send = '[' receiver param_list ']' receiver = message_send | NAME param_list = NAME | (NAME ':' param)+ param = .Copy the code
The specific parameters of a method call are parsed by the rule Param, which needs to know whether it is currently in another closure or other substructure so that it can end the match at the right time. This can be determined by counting the number of parentheses. Param enters the message_send rule when it encounters another method call statement and adds the result to the final match with the following pseudocode:
func param(a) throws {
whileFile not finished {ifNot in substructure && parameter matching end {return
}
if isMessageSend() {
try messageSend() // Match the method callSave to the final matching resultcontinue
}
consume()
}
}
Copy the code
Afterword.
The above is the basic idea of Drafter implementation, and the three problems mentioned at the beginning have been basically solved. Drafter helped me a lot during this period of work, at least when I was faced with one such code file
Exporting the flow of calls to specified methods makes it easier to sort out the code’s logic more quickly:
We’ll add more features, parsing capabilities, etc. to Drafter if needed, hopefully this gadget will make reading your code a little bit easier 😁.