Hello, I am the smiling snail, 🐌.

In the last article, we talked about HTML parsing and implemented a little HTML parser. For those of you who haven’t seen it, click the link below and go back to it.

  • I heard you wanted to write a rendering engine – HTML parsing

Today, we’ll focus on CSS parsing, and we’ll also implement a simple CSS parser that outputs stylesheets.

CSS rules

CSS rules are somewhat complex, in addition to the basic general selector, element selector, class selector, ID selector, there are group selector, group selector and so on.

  • Universal selector,*Is a wildcard character that matches any element.
* {
	width: 100px;
}
Copy the code
  • Element selector that defines the style of the tag.
// Any div element matches this style
div {
	width: 100px;
}
Copy the code
  • ID selector to#Start, used in the elementidProperty specified.
// All elements with id test are matched
#test {
	text-align: center;
}

/ / set id
<span id="test"></span>
<h1 id="test"></h1>
Copy the code

In addition, it can be combined with elements to indicate double matching.

// The match is performed only when the tag is h1 and id = test
h1#test {
	text-align: center;
	color: #ffffff;
}

<h1 id="test"></h1>
Copy the code
  • Class selector to.Start, used in the elementclassProperty specified.
.test {
	height: 200px;
}

/ / match
<div class="test"></div>
<p class="test"></p>
Copy the code

Again, it can be combined with elements, double matched. This will only match if the elements are the same and the element’s class attribute contains all the classes specified in the rule.

div.test.test1 {
	height: 200px;
}

/ / match
<div class="test test1"></div>
<div class="test test1 test2"></div>

/ / don't match
<div class="test test2"></div>
Copy the code
  • Group selectors, specifying a set of selectors to.Separated. Nodes match styles if they satisfy any of the selectors.
div.test, #main {
	height: 200px;
}
Copy the code
  • Combinatorial selectors, you can combine them in a variety of ways, but I won’t expand them here.

Achieve the goal

For the sake of simplicity, we implement only a few selectors mentioned above: generic, element, class, ID, and group selectors.

In addition, the selector has a priority. Priorities are as follows:

ID selector > Class selector > Element selector

Attribute values can be represented in various ways, such as:

  • The keyword is a pure string that meets certain rules, for example:text-align: center;
  • The length, in the form of number + unit, as inheight: 200px;And there can be many different units.em/pxAnd so on; There is also a percentage form, such asheight: 90%;
  • Color value, which can be hexadecimalcolor: #ffffff;, can also be represented by a color stringcolor: white;.
  • .

Here, only the most basic form is supported.

  • The keyword.
  • The length is a numeric type and the unit is fixedpx.
  • Color value, fixed to hexadecimal, supportedrgba/rgb.

Data structure definition

Stylesheets, which consist of lists of CSS rules, are also the end product of CSS parsing.

So how do you define data structures to represent CSS rules?

According to the CSS notation above, we can know:

CSS rule = list of selectors + List of property values

The selector has three forms: element selector, class selector and ID selector. In simple terms, it can contain tag, class, ID, and more than one class.

So, the selector structure can be defined as follows:

struct SimpleSelector {
    / / tag name
    var tagName: String?
    
    // id
    var id: String?
    
    // class
    var classes: [String]}// Can be used as an extension, such as adding combinatorial selectors, now only simple selectors are supported
enum CSSSelector {
    case Simple(SimpleSelector)
}
Copy the code

Property structure, easier to define. Attribute name + attribute value.

struct Declaration {
    let name: String
    let value: Value
}
Copy the code

As mentioned above, there are three types of attribute values:

  • The keyword
  • Color value
  • The value is in px units only

Therefore, the attribute value structure is defined as follows:

enum Value {
		/ / key
    case Keyword(String)
    
    // rgba
    case Color(UInt8, UInt8, UInt8, UInt8)
    
    / / the length
    case Length(Float, Unit)
}

/ / unit
enum Unit {
    case Px
}
Copy the code

With the above structure, you can define the structure of the CSS rules.

// CSS rule structure definition
struct Rule {
    / / selector
    let selectors: [CSSSelector]
    
    // Declared attributes
    let declarations: [Declaration]
}
Copy the code

Also, the structure of the stylesheet can be defined.

// Style sheet, the final product
struct StyleSheet {
    let rules: [Rule]
}
Copy the code

The overall data structure is shown in the figure below:

Selector priorities are distinguished by a triple.

// It is used to sort the selectors. The priority is id, class, tag
typealias Specifity = (Int, Int, Int)
Copy the code

The sorting is based on whether there is id, Number of classes, and Whether there is tag.

extension CSSSelector {
    public func specificity() -> Specifity {
     
        if case CSSSelector.Simple(let simple) = self {
            / / id
            let a = simple.id == nil ? 0 : 1
            
            / / the class number
            let b = simple.classes.count
            
            / / the tag
            let c = simple.tagName == nil ? 0 : 1
            
            return Specifity(a, b, c)
        }
        
        return Specifity(0.0.0)}}Copy the code

Selector resolution

Since we support grouping selectors, it is a set of selectors to, delimited. Such as:

div.test.test2, #main {
}
Copy the code

We only need to focus on the parsing of a single selector here, because grouping selector parsing is just a way of looping over a single selector.

Single selector resolution

There are some obvious rules for differentiating different selectors:

  • *The wildcard
  • In order to.It starts with class
  • In order to#It starts with an ID

In addition, if it is not within the rules, we will do the following:

  • In other cases, characters are considered elements if they meet certain rules
  • The rest are considered invalid

Now, let’s analyze them one by one.

  • For wildcard *, no data padding is required; id, tag, and classes in the selector are empty. Because then you can match any element.

  • For characters beginning with., belong to class. Parse the class name.

The class name must be a combination of arrays, letters, underscores, and dashes, for example, test-2_A. We call that a valid string. Note: This rule will be used in many places below.

// A valid identifier, including digits, letters, and _-
func valideIdentifierChar(c: Character) -> Bool {
    if c.isNumber || c.isLetter || c == "-" || c == "_" {
        return true
    }
    
    return false
}

// Parse the identifier
mutating func parseIdentifier() -> String {
    // alphanumeric -_
    return self.sourceHelper.consumeWhile(test: validIdentifierChar)
}
Copy the code
  • For characters starting with #, belong to the ID selector. The ID name is resolved using the same valid string determination rule.

  • Otherwise, if the string is a valid string, it is considered an element.

  • The rest, which are invalid characters, exit the parsing process.

The entire parsing process is as follows:

// Parse the selector
// tag#id.class1.class2
mutating func parseSimpleSelector() -> SimpleSelector {
    var selector = SimpleSelector(tagName: nil, id: nil, classes: [])
    
    outerLoop: while! self.sourceHelper.eof() {switch self.sourceHelper.nextCharacter() {
        // id
        case "#":
            _ = self.sourceHelper.consumeCharacter()
            selector.id = self.parseIdentifier()
            break
            
        // class
        case ".":
            _ = self.sourceHelper.consumeCharacter()
            let cls = parseIdentifier()
            selector.classes.append(cls)
            break
            
        // Wildcard, no data is needed in selector, can be matched arbitrarily
        case "*":
            _ = self.sourceHelper.consumeCharacter()
            break
            
        // tag
        case let c where valideIdentifierChar(c: c):
            selector.tagName = parseIdentifier()
            break
            
        case_ :break outerLoop
        }
    }
    
    return selector
}
Copy the code

Group selector resolution

Group selector parsing, loop calls above procedure, note exit condition. When {is encountered, indicating the beginning of the property list, you can exit.

In addition, when the list of selectors is obtained, it is necessary to sort the selectors from high to low in order to prepare for the generation of the style tree in the next stage.

// Sort the selectors from highest priority to lowest
selectors.sort { (s1, s2) -> Bool in
    s1.specificity() > s2.specificity()
}
Copy the code

Attribute resolution

The rule definition of an attribute is fairly straightforward. It separates the attribute name and attribute value with:, with; At the end.

Attribute name: attribute value; margin-top: 10px;Copy the code

As usual, let’s look at the parsing of individual attributes.

  • Parse out the attribute name, again following the rules for valid characters above.
  • Make sure there is:The separator.
  • Parse the property values.
  • Make sure to;The end.

Attribute value resolution

It’s a little more complicated because the attribute value contains three cases.

1. Color value analysis

The color value starts with #, which is easy to distinguish. Next comes the rGBA value, which is an 8-bit hexadecimal character.

However, we don’t usually write all alpha. So you need to be compatible with only 6 bits, where alpha defaults to 1.

The idea is very intuitive, just take out two characters one by one, convert to decimal number.

  • Take two characters and convert them to decimal.
mutating func parseHexPair() -> UInt8 {
        // Retrieves 2-bit characters
        let s = self.sourceHelper.consumeNCharacter(count: 2)
        
        // Convert to an integer
        let value = UInt8(s, radix: 16)??0
        
        return value
    }
Copy the code
  • Extract RGB one by one. If alpha is present, parse.
// Parse color values in hexadecimal format only, starting with #, #897722
    mutating func parseColor() -> Value {
        assert(self.sourceHelper.consumeCharacter() == "#")
        
        let r = parseHexPair()
        let g = parseHexPair()
        let b = parseHexPair()

        var a: UInt8 = 255
        
        // If there is alpha
        ifself.sourceHelper.nextCharacter() ! =";" {
            a = parseHexPair()
        }
        
        return Value.Color(r, g, b, a)
    }
    
    
Copy the code

2. Length numerical analysis

width: 10px;
Copy the code

At this point, the attribute value = floating point value + unit.

  • First, the floating-point values are resolved. The simple treatment here, the combination of “numbers” and “dots”, does not strictly judge validity.
// Parse floating point numbers
mutating func parseFloat() -> Float {
    let s = self.sourceHelper.consumeWhile { (c) -> Bool in
        c.isNumber || c == "."
    }
    
    let floatValue = (s as NSString).floatValue
    return floatValue
}
Copy the code
  • Then, you resolve the units. The unit can only be PX.
// Parse the unit
mutating func parseUnit() -> Unit {
    let unit = parseIdentifier()
    if unit == "px" {
        return Unit.Px
    }
    
    assert(false."Unexpected unit")}Copy the code

3. Keywords, that is, ordinary strings

Keywords are extracted according to the rules of valid characters.

Attribute list parsing

When a single property is resolved, the property list is simple. Same routine, same cycle.

  • Ensure that the character is{At the beginning.
  • When faced with}, indicating that the attribute declaration is complete.

The process is as follows:

// Parse the declared property list
/** { margin-top: 10px; margin-bottom: 10px } */
mutating func parseDeclarations() -> [Declaration] {
    var declarations: [Declaration] = []
    
    // Start with {
    assert(self.sourceHelper.consumeCharacter() == "{")
    
    while true {
        self.sourceHelper.consumeWhitespace()
        
        // If} is encountered, the rule declaration is complete
        if self.sourceHelper.nextCharacter() == "}" {
            _ = self.sourceHelper.consumeCharacter()
            break
        }
        
        // Parse a single attribute
        let declaration = parseDeclaration()
        declarations.append(declaration)
    }
    
    return declarations
}
Copy the code

Parsing rules

Since a single rule consists of a list of selectors + a list of properties, the resolution of selectors and properties has been done above. So to get a rule, you just have to combine the two.

mutating func parseRule() -> Rule {
		// Parse the selector
    let selectors = parseSelectors()

		// Parse attributes
    let declaration = parseDeclarations()
    
    return Rule(selectors: selectors, declarations: declaration)
}
Copy the code

Parsing the entire rule list is a loop that calls the parsing of a single rule.

// Parse CSS rules
mutating func parseRules() -> [Rule] {
    var rules:[Rule] = []
    
    // Loop parsing rules
    while true {
        self.sourceHelper.consumeWhitespace()
        
        if self.sourceHelper.eof() {
            break
        }
        
				// Parse a single rule
        let rule = parseRule()
        rules.append(rule)
    }
    
    return rules
}
Copy the code

Generate style sheets

The stylesheet is made up of a list of rules, and you can simply wrap the list of rules parsed in the previous step into the stylesheet.

// The parsed method provided externally returns the stylesheet
mutating public func parse(source: String) -> StyleSheet {
    self.sourceHelper.updateInput(input: source)
    
    let rules: [Rule] = parseRules()
    
    return StyleSheet(rules: rules)
}
Copy the code

The test code

let css = """ .test { padding: 0px; margin: 10px; position: absolute; } p { font-size: 10px; color: #ff908912; }"""

/ / CSS parsing
var cssParser = CSSParser()
let styleSheet = cssParser.parse(source: css)
print(styleSheet)
Copy the code

You can test the above code to see the output.

The full code can be viewed at: github.com/silan-liu/t… .

conclusion

In this tutorial, we focused on how to parse individual selectors, individual properties, and individual rules, and how to combine them to parse the whole and eventually generate a style sheet.

The analysis of these parts has a common way of thinking. From the whole to the part, and from the part back to the whole.

By breaking down the overall parsing task into individual goals, the problem becomes smaller. Focus on completing the parsing of a single goal, and recycle the invocation of a single resolution to achieve the overall goal.

The next article will cover style tree generation. Stay tuned ~

The resources

  • CSS rule: developer.mozilla.org/zh-CN/docs/…
  • Github:github.com/silan-liu/t…