Hello, I am the smiling snail, 🐌.
In the last article, we talked about HTML parsing and implemented a little HTML parser. For those of you who haven’t seen it, click the link below and go back to it.
- I heard you wanted to write a rendering engine – HTML parsing
Today, we’ll focus on CSS parsing, and we’ll also implement a simple CSS parser that outputs stylesheets.
CSS rules
CSS rules are somewhat complex, in addition to the basic general selector, element selector, class selector, ID selector, there are group selector, group selector and so on.
- Universal selector,
*
Is a wildcard character that matches any element.
* {
width: 100px;
}
Copy the code
- Element selector that defines the style of the tag.
// Any div element matches this style
div {
width: 100px;
}
Copy the code
- ID selector to
#
Start, used in the elementid
Property specified.
// All elements with id test are matched
#test {
text-align: center;
}
/ / set id
<span id="test"></span>
<h1 id="test"></h1>
Copy the code
In addition, it can be combined with elements to indicate double matching.
// The match is performed only when the tag is h1 and id = test
h1#test {
text-align: center;
color: #ffffff;
}
<h1 id="test"></h1>
Copy the code
- Class selector to
.
Start, used in the elementclass
Property specified.
.test {
height: 200px;
}
/ / match
<div class="test"></div>
<p class="test"></p>
Copy the code
Again, it can be combined with elements, double matched. This will only match if the elements are the same and the element’s class attribute contains all the classes specified in the rule.
div.test.test1 {
height: 200px;
}
/ / match
<div class="test test1"></div>
<div class="test test1 test2"></div>
/ / don't match
<div class="test test2"></div>
Copy the code
- Group selectors, specifying a set of selectors to
.
Separated. Nodes match styles if they satisfy any of the selectors.
div.test, #main {
height: 200px;
}
Copy the code
- Combinatorial selectors, you can combine them in a variety of ways, but I won’t expand them here.
Achieve the goal
For the sake of simplicity, we implement only a few selectors mentioned above: generic, element, class, ID, and group selectors.
In addition, the selector has a priority. Priorities are as follows:
ID selector > Class selector > Element selector
Attribute values can be represented in various ways, such as:
- The keyword is a pure string that meets certain rules, for example:
text-align: center;
- The length, in the form of number + unit, as in
height: 200px;
And there can be many different units.em/px
And so on; There is also a percentage form, such asheight: 90%;
- Color value, which can be hexadecimal
color: #ffffff;
, can also be represented by a color stringcolor: white;
. - .
Here, only the most basic form is supported.
- The keyword.
- The length is a numeric type and the unit is fixed
px
. - Color value, fixed to hexadecimal, supported
rgba/rgb
.
Data structure definition
Stylesheets, which consist of lists of CSS rules, are also the end product of CSS parsing.
So how do you define data structures to represent CSS rules?
According to the CSS notation above, we can know:
CSS rule = list of selectors + List of property values
The selector has three forms: element selector, class selector and ID selector. In simple terms, it can contain tag, class, ID, and more than one class.
So, the selector structure can be defined as follows:
struct SimpleSelector {
/ / tag name
var tagName: String?
// id
var id: String?
// class
var classes: [String]}// Can be used as an extension, such as adding combinatorial selectors, now only simple selectors are supported
enum CSSSelector {
case Simple(SimpleSelector)
}
Copy the code
Property structure, easier to define. Attribute name + attribute value.
struct Declaration {
let name: String
let value: Value
}
Copy the code
As mentioned above, there are three types of attribute values:
- The keyword
- Color value
- The value is in px units only
Therefore, the attribute value structure is defined as follows:
enum Value {
/ / key
case Keyword(String)
// rgba
case Color(UInt8, UInt8, UInt8, UInt8)
/ / the length
case Length(Float, Unit)
}
/ / unit
enum Unit {
case Px
}
Copy the code
With the above structure, you can define the structure of the CSS rules.
// CSS rule structure definition
struct Rule {
/ / selector
let selectors: [CSSSelector]
// Declared attributes
let declarations: [Declaration]
}
Copy the code
Also, the structure of the stylesheet can be defined.
// Style sheet, the final product
struct StyleSheet {
let rules: [Rule]
}
Copy the code
The overall data structure is shown in the figure below:
Selector priorities are distinguished by a triple.
// It is used to sort the selectors. The priority is id, class, tag
typealias Specifity = (Int, Int, Int)
Copy the code
The sorting is based on whether there is id, Number of classes, and Whether there is tag.
extension CSSSelector {
public func specificity() -> Specifity {
if case CSSSelector.Simple(let simple) = self {
/ / id
let a = simple.id == nil ? 0 : 1
/ / the class number
let b = simple.classes.count
/ / the tag
let c = simple.tagName == nil ? 0 : 1
return Specifity(a, b, c)
}
return Specifity(0.0.0)}}Copy the code
Selector resolution
Since we support grouping selectors, it is a set of selectors to, delimited. Such as:
div.test.test2, #main {
}
Copy the code
We only need to focus on the parsing of a single selector here, because grouping selector parsing is just a way of looping over a single selector.
Single selector resolution
There are some obvious rules for differentiating different selectors:
*
The wildcard- In order to
.
It starts with class - In order to
#
It starts with an ID
In addition, if it is not within the rules, we will do the following:
- In other cases, characters are considered elements if they meet certain rules
- The rest are considered invalid
Now, let’s analyze them one by one.
-
For wildcard *, no data padding is required; id, tag, and classes in the selector are empty. Because then you can match any element.
-
For characters beginning with., belong to class. Parse the class name.
The class name must be a combination of arrays, letters, underscores, and dashes, for example, test-2_A. We call that a valid string. Note: This rule will be used in many places below.
// A valid identifier, including digits, letters, and _-
func valideIdentifierChar(c: Character) -> Bool {
if c.isNumber || c.isLetter || c == "-" || c == "_" {
return true
}
return false
}
// Parse the identifier
mutating func parseIdentifier() -> String {
// alphanumeric -_
return self.sourceHelper.consumeWhile(test: validIdentifierChar)
}
Copy the code
-
For characters starting with #, belong to the ID selector. The ID name is resolved using the same valid string determination rule.
-
Otherwise, if the string is a valid string, it is considered an element.
-
The rest, which are invalid characters, exit the parsing process.
The entire parsing process is as follows:
// Parse the selector
// tag#id.class1.class2
mutating func parseSimpleSelector() -> SimpleSelector {
var selector = SimpleSelector(tagName: nil, id: nil, classes: [])
outerLoop: while! self.sourceHelper.eof() {switch self.sourceHelper.nextCharacter() {
// id
case "#":
_ = self.sourceHelper.consumeCharacter()
selector.id = self.parseIdentifier()
break
// class
case ".":
_ = self.sourceHelper.consumeCharacter()
let cls = parseIdentifier()
selector.classes.append(cls)
break
// Wildcard, no data is needed in selector, can be matched arbitrarily
case "*":
_ = self.sourceHelper.consumeCharacter()
break
// tag
case let c where valideIdentifierChar(c: c):
selector.tagName = parseIdentifier()
break
case_ :break outerLoop
}
}
return selector
}
Copy the code
Group selector resolution
Group selector parsing, loop calls above procedure, note exit condition. When {is encountered, indicating the beginning of the property list, you can exit.
In addition, when the list of selectors is obtained, it is necessary to sort the selectors from high to low in order to prepare for the generation of the style tree in the next stage.
// Sort the selectors from highest priority to lowest
selectors.sort { (s1, s2) -> Bool in
s1.specificity() > s2.specificity()
}
Copy the code
Attribute resolution
The rule definition of an attribute is fairly straightforward. It separates the attribute name and attribute value with:, with; At the end.
Attribute name: attribute value; margin-top: 10px;Copy the code
As usual, let’s look at the parsing of individual attributes.
- Parse out the attribute name, again following the rules for valid characters above.
- Make sure there is
:
The separator. - Parse the property values.
- Make sure to
;
The end.
Attribute value resolution
It’s a little more complicated because the attribute value contains three cases.
1. Color value analysis
The color value starts with #, which is easy to distinguish. Next comes the rGBA value, which is an 8-bit hexadecimal character.
However, we don’t usually write all alpha. So you need to be compatible with only 6 bits, where alpha defaults to 1.
The idea is very intuitive, just take out two characters one by one, convert to decimal number.
- Take two characters and convert them to decimal.
mutating func parseHexPair() -> UInt8 {
// Retrieves 2-bit characters
let s = self.sourceHelper.consumeNCharacter(count: 2)
// Convert to an integer
let value = UInt8(s, radix: 16)??0
return value
}
Copy the code
- Extract RGB one by one. If alpha is present, parse.
// Parse color values in hexadecimal format only, starting with #, #897722
mutating func parseColor() -> Value {
assert(self.sourceHelper.consumeCharacter() == "#")
let r = parseHexPair()
let g = parseHexPair()
let b = parseHexPair()
var a: UInt8 = 255
// If there is alpha
ifself.sourceHelper.nextCharacter() ! =";" {
a = parseHexPair()
}
return Value.Color(r, g, b, a)
}
Copy the code
2. Length numerical analysis
width: 10px;
Copy the code
At this point, the attribute value = floating point value + unit.
- First, the floating-point values are resolved. The simple treatment here, the combination of “numbers” and “dots”, does not strictly judge validity.
// Parse floating point numbers
mutating func parseFloat() -> Float {
let s = self.sourceHelper.consumeWhile { (c) -> Bool in
c.isNumber || c == "."
}
let floatValue = (s as NSString).floatValue
return floatValue
}
Copy the code
- Then, you resolve the units. The unit can only be PX.
// Parse the unit
mutating func parseUnit() -> Unit {
let unit = parseIdentifier()
if unit == "px" {
return Unit.Px
}
assert(false."Unexpected unit")}Copy the code
3. Keywords, that is, ordinary strings
Keywords are extracted according to the rules of valid characters.
Attribute list parsing
When a single property is resolved, the property list is simple. Same routine, same cycle.
- Ensure that the character is
{
At the beginning. - When faced with
}
, indicating that the attribute declaration is complete.
The process is as follows:
// Parse the declared property list
/** { margin-top: 10px; margin-bottom: 10px } */
mutating func parseDeclarations() -> [Declaration] {
var declarations: [Declaration] = []
// Start with {
assert(self.sourceHelper.consumeCharacter() == "{")
while true {
self.sourceHelper.consumeWhitespace()
// If} is encountered, the rule declaration is complete
if self.sourceHelper.nextCharacter() == "}" {
_ = self.sourceHelper.consumeCharacter()
break
}
// Parse a single attribute
let declaration = parseDeclaration()
declarations.append(declaration)
}
return declarations
}
Copy the code
Parsing rules
Since a single rule consists of a list of selectors + a list of properties, the resolution of selectors and properties has been done above. So to get a rule, you just have to combine the two.
mutating func parseRule() -> Rule {
// Parse the selector
let selectors = parseSelectors()
// Parse attributes
let declaration = parseDeclarations()
return Rule(selectors: selectors, declarations: declaration)
}
Copy the code
Parsing the entire rule list is a loop that calls the parsing of a single rule.
// Parse CSS rules
mutating func parseRules() -> [Rule] {
var rules:[Rule] = []
// Loop parsing rules
while true {
self.sourceHelper.consumeWhitespace()
if self.sourceHelper.eof() {
break
}
// Parse a single rule
let rule = parseRule()
rules.append(rule)
}
return rules
}
Copy the code
Generate style sheets
The stylesheet is made up of a list of rules, and you can simply wrap the list of rules parsed in the previous step into the stylesheet.
// The parsed method provided externally returns the stylesheet
mutating public func parse(source: String) -> StyleSheet {
self.sourceHelper.updateInput(input: source)
let rules: [Rule] = parseRules()
return StyleSheet(rules: rules)
}
Copy the code
The test code
let css = """ .test { padding: 0px; margin: 10px; position: absolute; } p { font-size: 10px; color: #ff908912; }"""
/ / CSS parsing
var cssParser = CSSParser()
let styleSheet = cssParser.parse(source: css)
print(styleSheet)
Copy the code
You can test the above code to see the output.
The full code can be viewed at: github.com/silan-liu/t… .
conclusion
In this tutorial, we focused on how to parse individual selectors, individual properties, and individual rules, and how to combine them to parse the whole and eventually generate a style sheet.
The analysis of these parts has a common way of thinking. From the whole to the part, and from the part back to the whole.
By breaking down the overall parsing task into individual goals, the problem becomes smaller. Focus on completing the parsing of a single goal, and recycle the invocation of a single resolution to achieve the overall goal.
The next article will cover style tree generation. Stay tuned ~
The resources
- CSS rule: developer.mozilla.org/zh-CN/docs/…
- Github:github.com/silan-liu/t…