In the development of human history, people had a need to communicate, so language and writing came into being, and then paper and printing were invented, so that people’s thoughts could be recorded and spread.
In iOS development, word processing can be said to be the most basic and common part of the content, system controls such as UILabel, UITextField, UITextView, etc., help us do a lot of work so that we can easily display a section of text. But when we want to do more in-depth custom presentation, the system provides these controls can not meet the requirements, this time it is necessary to understand how the system handles text presentation.
The following contents are some records I made when exploring iOS system for word processing, mainly some conceptual content, so that you can have a general understanding of iOS word processing, and know where to find convenient information when it comes to specific needs. Including Unicode, UIFont, TextKit, CoreText, Unicode bidirectional algorithm and more.
Nsstrings and Unicode
1. History
A computer can’t process text directly; it only deals with numbers. To represent text numerically in a computer, a mapping from characters to numbers is specified. This mapping is called encoding. The initial mapping was ASCII, but the number of characters it could represent was limited, so Unicode was later used.
(ii)Unicode outline
Unicode can be seen as the unification of coding systems, but it is not universal, and some very old coding systems are incompatible.
(3)Unicode features
- Unicode represents a character in an abstract way, without specifying how the character is rendered.
- Sequence of combined characters, some characters can be composed of a single code point or by multiple code points, although the appearance and meaning of the same, not equal in Unicode context, but according to canonically equivalent
(4)Unicode format conversion
UTF (Unicode Transformation Formats)
(5) nsstrings
The most important thing to remember about NSString is that it represents text encoded in UTF-16, and the length, index, and range are all based on UTF-16 codes. Here are some pitfalls if you’re not careful:
- The length of the
We often use the NSString length method to get the length of a string. In most cases, this method works fine, but when a string contains emoji, the returned length value is inaccurate. Here’s an example:
NSString *s = @"\U0001F30D"; // earth globe emoji ๐
NSLog(@"The length of %@ is %lu", s, [s length]);
// => The length of ๐ is 2Copy the code
The following code gets the actual length
NSUInteger realLength =
[s lengthOfBytesUsingEncoding:NSUTF32StringEncoding] / 4;
NSLog(@"The real length of %@ is %lu", s, realLength);
// => The real length of ๐ is 1Copy the code
- Random access
Accessing Unichar directly by index using the characterAtIndex: method has the same problem. RangeOfComposedCharacterSequenceAtIndex: can be used to determine the specific location of the unichar does represent a single character (may be composed of multiple code points) part of the symbol sequence. This should be done whenever a range of strings whose contents are unknown is passed to another method as an argument to ensure that Unicode characters are not split down the middle.
- traverse
When using rangeOfComposedCharacterSequenceAtIndex:, you can write a code routines to correctly all the characters in a string, but each time to traverse a string has to do is too inconvenient. Fortunately, nsstrings there are better ways: enumerateSubstringsInRange: options: usingBlock: method. This method hides the Unicode abstraction and allows you to easily loop through combinations of strings, words, lines, sentences, or paragraphs in a string. You can even add NSStringEnumerationLocalized this option, so that we can in determining a boundary between words and sentences between the user area into consideration. To traverse a single character, the parameters are specified as NSStringEnumerationByComposedCharacterSequences:
NSString *s = @"The weather on \U0001F30D is \U0001F31E today."; NSRange fullRange = NSMakeRange(0, [s length]); // The weather on ๐ is ๐ today. NSRange fullRange = NSMakeRange(0, [s length]); [s enumerateSubstringsInRange:fullRange options:NSStringEnumerationByComposedCharacterSequences usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop) { NSLog(@"%@ %@", substring, NSStringFromRange(substringRange)); }];Copy the code
- To compare
Some characters can consist of a single code point or more than one code point, although they have the same appearance and meaning and are not equal in Unicode context. IsEqual: and isEqualToString: both methods are compared byte by byte. If you want the composition and decomposition of a string to match, you have to normalize it yourself:
NSString *s = @"\u00E9"; // e NSString *t = @"e\u0301"; // e + ยด BOOL isEqual = [s isEqualToString:t]; NSLog(@"%@ is %@ to %@", s, isEqual ? @"equal" : @"not equal", t); / / = > e is not equal to e / / Normalizing to form C nsstrings * sNorm = precomposedStringWithCanonicalMapping [s]; NSString *tNorm = [t precomposedStringWithCanonicalMapping]; BOOL isEqualNorm = [sNorm isEqualToString:tNorm]; NSLog(@"%@ is %@ to %@", sNorm, isEqualNorm ? @"equal" : @"not equal", tNorm); // => e is equal to e NSString *s = @"ff"; // ff NSString *t = @"\uFB00"; O ๏ฌ ligature NSComparisonResult result = [S localizedCompare:t]; // o ๏ฌ ligature NSComparisonResult result = [S localizedCompare:t] NSLog(@"%@ is %@ to %@", s, result == NSOrderedSame ? @"equal" : @"not equal", t); // => FF is equal to o ๏ฌCopy the code
Click here to learn more about the above.
TextKit
TextKit was introduced in iOS7 to help developers achieve more text customization.
Text Kit is a set of classes and protocols that provide high-quality typographical services which enable apps to store, lay out, and display text with all the characteristics of fine typesetting, such as kerning, ligatures, line breaking, and justification.
Text Kit is a set of classes and protocols that provide high-performance typographical services that allow applications to store, lay out, and display all characters in a good typographical form, such as word spacing, confluent, line breaking, and end alignment.
There are several key components in the Text Kit, as shown below.
LayoutManager arranges the contents stored in textStorage into textViews (UITextView) according to the area defined by textContainers.
In MVC, textStorage and textContainers are equivalent to M, textViews are equivalent to V, and leyoutManager is equivalent to C.
An NSLayoutManager object orchestrates the operation of the other text handling objects. It intercedes in operations that convert the data in an NSTextStorage object to rendered text in a viewโs display area. It maps Unicode character codes to glyphs and oversees the layout of the glyphs within the areas defined by NSTextContainer objects.
The NSLayoutManager converts Unicode characters into Glyphs (glyphs) and lays them out within the range defined by the NSTextContainer.
The layout manager performs the following actions:
- Controls text storage and text container objects
- Generates glyphs from characters
- Computes glyph locations and stores the information
- Manages ranges of glyphs and characters
- Draws glyphs in text views when requested by the view
- Computes bounding box rectangles for lines of text
- Controls hyphenation
- Manipulates character attributes and glyph properties
The Layout Manager does the following:
- Controls text storage and Text Container
- Unicode characters converted to Glyphs
- Calculate and save the position information of the glyphs
- Manages character range information
- Draw glyphs to the view
- Calculates the rectangle wrap information for each row
- Handle hyphenation
- Handles text properties, such as fonts, colors, and subscripts
Text Kit handles three kinds of text attributes:
- character attributes, paragraph attributes, and document attributes.
- Character attributes include traits such as font, color, and subscript, which can be associated with an individual character or a range of characters.
- Paragraph attributes are traits such as indentation, tabs, and line spacing. Document attributes include documentwide traits such as paper size, margins, and view zoom percentage.
- Character Attributes: font, color, subscript
- Paragraph Attributes: indentation, TAB, and line spacing
- Document Attributes: number of pages, page spacing, page scaling
Here are some common uses of Text Kit,
UIFont
We can change the rendering style of characters on the page by setting different fonts. UIFont has some metrics, as shown below.
All of these information can be obtained in UIFont. The following is their corresponding relationship
There’s a specific application for UIFont metrics, for example if we want a field to display up to 6 lines of text, if we’re using UILabel, we can specify the numberOfLines property, and without using UILabel, We can use the lineHeight property of UIFont as follows:
+ (float)calculateContentHeight:(NSString *)content{
UIFont *font = [UIFont systemFontOfSize:13];
CGFloat lineHeight = font.lineHeight;
int height = 0;
float max_width = SCREEN_WIDTH-30;
float max_height = ceil(lineHeight)*6;
CGSize content_size = [content sizeWithFont:font constrainedToSize:CGSizeMake(max_width, MAXFLOAT) lineBreakMode:NSLineBreakByWordWrapping];
height = ceil(content_size.height);
if (content.length == 0) {
return 0;
}
height = MIN(max_height, height);
return height;
}Copy the code
CoreText
CTFramesetter generates ctFrames, and each CTFrame represents a paragraph. A CTFrame can be just one long CTLine or contain multiple ctlines, with one CTLine representing one line of text.
Unicode bidirectional algorithm
Bidirectional literals are literals that contain both writing orientations, that is, left-to-right and right-to-left literals that exist at the same time, with global orientations defined by default based on the Unicode attribute of the first character.
Writing direction is related to words, not language. A language may have more than one script, written left to right in English and right to left in Arabic.
These Unicode control characters added to text are not visible on the display screen and do not take up any display space. They just silently influence the display of two-way text.
Unicode control characters can be divided into two categories,
The first type is the implicit bidirectional control character:
- U+200E: LEFT-TO-RIGHT MARK (LRM)
- U+200F: RIGHT-TO-LEFT MARK (RLM
In simple terms, you can think of such control characters as strong characters that do not display, LRM as left-to-right strong characters and RLM as right-to-left strong characters.
The second type, of course, is explicit bidirectional control characters:
- U+202A: LEFT-TO-RIGHT EMBEDDING (LRE)
- U+202B: RIGHT-TO-LEFT EMBEDDING (RLE)
- U+202D: LEFT-TO-RIGHT OVERRIDE (LRO)
- U+202E: RIGHT-TO-LEFT OVERRIDE (RLO)
- U+202C: POP DIRECTIONAL FORMATTING (PDF)
These control characters need to be used in pairs, with the first four characters in the list being the start character and the last character being the end character. When the bidirectional algorithm encounters the LRE, the direction within the next text fragment begins to change from left to right. When the bidirectional algorithm encounters RLE, the direction within the next text fragment starts to change from right to left. When encountering LRO, bidirectional algorithms treat bidirectional attributes of all subsequent text as left-to-right strong characters. When encountering RLO, bidirectional algorithms treat bidirectional attributes of all subsequent text as right-to-left strong characters. If a PDF character is encountered, the state of the bidirectional attribute reverts to the state before the last LRE, RLE, LRO, or RLO.
More resources
www.ibm.com/developerwo… www.iamcal.com/understandi…
conclusion
The above overall describes the knowledge points used in iOS system to process words. Each part is not explained in depth, but to give you an overall concept. At the back of each part, I have pasted the reference materials that I think are better when searching for information, so you can have a further understanding.
If you found this article helpful, please click “like” to support me.
Reprint please indicate the source, have any questions can contact me, welcome to discuss. Wechat id: Xieguobihaha