My blog mengkang.net/1039.html my live stream segmentfault.com/ls/1…

Run the following code:



$tag = "Internet products,";
$text = rtrim($tag, "、");
print_r($text);Copy the code

What we might have expected was an Internet product, but it turned out to be an Internet product. Why is that?

The popular science

PHP uses the mb_ prefix for multi-byte functions php.net/manual/zh/r…

Such as



$str = "abcd";
print_r(strlen($str)."\n"); / / 4
print_r(mb_strlen($str)."\n"); / / 4

$str = "Zhou Mengkang";
print_r(strlen($str)."\n"); / / 9
print_r(mb_strlen($str)."\n"); / / 3Copy the code

The mb_ family functions operate on the granularity of “one character composed of multiple bytes”, or on the actual number of bytes without mb_.

The principle of

Trim function documentation



string trim ( string $str [, string $character_mask = " \t\n\r\0\x0B"])Copy the code

$character_mask = char; $character_mask = char; $character_mask = char; Such as:



echo ltrim("bcdf"."abc"); // dfCopy the code

The string_print_char function in demo is described as follows:, consists of three bytes 0xe3 0x80 0x81, and pin consists of three bytes 0xe5 0x93 0x81. Therefore, when performing rtrim, 0x81 will be removed by byte alignment, resulting in garbled characters.

The source code to explore

View PHP7 source code, and then extract the following small demo, convenient for everyone to learn together, in fact, PHP source code learning is not difficult, a little bit of progress every day.



//
// main.c
// trim
//
// Created by Zhou Mengkang on 2017/10/18
// Copyright © 2017 Zhou Mengkang. All rights reserved.
//

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

void string_print_char(char *str);
void php_charmask(unsigned char *input, size_t len, char *mask);
char *ltrim(char *str,char *character_mask);
char *rtrim(char *str,char *character_mask);


int main(int argc, char const *argv[])
{
    printf("%s\n",ltrim("bcdf"."abc"));
    
    string_print_char("Goods"); // e5 93 81
    string_print_char("、"); // e3 80 81
    
    printf("%s\n",rtrim("Internet products,"."、"));
    
    
    return 0;
}

char *ltrim(char *str,char *character_mask)
{
    char *res;
    char mask[256];
    register size_t i;
    int trimmed = 0;
    
    size_t len = strlen(str);
    
    php_charmask((unsigned char*)character_mask, strlen(character_mask), mask);
    
    for (i = 0; i < len; i++) {
        if (mask[(unsigned char)str[i]]) {
            trimmed++;
        } else {
            break;
        }
    }
    
    len -= trimmed;
    str += trimmed;
    
    res = (char *) malloc(sizeof(char) * (len+1));
    memcpy(res,str,len);
    
    return res;
}

char *rtrim(char *str,char *character_mask)
{
    char *res;
    char mask[256];
    register size_t i;
    
    size_t len = strlen(str);
    
    php_charmask((unsigned char*)character_mask, strlen(character_mask), mask);
    
    if (len > 0) {
        i = len - 1;
        do {
            if (mask[(unsigned char)str[i]]) {
                len--;
            } else {
                break; }}while(i-- ! =0);
    }
    
    res = (char *) malloc(sizeof(char) * (len+1));
    memcpy(res,str,len);
    
    return res;
}

void string_print_char(char *str)
{
    unsigned long l = strlen(str);
    
    for (int i=0; i < l; i++) {
        printf("%02hhx\t",str[i]);
    }
    
    printf("\n");
}

void php_charmask(unsigned char *input, size_t len, char *mask)
{
    unsigned char *end;
    unsigned char c;
    
    memset(mask, 0.256);
    
    for (end = input+len; input < end; input++) {
        c = *input;
        mask[c]= 1; }}Copy the code

If you think the demo is not clear enough, copy it down and execute it yourself. If you are poor in C language foundation, you don’t need to worry. I am going to write a series of introductory essays about LEARNING C language in PHP.

The solution

So let’s do the same with PHP’s own multi-byte functions:



function mb_rtrim($string, $trim, $encoding)
{

    $mask = [];
    $trimLength = mb_strlen($trim, $encoding);
    for ($i = 0; $i < $trimLength; $i++) {
        $item = mb_substr($trim, $i, 1, $encoding);
        $mask[] = $item;
    }

    $len = mb_strlen($string, $encoding);
    if ($len > 0) {
        $i = $len - 1;
        do {
            $item = mb_substr($string, $i, 1, $encoding);
            if (in_array($item, $mask)) {
                $len--;
            } else {
                break; }}while($i-- ! =0);
    }

    return mb_substr($string, 0, $len, $encoding);
}

mb_internal_encoding("UTF-8");
$tag = "Internet products,";
$encoding = mb_internal_encoding();
print_r(mb_rtrim($tag, "、",$encoding));Copy the code

Of course, you can also use re to do this. Learn from the above functions, single-byte functions and multi-byte functions, have you learned?

PHP7 related source code



PHP_FUNCTION(trim)
{
    php_do_trim(INTERNAL_FUNCTION_PARAM_PASSTHRU, 3);
}
PHP_FUNCTION(rtrim)
{
    php_do_trim(INTERNAL_FUNCTION_PARAM_PASSTHRU, 2);
}
PHP_FUNCTION(ltrim)
{
    php_do_trim(INTERNAL_FUNCTION_PARAM_PASSTHRU, 1);
}Copy the code


static void php_do_trim(INTERNAL_FUNCTION_PARAMETERS, int mode)
{
    zend_string *str;
    zend_string *what = NULL;

    ZEND_PARSE_PARAMETERS_START(1.2)
        Z_PARAM_STR(str)
        Z_PARAM_OPTIONAL
        Z_PARAM_STR(what)
    ZEND_PARSE_PARAMETERS_END(a);

    ZVAL_STR(return_value, php_trim(str, (what ? ZSTR_VAL(what) : NULL), (what ? ZSTR_LEN(what) : 0), mode));
}Copy the code


PHPAPI zend_string *php_trim(zend_string *str, char *what, size_t what_len, int mode)
{
    const char *c = ZSTR_VAL(str);
    size_t len = ZSTR_LEN(str);
    register size_t i;
    size_t trimmed = 0;
    char mask[256];

    if (what) {
        if (what_len == 1) {
            char p = *what;
            if (mode & 1) {
                for (i = 0; i < len; i++) {
                    if (c[i] == p) {
                        trimmed++;
                    } else {
                        break;
                    }
                }
                len -= trimmed;
                c += trimmed;
            }
            if (mode & 2) {
                if (len > 0) {
                    i = len - 1;
                    do {
                        if (c[i] == p) {
                            len--;
                        } else {
                            break; }}while(i-- ! =0); }}}else {
            php_charmask((unsigned char*)what, what_len, mask);

            if (mode & 1) {
                for (i = 0; i < len; i++) {
                    if (mask[(unsigned char)c[i]]) {
                        trimmed++;
                    } else {
                        break;
                    }
                }
                len -= trimmed;
                c += trimmed;
            }
            if (mode & 2) {
                if (len > 0) {
                    i = len - 1;
                    do {
                        if (mask[(unsigned char)c[i]]) {
                            len--;
                        } else {
                            break; }}while(i-- ! =0); }}}}else {
        if (mode & 1) {
            for (i = 0; i < len; i++) {
                if ((unsigned char)c[i] <= ' ' &&
                    (c[i] == ' ' || c[i] == '\n' || c[i] == '\r' || c[i] == '\t' || c[i] == '\v' || c[i] == '\ 0')) {
                    trimmed++;
                } else {
                    break;
                }
            }
            len -= trimmed;
            c += trimmed;
        }
        if (mode & 2) {
            if (len > 0) {
                i = len - 1;
                do {
                    if ((unsigned char)c[i] <= ' ' &&
                        (c[i] == ' ' || c[i] == '\n' || c[i] == '\r' || c[i] == '\t' || c[i] == '\v' || c[i] == '\ 0')) {
                        len--;
                    } else {
                        break; }}while(i-- ! =0); }}}if (ZSTR_LEN(str) == len) {
        return zend_string_copy(str);
    } else {
        return zend_string_init(c, len, 0); }}Copy the code


/* {{{ php_charmask * Fills a 256-byte bytemask with input. You can specify a range like 'a.. z', * it needs to be incrementing. * Returns: FAILURE/SUCCESS whether the input was correct (i.e. no range errors) */
static inline int php_charmask(unsigned char *input, size_t len, char *mask)
{
    unsigned char *end;
    unsigned char c;
    int result = SUCCESS;

    memset(mask, 0.256);
    for (end = input+len; input < end; input++) {
        c=*input;
        if ((input+3 < end) && input[1] = = '&&input[2] = = '&&input[3] >= c) {
            memset(mask+c, 1.input[3] - c + 1);
            input+ =3;
        } else if ((input+1 < end) && input[0] = = '&&input[1] = = ') {/* Error, try to be as helpful as possible: (a range ending/starting with '.' won't be captured here) */
            if (end-len >= input) { /* there was no 'left' char */
                php_error_docref(NULL, E_WARNING, "Invalid '.. '-range, no character to the left of '.. '");
                result = FAILURE;
                continue;
            }
            if (input+2> =end) { /* there is no 'right' char */
                php_error_docref(NULL, E_WARNING, "Invalid '.. '-range, no character to the right of '.. '");
                result = FAILURE;
                continue;
            }
            if (input[-1] > input[2]) { /* wrong order */
                php_error_docref(NULL, E_WARNING, "Invalid '.. '-range, '.. '-range needs to be incrementing");
                result = FAILURE;
                continue;
            }
            / *FIXME:better error (a.. b.. c is the only left possibility?) * /
            php_error_docref(NULL, E_WARNING, "Invalid '.. '-range");
            result = FAILURE;
            continue;
        } else {
            mask[c]=1; }}return result;
}
/ * * /}}}Copy the code