Do Arabs use Arabic numerals? Remember a time when String#format is used to format a string

To generate a String with dynamically changing integers, we usually use the string. format method, but if used incorrectly, you may not get the correct integer String.

It started with an online crash. From the crash stack, one of my SQL statements had a syntax error and crashed when it was executed. SQL statements are roughly generated as follows:

int i = 0;
String querySql = String.format("select * from table1 where id = %d", i);Copy the code

There are no syntax problems at all, and the local execution passes without a hitch. If you look at the log, it turns out that the SQL statement after format, %d should be replaced with the string corresponding to the value of I, but it turns out to be garbled, which is also the cause of syntax error. If you look at string formatting elsewhere, only the %d conversion is broken, and the string conversion is normal, as is the %s conversion. So, string. format does something unreliable when it comes to converting numbers.

If this method is not reliable in JDK, then it must have been used many times before, and most likely submitted an issue, so I went to StackOverflow and found no result. I would have to hazard a guess that some of the user devices on the line have character sets that are not compatible with ASCII and are converting numbers to other characters. This idea was quickly dismissed by some of the group’s colleagues, as there is probably no character set standard in the world that is too silly to be compatible with ASCII.

Okay, stop guessing, “Read the fuck source code.”

The source code for String#format is as simple as you can imagine. It splits the pattern string into an array, with each array element either a pure string or a format string beginning with the ‘%’ symbol, then iterates through the array, replacing the format string with the target value one by one, and concatenates the array back into the string. Since only the integer conversion failed, so I focus on the integer conversion process, one of the code is a little weird:

    char c = value[j];
    sb.append((char) ((c - '0') + zero));Copy the code

Value is an ASCII array of integers, such as [50,49] for the integer 21. It would have been fine to insert this array into the StringBuilder instance, but there is a (char) ((c – ‘0’) + zero) conversion that takes the target character C minus the character ‘0’ plus the character zero, This step seems to be the one that causes the conversion to mess up. Look at zero.

char zero = getZero(l); // Since we called the format method without specifying locale, l= locale.getdefault ();Copy the code

Now look at the getZero method

private char getZero(Locale l) {
    if((l ! = null) && ! l.equals(locale())) { DecimalFormatSymbols dfs = DecimalFormatSymbols.getInstance(l);return dfs.getZeroDigit();
    }
    return zero;
}Copy the code

Because the locale() method returns the locale we passed in, we don’t use if, we just return the value of the class property zero, and the initialization of the class property zero is done inside the constructor by calling a static method

private static char getZero(Locale l) {
    if((l ! =null) && !l.equals(Locale.US)) {
        DecimalFormatSymbols dfs = DecimalFormatSymbols.getInstance(l);
        return dfs.getZeroDigit();
    } else {
        return '0'; }}Copy the code

If locale were not US, we would have been unable to escape DecimalFormatSymbols in the last if block. Instantiation of this class is very simple, initializing fixed values based on the locale passed in, such as the decimal symbol, grouping symbol, percentile symbol, and, most importantly, zeroDigit

    /** * Gets the character used for zero. Different for Arabic, etc. */
    public char getZeroDigit(a) {
        return zeroDigit;
    }

    /** * Sets the character used for zero. Different for Arabic, etc. */
    public void setZeroDigit(char zeroDigit) {
        this.zeroDigit = zeroDigit;
        cachedIcuDFS = null;
    }Copy the code

The body of the two methods doesn’t matter, the important clue is in the notes: the ‘0’ in the Arab countries is different. Format (“%d”,0).tochararray () :String. Format (“%d”,0).tochararray () ValueOf ((char)1632) is converted directly to a character, resulting in a very thick ‘·’ character, which should be a 0 in an Arabic number. Google it, and sure enough:

So, (char) ((c – ‘0’) + zero) is a very simple conversion. String.format does not convert numbers into ‘0’ and ‘1’, but rather ‘0’ and ‘1’. Fortunately, we are used to using 123 in China, so there is no one, two and three in Chinese format.

It’s that simple. The solution is either to pass Locale.US when you call format, or instead of using %d for integers, use %s for strings.

PS: Bengali has the same problem, Bengali 0 corresponds to Unicode 2534.

Do Arabs use Arabic numerals? Remember a time when String#format is used to format a string

Related Posts

Modifier source, Kotlin high order function with true 6

Jetpack-Databinding

Binder as I understand it