• Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”.
  • This article also participated in the “Digitalstar Project” to win a creative gift package and creative incentive money.

Mandatory string operations

We classify string operations into the following six categories:

  1. Basic operation method
  2. String null, compare
  3. String interception and splitting
  4. String lookup and replacement
  5. Conversion of strings to other types of data
  6. String concatenation and formatting

Today we’ll cover section 3.2.

Native method for string splitting

Native operation for string splitting

// String split native method:
public String[] String.split(String regex)
Copy the code

Problem: the string contains “|” separator part split, how to write?

STR directly. The split (” | “) is that ok? Sure, but is the result what you want?

String str = "a.|b."; * * * the original String. The split method result * * * * * [a,., | b,.]Copy the code

Answer: if just want to follow the “|” segmentation, the correct writing is ss. The split (” \ \ | “), because here touch annoying regular expressions.

String str = "a.|b."/ / use the "|" broken up
System.out.println("*** native string.split method *****");
if (StringUtils.isNotEmpty(str)) {
    String[] s1 = str.split("\ \ |");
    System.out.println(Arrays.toString(s1));
}
Copy the code

Split source analysis

// Select the key analysis in the source code

/** Splits this string around a match for the given regular expression. This method works just like calling the two-argument split method with a given expression and zero bound arguments. Therefore, the resulting array contains no trailing empty strings. Parameters: Regex -- Delimited regular expression Returns: Array of strings computed by splitting this string around the match of the given regular expression Throws: PatternSyntaxException -- If the regular expression syntax is invalid */
public String[] split(String regex) {
    return split(regex, 0);
}

/** Splits this string around a match for the given regular expression. Call this method as STR. Split (regex, n) yields the same result as the expression: pattern.pile (regex).split(STR, n) */
public String[] split(String regex, int limit) {
        / * the following conditions using fast mode: (1) is a single character but not in regular metacharacters, such as ". $| () [{^? * + \ "(2) the two characters, the first character is \, the second character cannot be Numbers or letters [0-9 a zA - Z] * /
        char ch = 0;
        if (((regex.value.length == 1 && ".$|()[{^?*+\".indexOf(ch = regex.charAt(0)) == -1) || (regex.length() == 2 && regex.charAt(0) == '\' && (((ch = regex.charAt(1))-'0')|('9'-ch)) < 0 && ((ch-'a')|('z'-ch)) < 0 && ((ch-'A')|('Z'-ch)) < 0)) && (ch < Character. MIN_HIGH_SURROGATE | | ch > Character. MAX_LOW_SURROGATE)) {/ / omitted using the substring split process String [] result = new String[resultSize]; return list.subList(0, resultSize).toArray(result); } // Other scenarios use regular expressions to split return pattern.compile (regex).split(this, limit);}Copy the code

The source code is not complex, the core content is:

(1) is not a regular expression metacharacters “. $| () [{^? * + \ “or ` ` beginning but later not [0-9 a zA – Z] the use of quick resolution mode

(2) In other cases, regular expression matching pattern is used to split

There is another topic covered here, regular expressions. If you are interested in regular expressions, refer to other posts.

Of course, I don’t think you’re interested, and many programmers hate regular expression rules, so they need to be read when needed.

For (1), if you are proficient in regular expressions and can quickly figure out what regular metacharacters are, you can try using the native method, which is easy to use and really fast to split.

For (2), regular expressions need to be compiled and matched, which is time-consuming. Therefore, it is not recommended to use regular expressions. As for why regular expressions are slow, which is faster to find a girl if you’ve only met her once, than if you know her name and address?

To sum up, the native approach has the following problems:

  • Void it before use
  • Beware of special characters in regular expressions
  • Poor performance

Why not recommend StringTokenizer

There is also a class in the JDK for splitting strings: StringTokenizer, which is not based on regular expressions and should have high performance. But I don’t recommend using this method for string splitting. Let’s look at the source code:

/** StringTokenizer is a legacy class that is retained for compatibility reasons although its use is discouraged in new code. It is recommended that anyone seeking this functionality use the split method of String or the java.util.regex Since: JDK1.0 */
public class StringTokenizer implements Enumeration<Object{
    / /...
}
Copy the code

Alice in practice = not recommended; Knowing that it is a legacy class for compatibility is enough that it is not officially recommended.

Another reason not to recommend this is that a simple split that requires a while loop should not be used at all.

String splitting is recommended

Make the best useStringUtils.splitXX()

The Apache utility class Stringutils.splitxx () is recommended.

Advantages: Don’t worry about null strings, method names are straightforward.

Recommended common methods:

// Splits the string with the specified separator
public static String[] split(String str, String separator)
 // Split the string according to the full separator
public static String[] splitByWholeSeparator(String str, String separator)
 // Split the string, preserving all separator positions -- rarely used
public static String[] splitPreserveAllTokens(String str, String separatorChars)

// Get the first part of the separator
 public static String[] substringBefore(String str, String separatorChars)
 // Get the last part of the separator
 public static String[] substringAfterLast(String str, String separatorChars)
Copy the code

For the above problem, in accordance with the “|”, can be done one line of code.

String str = "a.|b.";
String[] s2 = StringUtils.split(str, "|");
[a., b.] */
Copy the code

Developing more common requirement is to “. | * : “split, etc. If you’re not familiar with regular metacharacters, it’s easy to write the wrong split.

Therefore, it is advisable to use the StringUtils utility class directly to split, reducing the cost of thinking when writing code.

StringUtils. The split pit

img

Or the above example, if the string “a. | biggest” carried out in accordance with the “|”, use the StringUtils. The split is that ok?

/ / the problem: please use ". | "broken up
 String str = "a.|b.c";
String[] s2 = StringUtils.split(str, ". |");
/* Result: [a, b, c] is not expected [A, b.c] */
Copy the code

The answer is no, the split result is [a, b, C], not the expected [a, b.c], because the stringutils.split method splits a multi-character separator into a single separator and returns the non-separator portion.

If want to more characters segmentation, please use the StringUtils. SplitByWholeSeparator () method.

This small detail, I hope you can avoid pit.

The demo cases

import org.apache.commons.lang3.StringUtils;

import java.util.Arrays;
import java.util.StringTokenizer;

/** * JDK version: jdk1.8.0_66 **@author Pandas
 * @date2021/10/30 * /
public class StringSplitDemo {

    /**
     * 字符串拆分方法用哪个方法好?
     */
    public static void main(String[] args) {
        / / the problem: please use the "|" broken up
        String str = "a.|b.c";
        System.out.println("*** native string.split method *****");
        if (StringUtils.isNotEmpty(str)) {
            String[] s1 = str.split("\ |");
            System.out.println(Arrays.toString(s1));
        }
        String[] s11 = StringUtils.split(str, "|");
        System.out.println(Arrays.toString(s11));

        System.out.println("*** native StringTokenizer method *****");
        StringTokenizer tokenizer = new StringTokenizer(str, "|");
        while (tokenizer.hasMoreTokens()) {
            System.out.println(tokenizer.nextToken());
        }

        // "a.|b.c"
        str = "a.|b.c";
        System.out.println(*** StringUtils method *****);
        String[] s2 = StringUtils.split(str, ". |");
        System.out.println(Arrays.toString(s2));

        String[] s3 = StringUtils.splitByWholeSeparator(str, ". |");
        System.out.println(Arrays.toString(s3));

        str = "a.|b...";
        String[] s4 = StringUtils.splitPreserveAllTokens(str, "."); System.out.println(Arrays.toString(s4)); }}/ * * run results = = = > * * * the original String. The split method * * * * * [a., biggest] [a., Biggest] * * * * * * * * primary StringTokenizer method a. biggest * * * * * * * * StringUtils method [a, b, c] [a, biggest] [a, | b,,,] * /
Copy the code

conclusion

  • Direct use ofStringUtilsLeast trouble (note the hole avoidance written above).
  • For non-special character splits, such as alphanumeric splits, you can use the native split method.
  • Don’t use StringTokenizer.

Thanks for reading this, I hope it was helpful to you as a newcomer.

If you want more, there is a companion to string splitting, so check it out.

Previous content:

  • I decided to write a Java practical technology, practical features! Practical! Or practical!
  • Nullation of mandatory string operations
  • There are several cold facts about integers in Java, but there is always one you don’t know
  • 【Java Utility technology 】 string interception with what method?

I’m Pandas. I’m dedicated to sharing Java programming techniques for Pandas.

If you find this post useful, don’t forget to like it and follow it!