You only get one shot, do not miss your chance to blow. 


You only have one bullet. Don’t miss your chance to blow it up.

The log

Counts the number of occurrences of a specified character in a string

Sensitive word filtering system based on AC automata

The introduction

Learning is not limited to realization, but more important is to learn to think for yourself and draw inferences by analogy. It’s about thinking, how to make it your own.

Trie tree is also called “dictionary tree”. Keyword prompt function is very common in daily life, usually just need to output prefix, it will give the corresponding prompt. How do you do that? This article mainly shares a simple trie tree-based search tips and common application scenarios of trie trees. All source code has been uploaded to Github: link

Ps: The essence of a Trie tree is to combine duplicate prefixes using common prefixes between strings.

Simulate search keyword prompt function

This implementation could also be modified to store user habits (input) as a trie tree

Take how, hi, her, hello, so, see

Declare a tire class

Here I stole a little lazy and made an inner class.

Public class TrieNode {/** * char data; /** * children */ TrieNode[] children; /** * Boolean isEndingChar; TrieNode(char data) { children = new TrieNode[26]; isEndingChar =false; this.data = data; }}Copy the code

Initialize the

Usually the root node does not store any information and acts as a placeholder

/** * private TrieNode root; /** * private int count; /** * private List<String> List; /** * input value */ private String pattern; /** * stores a meaningless character */ privateTrieTree() {
        root = new TrieNode('/');
        count = 0;
        list = new ArrayList<>();
    }Copy the code

insert

This is ASCII code, which is relatively easy to save.

    private void insert(char[] txt) {
        TrieNode p = root;
        for(char c: TXT) {// The current character of the ASCII code -'a'ASCII code int index = c -'a';
            if (null == p.children[index]) {
                TrieNode node = new TrieNode(c);
                p.children[index] = node;
            }
            p = p.children[index];
        }
        ++count;
        p.isEndingChar = true;
    }Copy the code

The query

    private boolean contains(String pattern) {
        char[] patChars = pattern.toCharArray();
        TrieNode p = root;
        for (char patChar : patChars) {
            int index = patChar - 'a';
            if (null == p.children[index])
                return false;
            p = p.children[index];
        }
        return p.isEndingChar;
    }Copy the code

Fuzzy cue matching

    private void match() {
        char[] patChars = pattern.toCharArray();
        TrieNode p = root;
        for (char patChar : patChars) {
            int index = patChar - 'a';
            if (null == p.children[index])
                return; p = p.children[index]; } // Start traversal of p and pass all matched characters into STRS traversal(p,"");
    }Copy the code

Recursively traverses the node

    private void traversal(TrieNode trieNode, String str) {
        if(null ! = trieNode) { str += trieNode.data;if (trieNode.isEndingChar) {
                String curStr = pattern.length() == 1 ? 
		str : pattern + str.substring(pattern.length() - 1);
                if(! list.contains(curStr)) list.add(curStr);return;
            }
            for(int i = 0; i < trieNode.children.length; i++) { traversal(trieNode.children[i], str); }}}Copy the code

The test code

Construct a tire tree artificially

Ps: The storage here will cause the tree to be very tall, such as L, L, o, and you can actually synthesize LLO, which is a shrink optimization. I’m not going to implement that for now.

    private void initTries() {// how, hi, her, hello, so, see // / // h s // e I o o e // l w e // l // o char[] how ="how".toCharArray();
        insert(how);
        char[] hi = "hi".toCharArray();
        insert(hi);
        char[] her = "her".toCharArray();
        insert(her);
        char[] hello = "hello".toCharArray();
        insert(hello);
        char[] so = "so".toCharArray();
        insert(so);
        char[] see = "see".toCharArray();
        insert(see);
    }Copy the code

The test code

    public static void main(String[] args) {
        TrieTree trieTree = new TrieTree();
        trieTree.initTries();
        String str = "hello";
        boolean res = trieTree.contains(str);
        System.out.println("Does the trie tree contain?" + str + "Return result :" + res);

        trieTree.pattern = "h";
        trieTree.match();
        System.out.println("Single character fuzzy matching" + trieTree.pattern + ":");
        trieTree.printAll();

        trieTree.list.clear();
        trieTree.pattern = "he";
        trieTree.match();
        System.out.println("Multi-character fuzzy matching" + trieTree.pattern + ":");
        trieTree.printAll();
    }Copy the code

The test results



Counts the number of occurrences of a specified character in a string

Again, the premise is 26 letters. Dictionary tree is favored by search engine because of its quick search. As long as you have space (which is memory intensive), you can do whatever you want (fast).

thinking

The main idea here is to share an idea of how to take existing code and adapt it to meet the requirements. Sometimes you don’t need to reinvent the wheel, but you need to be able to use the wheel when it matters.

Transform TireNode class

A frequency attribute is added to count high-frequency words. And changing children from an array to a map makes it easier to store and, in other words, saves space.

Private class TrieNode {/** * char */ public char data; /** ** int frequency; boolean isEndingChar; Map<Character, TrieNode> children; TrieNode(char data) { this.data = data; children = new HashMap<>(); isEndingChar =false; }}Copy the code

Initialize the

/** * private TrieNode root; /** * count */ private int count; /** * no argument constructor */ privateTrieTreeAlgo() {
        root = new TrieNode('/');
        count = 0;
    }Copy the code

Modified insertion method

  • Frequency is used to count the frequency of a character
  • IsEndingChar is used, as before, to determine if it is the end of the word

    private void insert(String txt) {
        TrieNode p = root;
        char[] txtChar = txt.toCharArray();
        for (Character c : txtChar) {
            if(! p.children.containsKey(c)) { TrieNode trieNode = new TrieNode(c); p.children.put(c, trieNode); } p = p.children.get(c); ++p.frequency; } ++count; p.isEndingChar =true;
    }Copy the code

Statistical methods

Add a statistical method to count the frequency of a word, if isEndingChar==true, the word has been matched, and the end is reached, then the frequency of the character minus the number of children

    private int frequency(String pattern) {
        char[] patChars = pattern.toCharArray();
        TrieNode p = root;
        for (char patChar : patChars) {
            if(p.children.containsKey(patChar)) { p = p.children.get(patChar); }}if (p.isEndingChar) return p.frequency - p.children.size();
        return- 1; }Copy the code

The test code

Initialize the word to be inserted into the dictionary tree (this can be extended to insert an article, insert user-typed words, etc.).

    private void initTries() {
        String txt = "he her hello home so see say just so so hello world";
        String[] strs = txt.split("");
        for(String str : strs) { insert(str); }}Copy the code

The test code

  1. So is a very common word
  2. He is a common word, and there are other words that contain it, such as her, hello
  3. Hel is a word that doesn’t exist

    public static void main(String[] args) {
        TrieTreeAlgo trieTreeAlgo = new TrieTreeAlgo();
        trieTreeAlgo.initTries();
        System.out.println("Total" + trieTreeAlgo.count + "One word.");
        String so = "so";
        int soCount = trieTreeAlgo.frequency(so);
        System.out.println(so + "The number of occurrences is:" + (soCount > 0 ? soCount : 0));
        String he = "he";
        int heCount = trieTreeAlgo.frequency(he);
        System.out.println(he + "The number of occurrences is:" + (heCount > 0 ? heCount : 0));
        String hel = "hel";
        int helCount = trieTreeAlgo.frequency(hel);
        System.out.println(hel + "The number of occurrences is:" + (helCount > 0 ? helCount : 0));
    }Copy the code

The test results



Sensitive word filtering system based on AC automata

With keyword matching, then the corresponding, also should have naturally sensitive word filtration, as the Internet is becoming more and more developed, the user’s quality is uneven, ready to call names, if the display on a web site, sure it is not good, so this phenomenon, the sensitive word filtration system based on AC automata was born.

Ps: I’ll let you in on a secret: it’s a castrated version of a sensitive word filtering system

thinking

AC automata is essentially a KMP-like next array on top of the Trie tree (except that the next array is built on the Trie tree). It still needs to be reformed. A fail pointer is added on the basis of the TRIe tree. When there is no match, it can slide on the tree as much as possible.

Ps this is a suffix string matching algorithm

transform

On the original basis, a fail pointer is added, and the jump of AC automata is realized by the FAIL pointer.

Private class AcNode {/** * public char data; /** * children */ Map<Character, AcNode> children; /** * Boolean isEndingChar; /** * fail pointer */ AcNode fail; AcNode(char data) { this.data = data; children = new HashMap<>(); isEndingChar =false; }}Copy the code

Initialize the

/** * root */ private AcNode root; privateAhoCorasick() {
        root = new AcNode('/');
    }Copy the code

insert

    private void insert(String txt) {
        AcNode p = root;
        char[] txtChar = txt.toCharArray();
        for (Character c : txtChar) {
            if(! p.children.containsKey(c)) { AcNode trieNode = new AcNode(c); p.children.put(c, trieNode); } p = p.children.get(c); } p.isEndingChar =true;
    }Copy the code

Build failure pointer

This method is key.

    private void buildFailurePointer() {
        Queue<AcNode> queue = new LinkedList<>();
        root.fail = null;
        queue.offer(root);
        while(! queue.isEmpty()) { AcNode p = queue.poll();for (char c : p.children.keySet()) {
                AcNode pChild = p.children.get(c);
                if (null == pChild) continue;
                if (root == p) {
                    pChild.fail = root;
                } else {
                    AcNode q = p.fail;
                    while(null ! = q) { AcNode qChild = q.children.get(p.data);if(null ! = qChild) { pChild.fail = qChild;break;
                        }
                        q = q.fail;
                    }
                    if(null == q) { pChild.fail = root; } } queue.offer(pChild); }}}Copy the code

matching

    private boolean match(String txt) {
        char[] txtChars = txt.toCharArray();
        AcNode p = root;
        for (char c : txtChars) {
            while(p ! = root && null == p.children.get(c)) { p = p.fail; } p = p.children.get(c); // If there is no match, start again from rootif (null == p) p = root;
            AcNode temp = p;
            while(temp ! = root) {if (temp.isEndingChar) {
                    return true; } temp = temp.fail; }}return false;
    }Copy the code

Construct Trie tree of sensitive words

    private void generate() {
        String[] strs = new String[]{"so"."hel"."oh"."llo"};
        for(int i = 0; i < strs.length; i++) { insert(strs[i]); }}Copy the code

The test code

A Map is added here for caching. If a match has been made, it can be replaced directly to improve efficiency. The value of mapCache is the number of occurrences of the key.

    public static void main(String[] args) {
        AhoCorasick ac = new AhoCorasick();
        ac.generate();
        ac.buildFailurePointer();
        String txt = "he her hello home so see say just so so hello world";
        System.out.println("The main string");
        System.out.println("[" + txt + "]");
        System.out.println("Sensitive words:);
        System.out.println("so,hel,oh,llo");
        String[] strs = txt.split("");
        Map<String, Integer> mapCache = new HashMap<>();
        for (int i = 0; i < strs.length; i++) {
            if (mapCache.containsKey(strs[i])) {
                int index = mapCache.get(strs[i]);
                mapCache.put(strs[i], ++index);
                strs[i] = "* * * *";
            } else{ boolean res = ac.match(strs[i]); // If a match is found, replace it with ****if (res) {
                    mapCache.put(strs[i], 1);
                    strs[i] = "* * * *";
                }
            }
        }
        System.out.println("Filtered through the sensitive word system...");
        System.out.println(Arrays.toString(strs));
        for (String str:mapCache.keySet()){
            System.out.println(str + "The number of occurrences is+ mapCache.get(str)); }}Copy the code

The test results



end



Your likes and attention are my biggest support, thank you!