You only get one shot, do not miss your chance to blow.
You only have one bullet. Don’t miss your chance to blow it up.
The log
Counts the number of occurrences of a specified character in a string
Sensitive word filtering system based on AC automata
The introduction
Learning is not limited to realization, but more important is to learn to think for yourself and draw inferences by analogy. It’s about thinking, how to make it your own.
Trie tree is also called “dictionary tree”. Keyword prompt function is very common in daily life, usually just need to output prefix, it will give the corresponding prompt. How do you do that? This article mainly shares a simple trie tree-based search tips and common application scenarios of trie trees. All source code has been uploaded to Github: link
Ps: The essence of a Trie tree is to combine duplicate prefixes using common prefixes between strings.
Simulate search keyword prompt function
This implementation could also be modified to store user habits (input) as a trie tree
Take how, hi, her, hello, so, see
Declare a tire class
Here I stole a little lazy and made an inner class.
Public class TrieNode {/** * char data; /** * children */ TrieNode[] children; /** * Boolean isEndingChar; TrieNode(char data) { children = new TrieNode[26]; isEndingChar =false; this.data = data; }}Copy the code
Initialize the
Usually the root node does not store any information and acts as a placeholder
/** * private TrieNode root; /** * private int count; /** * private List<String> List; /** * input value */ private String pattern; /** * stores a meaningless character */ privateTrieTree() {
root = new TrieNode('/');
count = 0;
list = new ArrayList<>();
}Copy the code
insert
This is ASCII code, which is relatively easy to save.
private void insert(char[] txt) {
TrieNode p = root;
for(char c: TXT) {// The current character of the ASCII code -'a'ASCII code int index = c -'a';
if (null == p.children[index]) {
TrieNode node = new TrieNode(c);
p.children[index] = node;
}
p = p.children[index];
}
++count;
p.isEndingChar = true;
}Copy the code
The query
private boolean contains(String pattern) {
char[] patChars = pattern.toCharArray();
TrieNode p = root;
for (char patChar : patChars) {
int index = patChar - 'a';
if (null == p.children[index])
return false;
p = p.children[index];
}
return p.isEndingChar;
}Copy the code
Fuzzy cue matching
private void match() {
char[] patChars = pattern.toCharArray();
TrieNode p = root;
for (char patChar : patChars) {
int index = patChar - 'a';
if (null == p.children[index])
return; p = p.children[index]; } // Start traversal of p and pass all matched characters into STRS traversal(p,"");
}Copy the code
Recursively traverses the node
private void traversal(TrieNode trieNode, String str) {
if(null ! = trieNode) { str += trieNode.data;if (trieNode.isEndingChar) {
String curStr = pattern.length() == 1 ?
str : pattern + str.substring(pattern.length() - 1);
if(! list.contains(curStr)) list.add(curStr);return;
}
for(int i = 0; i < trieNode.children.length; i++) { traversal(trieNode.children[i], str); }}}Copy the code
The test code
Construct a tire tree artificially
Ps: The storage here will cause the tree to be very tall, such as L, L, o, and you can actually synthesize LLO, which is a shrink optimization. I’m not going to implement that for now.
private void initTries() {// how, hi, her, hello, so, see // / // h s // e I o o e // l w e // l // o char[] how ="how".toCharArray();
insert(how);
char[] hi = "hi".toCharArray();
insert(hi);
char[] her = "her".toCharArray();
insert(her);
char[] hello = "hello".toCharArray();
insert(hello);
char[] so = "so".toCharArray();
insert(so);
char[] see = "see".toCharArray();
insert(see);
}Copy the code
The test code
public static void main(String[] args) {
TrieTree trieTree = new TrieTree();
trieTree.initTries();
String str = "hello";
boolean res = trieTree.contains(str);
System.out.println("Does the trie tree contain?" + str + "Return result :" + res);
trieTree.pattern = "h";
trieTree.match();
System.out.println("Single character fuzzy matching" + trieTree.pattern + ":");
trieTree.printAll();
trieTree.list.clear();
trieTree.pattern = "he";
trieTree.match();
System.out.println("Multi-character fuzzy matching" + trieTree.pattern + ":");
trieTree.printAll();
}Copy the code
The test results
Counts the number of occurrences of a specified character in a string
Again, the premise is 26 letters. Dictionary tree is favored by search engine because of its quick search. As long as you have space (which is memory intensive), you can do whatever you want (fast).
thinking
The main idea here is to share an idea of how to take existing code and adapt it to meet the requirements. Sometimes you don’t need to reinvent the wheel, but you need to be able to use the wheel when it matters.
Transform TireNode class
A frequency attribute is added to count high-frequency words. And changing children from an array to a map makes it easier to store and, in other words, saves space.
Private class TrieNode {/** * char */ public char data; /** ** int frequency; boolean isEndingChar; Map<Character, TrieNode> children; TrieNode(char data) { this.data = data; children = new HashMap<>(); isEndingChar =false; }}Copy the code
Initialize the
/** * private TrieNode root; /** * count */ private int count; /** * no argument constructor */ privateTrieTreeAlgo() {
root = new TrieNode('/');
count = 0;
}Copy the code
Modified insertion method
- Frequency is used to count the frequency of a character
- IsEndingChar is used, as before, to determine if it is the end of the word
private void insert(String txt) {
TrieNode p = root;
char[] txtChar = txt.toCharArray();
for (Character c : txtChar) {
if(! p.children.containsKey(c)) { TrieNode trieNode = new TrieNode(c); p.children.put(c, trieNode); } p = p.children.get(c); ++p.frequency; } ++count; p.isEndingChar =true;
}Copy the code
Statistical methods
Add a statistical method to count the frequency of a word, if isEndingChar==true, the word has been matched, and the end is reached, then the frequency of the character minus the number of children
private int frequency(String pattern) {
char[] patChars = pattern.toCharArray();
TrieNode p = root;
for (char patChar : patChars) {
if(p.children.containsKey(patChar)) { p = p.children.get(patChar); }}if (p.isEndingChar) return p.frequency - p.children.size();
return- 1; }Copy the code
The test code
Initialize the word to be inserted into the dictionary tree (this can be extended to insert an article, insert user-typed words, etc.).
private void initTries() {
String txt = "he her hello home so see say just so so hello world";
String[] strs = txt.split("");
for(String str : strs) { insert(str); }}Copy the code
The test code
- So is a very common word
- He is a common word, and there are other words that contain it, such as her, hello
- Hel is a word that doesn’t exist
public static void main(String[] args) {
TrieTreeAlgo trieTreeAlgo = new TrieTreeAlgo();
trieTreeAlgo.initTries();
System.out.println("Total" + trieTreeAlgo.count + "One word.");
String so = "so";
int soCount = trieTreeAlgo.frequency(so);
System.out.println(so + "The number of occurrences is:" + (soCount > 0 ? soCount : 0));
String he = "he";
int heCount = trieTreeAlgo.frequency(he);
System.out.println(he + "The number of occurrences is:" + (heCount > 0 ? heCount : 0));
String hel = "hel";
int helCount = trieTreeAlgo.frequency(hel);
System.out.println(hel + "The number of occurrences is:" + (helCount > 0 ? helCount : 0));
}Copy the code
The test results
Sensitive word filtering system based on AC automata
With keyword matching, then the corresponding, also should have naturally sensitive word filtration, as the Internet is becoming more and more developed, the user’s quality is uneven, ready to call names, if the display on a web site, sure it is not good, so this phenomenon, the sensitive word filtration system based on AC automata was born.
Ps: I’ll let you in on a secret: it’s a castrated version of a sensitive word filtering system
thinking
AC automata is essentially a KMP-like next array on top of the Trie tree (except that the next array is built on the Trie tree). It still needs to be reformed. A fail pointer is added on the basis of the TRIe tree. When there is no match, it can slide on the tree as much as possible.
Ps this is a suffix string matching algorithm
transform
On the original basis, a fail pointer is added, and the jump of AC automata is realized by the FAIL pointer.
Private class AcNode {/** * public char data; /** * children */ Map<Character, AcNode> children; /** * Boolean isEndingChar; /** * fail pointer */ AcNode fail; AcNode(char data) { this.data = data; children = new HashMap<>(); isEndingChar =false; }}Copy the code
Initialize the
/** * root */ private AcNode root; privateAhoCorasick() {
root = new AcNode('/');
}Copy the code
insert
private void insert(String txt) {
AcNode p = root;
char[] txtChar = txt.toCharArray();
for (Character c : txtChar) {
if(! p.children.containsKey(c)) { AcNode trieNode = new AcNode(c); p.children.put(c, trieNode); } p = p.children.get(c); } p.isEndingChar =true;
}Copy the code
Build failure pointer
This method is key.
private void buildFailurePointer() {
Queue<AcNode> queue = new LinkedList<>();
root.fail = null;
queue.offer(root);
while(! queue.isEmpty()) { AcNode p = queue.poll();for (char c : p.children.keySet()) {
AcNode pChild = p.children.get(c);
if (null == pChild) continue;
if (root == p) {
pChild.fail = root;
} else {
AcNode q = p.fail;
while(null ! = q) { AcNode qChild = q.children.get(p.data);if(null ! = qChild) { pChild.fail = qChild;break;
}
q = q.fail;
}
if(null == q) { pChild.fail = root; } } queue.offer(pChild); }}}Copy the code
matching
private boolean match(String txt) {
char[] txtChars = txt.toCharArray();
AcNode p = root;
for (char c : txtChars) {
while(p ! = root && null == p.children.get(c)) { p = p.fail; } p = p.children.get(c); // If there is no match, start again from rootif (null == p) p = root;
AcNode temp = p;
while(temp ! = root) {if (temp.isEndingChar) {
return true; } temp = temp.fail; }}return false;
}Copy the code
Construct Trie tree of sensitive words
private void generate() {
String[] strs = new String[]{"so"."hel"."oh"."llo"};
for(int i = 0; i < strs.length; i++) { insert(strs[i]); }}Copy the code
The test code
A Map is added here for caching. If a match has been made, it can be replaced directly to improve efficiency. The value of mapCache is the number of occurrences of the key.
public static void main(String[] args) {
AhoCorasick ac = new AhoCorasick();
ac.generate();
ac.buildFailurePointer();
String txt = "he her hello home so see say just so so hello world";
System.out.println("The main string");
System.out.println("[" + txt + "]");
System.out.println("Sensitive words:);
System.out.println("so,hel,oh,llo");
String[] strs = txt.split("");
Map<String, Integer> mapCache = new HashMap<>();
for (int i = 0; i < strs.length; i++) {
if (mapCache.containsKey(strs[i])) {
int index = mapCache.get(strs[i]);
mapCache.put(strs[i], ++index);
strs[i] = "* * * *";
} else{ boolean res = ac.match(strs[i]); // If a match is found, replace it with ****if (res) {
mapCache.put(strs[i], 1);
strs[i] = "* * * *";
}
}
}
System.out.println("Filtered through the sensitive word system...");
System.out.println(Arrays.toString(strs));
for (String str:mapCache.keySet()){
System.out.println(str + "The number of occurrences is+ mapCache.get(str)); }}Copy the code
The test results
end
Your likes and attention are my biggest support, thank you!