[Algorithm of the Day] Solve the search space explosion problem with "bidirectional BFS" (with the heuristic search AStar algorithm) | Python topic month

This article is participating in Python Theme Month. See the link to the event for more details

Topic describes

This is the 127. Word solitaire on LeetCode, and the difficulty is difficult.

Tag: “bidirectional BFS”

The conversion sequence from beginWord and endWord in the dictionary wordList is a sequence formed according to the following specifications:

The first word in the sequence is beginWord.
The last word in the sequence is endWord.
Only one letter can be changed per conversion.
The intermediate word in the transformation must be a word in the dictionary wordList.

Given the two words beginWord and endWord and a dictionary wordList, find the number of words in the shortest conversion sequence from beginWord to endWord. If no such sequence of transformations exists, return 0.

Example 1:

Enter: beginWord = "hit", endWord = "cog", wordList = ["hot","dot","dog","lot","log","cog"] A minimum conversion sequence is "hit" - > "hot" - > "dot" - > "dog" - > "cog", return it the length of 5.Copy the code

Example 2:

Input: beginWord = "hit", endWord = "cog", wordList = ["hot","dot","dog","lot","log"]Copy the code

Tip:

1 <= beginWord.length <= 10
endWord.length == beginWord.length
1 <= wordList.length <= 5000
wordList[i].length == beginWord.length
BeginWord, endWord, and wordList[I] are composed of lower-case English letters
beginWord ! = endWord
All strings in wordList are different from each other

Fundamental analysis

According to the meaning of the question, only one character can be replaced at a time, and each new word must have appeared in the wordList.

A naive way to do this is to use BFS.

Start from beginWord, enumerate all schemes to replace a character, if the scheme exists in wordList, then join the queue, so that there are all words with 111 replacement times in the queue. The element is then removed from the queue and the process continues until either endWord is encountered or the queue is empty.

At the same time, in order to “prevent repeated enumeration to an intermediate result” and “record how many times each intermediate result is transformed”, we need to create a “hash table” to record.

The KV of the hash table is of the form {word: how many conversions to get}.

When enumerating the new word STR, check whether it already exists in the “hash table”. If not, update the “hash table” and put the new word into the queue.

This practice ensures that “enumerates to all bybeginWord 到 endWord“, and bybeginWord 到 endWordThe “shortest transformation path” of is necessarily the first to be enumerated.

Two-way BFS

1 <= beginWord. Length <= 10

Naive BFS could lead to a “search space explosion”.

Imagine if our wordList was rich enough to include all the words, For an beginWord replacement of 101010 characters that can produce 10∗2510 * 2510∗25 new words (each substitution point can replace another 252,525 lowercase letters), the first layer will produce 250,250,250 words; The second layer produces more than 6∗1046 * 10^46∗104 new words…

As the number of layers increases, the number increases faster. This is the “search space explosion” problem.

In naive BFS implementations, the bottleneck of space is largely determined by the maximum width in the search space.

So is there a way that we can not use such a wide search space and still find the desired results?

“Bidirectional BFS” is a good solution to this problem:

The search starts in both directions at the same time, and once the same value is found, it means that a shortest path connecting the starting point and the end point is found.

The basic implementation of bidirectional BFS is as follows:

Create “two queues” for searching in both directions;
Create “two hash tables” to “resolve repeated searches of the same node” and “record the number of conversions”;
In order to make the two search directions “average” as much as possible, we first determine which queue has the smallest capacity each time we expand the value from the queue.
If the searched node is found during the search, the shortest path is found.

The pseudocode corresponding to the basic idea of “bidirectional BFS” is roughly as follows:

D1 and D2 are queues in two directions, M1 and M2 are hash tables in two directions, and the distance between each node and the starting point is recorded// If both queues are not empty, it is necessary to continue the search
// If one of the queues is empty, the target node in a certain direction cannot be found in the end
while(! d1.isEmpty() && ! d2.isEmpty()) {if (d1.size() < d2.size()) {
        update(d1, m1, m2);
    } else{ update(d2, m2, m1); }}// Update the logic of fetching an element from queue D for "one full extension"
void update(Deque d, Map cur, Map other) {}
Copy the code

Back to the problem, let’s see how to use “bidirectional BFS” to solve the problem.

This is probably the first time for many students to come into contact with “two-way BFS”, so I have written a lot of notes this time.

It is recommended that you read with a basic understanding of “bidirectional BFS”.

Code:

class Solution {
    String s, e;
    Set<String> set = new HashSet<>();
    public int ladderLength(String _s, String _e, List<String> ws) {
        s = _s;
        e = _e;
        If the target word is not in the set, there is no solution
        for (String w : ws) set.add(w);
        if(! set.contains(e))return 0;
        int ans = bfs();
        return ans == -1 ? 0 : ans + 1;
    }

    int bfs(a) {
        // start with beginWord
        // d2 means search from end endWord (reverse)
        Deque<String> d1 = new ArrayDeque<>(), d2 = new ArrayDeque(); 
        
        /* * m1 and m2 respectively record how many times the words appear in the two directions have been transformed * e.g. * m1 = {" ABC ":1} represents ABC replaced by beginWord for 1 time * m1 = {"xyz":3} represents xyz by EndWord replaces the character */ three times
        Map<String, Integer> m1 = new HashMap<>(), m2 = new HashMap<>();
        d1.add(s);
        m1.put(s, 0);
        d2.add(e);
        m2.put(e, 0);
        
        /* * If one of the queues is empty, it is necessary to continue the search * * if one of the queues is empty, it is necessary to continue the search * * if one of the queues is empty, it is necessary to continue the search * * if one of the queues is empty, it is necessary to continue the search * * if one of the queues is empty, it is necessary to continue the search * * if one of the queues is empty, it is necessary to continue the search * * Reverse search is also unnecessary */
        while(! d1.isEmpty() && ! d2.isEmpty()) {int t = -1;
            // In order to make the search in both directions as even as possible, the direction with fewer elements in the queue is extended first
            if (d1.size() <= d2.size()) {
                t = update(d1, m1, m2);
            } else {
                t = update(d2, m2, m1);
            }
            if(t ! = -1) return t;
        }
        return -1;
    }

    // Update means to take a word from the deque and extend it,
    // cur for the current direction of distance dictionary; Other is a distance dictionary in another direction
    int update(Deque<String> deque, Map<String, Integer> cur, Map<String, Integer> other) {
        // Get the original string that you currently want to extend
        String poll = deque.pollFirst();
        int n = poll.length();

        // Enumerates which character I replaces the original string
        for (int i = 0; i < n; i++) {
            // Enumerates which lowercase letter to replace I with
            for (int j = 0; j < 26; j++) {
                // The replaced string
                String sub = poll.substring(0, i) + String.valueOf((char) ('a' + j)) + poll.substring(i + 1);
                if (set.contains(sub)) {
                    // If the string has been recorded (extended) in the current direction, skip it
                    if (cur.containsKey(sub)) continue;

                    // If the string appears in the "other direction", the shortest path connecting the two directions has been found
                    if (other.containsKey(sub)) {
                        return cur.get(poll) + 1 + other.get(sub);
                    } else {
                        // Otherwise join the deque queue
                        deque.addLast(sub);
                        cur.put(sub, cur.get(poll) + 1); }}}}return -1; }}Copy the code

Time complexity: letwordListLength of $n$ .beginWordThe length of the string is $m$ . Because all search results have to be inwordListYes, so it’s at most $n + 1$ Node, in the worst case, all nodes are connected, and the complexity of searching the whole graph is $O(n^2)$ ; frombeginWordStart character replacement, and perform character by character check during replacement. The complexity is $O(m)$ . The overall complexity is $O(m * n^2)$
Space complexity: Same space size. O (m ∗ (n2) to O (m * n ^ 2) O (m ∗ n2)

conclusion

This is essentially a “all edge weights are zero1“Most short circuit problem: willbeginWordAnd all thewordListAny string that occurs is considered a point. Each transformation operation is considered to produce an edge weight of1The edge. Problem asks tobeginWordIs the source point, withendWordIs the shortest path of the sink.

With the help of this problem, I introduced to you “bidirectional BFS”, “bidirectional BFS” can effectively solve the “search space explosion” problem.

For those search problems where the search nodes grow multiple or exponentially as the number of layers increases, “bidirectional BFS” can be used to solve them.

[Supplement] Heuristic search AStar

The “heuristic function” of A* can be designed directly according to the rule in question.

For example, for two strings A and B, I think it’s appropriate to use the number of different characters in them as the estimated distance.

Because the difference between the number of different characters ensures that the real distance (a theoretical minimum number of substituts) is not exceeded.

Note: The data in this question is weak, and we use A* to pass it, but usually we need to “make sure there is A solution” for the heuristic search of A* to be of real value. In this case, unless endWord itself is not in the wordList, there is no good way to judge “whether there is a solution” in advance. In this case, A* will not bring “search space optimization” as well as “bidirectional BFS”.

Java code:

class Solution {
    class Node {
        String str;
        int val;
        Node (String _str, int _val) {
            str = _str;
            val = _val;
        }
    }
    String s, e;
    int INF = 0x3f3f3f3f;
    Set<String> set = new HashSet<>();
    public int ladderLength(String _s, String _e, List<String> ws) {
        s = _s;
        e = _e;
        for (String w : ws) set.add(w);
        if(! set.contains(e))return 0;
        int ans = astar();
        return ans == -1 ? 0 : ans + 1;
    }
    int astar(a) {
        PriorityQueue<Node> q = new PriorityQueue<>((a,b)->a.val-b.val);
        Map<String, Integer> dist = new HashMap<>();
        dist.put(s, 0);
        q.add(new Node(s, f(s)));
        
        while(! q.isEmpty()) { Node poll = q.poll(); String str = poll.str;int distance = dist.get(str);
            if (str.equals(e)) {
                break;
            }
            int n = str.length();
            for (int i = 0; i < n; i++) {
                for (int j = 0; j < 26; j++) {
                    String sub = str.substring(0, i) + String.valueOf((char) ('a' + j)) + str.substring(i + 1);
                    if(! set.contains(sub))continue;
                    if(! dist.containsKey(sub) || dist.get(sub) > distance +1) {
                        dist.put(sub, distance + 1);
                        q.add(newNode(sub, dist.get(sub) + f(sub))); }}}}return dist.containsKey(e) ? dist.get(e) : -1;
    }
    int f(String str) {
        if(str.length() ! = e.length())return INF;
        int n = str.length();
        int ans = 0;
        for (int i = 0; i < n; i++) {
            ans += str.charAt(i) == e.charAt(i) ? 0 : 1;
        }
        returnans; }}Copy the code

Python 3 code:

class Solution:
    def ladderLength(self, beginWord: str, endWord: str, wordList: List[str]) - >int:
        class PriorityQueue:
            def __init__(self) :
                self.queue = []
                self.explored = set(a)def __contains__(self, item) :
                return item in self.explored

            def __add__(self, other) :
                heapq.heappush(self.queue, other)
                self.explored.add(other[2])

            def __len__(self) :
                return len(self.queue)

            def pop(self) :
                return heapq.heappop(self.queue)

        def heuristic(curr, target) :
            return sum(1 for a, b in zip(curr, target) ifa ! = b) wordList =set(wordList)
        if endWord not in wordList:
            return 0
        front = PriorityQueue()
        front.__add__((1.0, beginWord))
        while front:
            l, _, s = front.pop()
            if s == endWord:
                return l
            for i in range(len(s)):
                for c in string.ascii_lowercase:
                    t = s[:i] + c + s[i + 1:]
                    if t in wordList and t not in front:
                        front.__add__((l + 1, heuristic(t, endWord), t))
        return 0
Copy the code

The last

This is No.127 in our “Brush through LeetCode” series, which began on 2021/01/01. As of the start date, there are 1916 questions on LeetCode, some with locks, and we will first brush through all the questions without locks.

In this series of articles, in addition to explaining how to solve the problem, I’ll give you the most concise code possible. If the general solution is involved, the corresponding code template will be provided.

In order to facilitate the students to debug and submit the code on the computer, I set up a related warehouse: github.com/SharingSour… .

In the repository address, you can see the solution to the series, the corresponding code to the series, the LeetCode link, and other preferred solutions to the series.

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

[Algorithm of the Day] Solve the search space explosion problem with “bidirectional BFS” (with the heuristic search AStar algorithm) | Python topic month

Topic describes

Fundamental analysis

Two-way BFS

conclusion

[Supplement] Heuristic search AStar

The last

[Algorithm of the Day] Solve the search space explosion problem with “bidirectional BFS” (with the heuristic search AStar algorithm) | Python topic month

Topic describes

Fundamental analysis

Two-way BFS

conclusion

[Supplement] Heuristic search AStar

The last

Related Posts

How did I learn to write an operating system (VI) : Process scheduling

How to build the perfect Python project

The plain English “link tracking” really smells good