background

Give an array of two, one array is A [1,2,3,4], and the other array is B [5,6,7,8]. Let you find a large array by combining two arrays:

  • The maximum
  • The minimum value
  • The sum of the

Isn’t that easy? We can easily solve this at O(m+n)O(m+n)O(m+n) O(m+n), where m and n are the sizes of arrays A and B, respectively.

What if I could change some of the values of A and B, and I asked for Max, min, and sum multiple times?

The naive idea is to modify the array in place and then recalculate the time of O(m+n)O(m+n)O(m+n). Obviously, this does not take advantage of the calculated results, which is inefficient. Is there a more efficient way to do that?

There are! Line segment trees will do it.

What is a line segment tree?

A segment tree is essentially a tree. More precisely, it is a binary tree, and it is a balanced binary tree. We’ll talk about why it’s a balanced binary tree, but here’s the idea.

Although it is a binary tree, we usually use arrays to simulate the tree structure, rather than the traditional definition of TreeNode.

This is partly because it is easy to implement, and partly because the segment tree is actually a complete binary tree, so direct simulation using arrays is efficient. The reason for this has already been explained in the binary heap implementation in the heap topic I wrote earlier. You can reply to the heap on my public account “Likejijiajia”.

What problem to solve

As its name suggests, a line segment tree has to do with line segments. Each node of a line segment tree actually stores information about an interval (segment). Then, if the information of these intervals satisfies certain properties, the line segment tree can be used to improve performance.

The:

  1. What is the nature of it?
  2. How to improve performance?

What is the nature of it?

For example, the maxima, minima and summation that we mentioned earlier satisfy this certain property. That is, I can derive some index of the union of subsets from several subsets (in this case, two).

For the example above, we can think of arrays A and B as two sets. Then: the maximum value of set A and set B are known, we can directly obtain the maximum value of the union of set A and set B by Max (Amax, Bmax). Where Amax and Bmax are the maximum values of sets A and B respectively. Minima and summation are the same and will not be repeated. So if the statistics satisfy this property, then we can use a line segment tree. But whether or not to use the line tree depends on whether or not it can improve performance.

How to improve performance?

As for improving performance, I’ll keep it in suspense and talk about it later when we talk about implementation.

implementation

Take the summation at the beginning of this article.

We can take interval A and interval B as the left and right nodes of A tree respectively, and store the interval of A and the interval sum of B in the left and right child nodes respectively.

Next, the interval of A is divided into left and right parts, and B is also divided into left and right parts. Continue this process until you can no longer continue.

To sum up, the interval is continuously divided into two parts, and the interval information is stored separately to the left and right nodes. If it’s a sum, then the interval information is the sum of the intervals. The line segment tree would look something like this:

The sum of the intervals in blue.

Note that there are n leaf nodes in the tree (n is the length of the original array), and each one corresponds to a value of the original array.

It’s also easy to translate into code. You can solve this problem by simply using subsequent traversal. This is because we need to know the statistics of the left and right nodes in order to calculate the statistics of the current node.

After not familiar with the sequence traversal can look at my previous tree topic, the public number force button plus reply tree can be obtained

In the same way as the binary heap, we can represent the tree as an array, using 2 * I + 1 and 2 * 1 + 2 to represent the index of the left and right nodes, where I is the index corresponding to the current node on the tree.

Tree is an array used to build a line segment tree, similar to a binary heap. Tree [I] currently stores interval information.

The way I described building a tree above is clearly recursive, so we can recursively build a tree. Specifically, you can define a build(tree_index, L, R) method to build the tree. Where L and R are the left and right endpoints of the corresponding interval, so that L and R can uniquely determine an interval. Tree_index indicates where the current interval information should be updated in the tree array.

We store interval information on tree, so we can finally use tree[tree_index] =…. Let’s update the interval information.

Core code:

def build(self, tree_index:int, l:int, r:int) :
    Tree_index: the position of the segment tree node in the array L, r: the left and right boundaries of the interval represented by this node.
    if l == r:
        self.tree[tree_index] = self.data[l]
        return
    mid = (l+r) // 2 The midpoint of the interval corresponds to the end of the left child interval and the beginning of the right child interval
    left, right = 2 * tree_index + 1.2 * tree_index + 2 # tree_index Specifies the left and right subtree index
    self.build(left, l, mid)
    self.build(right, mid+1, r)
    Typical post-order traversal
    # Add interval sum, if not interval sum, change the following line of code
    self.tree[tree_index] = self.tree[left] + self.tree[right]

Copy the code

The array self.tree[I] in the above code is actually used to store the sum of intervals like the blue font shown above. Each interval stores a location on the tree, storing its interval and.

Complexity analysis

  • Time complexity: the recursive relation T(n) = 2*T(n/2) + 1, so the time complexity is O(n)O(n)O(n) O(n)

I don’t know how to get O(n)O(n)O(n)? Take a look at the first chapter of my Algorithmic Way to Beat The Game. leetcode-solution.cn/book-intro

  • Space complexity: The size of tree is of the same order as n, so the space complexity is O(n)O(n)O(n)

Finally finished the tree, but knew it was not efficient at all. What we want to do is efficiently handle interval queries with frequent updates.

So based on this line tree method, how to update and query interval information?

Range queries

Let’s start with a simple question about how interval queries work.

If you query information about an interval. This is also a post order traversal ok. Let’s say I want to find the sum of the interval [l,r].

So if the current left node:

  • Fall completely inside [L,r]. For example, [2,3] falls completely inside [1,4]. We directly take out the interval and of the left node of the tree for standby, let’s say lsum.
  • Partially within [L, R]. For example, the [1,3] part falls on [2,4]. At this point we continue to recurse until we are completely in the interval (as in the case above), at which point we directly extract the interval sum of the left node in the tree
  • Add up all the spare values you’ve taken out and you get the answer

The same is true for the right node.

Complexity analysis

  • Time complexity: The query does not need to process two leaf nodes at any one time, and actually processes roughly the same number of times as the height of the tree. And the tree is balanced, so it’s order logn, order logn, order logn.

Or T(n) = T(n/2) + 1, so the time is order logn order logn order logn

I don’t know how to get order logn, order logn, order logn? Take a look at the first chapter of my Algorithmic Way to Beat The Game. leetcode-solution.cn/book-intro

You can understand the complexity in the following code.

Interval changes

So what if I change A[1] to 1?

If you do not modify the tree, then it is obvious that the query must be wrong as long as the interval contains A[1], for example, the sum of the interval [1,3] will get the wrong answer. So we need to modify the tree at the same time as we modify A[1].

The question is which tree values do we want to change, and by how much?

To answer the first question, which trees are changed?

We know that the leaf nodes of the line segment tree are all values of the original array, which means that the n leaf nodes of the line segment tree correspond to the original array. So we first need to find the leaf node corresponding to the position we modify and change its value.

Is that it?

There is no end. In fact, we need to change all of the parent and, if any, grandfather nodes of the leaf node we are modifying. That is, we need to bubble from this leaf node to the root node, and modify the interval information along the way

This process is similar to the browser’s event model

Now to answer the last question, how much?

For the sum, we need to first change the leaf node to the modified value, and add delta to the interval sum of all points along the path from the leaf node to the root node, where delta is the difference before and after the change.

How to update the maximum and minimum values? Think about it for yourself.

The problem of which nodes to change, and how many to change, is solved, and the code implementation is easy.

Complexity analysis

  • Time complexity: Modifications do not require processing of two leaf nodes at every moment, and are actually processed roughly as many times as the height of the tree. And the tree is balanced, so it’s order logn, order logn, order logn.

Or T(n) = T(n/2) + 1, so the time is order logn order logn order logn

I don’t know how to get order logn, order logn, order logn? Take a look at the first chapter of my Algorithmic Way to Beat The Game. leetcode-solution.cn/book-intro

You can understand the complexity in the following code.

code

The line tree code has been put on the brush plug-in, which can be obtained by the public account “Likou Plus Plus” reply plug-in.


class SegmentTree:
    def __init__(self, data:List[int]) :
        Data: Array passed in.
        self.data = data
        self.n = len(data)
        # Apply for space 4 times the length of data to store line segment tree nodes
        self.tree = [None] * (4 * self.n) The left child of index I is indexed 2i+1 and the right child is indexed 2i+2
        if self.n:
            self.build(0.0, self.n-1)
    # Is essentially a bottom-up updating process
    So you can use a back-order traversal, which updates the parent node when the function returns.
    def update(self, tree_index, l, r, index) :
        Tree_index: a root node index l, r: this root node represents the left and right boundaries of the interval index: the index of the updated value
        if l == r==index :
            self.tree[tree_index] = self.data[index]
            return
        mid = (l+r)//2
        left, right = 2 * tree_index + 1.2 * tree_index + 2
        if index > mid:
            The interval to be updated is in the right subtree
            self.update(right, mid+1, r, index)
        else:
            Index <=mid
            self.update(left, l, mid, index)
        Part of the interval is in the left subtree and part in the right subtree
        # Add interval sum, if not interval sum, change the following line of code
        self.tree[tree_index] = self.tree[left] + self.tree[right]

    def updateSum(self,index:int,value:int) :
        self.data[index] = value
        self.update(0.0, self.n-1, index)
    def query(self, tree_index:int, l:int, r:int, ql:int, qr:int) - >int:
        Tree_index: the index of a root node l, r: the left and right boundaries of the interval ql, QR: the left and right boundaries of the interval to be queried
        if l == ql and r == qr:
            return self.tree[tree_index]

        The midpoint of the interval corresponds to the end of the left child interval and the beginning of the right child interval
        mid = (l+r) // 2
        left, right = tree_index * 2 + 1, tree_index * 2 + 2
        if qr <= mid:
            Query interval all in the left subtree
            return self.query(left, l, mid, ql, qr)
        elif ql > mid:
            Query interval all in the right subtree
            return self.query(right, mid+1, r, ql, qr)

        Part of the interval is in the left subtree and part in the right subtree
        # Add interval sum, if not interval sum, change the following line of code
        return self.query(left, l, mid, ql, mid) + self.query(right, mid+1, r, mid+1, qr)

    def querySum(self, ql:int, qr:int) - >int:
        Return the sum of the interval [ql,..,qr].
        return self.query(0.0, self.n-1, ql, qr)

    def build(self, tree_index:int, l:int, r:int) :
        Tree_index: the position of the segment tree node in the array L, r: the left and right boundaries of the interval represented by this node.
        if l == r:
            self.tree[tree_index] = self.data[l]
            return
        mid = (l+r) // 2 The midpoint of the interval corresponds to the end of the left child interval and the beginning of the right child interval
        left, right = 2 * tree_index + 1.2 * tree_index + 2 # tree_index Specifies the left and right subtree index
        self.build(left, l, mid)
        self.build(right, mid+1, r)
        # Add interval sum, if not interval sum, change the following line of code
        self.tree[tree_index] = self.tree[left] + self.tree[right]

Copy the code

Related to the project

  • The heap

You can take a look at the binary heap implementation that I wrote earlier on heaps. Then contrast study, incidentally also learned pile, is not beautiful zai?

  • Tree array

Tree arrays are similar to line trees, but slightly more difficult than line trees. I have a chance to write an article on tree arrays.

  • immutablejs

Do you know immutable? Immutablejs is a well-known tool library that implements IMmutable mutable. Lucifer ren/blog/2020/0 | | | | | | | | | | | | | | | | | | | | | |

Answer the previous question

Why a balanced binary tree?

The previous time complexity is also based on the premise that the tree is a balanced binary tree. Is the line segment tree necessarily a balanced binary tree? Yes. This is because line segment trees are perfect binomial trees, and perfect binomial trees are balanced.

Of course, there is another premise, that is, the total number of nodes in the tree is the same order as the length of the original array, which is on the order of n. It’s also easy to figure out why it’s the same order, just by using the recursive formula.

Why line segment trees improve performance?

This is easy to understand once you understand the time complexity of my implementation. Because the time complexity of both modification and query is lognlognlogn, and the complexity of query without brute force of line tree is up to O(n)O(n)O(n). The corresponding cost is the space of O(n)O(n)O(n) O(n) of the tree, so the line segment tree is also a typical space time algorithm.

Let’s do the last one. Can interval algorithm problems be solved by line segment tree? In fact, this has been answered in the article, which depends on whether two points are met:

  1. Whether an index of the union of subsets can be derived from several (here, two) subsets.
  2. Whether it improves performance (compared to the naive brute force solution). The segment tree can be used to optimize the modified query time consumption in frequently modified scenarios.