Study on volume occupancy of oil storage tank based on YoloV3 satellite image

The author | Md. Compile Mubasir | source of vitamin k | forward Data Science

Before 1957, earth had only one natural satellite: the moon. On October 4, 1957, the Soviet Union launched the world’s first Sputnik. Since then, about 8,900 satellites have been launched from more than 40 countries.

These satellites help us with surveillance, communications, navigation and so on. These countries also use satellites to monitor the land and movements of another country and estimate its economy and power. However, all countries withhold their information from each other.

Likewise, the global oil market is not entirely transparent. Almost all oil producers try to hide their total production, consumption and reserves. Countries do this to indirectly hide their real economies from the outside world and to enhance the capabilities of their defense systems. That could pose a threat to other countries.

For this reason, many startups, such as Planet and Orbital Insight, keep an eye on such activity in countries through satellite imagery. Thye collects satellite images of the tanks and estimates reserves.

But the question is, how do you estimate the size of the tanks from satellite images alone? Well, only if the tank has a floating top tank. This particular type of tank is specifically designed to store large quantities of petroleum products, such as crude oil or condensate. It consists of a top cover, which sits directly on top of the oil and forms two shadows around it as the amount of oil in the tank increases or decreases. As shown below, the shadow is on the north side

(External shadow) refers to the total height of the tank, while the shadow inside the tank (internal shadow) indicates the depth of the floating roof. The volume is estimated to be 1-(inner shaded region/outer shaded region).

In this blog, we will use the TensorFlow2.x framework to implement a complete model from scratch using Python to estimate tank occupancy with the help of satellite imagery.

Making the warehouse

All of this article and the entire code can be found in the Github repository

Github.com/mdmub0587/O…

Below is the list of contents for this blog. We’ll explore one by one.

1. Problem statement, data set and evaluation indicators

Problem statement:

Detection of floating roof tank and estimation of oil storage. The image blocks are then reassembled into a full image with an estimate of oil storage.

Data set:

Data set links: www.kaggle.com/towardsentr…

The dataset, which contains an annotated bounding box with satellite images taken from Google Earth, contains industrial zones around the world. There are 2 folders and 3 files in the dataset. Let’s look at each of them.

Large_images: This is a folder containing 100 raw satellite images, each 4800×4800 in size. All images are named in id_large.jpg format.
Image_patches: The Image_patches directory contains 512×512 subimages generated from large images. Each large image is split into 100, 512×512 subimages, with 37 pixels of overlap between the subimages on the two axes. The program that generates the image subgraph is named in the format ID_row_column.jpg
** allelages. json:** This contains the labels for all images. Labels are stored as a list of dictionaries, one for each image. Images that do not contain any floating top tanks will be marked as “Skip”. The format of the bounding box label is the (x, y) coordinates of the four corners of the bounding box.
Labels_coke. json: It contains the same tags as the previous file, converted to COCO label format. In this case, the bounding box is of the format x_min, y_min, width, height.
**large_image_data.csv:** It contains metadata about large image files, including the center coordinates and altitude of each image.

Evaluation indicators:

For tank detection, we will use Average Precision (AP) for each tank and mAP(Mean Average Precision) for each tank. There is no metric for the estimated volume of a floating roof tank.

MAP is the standard evaluation index of the target detection model. Detailed instructions for mAP can be found in the YouTube playlist below

www.youtube.com/watch?list=…

2. Existing methods

Karl Keyer [1] uses RetinaNet in his repository for tank detection tasks. He creates the model from scratch and applies the resulting anchor frame to the data set. This resulted in an average accuracy (AP) of 76.3% for floating top tanks. He then applied shadow enhancement and pixel threshold methods to calculate its volume.

As far as I know, this is the only method available on the Internet.

3. Relevant research work

Estimating the Volume of Oil Tanks Based on High-Resolution Remote Sensing Images [2]:

This paper presents a method for estimating tank capacity/volume based on satellite images. To calculate the total volume of a storage tank, they need the height and radius of the tank. To calculate the height, they used a geometric relationship with the length of the projected shadow. But calculating the length of the shadow is not easy. To highlight shadows use the HSV(hue saturation value) color space because shadows are usually highly saturated in the HSV color space. Then the median method based on sub-pixel subdivision positioning is used to calculate the shadow length. Finally, the radius of oil tank is obtained by Hough transform algorithm.

In this paper, a building height calculation method based on satellite image is proposed.

4. Useful blogs and research papers

A Beginner’s Guide To Storage Tank Occupancy With Help Of Satellite Imagery [3]:

This blog is written by TankerTracker.com. One service uses satellite imagery to track oil storage at several geographic points of interest.

In this blog post, they describe in detail how the external and internal shadows of the tanks help us estimate how much oil is inside. Satellite images taken at a specific time and a month later were also compared, showing how the tanks changed over the course of a month. This blog gives us an intuitive knowledge of how to estimate quantities.

A Gentle Introduction to Object Recognition With Deep Learning [4] :

This article introduces some of the most confusing concepts that come to mind for beginners in object detection. First, the differences among target classification, target location, target recognition and target detection are described. Then we discuss some new deep learning algorithms to expand the target recognition task.

Object classification refers to assigning labels to images containing individual objects. Object positioning refers to drawing a bounding box around one or more objects in an image. The target detection task combines target classification and location. This means that it is a more challenging/complex task to first draw a bounding box around the object of interest (OI) through localization techniques and then assign a label to each OI with the help of classification. Target recognition is simply a collection of all of the above tasks (i.e., classification, location, and detection).

Finally, two major object detection algorithms/models are discussed: region-based Convolutional Neural Networks (R-CNN) and You Only Look Once (YOLO).

Selective Search for Object Recognition [5]:

In the target detection task, the most critical part is target location, because target classification is based on this. The classification relies on locating the proposed region of interest (referred to as the regional proposal). Better positioning will lead to better target detection. Selective search is an emerging algorithm, which is used for object location in some object recognition models, such as R-CNN and FAST-R-CNN.

The algorithm first uses an efficient graph-based image segmentation method to generate sub-segments of the input image, and then uses greedy algorithm to merge smaller similar regions into larger similar regions. Segmented similarity is based on four attributes: color, texture, size, and fill.

Region Proposal Network — A Detailed view[6] :

RPN(Region-Proposition Network) is widely used in target location because it is faster than traditional selective search algorithms. It learns the best location of a target from a feature map, just as CNN learns classification from a feature map.

It is responsible for three main tasks, first generating anchor frames (each feature mapping point generates nine anchor frames of different shapes), then classifying each anchor frame into a foreground or background (that is, whether it contains an object or not), and finally learning the shape offset of the anchor frame to fit the object.

Faster R-CNN: Towards real-time Object Detection with Region Proposal Networks[7] :

The Faster R-CNN model solves all the problems of the first two related models (R-CNN and Fast R-CNN) and uses RPN as a regional suggestion generator. Its architecture is exactly the same as Fast R-CNN, except that it uses RPN instead of selective search, which makes it 34 times faster than Fast R-CNN.

Real-time Object Detection with YOLO, YOLOv2, and Now YOLOv3 [8] :

Before introducing the Yolo model, let’s take a look at a Ted talk by its lead researcher, Joseph Redman.

youtu.be/Cgxsv1riJhI

This model is at the top of the list of object detection models for a number of reasons. The main reason, however, is its firmness. Its reasoning time is very short, which is why it is easy to match the normal speed of video (i.e. 25fps) and apply it to real-time data.

Different from other object detection models, the Yolo model has the following characteristics.

Single neural network model (that is, the classification and positioning tasks will be performed from the same model) : With a photo as input, it directly predicts the bounding boxes and class labels for each bounding box, meaning it only looks at the image once.
Because it performs convolution on the whole image rather than a part of it, it produces very few background errors.
Generalized representation of YOLO learning objects. YOLO far outperforms top detection methods such as DPM and R-CNN for training and art testing of natural images. Because YOLO is highly versatile, it is less likely to crash when applied to new domains or unexpected inputs.

What makes YoloV3 better than Yolov2?

If you look closely at the title of the Yolov2 paper, it is “YOLO9000: Better, Faster, Stronger”. Is Yolov3 much better than Yolov2? Well, the answer is yes, it’s better, but not faster or stronger, because the complexity of the system has increased.
Yolov2 uses a 19-layer DarkNet architecture without any residual blocks, skip connections, and upsampling, making it difficult to detect small objects. In Yolov3, however, these features were added and a 53-layer DarkNet network trained on Imagenet was used. In addition, 53 convolution layers are piled up to form 106 convolution layers.

Yolov3 makes predictions on three different scales, starting with a 13X13 grid for large objects, followed by a 26X26 grid for medium objects, and finally a 52X52 grid for small objects.
YoloV3 uses a total of 9 anchor boxes, 3 for each scale. The optimal anchor box was selected by k-means clustering method.
Yolov3 now performs multi-label classification of objects detected in the image. Object confidence and class prediction were predicted by logistic regression.

5. Our contribution

Our problem statement consists of two tasks, the first is the detection of the floating roof tank, the other is the extraction of shadows and the estimation of the identified tank volume. The first task is based on object detection and the second task is based on computer vision technology. Let’s describe the approach to each task.

Storage tank testing:

Our goal is to estimate the volume of the floating roof tank. We can build a target detection model for a class, but in order to reduce the confusion between a model and another type of storage tank (i.e., other types of storage tank) and make it robust, we propose three types of target detection models. YoloV3 with transfer learning is used for target detection because it is easier to train on a machine. In addition, data enhancement was used to improve the measurement score.

Shadow extraction and volume estimation:

Shadow extraction involves many computer vision techniques. Since the RGB color scheme is not sensitive to shadows, it must first be converted to HSV and LAB color Spaces. We use the ratio image of (L1 + L3)/(V+1), where L1 is the first channel value in the LAB color space, to enhance the shaded areas.

Then, the enhanced image is filtered by a threshold of 0.5× T1 +0.4× T2, where T1 is the minimum pixel value and T2 is the mean value. Then the threshold image is processed by morphology (noise removal, clear contour, etc.).

Finally, the shaded silhouettes of the two tanks were extracted and the volume occupied was estimated according to the above formula. These ideas are taken from the Notebook below.

www.kaggle.com/towardsentr…

Follow the entire process to solve the case study as shown below.

Let’s start with EDA for exploratory data analysis of datasets!!

6. Exploratory Data Analysis (EDA)

Explore the alllabs. json file:

json_labels = json.load(open(os.path.join('data'.'labels.json')))
print('Number of Images: '.len(json_labels))
json_labels[25:30]
Copy the code

All labels are stored in the dictionary list. There are 100,000 images in total. Images that do not contain any tanks will be labeled Skip, while images that contain tanks will be labeled Tank, Tank Cluster, or Floating Head Tank. Each Tank object has four corner bounding box coordinates in dictionary format.

Count:

Out of 10K images, 8187 images have no labels (that is, they do not contain any tank objects). In addition, 81 images contained at least one tank cluster object and 1,595 images contained at least one floating roof tank.

In the bar chart, it can be observed that of the 1595 floating top tanks containing images, 26.45% of the images contain only one floating top tank object. The maximum number of floating roof tank objects in a single image is 34.

Explore labels_cox. json file:

json_labels_coco = json.load(open(os.path.join('data'.'labels_coco.json')))
print('Number of Floating tanks: '.len(json_labels_coco['annotations']))
no_unique_img_id = set(a)for ann in json_labels_coco['annotations']:
  no_unique_img_id.add(ann['image_id'])
print('Number of Images that contains Floating head tank: '.len(no_unique_img_id))
json_labels_coco['annotations'] [:8]
Copy the code

This file contains only the float top tank bounding box and its Image_ID in the dictionary format list

Print boundary box:

There are three types of storage tanks:

Tank (Tank) T

Tank Cluster(TC Tank Group),

Floating Head Tank(FHT)

7. Data expansion

In EDA, it was observed that 8,171 of 10,000 images were useless because they did not contain any objects. In addition, 1,595 images contain at least one floating top tank object. It is well known that all deep learning models require a large amount of data, and not having enough data can lead to performance degradation.

Therefore, we first perform data expansion, and then fit the obtained expanded data into the Yolov3 target detection model.

8. Data preprocessing, expansion and TFRecords

Data preprocessing:

Observe that the comments for the object are given in Jason format with four corners. First, extract the upper-left and lower-right points from these corners. Next, all comments belonging to a single image and their corresponding labels are saved in a one-line list of CSV files.

Extract the code for the upper-left and lower-right points from corner points

def conv_bbox(box_dict) :
  Output: tuple(ymin, xmin, ymax, xmax) """
  xs = np.array(list(set([i['x'] for i in box_dict])))
  ys = np.array(list(set([i['y'] for i in box_dict])))
  x_min = xs.min()
  x_max = xs.max()
  y_min = ys.min()
  y_max = ys.max(a)return y_min, x_min, y_max, x_max
Copy the code

The CSV file will look like this

To evaluate the model, we will keep 10% of the images as the test set.

# Training and testing division
df_train, df_test= model_selection.train_test_split(
  df, #CSV file comments
  test_size=0.1,
  random_state=42,
  shuffle=True,
)
df_train.shape, df_test.shape
Copy the code

Data expansion:

We know that target detection requires a lot of data, but we only have 1,645 images for training, which is very small. To add data, we must perform data expansion. In this process, a new image is generated by flipping and rotating the original image. Let’s go to the GitHub repository below and extract the code from it for expansion

Blog.paperspace.com/data-augmen…

Generate seven new images from a single original image by doing the following:

Flip horizontal
Rotate 90 degrees
Rotate 180 degrees
Rotate 270 degrees
Horizontal flip and 90 degree rotation
Flip horizontally and rotate 180 degrees
Horizontal flip and 270 degree rotation

The following is an example

TFRecords:

TFRecords is TensorFlow’s own binary storage format. It is often useful when the data set is too large. It stores data in binary format and has a significant impact on the performance of the training model. Binary data takes less time to replicate and consumes less space because only one batch data is loaded at training time. You can find a detailed description of it on the blog below.

Medium.com/mostly-ai/t…

You can also view the Tensorflow documentation below.

www.tensorflow.org/tutorials/l…

Our data set has been converted to RFRecords format. There is no need to perform this task because our data set is not very large. However, this is for intellectual purposes. If you’re interested, you can find the code in my GitHub repository.

9. Target detection based on YoloV3

Training:

In order to train the Yolov3 model, transfer learning is used. The first step involves loading the weight of the DarkNet network and freezing it during training to keep the weight unchanged.

def create_model() :
    tf.keras.backend.clear_session() 
    pret_model = YoloV3(size, channels, classes=80)
    load_darknet_weights(pret_model, 'Pretrained_Model/yolov3.weights')
    print('\nPretrained Weight Loaded')

    model = YoloV3(size, channels, classes=3)
    model.get_layer('yolo_darknet').set_weights(
        pret_model.get_layer('yolo_darknet').get_weights())
    print('Yolo DarkNet weight loaded')

    freeze_all(model.get_layer('yolo_darknet'))
    print('Frozen DarkNet layers')
    return model

model = create_model()
model.summary()
Copy the code

We used the Adam optimizer (initial learning rate =0.001) to train our model and applied cosine attenuation to reduce the learning rate according to the epoch. The model checkpoint was used to save the best weight during training, and the last weight was saved after training.

tf.keras.backend.clear_session()  
epochs = 100
learning_rate=1e-3

optimizer = get_optimizer(
    optim_type = 'adam',
    learning_rate=1e-3, 
    decay_type='cosine', 
    decay_steps=10*600 
)
loss = [YoloLoss(yolo_anchors[mask], classes=3) for mask in yolo_anchor_masks]


model = create_model()
model.compile(optimizer=optimizer, loss=loss)

# Tensorbaord
! rm -rf ./logs/ 
logdir = os.path.join("logs", datetime.datetime.now().strftime("%Y%m%d-%H%M%S"))
%tensorboard --logdir $logdir
tensorboard_callback = tf.keras.callbacks.TensorBoard(logdir, histogram_freq=1)

callbacks = [
    EarlyStopping(monitor='val_loss', min_delta=0, patience=15, verbose=1),
    ModelCheckpoint('Weights/Best_weight.hdf5', verbose=1, save_best_only=True),
    tensorboard_callback,
]

history = model.fit(train_dataset,
                    epochs=epochs,
                    callbacks=callbacks,
                    validation_data=valid_dataset)
model.save('Weights/Last_weight.hdf5')
Copy the code

Loss function:

YOLO loss function:

The loss function used in Yolov3 model training is quite complex. Yolo calculates three different losses at three different scales and summarizes the back propagation (as you can see in the code unit above, the final loss is a list of three different losses). Each loss calculates positioning loss and classification loss through four subfunctions.

MSE loss of center (x, y).
Mean square Error (MSE) of width and height of boundary box
Binary cross entropy score and no objective score of boundary box
Binary cross entropy or sparse category cross entropy for boundary box multi-class prediction

Let’s look at the loss formula used in Yolov2

The last three terms in Yolov2 are squared errors, whereas in Yolov3 they are replaced by cross entropy error terms. In other words, object confidence and class predictions in Yolov3 are now predicted by logistic regression.

Look at the implementation of the Yolov3 loss function

def YoloLoss(anchors, classes=3, ignore_thresh=0.5) :
    def yolo_loss(y_true, y_pred) :
        # 1. Transform all forecast output
        # y_pred: (batch_size, grid, grid, anchors, (x, y, w, h, obj, ... cls))
        pred_box, pred_obj, pred_class, pred_xywh = yolo_boxes(
            y_pred, anchors, classes)
        # predicted (tx, ty, tw, th)
        pred_xy = pred_xywh[..., 0:2] #x,y of last channel
        pred_wh = pred_xywh[..., 2:4] #w,h of last channel

        # 2. Convert all real output
        # y_true: (batch_size, grid, grid, anchors, (x1, y1, x2, y2, obj, cls))
        true_box, true_obj, true_class_idx = tf.split(
            y_true, (4.1.1), axis=-1)

        Convert x1, y1, x2, y2 to x, y, w, h
        # x,y = (x2 - x1)/2, (y2-y1)/2
        # w, h = (x2- x1), (y2 - y1)
        true_xy = (true_box[..., 0:2] + true_box[..., 2:4) /2
        true_wh = true_box[..., 2:4] - true_box[..., 0:2]

        Small boxes have higher weights
        #shape-> (batch_size, grid, grid, anchors)
        box_loss_scale = 2 - true_wh[..., 0] * true_wh[..., 1]


        # 3. Reverse the Pred box equation
        Change (bx, by, bw, bh) to (tx, ty, tw, th)
        grid_size = tf.shape(y_true)[1]
        grid = tf.meshgrid(tf.range(grid_size), tf.range(grid_size))
        grid = tf.expand_dims(tf.stack(grid, axis=-1), axis=2)
        true_xy = true_xy * tf.cast(grid_size, tf.float32) - tf.cast(grid, tf.float32)
        true_wh = tf.math.log(true_wh / anchors)
        It is possible that some lattice true_WH is 0, and partitioning with anchor points may result in INF or nan
        true_wh = tf.where(tf.logical_or(tf.math.is_inf(true_wh),
                                         tf.math.is_nan(true_wh)),
                           tf.zeros_like(true_wh), true_wh)

        # 4. Count all masks
        # Remove dimensions of size 1 from the shape of the tensor.
        #obj_mask: (batch_size, grid, grid, anchors)
        obj_mask = tf.squeeze(true_obj, -1) 
        # When iOU exceeds the critical value, false positive examples are ignored
        #best_iou: (batch_size, grid, grid, anchors)
        best_iou = tf.map_fn(
            lambda x: tf.reduce_max(broadcast_iou(x[0], tf.boolean_mask(
                x[1], tf.cast(x[2], tf.bool))), axis=-1),
            (pred_box, true_box, obj_mask),
            tf.float32)
        ignore_mask = tf.cast(best_iou < ignore_thresh, tf.float32)

        # 5. Count all losses
        xy_loss = obj_mask * box_loss_scale * \
            tf.reduce_sum(tf.square(true_xy - pred_xy), axis=-1)
        wh_loss = obj_mask * box_loss_scale * \
            tf.reduce_sum(tf.square(true_wh - pred_wh), axis=-1)
        obj_loss = binary_crossentropy(true_obj, pred_obj)
        obj_loss = obj_mask * obj_loss + \
            (1 - obj_mask) * ignore_mask * obj_loss
        #TODO:Use binary_crossentropy instead
        class_loss = obj_mask * sparse_categorical_crossentropy(
            true_class_idx, pred_class)

        # 6. Summing in (Batch, gridx, gridy, anchors) => (Batch, 1)
        xy_loss = tf.reduce_sum(xy_loss, axis=(1.2.3))
        wh_loss = tf.reduce_sum(wh_loss, axis=(1.2.3))
        obj_loss = tf.reduce_sum(obj_loss, axis=(1.2.3))
        class_loss = tf.reduce_sum(class_loss, axis=(1.2.3))

        return xy_loss + wh_loss + obj_loss + class_loss
    return yolo_loss
Copy the code

Score:

To evaluate our model, we used AP and mAP to evaluate training and test data

Test set score

get_mAP(model, 'data/test.csv')
Copy the code

Training set score

get_mAP(model, 'data/train.csv')
Copy the code

Reasoning:

Let’s see how this model works, right

10. Reserve estimation

Volume estimation is the end result of this case study. There is no standard for evaluating the estimated volume. However, we try to find the optimal threshold pixel value for the image in order to be able to detect shaded areas to a large extent (by counting pixels).

We will take a large image of the shape of 4800X4800 captured by satellite and divide it into 100 512×512 subimages, overlapping 37 pixels between the subimages on the two axes. The image patch is named id_row_column. JPG.

The predictions for each generated subgraph are stored in a CSV file. Next, estimate the volume of each floating top tank (the code and explanation are provided in Notebook format in my GitHub repository).

Finally, all image blocks and bounding boxes are combined with labels to output the estimated volume and form a large image. You can check out the following examples:

Results of 11.

The AP score of the floating roof tank on the test set was 0.874, and that of the training set was 0.942.

Conclusion 12.

Fairly good results can be obtained with a limited number of images.
The data expansion worked well.
In this case, yolov3 performs well compared to the existing approaches of the RetinaNet model.

13. Future work

Floating roof tank has a high AP value of 87.4%. However, we can try to improve scores to a greater extent.
We will try to generate more data to train the model.
We will try to train another more accurate model such as YOLOV4, YOLOV5 (unofficially).

14. References

[1] Oil-tank-volume-estimation, By Karl Heyer, Nov 2019. (github.com/kheyer/Oil-…

[2] Estimating the Volume of Oil Tanks Based on High-Resolution Remote Sensing Images by Tong Wang, Ying Li, Shengtao Yu, and Yu Liu, April 2019. (www.researchgate.net/publication…

[3] A Beginner’s Guide To Storage Tank Occupancy With Help Of Satellite Imagery by TankerTrackers.com, Sep 2017. (medium.com/planet-stor…

[4] A Gentle Introduction to Object Recognition With Deep Learning by machinelearningmastery.com/, May 2019. (machinelearningmastery.com/object-reco…

[5] Selective Search for Object Recognition by J.R.R. Uijlings at el. 2012 (www.huppelen.nl/publication…

[6] Region Proposal Network – A detailed view by Sambasivarao. K, Dec 2019 (towardsdatascience.com/region-prop…

[7] Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks by Ross Girshick et al. Jan 2016. (arxiv.org/abs/1506.01…

[8] Real-time Object Detection with YOLO, YOLOv2 and now YOLOv3 by Joseph Redmon, 2015 — 2018 (arxiv.org/abs/1506.02…

The original link: towardsdatascience.com/oil-storage…

Welcome to panchuangai blog: panchuang.net/

Sklearn123.com/

Welcome to docs.panchuang.net/