There are many evaluation indexes for image segmentation. Common ones in papers include Pixel Accuracy (PA), intersection-over-union, IOU), Dice Coeffcient, Hausdorff Distance (HD95), relative Volume error (RVE).

At the end of the article, there is a one-stop shop for cancer slackers, who can achieve all their metrics by changing their address and saving the results in Excel.

Next, one by one for you to answer. Content is too much, suggest collection, need to use when to study again.

All cases in this paper are binary, and there are only 0 and 1 in label

[toc]

1 pixel accuracy

1.1 theory

Definition: It is the percentage of pixels in an image that are properly classified. That is, the proportion of correctly classified pixels to total pixels. The formula can be expressed as:


P A = i = 0 n p i i i = 0 n j = 0 n p i j = T P + T N T P + T N + F P + F N PA =\frac{\sum_{i=0}^{n} p_{ii}}{\sum_{i=0}^{n} \sum_{j=0}^{n}p_{ij}} = \frac{TP + TN}{TP +TN + FP + FN}

Among them:

  • N: Total number of categories, n+1 including backgrounds
  • Piip_ {II} PII: The total number of pixels of real category III that are predicted to be of category III is the total number of pairs of pixels of real category III.
  • Pijp_ {ij} PIJ: Real pixels of category III are predicted to be the total number of category JJJ, in other words, how many pixels of category III are misclassified into category JJJ.
  • TP: the number of true positives, which is positive in label and positive in predicted value
  • TN: number of true negatives, which is negative in label and negative in predicted value
  • FP: the number of false positives, which is negative in label and positive in predicted value
  • FN: number of false negatives, which is positive in label and negative in predicted value

TP+TN+FP+FN= Total number of pixels TP+TN= number of correctly classified pixels

Therefore, PA can be calculated in two ways.

Here is a simple example using 3 * 3: TP=3,TN=4, FN=2, FP=0


P A = 3 + 4 3 + 4 + 2 = 77.8 % PA = \frac{3 +4}{3+4+2}=77.8\%

That is, the number of correctly classified pixels in the figure is 7 and the total number of pixels is 9.

1.2 How to express it in code

def binary_pa(s, g) :
    """ calculate the pixel accuracy of two N-d volumes. s: the segmentation volume of numpy array g: the ground truth volume of numpy array """
    pa = ((s  g).sum()) / g.size
    return pa
g = np.array([1.1.1.1.1.0.0.0.0])
s = np.array([0.1.0.1.1.0.0.0.0])
pa = binary_pa(s, g)
Copy the code

Is it very simple ~~~~

That’s not the end of it. Let’s continue ~~~~~

If you think about it, PA is simple, but it is by no means the best indicator.In this case, even though the model segmented nothing, his PA = 95%, what??

Well. Is there something wrong with our calculation? Not at all. Exactly. It’s just that the background class is 95% of the original image. Therefore, if the model classifies all pixels into this class, 95% of pixels are accurately classified, while the other 5% are not.

So even though you’re 95 percent accurate, your model returns completely useless predictions. This is to show that high pixel accuracy does not always mean superior segmentation ability.

This problem is called category imbalance. When our classes are extremely lopsided, it means that one or some classes dominate the image and some other classes are only a small part of the image. Unfortunately, class imbalances are common in many real-world data sets and therefore cannot be ignored.

As a result, this indicator is of little guidance.

2 cross over IoU

Intersection-over-union (IoU) is one of the most commonly used indexes in semantic segmentation… And for good reason. The IoU is a very simple metric, very effective.

2.1 theory

Simply put, the IoU is the overlap region between the prediction split and the label divided by the union region between the prediction split and the label (intersection of the two/union of the two), as shown in figure. The index ranges from 0 to 1 (0-100%), where 0 means no overlap and 1 means complete overlap and segmentation.

For binary classification, its calculation formula is:


I o U = A B A B = T P T P + F P + F N IoU = \frac{|A \bigcap B|}{|A \bigcup B|} = \frac{TP}{TP + FP + FN}

So let’s figure out the IoU of the 3 by 3 example


I o U = intersection = 3 And set = 5 = T P = 3 T P + F P + F N = 5 = 60 % Ious = \ frac intersection = {3} {and set = 5} = \ frac {TP = 3} {TP + FP + FN = 5} = 60 \ %

2.2. How to express it in code

# IOU evaluation
def binary_iou(s, g) :
    assert (len(s.shape)  len(g.shape))
    The part of the product whose value is 1 is the intersection
    intersecion = np.multiply(s, g)
    # If you add the two, the part with a value greater than 0 is the intersection
    union = np.asarray(s + g > 0, np.float32)
    iou = intersecion.sum() / (union.sum() + 1e-10)
    return iou
g = np.array([1.1.1.1.1.0.0.0.0])
s = np.array([0.1.0.1.1.0.0.0.0])
iou = binary_iou(s, g)
print(iou)
Copy the code

If you understand the principles, the code is still simple.

Tips: Add 1e-10 to the denominator in case the denominator is 0.

3 the coefficient of Dice

3.1 the principle

Definition: The Dice coefficient is defined as the intersection of two times divided by pixel sum, also known as F1 score. The Dice coefficient is very similar to IoU, they are positively correlated. This means that if one person says model A is better at segmentation than model B, the other person will say the same.

Like IoU, they range from 0 to 1, where 1 represents the maximum similarity between prediction and reality.

The formula is:


D i c e = 2 A B A + B = 2 T P 2 T P + F P + F N Dice = \frac{2 |A \bigcap B|}{|A| + |B|} = \frac{2 TP}{2TP + FP + FN}

You can see that the Dice coefficient corresponds to the IoU, and the TP in both the numerator and denominator is doubled

Again with the 3 by 3 example above, let’s calculate its Dice:


D i c e = 2 3 5 + 3 = 2 T P = 6 2 T P + F P + F N = 8 = 75 % Dice=\frac{2 * 3 }{5+ 3} = \frac{2TP=6}{2TP+FP+FN=8} = 75\%

3.2 How is it implemented in the code

def binary_dice(s, g) :
    """ calculate the Dice score of two N-d volumes. s: the segmentation volume of numpy array g: the ground truth volume of numpy array """
    assert (len(s.shape)  len(g.shape))
    prod = np.multiply(s, g)
    s0 = prod.sum()
    dice = (2.0 * s0 + 1e-10) / (s.sum() + g.sum() + 1e-10)
    return dice
g = np.array([1.1.1.1.1.0.0.0.0])
s = np.array([0.1.0.1.1.0.0.0.0])
dice = binary_dice(s, g)
print(dice)
Copy the code

The meaning of the above indicators can be focused on mastering, the following is a little difficult, do not understand the direct pull to the end, copy homework is good ~~

4 calculation of surface distance

When we evaluate the quality of image segmentation and model performance, we often use the calculation of various surface distances. Such as

  • Average surface distance
  • Hausdorff distance Hausdorff distance
  • Surface overlap
  • Surface DICE Surface DICE value
  • Volumetric dice Indicates the 3D DICE value

4.1 Hausdorff distance

Hausdorff Distance and HD are used as segmentation indicators, mainly to measure the accuracy of boundary segmentation

HD is A measure describing the degree of similarity between two sets of points. It is A defined form of the distance between two sets of points: Suppose there are two sets of sets A={A1,… , the ap}, B = {b1,… ,bq}, then HD between the two point sets is defined as:


H ( A . B ) = m a x ( h ( A . B ) . h ( B . A ) ) . . . . . . ( 1 ) H(A, B) = max(h(A, B), h(B,A)) …… (1)

  • Formula (1) is called bidirectional Hausdorff distance, which is the most basic form of Hausdorff distance.
  • H (A, B) and h(B, A) in Equation (2) are called one-way Hausdorff distances from A set to B and from B to A respectively. That is, h(A,B) actually first sorts the distance between each point AI in point A and the nearest point BJ in point B, ‖ai-bj‖, and then takes the maximum value of that distance as the value of h(A,B); H of B,A, same thing.
  • According to Formula (1), the bidirectional Hausdorff distance H(A, B) is the larger of the one-way distance H(A, B) and H(B,A), which measures the maximum mismatch degree between two point sets.

I just said so much, is it not very clear, just look at the formula is really not fun, then I use a commonly used picture on the Internet to illustrate, and a relatively simple and clear calculation process. Given the set of two points A{a0, a1… } and B{b0, b1, b2… }

  • Take A point A0 in set A and calculate the distance from A0 to all points in set B, keeping the shortest distance d0
  • You go through all the points in A, so there’s two points a0 and A1, and you compute d0 and d1
  • Compare all distances {d0, d1} and select the longest distance d1
  • The longest distance is h, which is the one-way Hausdorff distance from A to B, called h of A, B
  • For any point A in the set A, we can be certain that A circle with A center and h radius must have A member of the set B
  • Change the roles of set A and set B, calculate the one-way Hausdorff distance H (B, A) of B→A, and select the longest distance between H (A,B) and H (B, A), which is the bidirectional Hausdorff distance of set A and B

See links in this section

In the actual calculation, we do not choose the maximum distance, but rank the distance from the largest to the smallest, and take the ranking distance of 5%. The purpose of this is to eliminate the unreasonable distance caused by some outliers and maintain the stability of the overall value. So also called HD95HD_{95}HD95.

4.2 How to implement the code
H D 95 HD_{95}
And ASD

# Hausdorff and ASSD evaluation
def get_edge_points(img) :
    """ get edge points of a binary segmentation result """
    dim = len(img.shape)
    if (dim  2):
        strt = ndimage.generate_binary_structure(2.1)
    else:
        strt = ndimage.generate_binary_structure(3.1)  A three-dimensional structure element is a neighborhood 1 pixel away from the center point
    ero = ndimage.morphology.binary_erosion(img, strt)
    edge = np.asarray(img, np.uint8) - np.asarray(ero, np.uint8)
    return edge

def binary_hausdorff95(s, g, spacing=None) :
    """ get the hausdorff distance between a binary segmentation and the ground truth inputs: s: a 3D or 2D binary image for segmentation g: a 2D or 2D binary image for ground truth spacing: a list for image spacing, length should be 3 or 2 """
    s_edge = get_edge_points(s)
    g_edge = get_edge_points(g)
    image_dim = len(s.shape)
    assert (image_dim  len(g.shape))
    if (spacing  None):
        spacing = [1.0] * image_dim
    else:
        assert (image_dim  len(spacing))
    img = np.zeros_like(s)
    if (image_dim  2):
        s_dis = GeodisTK.geodesic2d_raster_scan(img, s_edge, 0.0.2)
        g_dis = GeodisTK.geodesic2d_raster_scan(img, g_edge, 0.0.2)
    elif (image_dim  3):
        s_dis = GeodisTK.geodesic3d_raster_scan(img, s_edge, spacing, 0.0.2)
        g_dis = GeodisTK.geodesic3d_raster_scan(img, g_edge, spacing, 0.0.2)

    dist_list1 = s_dis[g_edge > 0]
    dist_list1 = sorted(dist_list1)
    dist1 = dist_list1[int(len(dist_list1) * 0.95)]
    dist_list2 = g_dis[s_edge > 0]
    dist_list2 = sorted(dist_list2)
    dist2 = dist_list2[int(len(dist_list2) * 0.95)]
    return max(dist1, dist2)


# Mean surface distance
def binary_assd(s, g, spacing=None) :
    """ get the average symetric surface distance between a binary segmentation and the ground truth inputs: s: a 3D or 2D binary image for segmentation g: a 2D or 2D binary image for ground truth spacing: a list for image spacing, length should be 3 or 2 """
    s_edge = get_edge_points(s)
    g_edge = get_edge_points(g)
    image_dim = len(s.shape)
    assert (image_dim  len(g.shape))
    if (spacing  None):
        spacing = [1.0] * image_dim
    else:
        assert (image_dim  len(spacing))
    img = np.zeros_like(s)
    if (image_dim  2):
        s_dis = GeodisTK.geodesic2d_raster_scan(img, s_edge, 0.0.2)
        g_dis = GeodisTK.geodesic2d_raster_scan(img, g_edge, 0.0.2)
    elif (image_dim  3):
        s_dis = GeodisTK.geodesic3d_raster_scan(img, s_edge, spacing, 0.0.2)
        g_dis = GeodisTK.geodesic3d_raster_scan(img, g_edge, spacing, 0.0.2)

    ns = s_edge.sum()
    ng = g_edge.sum()
    s_dis_g_edge = s_dis * g_edge
    g_dis_s_edge = g_dis * s_edge
    assd = (s_dis_g_edge.sum() + g_dis_s_edge.sum()) / (ns + ng)
    return assd
Copy the code

This part of the calculation is relatively difficult, in addition, for the calculation of surface distance there are public libraries available, such as the surface distance calculation library, can achieve the above distance algorithm.

5 Correlation volume error RVE

Relative Volume Error, RVE, has not found a good explanation, please refer to: RVE reference link

Formula is:


R V E ( R a . R b ) = a b s ( R a R b ) R b RVE(R_a, R_b) = \frac{abs(|R_a|-|R_b|)}{|R_b|}
  • R_a: segmentation results
  • R_b: ground truth

The implementation in the code is:

def binary_relative_volume_error(s_volume, g_volume) :
    s_v = float(s_volume.sum())
    g_v = float(g_volume.sum())
    assert (g_v > 0)
    rve = abs(s_v - g_v) / g_v
    return rve
Copy the code

Pack for cancer slackers

Do you want to click get the same style, copy the code below, just change the name can get all the index results. Suitable for 3D data, data format niI.gz.. If the data is two-dimensional, the data reading method needs to be given.

# Calculate various indexes in three dimensions
from __future__ import absolute_import, print_function

import pandas as pd
import GeodisTK
import numpy as np
from scipy import ndimage


# pixel accuracy
def binary_pa(s, g) :
    """ calculate the pixel accuracy of two N-d volumes. s: the segmentation volume of numpy array g: the ground truth volume of numpy array """
    pa = ((s  g).sum()) / g.size
    return pa


# Dice evaluation
def binary_dice(s, g) :
    """ calculate the Dice score of two N-d volumes. s: the segmentation volume of numpy array g: the ground truth volume of numpy array """
    assert (len(s.shape)  len(g.shape))
    prod = np.multiply(s, g)
    s0 = prod.sum()
    dice = (2.0 * s0 + 1e-10) / (s.sum() + g.sum() + 1e-10)
    return dice


# IOU evaluation
def binary_iou(s, g) :
    assert (len(s.shape)  len(g.shape))
    The part of the product whose value is 1 is the intersection
    intersecion = np.multiply(s, g)
    # If you add the two, the part with a value greater than 0 is the intersection
    union = np.asarray(s + g > 0, np.float32)
    iou = intersecion.sum() / (union.sum() + 1e-10)
    return iou


# Hausdorff and ASSD evaluation
def get_edge_points(img) :
    """ get edge points of a binary segmentation result """
    dim = len(img.shape)
    if (dim  2):
        strt = ndimage.generate_binary_structure(2.1)
    else:
        strt = ndimage.generate_binary_structure(3.1)  A three-dimensional structure element is a neighborhood 1 pixel away from the center point
    ero = ndimage.morphology.binary_erosion(img, strt)
    edge = np.asarray(img, np.uint8) - np.asarray(ero, np.uint8)
    return edge


def binary_hausdorff95(s, g, spacing=None) :
    """ get the hausdorff distance between a binary segmentation and the ground truth inputs: s: a 3D or 2D binary image for segmentation g: a 2D or 2D binary image for ground truth spacing: a list for image spacing, length should be 3 or 2 """
    s_edge = get_edge_points(s)
    g_edge = get_edge_points(g)
    image_dim = len(s.shape)
    assert (image_dim  len(g.shape))
    if (spacing  None):
        spacing = [1.0] * image_dim
    else:
        assert (image_dim  len(spacing))
    img = np.zeros_like(s)
    if (image_dim  2):
        s_dis = GeodisTK.geodesic2d_raster_scan(img, s_edge, 0.0.2)
        g_dis = GeodisTK.geodesic2d_raster_scan(img, g_edge, 0.0.2)
    elif (image_dim  3):
        s_dis = GeodisTK.geodesic3d_raster_scan(img, s_edge, spacing, 0.0.2)
        g_dis = GeodisTK.geodesic3d_raster_scan(img, g_edge, spacing, 0.0.2)

    dist_list1 = s_dis[g_edge > 0]
    dist_list1 = sorted(dist_list1)
    dist1 = dist_list1[int(len(dist_list1) * 0.95)]
    dist_list2 = g_dis[s_edge > 0]
    dist_list2 = sorted(dist_list2)
    dist2 = dist_list2[int(len(dist_list2) * 0.95)]
    return max(dist1, dist2)


# Mean surface distance
def binary_assd(s, g, spacing=None) :
    """ get the average symetric surface distance between a binary segmentation and the ground truth inputs: s: a 3D or 2D binary image for segmentation g: a 2D or 2D binary image for ground truth spacing: a list for image spacing, length should be 3 or 2 """
    s_edge = get_edge_points(s)
    g_edge = get_edge_points(g)
    image_dim = len(s.shape)
    assert (image_dim  len(g.shape))
    if (spacing  None):
        spacing = [1.0] * image_dim
    else:
        assert (image_dim  len(spacing))
    img = np.zeros_like(s)
    if (image_dim  2):
        s_dis = GeodisTK.geodesic2d_raster_scan(img, s_edge, 0.0.2)
        g_dis = GeodisTK.geodesic2d_raster_scan(img, g_edge, 0.0.2)
    elif (image_dim  3):
        s_dis = GeodisTK.geodesic3d_raster_scan(img, s_edge, spacing, 0.0.2)
        g_dis = GeodisTK.geodesic3d_raster_scan(img, g_edge, spacing, 0.0.2)

    ns = s_edge.sum()
    ng = g_edge.sum()
    s_dis_g_edge = s_dis * g_edge
    g_dis_s_edge = g_dis * s_edge
    assd = (s_dis_g_edge.sum() + g_dis_s_edge.sum()) / (ns + ng)
    return assd


# relative volume error evaluation
def binary_relative_volume_error(s_volume, g_volume) :
    s_v = float(s_volume.sum())
    g_v = float(g_volume.sum())
    assert (g_v > 0)
    rve = abs(s_v - g_v) / g_v
    return rve


def compute_class_sens_spec(pred, label) :
    """ Compute sensitivity and specificity for a particular example for a given class for binary. Args: pred (np.array): binary arrary of predictions, shape is (height, width, depth). label (np.array): binary array of labels, shape is (height, width, depth). Returns: sensitivity (float): precision for given class_num. specificity (float): recall for given class_num """
    tp = np.sum((pred  1) & (label  1))
    tn = np.sum((pred  0) & (label  0))
    fp = np.sum((pred  1) & (label  0))
    fn = np.sum((pred  0) & (label  1))

    sensitivity = tp / (tp + fn)
    specificity = tn / (tn + fp)

    return sensitivity, specificity


def get_evaluation_score(s_volume, g_volume, spacing, metric) :
    if (len(s_volume.shape)  4) :assert (s_volume.shape[0]  1 and g_volume.shape[0]  1)
        s_volume = np.reshape(s_volume, s_volume.shape[1:])
        g_volume = np.reshape(g_volume, g_volume.shape[1:)if (s_volume.shape[0]  1):
        s_volume = np.reshape(s_volume, s_volume.shape[1:])
        g_volume = np.reshape(g_volume, g_volume.shape[1:])
    metric_lower = metric.lower()

    if (metric_lower  "dice"):
        score = binary_dice(s_volume, g_volume)

    elif (metric_lower  "iou"):
        score = binary_iou(s_volume, g_volume)

    elif (metric_lower  'assd'):
        score = binary_assd(s_volume, g_volume, spacing)

    elif (metric_lower  "hausdorff95"):
        score = binary_hausdorff95(s_volume, g_volume, spacing)

    elif (metric_lower  "rve"):
        score = binary_relative_volume_error(s_volume, g_volume)

    elif (metric_lower  "volume"):
        voxel_size = 1.0
        for dim in range(len(spacing)):
            voxel_size = voxel_size * spacing[dim]
        score = g_volume.sum() * voxel_size
    else:
        raise ValueError("unsupported evaluation metric: {0:}".format(metric))

    return score


if __name__  '__main__':
	import os
    import nibabel as nib
        seg_path = 'Your split result folder'
    gd_path = 'Your Label folder'
    save_dir = 'Excel Folder'
    seg = sorted(os.listdir(seg_path))

    dices = []
    hds = []
    rves = []
    case_name = []
    senss = []
    specs = []
    for name in seg:
        if not name.startswith('. ') and name.endswith('nii.gz') :# Load label and segmentation image
            seg_ = nib.load(os.path.join(seg_path, name))
            seg_arr = seg_.get_fdata().astype('float32')
            gd_ = nib.load(os.path.join(gd_path, name))
            gd_arr = gd_.get_fdata().astype('float32')
            case_name.append(name)

            Find hausdorff95 distance
            hd_score = get_evaluation_score(seg_arr, gd_arr, spacing=None, metric='hausdorff95')
            hds.append(hd_score)

            Find the volume correlation error
            rve = get_evaluation_score(seg_arr, gd_arr, spacing=None, metric='rve')
            rves.append(rve)

            # o dice
            dice = get_evaluation_score(seg_.get_fdata(), gd_.get_fdata(), spacing=None, metric='dice')
            dices.append(dice)

            Sensitivity, specificity
            sens, spec = compute_class_sens_spec(seg_.get_fdata(), gd_.get_fdata())
            senss.append(sens)
            specs.append(spec)
    # in the pandas
    data = {'dice': dices, 'RVE': rves, 'Sens': senss, 'Spec': specs, 'HD95': hds}
    df = pd.DataFrame(data=data, columns=['dice'.'RVE'.'Sens'.'Spec'.'HD95'], index=case_name)
    df.to_csv(os.path.join(save_dir, 'metrics.csv'))
Copy the code

The article continues to be updated, you can follow the wechat public account [Medical image Artificial Intelligence Field Camp] to get the latest developments, a public account focusing on the frontier technology in the field of medical image processing. Adhere to the practice of the main, hand in hand to take you to do projects, play games, write papers. All original articles provide theoretical explanation, experimental code, experimental data. Only practice can grow faster, pay attention to us, learn and progress together ~

I’m Tina and I’ll see you next time

Finally, get likes, comments, favorites. Or three with one button