Superpixels from scratch
Image segmentation using K-means clustering
A typical image may contain millions of pixels, which makes operation on images and computer vision in general pretty expensive computationally. However, most of the image is actually composed of relatively large regions of similar color. Thus, if we can identify these regions we can efficiently reduce the size of the image while keeping most of the relevant information, i.e. color and edges. Enter superpixels. "Superpixels group pixels similar in color and other low-level properties" (Stutz et al.). They can be used to improve the computational efficiency for problems such as image segmentation, tracking, or 3-D reconstruction.
According to Stutz et al., superpixels must have the following properties: "
- Partition. Superpixels should define a partition of the image, i.e. superpixels should be disjoint and assign a label to every pixel.
- Connectivity. Superpixels are expected to represent connected sets of pixels.
- Boundary Adherence. Superpixels should preserve image boundaries. Here, the appropriate definition of image boundaries may depend on the application.
- Compactness, Regularity and Smoothness. In the absence of image boundaries, superpixels should be compact, placed regularly and exhibit smooth boundaries.
- Efficiency. Superpixels should be generated efficiently.
- Controllable Number of Superpixels. The number of generated superpixels should be controllable."
In recent years, SLIC has emerged as a popular algorithm for image segmentation into superpixel. Fundamentally, SLIC works by performing K-means clustering on a 5D feature space composed of color and space (xy). The CIELAB color space is prefered over RGB because it is more visually homogeneous.
In this blogpost I will implement an algorithm for superpixel segmentation that is fundamentally similar to SLIC, i.e. clustering over color and spatial feature space. There a two important parameters in the algorithm: the relative weight given to color and spatial features (called "compactness" in SLIC), and the number of superpixels. SLIC uses an initial cluster centroid positioning and partial neighborhood calculation to speed up the computation. For the purpose of clarity and to allow for some more exploration, I ignore these aspects. Because of these different choices, I also employ a different method to ensure connectivy. My implementation is based on skimage, sklearn and numpy, with matplotlib for visualization.
# imports
import skimage as ski
import matplotlib.pyplot as plt
import numpy as np
from im_func import show_image, show_multi_image, timer
import seaborn as sns
from skimage import transform
from skimage import io
from skimage import segmentation
from sklearn.cluster import MiniBatchKMeans
Then, we import an image and we compute a SLIC segmentation using the skimage implementation for reference. Here, we used a compactness of 20 and 100 superpixels. We reconstruct an image for the segmentation using skimage's label2rgb
function.
title = 'coffee'
image = getattr(ski.data, title)()
# Resize the image
image = ski.transform.rescale(image,1/1,multichannel=True)
segments = segmentation.slic(image, compactness=20, n_segments=100, start_label=1)
image_SLIC = ski.color.label2rgb(segments, image, kind='avg', bg_label=0)
_ = show_multi_image(((image, title),
(image_SLIC, 'Reference: SLIC superpixel')),
figsize=[20,20])
Our method
We create a function segment_image
that performs three steps.
1. Prepare data
We modify the original image in two ways: we apply a gaussian filter to reduce the noise and smooth high frequency features, and we convert the image from RGB to CIELAB color space, and scale it in the range 0-1. For space, we create two matrices to store the x and y position of every pixel. The x coordinate goes 0 to W
, and the y coordinate from 0 to W*width/height
. Then, we merge color and space into a n*5
matrix, where n
is the number of pixels.
W
is a parameter that weights the importance of space compared to color, and thus serves the same purpose as compactness in SLIC. If W
=1 space and color have equal importance; if W
>>1, the clustering is effectively only affected by space; and if W
~0 only color matters.
2. Clustering
Finally, we apply the K-means clustering algorithm. Here, we use the MiniBatchKMeans
function of sklearn to reduce computation time. We store the labels and cluster_centers for later use.
3. Enforce connectivity
Following the definition given in the introduction, superpixels should represent connected sets of pixels. But our initial segmentation may not respect this constraint. If W
>>1, then the the clustering is based only on spatial features, and connectivity is expected. But in the extreme case where W
=0, pixels are grouped only by color, and therefore say, two black pixels in different corners of the image would be grouped together. To ensure the connectivity of superpixels for any values of W
we do the following.
To verify connectivity we apply a flooding algorithm starting from one pixel in each cluster to flag pixels (like the paint bucket tool in drawing softwares). Once we have looped through unflagged pixels correspond to pixels belonging a part of one disconnected cluster. We loop apply flooding to remaining pixels, each time assigning a new cluster number until there are no unflagged pixel remaining.
Sometimes connectivity is not desired or required (e.g. for a filtering effect). Furthermore, when W
<<1, the clustering may result in clusters with small disconnected regions. In that case enforce connectivity is expensive. Therefore connecitivity enforcement can be switched off.
4. Function
The function segment_image
takes five arguments:
- an image
- K: number of clusters
- W: relative importance of space compared to color
- sigma: intensity of gaussian filtering
- enforce_connectivity: (boolean) specifies whether or not to enforce connectivity
def segment_image(image, K=100, W=3.0, sigma=2.0, enforce_connectivity=True, output_type='image'):
"""Segments an image into superpixels by using K-Means clustering and connectivity enforcement
Parameters
----------
image: array
Input image in RGB format.
K: int, default=100
Number of superpixels (i.e. number of clusters).
W: float, default=3.0
Relative importance of space of color during clustering.
sigma: float, default=2.0
Standard deviation for Gaussian kernel.
enforce_connectivity: bool, default=True
Whether to enforce the connectivity of clusters.
output_type: str, default='image'
whether to output 'image' or 'labels'. In case of 'image', an image is computed
from the labels using the average color in the cluster.
Returns
----------
if output_type=='image':
segmented_image: array
an RGB image computed from the labels
if output_type=='labels':
labels: array
an array containing the label (i.e. cluster) of each pixel
"""
# Prepare data
# ======================================
# Apply gaussian filter
image_filtered = ski.filters.gaussian(image, sigma=sigma, multichannel=True)
# Convert the image to CIELAB
image_filtered = ski.color.rgb2lab(image_filtered)/100.0
# Define space
X,Y = np.meshgrid(np.linspace(0,1.0*W,image.shape[0]),np.linspace(0,1.0*W*image.shape[1]/image.shape[0],image.shape[1]))
X = X.T; Y = Y.T
# Merge space and color matrices
data = np.concatenate([(image_filtered), np.stack([X,Y],axis=2)],axis=2)
data = data.reshape(image.shape[0]*image.shape[1],-1)
# Cluster
# ======================================
clustering = MiniBatchKMeans(n_clusters=K, random_state=12).fit(data)
labels = clustering.labels_.reshape(image.shape[:-1])
cluster_centers = clustering.cluster_centers_
cluster_centers[:,-2] *= image.shape[0]/W
cluster_centers[:,-1] *= np.max(Y)/W
# Enforce connectivity
# ======================================
if enforce_connectivity:
image_flooded = labels.copy()
ic = 0
for i,j in cluster_centers[:,-2:].astype(int):
if image_flooded[i,j] == ic:
seed = (i,j)
else:
# if the cluster is disconnected or has a twisty shape, the cluster_center's pixel may be outside the actual cluster
# then find one point in the cluster
seed = tuple(np.argwhere(image_flooded==ic)[0])
ski.morphology.flood_fill(image_flooded, seed, -1, in_place=True)
ic+=1
nseed = np.max(labels)
disconnected = np.argwhere(image_flooded>=0)
while len(disconnected>0):
seed = tuple(disconnected[0])
nseed += 1
ski.morphology.flood_fill(labels, seed, nseed, in_place=True)
ski.morphology.flood_fill(image_flooded, seed, -1, in_place=True)
disconnected = np.argwhere(image_flooded>=0)
# Return
# ======================================
if output_type.lower()=='labels':
return labels
elif output_type.lower()=='image':
return ski.color.label2rgb(labels, image, kind='avg', bg_label=-1)
else:
raise ValueError(f"Unkown output_type {output_type}. Accepted values are 'image' or 'labels'.")
Results
The figure below illustrates the influence of the parameters K
(number of superpixels) and W
(weight of space over color). When W
=0, the partition is based only on color and the image always remain recognizable, even when the number K is small (e.g. 10). However, these clusters may be disconnected, and once if enforce connectivity the actual number of cluster may increase sharply. On the other hand, when W=100, the segmentation is based effectively on space only. A hexagonal pattern which corresponds to the voronoi diagram appears. Since this pattern doesn't conserve edges, a large number of superpixel, K
, is necessary to recognize the picture. Overall, for the purpose of superpixel segmentation W
=3 and K
=100-500 yields good results.
image_list = []
W_list = [0.01, 0.3, 1.0, 3.0, 100]
K_list = [10, 100, 500]
for K in K_list:
for W in W_list:
# Add a print statement that overwritten at each step
image_list.append( (segment_image(image,W=W, K=K, enforce_connectivity=False),
f'W={W:.1f}, K={K:.1f}') )
_ = show_multi_image(tuple(image_list),
row_col=(len(K_list),len(W_list)),
figsize=[13,6],
tight_layout=True)
# Get edges and superpose them to the original picture
labels = segment_image(image,W=4.0, K=150, sigma=2.5, enforce_connectivity=True, output_type='labels')
image_seg = ski.color.label2rgb(labels, image, kind='avg', bg_label=-1)
im_bound = ski.segmentation.find_boundaries(labels)
im_out = ski.morphology.dilation(im_bound,selem=ski.morphology.square(2))
im_bound_thick = ski.segmentation.find_boundaries(labels)
# im_out = image_sobel
image_seg_contour = image_seg.copy()
image_contour = image.copy()
for i in range(3):
image_seg_contour[:,:,i] *= 1.0-im_out
image_contour[:,:,i] *= 1.0-im_out
_ = show_multi_image(((im_bound,'contours'),
(im_out,'thick contours'),
(image_seg_contour,'seg+contour'),
(image_contour,'original+contour')
),figsize=[14,10],row_col=(2,2),tight_layout=True)
_ = show_multi_image([(fac*image + (1.-fac)*image_seg, f'fac: {fac:.2f}') for fac in [1., .66, .33, .0]],
figsize=[14,10],
tight_layout=True)
Finally, we gradually blend (i.e. crossfade) the original, segmented and countoured image together.
# Generate a gradually blended image
ny = image.shape[1]
x = np.zeros((1,image.shape[1]))
lim = 2*ny/3
pad = 40
def crossfade_images(image1, image2, lim=ny/2, pad=5):
# image1 and image2 should be the same size
for i in range(x.shape[1]):
if i < lim-pad:
x[0,i] = 1.
elif i > lim+pad:
x[0,i] = 0.
else:
x[0,i] = .5-(i-lim)/(2*pad)
X = np.ones((image.shape[0],1))@x
return (image1.T*X.T).T.reshape(image1.shape) + (image2.T*(1.-X.T)).T.reshape(image2.shape)
new_image = crossfade_images(image, image_seg,lim=ny*.45, pad=100)
new_image = crossfade_images(new_image, image_seg_contour,lim=ny*.85, pad=100)
_ = show_image(new_image,figsize=[10,10])
Conclusion
I created a superpixel algorithm that includes most ingredients according to the definition of Stutz et al. The algorithm is based on K-means clustering with an additional step to enforce connectivity. The algorithm partitions the image into small connected regions of similar color. The algorithm offers the flexibility to choose the number of superpixel (parameter K
) and how to weigh space vs color (parameter W
).
Enforcing connectivity is the most expensive step. For the purpose of superpixel segmentation the ideal W is 1-10. In this range the algorithm is quite efficient. Enforcing connectivity can, however, be expensive in the case where W~0 because the initial partition is only based on color. But, that case is not ideal for superpixel.
Compared to the popular SLIC algorithm, our algorithm results in a segmentation that is more hexagonal, whereas SLIC creates rather square regions. In addition, our algorithm can create relatively small regions. Both effects are due to a different choice of initial cluster positions.
I also showed how to contour and blend images which can be useful for specific technical or artistic purposes.