Computer Vision: Object Recognition

2D Object Recognition
Xiang Lan Zhuo

Auto binary thresholding algorithm

The thresholding algorithm I used is rather simple. It is based on the following assumptions:

image background is uni-color
total image background area is greater than total object(s) area

First, the original image is scanned through once to create an intensity histrogram. Under the second assumption from above, the intensity with the higheset pixel frequency will be the background intensity. To account for specularity and shadows due to lighting, I choose a bandwidth of about 250 to filter out the background. This also got rid of some particularly high sepecularities on certain objects, such as the cone. Here are graphs of histograms for images containing individual objects. It shows that the above assumptions are valid most of the cases except for the blue disk box. In this case, the object covers most of the image surface. Since the object is also uni-color, the binarized image became reversed: object is white and background is black. For this image, I manually selected the threshold by looking at their perspective histograms.

After mannually selecting threshold:

Additional Image Processing Before Binarize Image

I used a 7x7 Gaussian mask filter before making a binary image in hope of getting rid of jagged edges. In some cases, the result is slightly better but not too significant. The resulting color image after it's been filtered tend to have spotting. It also significantly lowers the background intensity. I have found that the Gaussian filter works much better on grayscale images. Here is an example of the filter on a grayscale image I found on the web.

Here are some examples of the intermediate result:

2-Pass Segmentaion Algorithm

The 2-pass segmentation algorithm passes through the image only twice which means each pixel is only visited two times. During first pass, region ID is initialized to 0. At each pixel, look at its up and back neighbors. If there exist only 1 connectivity, give the pixel the current region ID. If there exist 2 connectivity, and if both neighbors have the same region ID then give the current pixel the same region ID. Else, create an entry in the equivalency table for the 2 IDs of its neighbor and give the current pixel the lower region ID. If the current pixel is not connected to its up and back pixel, increment the region ID. Repeat the process once for every pixel of the image.During the second pass through the image, relabel region ID according to entries created in the equivalency table.

In the following image, the plier is especially difficult to segment due to high specularity of the metal part. When binarizing the original image, metal intensity was very close to background intensity. This proved to be extremely difficult for combination image 6 and 8. By trial and error in manually selecting threshold and its bandwidth, I was able to segment the pliers.

Here are the results after images are segmented:

Auto-thresholding

Auto-thresholding

Auto-thresholding

Auto-thresholding

Auto-thresholding
Minimum region size (pixel) = 400

Manual-thresholding with parameters:
Average background intensity = 650
Threshold range = 75
Minimum region size (pixel) = 300

Auto-thresholding

Manual-thresholding with parameters:
Average background intensity = 580
Threshold range = 90
Minimum region size (pixel) = 320

Auto-thresholding

Features Extraction

Here is the list of features that I decided to store in my object database:

Average R value
Average G value
Average B value
Average intenstiy
Ratio of moments about centroidal axes
Orientation (angle of rotation from horizontal)
Percentage of bounding box filled
Size of the object (pixel)
Percentage of bounding box filled (after bounding box is translated to center at centroid)

The average RGB and intensity values were calculated by summing each pixel within the bounding box of the object and divide by object size. The centroidal axes were found by first calculating the 2x2 covariance matrix for each bounding box. The major centroidal axis corresponds to the maximum eigenvector of the matrix. Orientation of the object along its centroidal axes is the inverse tangent of (x / y) of the enigenvector. After the centroidal axes are found, the second moment about the axes is the following summation:

I found that the ratio of the moment about the long axis and the short axis is one of the most informative feature because its ratational, translation, and scale invariant characteristic. Here are the feature lists for each combination image.

Building Object Models

Object models are stored as an array of values for the above features. This information is then stored in a text file for the next step of object recognition. In order to build a database of object models, the image used for building models is segmented and features determined.Then the user is prompt to identify the object given its location in the reference image. The program (build_db.c) does this and allows the user to build a new database, or append these new models to an existing database.

Object Matching

My object matching scheme is a simple decision tree of features:

intensity ->	average R value ->	average G value ->	average B value ->	% centroid-centered bounding box filled ->	image size ->	identify object
			object rientation ->	% bounding box filled ->	image size ->	identify object
			centroidal axes ratio ->	image size ->	identify object

I choose more relaxing thresholds at the beginning of the decision tree to account for significant variances due to lighting at the time the image was taken.

The Matching Process

original image

binarized image

segmented image

bounded image

aligned image

Recognition Results in Confusion Matrix

Object models were built from combination image 9. The results are from comparison to the other 8 combination images. Actual objects on the vertical, labels from recognition system on the horizontal:

PLIERS 3 0 0 0 0 0 0 0 0 0 0 0
CONE 0 2 0 0 0 0 0 0 0 0 0 0
BLACK-DISK 0 0 3 0 0 0 0 0 0 0 0 0
ORANGE-DISK 0 0 0 6 0 0 0 0 0 0 0 0
STOP-SIGN 0 0 0 0 4 0 0 0 0 0 0 0
DISK-BOX 0 0 0 0 0 6 0 0 0 0 0 0
VELVET 0 0 0 0 0 0 3 0 0 0 0 0
ENVELOPE 0 0 0 0 0 0 0 4 0 0 0 0
OVAL-OBJECT 0 0 2 0 0 0 0 0 2 0 1 0
CLIP 0 0 0 0 0 0 0 0 1 0 1 1
PEN 0 0 0 0 0 0 0 0 0 0 3 0
PLIERS CONE BLACK-DISK ORANGE-DISK STOP-SIGN DISK-BOX VELVET ENVELOPE OVAL-OBJECT CLIP PEN NO MATCH

OBJECT	% CORRECTLY IDENTIFIED
PLIERS	100
CONE	100
BLACK-DISK	100
ORANGE-DISK	100
STOP-SIGN	100
DISK-BOX	100
VELVET	100
ENVELOPE	100
OVAL-OBJECT	40
CLIP	0
PEN	100

<<HOME

Auto-thresholding	Auto-thresholding	Auto-thresholding
Auto-thresholding	Auto-thresholding Minimum region size (pixel) = 400	Manual-thresholding with parameters: Average background intensity = 650 Threshold range = 75 Minimum region size (pixel) = 300
Auto-thresholding	Manual-thresholding with parameters: Average background intensity = 580 Threshold range = 90 Minimum region size (pixel) = 320	Auto-thresholding