Auto binary thresholding algorithm
The thresholding algorithm I used is rather simple. It is based on the following assumptions:
Additional Image Processing Before Binarize Image
I used a 7x7 Gaussian mask filter before making a binary image in hope of getting rid of jagged edges. In some cases, the result is slightly better but not too significant. The resulting color image after it's been filtered tend to have spotting. It also significantly lowers the background intensity. I have found that the Gaussian filter works much better on grayscale images. Here is an example of the filter on a grayscale image I found on the web.
2-Pass Segmentaion Algorithm
The 2-pass segmentation algorithm passes through the image only twice which means each pixel is only visited two times. During first pass, region ID is initialized to 0. At each pixel, look at its up and back neighbors. If there exist only 1 connectivity, give the pixel the current region ID. If there exist 2 connectivity, and if both neighbors have the same region ID then give the current pixel the same region ID. Else, create an entry in the equivalency table for the 2 IDs of its neighbor and give the current pixel the lower region ID. If the current pixel is not connected to its up and back pixel, increment the region ID. Repeat the process once for every pixel of the image.During the second pass through the image, relabel region ID according to entries created in the equivalency table.
In the following image, the plier is especially difficult to segment due to high specularity of the metal part. When binarizing the original image, metal intensity was very close to background intensity. This proved to be extremely difficult for combination image 6 and 8. By trial and error in manually selecting threshold and its bandwidth, I was able to segment the pliers.
Features Extraction
Here is the list of features that I decided to store in my object database:
The average RGB and intensity values were calculated by summing each pixel within the bounding box of the object and divide by object size.
The centroidal axes were found by first calculating the 2x2 covariance matrix for each bounding box. The major centroidal axis corresponds to the
maximum eigenvector of the matrix. Orientation of the object along its centroidal axes is the inverse tangent of (x / y) of the enigenvector.
After the centroidal axes are found, the second moment about the axes is the following summation:
I found that the ratio of the moment about the long axis and the short axis is one of the most informative feature because its ratational, translation, and scale invariant characteristic. Here are the feature lists for each combination image.
Building Object Models
Object models are stored as an array of values for the above features. This information is then stored in a text file for the next step of object recognition. In order to build a database of object models, the image used for building models is segmented and features determined.Then the user is prompt to identify the object given its location in the reference image. The program (build_db.c) does this and allows the user to build a new database, or append these new models to an existing database.
Object Matching
My object matching scheme is a simple decision tree of features:
intensity -> | average R value -> | average G value -> | average B value -> | % centroid-centered bounding box filled -> | image size -> | identify object |
object rientation -> | % bounding box filled -> | image size -> | identify object | |||
centroidal axes ratio -> | image size -> | identify object |
I choose more relaxing thresholds at the beginning of the decision tree to account for significant variances due to lighting at the time the image was taken.
The Matching Process
original image | binarized image | segmented image | bounded image | aligned image |
Recognition Results in Confusion Matrix
Object models were built from combination image 9. The results are from comparison to the other 8 combination images. Actual objects on the vertical, labels from recognition system on the horizontal:
PLIERS | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
CONE | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
BLACK-DISK | 0 | 0 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
ORANGE-DISK | 0 | 0 | 0 | 6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
STOP-SIGN | 0 | 0 | 0 | 0 | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
DISK-BOX | 0 | 0 | 0 | 0 | 0 | 6 | 0 | 0 | 0 | 0 | 0 | 0 |
VELVET | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0 | 0 | 0 | 0 | 0 |
ENVELOPE | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0 | 0 | 0 | 0 |
OVAL-OBJECT | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 1 | 0 |
CLIP | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 |
PEN | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0 |
PLIERS | CONE | BLACK-DISK | ORANGE-DISK | STOP-SIGN | DISK-BOX | VELVET | ENVELOPE | OVAL-OBJECT | CLIP | PEN | NO MATCH |
OBJECT | % CORRECTLY IDENTIFIED |
PLIERS | 100 |
CONE | 100 |
BLACK-DISK | 100 |
ORANGE-DISK | 100 |
STOP-SIGN | 100 |
DISK-BOX | 100 |
VELVET | 100 |
ENVELOPE | 100 |
OVAL-OBJECT | 40 |
CLIP | 0 |
PEN | 100 |