Detecting movement
Movement detection was achieved through intensity comparison between a reference background frame and
the current input frame. If the the absolute value of the intensity difference is less than some
threshold, set flag for that particular pixel to denote background. The threshold I used here is
eight pixels. To filter out some of the flickering due to artifical lighting, the current frame
is pass through one more time in which each pixel looks at its right and bottom neighbors. If these
neighbors are part of the background then set current pixel to background as well.
Reference background. |
Background flickers due to lighting. |
After filtering flickers. |
Showing only movement when someone walks in. |
Updating reference background
The background reference image is updated every twenty consecutive similar frames. Two consecutive frames are
considered similar if the total number of pixels that are similar in intensity (with the absolute intensity difference
less than 8) is greater or equal to 85% of the total number of pixels in a frame.
Detecting skin tone
Skin tone detection was done in chromaticity space. I took sample values from people of various skin tones
and calculated the chromaticity of the R and G band. It seems that these values have little variation between
people of different skin tones. The processing stage first check for the background flag of the current pixel.
If it is not background, threshold the pixel by chromaticity. Thresholds I used for R are (0.35 <= R <= 0.55), and
for G are (0.28 <= G <= 0.35). Pixels outside of these thresholds are set to be background.
Before skin tone detection. |
After skin tone detection. |
Convert to binary image and segment
After the skin tone detection process, information about each frame is then stored
in a binary image where background pixels are set to 0 and skin tone pixels are set to 255.
Using the 2-pass segmentation algorithm from lab 1 (2D object recogniction), bounding boxes are
found for each cluster of skin tone pixels. The minimum number of pixels in a cluster to register
a bounding box is 300.
Convert to binary image. |
Found bounding boxes. |
Multi-person detection (a.k.a. getting rid of everything except people's head)
My approach to solving this problem is very straight forward. For each bounding box resulted from
segmentation, I first threshold the percentage of the bounding box filled. Heads should fill up most
of the bounding box and I set the threshold to be greater than 65%. If the previous condition is met,
then threshold the width/height ratio of the bounding box. Arms placed along side the body should
have fairly small ratio. When elbows are bent at any angle, however, by enlarging the bounding box
the percentage of box filled will decrease. Therefore, these cases will generally fail the first
condition.
Few problems encountered with this approach is due to the following assumption I made: people are either
wearing short sleeves or long sleeves so that the skin tone clusters of the hand is too small to be
identified. Anything in between will close the head and arms differential gap in the two conditions mentioned.
Secondly, the approach increases false negatives in which none of the bounding box meets the two conditions.
Results are mixed and it's difficult to quantify.
More screen shots
|
| |
| Screen shots with skin tone detection.
|
Filter out everything except head. |
False negatives. |
Screen shots with head detection.
|
|