Real-Time People Detection
Xiang Lan Zhuo

Abstract

The real-time people detection system is based on movement detection and skin tone segmentation. The input video sequence is about 30 frames per second. Movement in each frame is detected by comparing the current frame with a recorded background. The next step identifies possible skin tones within the movement of that particular frame. A flag is set for each pixel that's identified to be skin tone. Then the current frame is converted to a binary image and use the 2-pass segmentation algorithm to locate bounding boxes for each cluster of skin tone. The algorithms used reduce the frame rate to about 15 frames per second. I found this to be within tolerance of the real-time component of this lab.

Code:   util.h    util.c    display.c   

Detecting movement
Movement detection was achieved through intensity comparison between a reference background frame and the current input frame. If the the absolute value of the intensity difference is less than some threshold, set flag for that particular pixel to denote background. The threshold I used here is eight pixels. To filter out some of the flickering due to artifical lighting, the current frame is pass through one more time in which each pixel looks at its right and bottom neighbors. If these neighbors are part of the background then set current pixel to background as well.



Reference background.

Background flickers due to lighting.

After filtering flickers.

Showing only movement when someone walks in.


Updating reference background
The background reference image is updated every twenty consecutive similar frames. Two consecutive frames are considered similar if the total number of pixels that are similar in intensity (with the absolute intensity difference less than 8) is greater or equal to 85% of the total number of pixels in a frame.


Detecting skin tone
Skin tone detection was done in chromaticity space. I took sample values from people of various skin tones and calculated the chromaticity of the R and G band. It seems that these values have little variation between people of different skin tones. The processing stage first check for the background flag of the current pixel. If it is not background, threshold the pixel by chromaticity. Thresholds I used for R are (0.35 <= R <= 0.55), and for G are (0.28 <= G <= 0.35). Pixels outside of these thresholds are set to be background.



Before skin tone detection.

After skin tone detection.


Convert to binary image and segment
After the skin tone detection process, information about each frame is then stored in a binary image where background pixels are set to 0 and skin tone pixels are set to 255. Using the 2-pass segmentation algorithm from lab 1 (2D object recogniction), bounding boxes are found for each cluster of skin tone pixels. The minimum number of pixels in a cluster to register a bounding box is 300.



Convert to binary image.

Found bounding boxes.


Multi-person detection (a.k.a. getting rid of everything except people's head)
My approach to solving this problem is very straight forward. For each bounding box resulted from segmentation, I first threshold the percentage of the bounding box filled. Heads should fill up most of the bounding box and I set the threshold to be greater than 65%. If the previous condition is met, then threshold the width/height ratio of the bounding box. Arms placed along side the body should have fairly small ratio. When elbows are bent at any angle, however, by enlarging the bounding box the percentage of box filled will decrease. Therefore, these cases will generally fail the first condition.
Few problems encountered with this approach is due to the following assumption I made: people are either wearing short sleeves or long sleeves so that the skin tone clusters of the hand is too small to be identified. Anything in between will close the head and arms differential gap in the two conditions mentioned. Secondly, the approach increases false negatives in which none of the bounding box meets the two conditions. Results are mixed and it's difficult to quantify.

More screen shots
Screen shots with skin tone detection.




Filter out everything except head.

False negatives.
Screen shots with head detection.