May 20, 2008

3D Web-Cam Motion Tracking with Papervision

There has been a spattering of small internet projects over the years using Flash to track two dimensional movement via a web camera. What I am attempting here is tracking not only the X & Y but also the Z axis of an object via a web-cam, thus giving any user 3 dimensions of input.

When I recently discovered papervision 3D, I like many others was blow-away by it. It’s a whole new world, a gold vein of Flash development just waiting to be tapped. Its inspiring stuff... quite literally in this case because it inspired me into thinking: what this cool 3D environment needs is 3D control. My thought was simply this: Can a web cam be used to as a 3D input device?

This is the recording of my first foray into making this idea a reality, or if not a reality, a proof of concept at very least. I am glade to say that it does actually work, and that it is quite robust, working in a wide range of light conditions, colour variations and camera image qualities. Its works, but is by no means perfect, though I believe it is a good start.


So how does it work? Lets say we are creating a tennis style game, we want our user to be able to control the on-screen virtual tennis racket: up, down, left & right but also control the forward & back which allows the user to control the return hitting speed.

The user would be holding their own raquet-like object that the tracking software would be programmed to recognise. For this technique to work on as many computers around the world as possible there needs to be a consistent method of input to eliminate as many variables as possible. This is where the paddle comes in, a paper based, disc shape that is one half coloured green. The user would print it out on their home printer and hold it up to their web cam, the software is optimised to identify this colour.

The game is poised to start: the user has their paddle in hand and the software begins tracking the paddle…

The tracking is done by examining web-cams images, looking for the border edge & green area of our paddle. Obviously there can be other green object in the view of the web-cam (plants, painted walls, T-shirts), part of the process of tracking is determining and eliminating other objects that are not the paddle. To do this we looking for certain markers that differentiate our paddle from other green objects.

When a starting point for the paddle is identified the size of the paddle then needs to be determined, this is the key to extracting a Z axis. The software looks for the perimeters of the green area (see the yellow squiggles in the picture right), literally looking one pixel at a time for the green pixels that make up the boundary, stopping when it finds the first pixel again. When this is completed we now know the height, width, X & Y parameters of the target.

The process is repeated continuously. As the user moves the paddle around, towards the web-cam and away, the paddles image will scale in size, the scaling is easily converted to a Z axis for use with Papervision. In the case of our tennis game we could hit the virtual ball harder by moving the paddle closer to the web-cam which inturn moves the virtual tennis racket forward.


Colour & brightness variation

The very first hurdle was how to handle the variety of lighting & colour conditions that would effect the web-cam’s image output. As well as the natural & room lighting some web-cams come with software that automatically adjusts the colour & brightness settings of the web-cam images. Essentially, the image received in Flex from the camera could have a brightness anywhere from dark to washed-out light, and any colour shade or tint.

Example: I was wearing a red t-shirt one day when developing this program and for some reason the tracker was going nuts, picking up green objects everywhere. The problem was the web-cam’s OEM software detected that the web-cam image contained too much red and overcompensate by upping the green. As a result everything in the web-cam image took on a shade for green, sending my motion tracking software crazy!

I got around these colour and brightness issues by coding a way to find what I call ‘truly green pixels’. These are pixels that are constituted mostly of green not just tinted green. As we all know pixel colour is made up of red, green & blues values. For a pixel to be considered truly green (In this application) its green value must always be greater than both its red and blue values. As a happy coincidence this ''true green'' idea also happened to be the best method for working with variable light conditions because no matter the brightness of the pixels colour, its RGB values are always relative.

For example: A very washed-out, over bright web-cam image many show the paddle as a green colour like this:

Light Green
With colour values of Red: 224, Green: 245 & Blue: 227.

A very dark poorly lit web-cam image many have a green paddle colour like this:
Dark Green
With colour values of Red: 13, Green: 39 & Blue: 23.

The software can easily recognise that they are acceptable shades of green because both have green values that are greater than their red and blue values.

This is the general idea, that actual coding uses a few more tricks to refine the process. As a result the tracking software can find the real shades of green it is looking for regardless of colour variation and light condition. In fact it will operation with a dimly lit web-cam image just as well as it operates with an almost washed-out over bright image. You could even flick a light switch on & off rapidly and the tracking will not be deterred.

Other green objects

How can a computer tell the difference between the game paddle and a plant or a green t-shirt? A lot thinking went into this one let me assure you! The key to this was to examine the texture of these objects: plants are quite detailed & layered, shirts have wrinkles. The paddle has one property that makes it unlike the others: It has a flat consistent surface & colour.

This means the paddle is likely to be the only flat area of colour in the web-cam image (unless the user has vivid green wall paint, lets just hope they don’t!). So the solution to this hurdle is a process of eliminating areas of the camera image that don''t match the profile of our paddle.

We do this by looking for lines of unbroken colour in the entire web-cam image. Start from the top left, pixel 1 is compared to pixel 2 which is compared to pixel 3, and so on. If each one is very similar in colour to the last then it is likely to be our paddle. Conversely if the colour varies too much then the area is clearly not part of our paddle.

If a short column of pixels (8 – 12) is found to be consistent in colour and brightness then area is temporarily stored to (perhaps) be used for the next step of finding the paddles perimeters.

Note: Within the code you will see these lines are referred to as blobRoots.


Like the human brain at 3.30pm on any given work-day, some things just have their limitations. My web-cam 3D track project encountered some limitations to:

Camera Refresh Rate

My cheap (read: commonly available) camera only operates at 15 frames per second, Television runs at 25-30fps. I’m sure some cameras use a higher frame rate but I bet most don’t. All the programming for the software is designed for the most common or worst case scenario of 15fps.


The lag time is the time it takes for the camera to produce the image, have the code process it and render the result on screen. It is only about 1-15th’s of a second behind but is still enough to make the output feel out of sync & ‘washy’.


When the paddle is moved rapidly the web cam image of the paddle is blurred, making it nigh on impossible to collect the boundary line of the paddle and therefore we cannot retrieve the height and width.

What does this mean? It means fast paced games like our tennis example might be impractical and require a great deal of interpolation to make the game play enjoyable. However slower more precise games are in, game that can make the most of an immersive technique of object manipulation. What about a virtual game of the classic board game Operation? Using this system to guide virtual tweezers to pick up pieces of skeleton.

The future

The next step for me is to revisit the code used to find the perimeter of the paddle, it is a little too hit-n-miss and not very elegant, relying on many if statements & for loops. My next move may be to use the ‘lines of consistent colour’ (blobRoot’s) that identify the paddle texture and tie them together to find the mass of the paddle object, not just the boundary of it.

After I manage to do the impossible and make the tracking perfectly flawless (I will probably settle for ‘acceptably robust’) I will move onto the next level of craziness for this idea: detecting the rotation and pitch of the paddle! I should be able to find the rotation of the paddle by identifying the only straight line of green on the paddle (the main reason the paddle is a half moon shape) if the line can be discovered then I can measure the true height and width of the paddle, comparing the two should yield information about the pitch.

So a game of 3D Tennis is still best left to the Wii console but maybe you find a use for this code in your own way, if you do please be sure an let me know, I would love to see it.

No comments:

Post a Comment