Benefits and challenges of using Kinect

Kinect is a motion sensing input device produced by Microsoft for Xbox and Windows PCs. It is a really simple device with a few interesting sensors.


Kinect has a 3D depth sensor, an RGB camera and a multi-array MIC. Depth resolution is 512×424 which is really weird, but I will talk about the challenges and limitations of it a little bit later. RGB resolution is 1920×1080 (16:9) and the frame rate is 60 FPS, so it works pretty well.

What we did with Kinect

We used Kinect on one of our largest projects – a set of interactive installations for the Sheikh Abdullah Al Salem Cultural Center, the world’s largest museum that’s located in Kuwait. We were tasked with creating 33 different interactive applications which were all connected to different pieces of interactive hardware. Some hardware devices we used to interact with applications were RFID readers, joysticks, Kinect devices, large multi-touch screens…

In the applications we developed for the museum, Kinect was used to detect hands, people and their feet position. There were three different applications that had to use Kinect sensors for interaction and they were all completely different, so we had to use completely different strategies to implement interaction elements and the overall gameplay aspects of those applications.

Microsoft created a great SDK and good libraries for Kinect, and when Kinect is standing on a table and facing a person head on, it gives you an enormous amount of useful information, like the full body skeleton and hand locations in 3D space.

So it was just plug’n'play?

Well, not really. Our applications had some specific requirements and therefore we couldn’t rely on Kinect’s API information. You see, there is this interesting detail about Microsoft Kinect API – it just doesn’t work if the Kinect device is hanging from the ceiling and facing the floor. To give you relevant info, its position must be exactly like it is in the manual – on a flat surface and facing a person. If Kinect’s sensor angle is incorrect, the API won’t give you any data on a person’s skeleton, hands or feet positions.

Due to these limitations, we had to implement our own hand/feet recognition algorithms.


The process of implementation

We used a depth sensor which comes with a resolution of 512×424 and a projector with a resolution of 1920×1200 – both hung from the ceiling and facing the floor. Kinect’s depth frame resolution is really weird, so it was really challenging to detect which part of Kinect’s depth projection has a projector projection. You see, Kinect was attached to the middle of the projector, but as depth sensors are not located in the middle of Kinect, the projector projection wasn’t in the middle of Kinect’s depth projection on the floor. Because of this, we had to create a sort of a configuration middleware which would detect where a person (or their hands/feet) were located within the interactive application.

Another challenging detail of using Kinect as an interactive device for our applications were the colors and materials of the floor. If a floor was too reflective, depth sensors weren’t working at all, so we had to keep that in mind when developing our applications and choosing an appropriate place for it in the museum.

Working on site

We worked on three different applications that used Kinect as an input device, so it made sense that we would create one hand/feet detection algorithm and use it on all three applications. Well, hand/feet detection is not the easiest thing to do if applications are not completely similar, so we needed to find a way to implement hand/feet detection without exactly detecting hands or feet.

Our first application was called Forest Floor, and for it we had to implement a feet detection algorithm. The idea behind the application was that when a person’s foot touches the projection of leaves on the ground, those leaves should animate and move away from the exact spot. Since Kinect was on the ceiling facing the floor, it was very challenging to detect feet. To solve that issue, we needed to create a custom algorithm which would serve as a workaround solution that detected the body and approximate the location of the feet.

A second application was called Precious Oil. This application was projected onto a round table which was about 1 meter in height. Users could interact with the application with hand gestures. Holding a hand over an oil drop triggered an action of grabbing that oil drop and made it possible to move it around by simply moving the hand. We simplified it by creating an algorithm which detects anything that’s positioned over the table. When this was done, the only thing we had to do is find the furthest dot in that active matrix and tag it as a hand position.

The third application we created was called Sustainable Infrastructure – it consisted of large consoles for two players. Each console had a few buttons on it, and since button positions were fixed, we needed to simplify the algorithm so it detected more than just hands. The idea was to save button positions and match them in Kinect’s depth frame matrix. When we did that, everything that crossed over those buttons and stood there for a small period of time, triggered a click. The algorithm was very simple and worked like a charm.

Developing and debugging

Kinect devices work on Windows platforms only, and we used Mac devices to develop those applications. Now, we could have easily just switched to Windows OS, but another problem was that computers in the museum were hidden and we couldn’t reach them or easily work on them. To counter that, we created a simple Node.js socket server which was installed on a Kinect computer and streamed data from depth sensors to a local network. OK, now we could develop applications on any OS and in any language we wanted to!

But how do we debug our code? Well, at first, we just walked and waved our hands below the installed Kinect device every time we changed something in our code. Of course, we had to create as many unit tests as possible to avoid losing time constantly walking to and from the Kinect, but some changes required us to test it for real. After some time, we found out that Kinect Studio can record movement and play it in a loop over and over again. Playing a recorded video streamed data like it was happening in real time. This simple solution enabled us to test all the changes, refactorings and upgrades from the comforts of our chairs.

Any bits of advice?

Now that we wrapped it all up and all of the applications are active within the museum, here’s a bit of information that can make the implementation of Kinect much easier.

  1. Use OpenCV
    • It’s a great open source library for detecting items like hand, feet or anything else
  2. Please use Kinect Studio when debugging or try to find something even better
    • Simplify debugging as much as possible with unit tests, integration tests, and other automated tests
  3. Don’t be afraid of thinking outside of the box
    • Sometimes a simpler solution can work much better – the idea is always to solve a specific problem, not to develop an almighty monster