Basic image capturing with Kinect

The Microsoft Kinect sensor is a very powerful device that provides image capturing, distance measurement and body postures and facial expressions recognition services, which makes it suitable for infinity of applications. In this introductory article I will show how to use it to capture different types of images.

Although Kinect was born as a sensor for the Xbox game console, there is a version with an adapter for Windows, so you can develop all kinds of applications based on this sensor on this platform. For this, in addition to having the sensor and the adapter, you have to download the Kinect SDK for Windows. There are several versions of it. The one that I will use here is the last one at this moment, the Kinect SDK version 2.0.

The code examples I will show are written in C#, although if you prefer you can also use C++. There are quite complete tutorials, for example this is a tutorial for Windows store Apps. In the Kinect SDK, the examples are WPF applications. In this article, for a change, the example will be a Windows Forms application. In this link you can download the source code of the KinectImage solution, written in csharp using Visual Studio 2015.

When creating a Kinect based .NET project, you should always include the reference to the Microsoft.Kinect assembly, which you'll find in the extensions tab, you should also include a using statement for the Microsoft.Kinect namespace.

The class representing the sensor is KinectSensor, this is the class with which the whole process should start. First, get an instance to it:

private KinectSensor _kSensor = null; … _kSensor = KinectSensor.GetDefault();

It is possible to connect several Kinect sensors to the same computer. This way you will only get the default sensor, but at the moment we are not going to complicate our life. Using this instance, you can subscribe to the IsAvailableChanged event so that the sensor notifies you when their availability status changes:

_kSensor.IsAvailableChanged += new EventHandler<IsAvailableChangedEventArgs>(Sensor_IsAvailableChanged);

The next step is to start the sensor:

_kSensor.Open();

The sensor can provide various types of frames: Color, for color images, Infrared, for images obtained with an infrared sensor, which allow us to see in the dark, Depth, which provides us with information about the distance at which the objects are, and some others that I will left for other articles. To read each of these types of images we must obtain a specific reader object, or we can obtain a multireader object that allows us to obtain synchronized images of different types, the latter is which I use in my example:

private MultiSourceFrameReader _fReader = null;
…
_fReader = _kSensor.OpenMultiSourceFrameReader(
FrameSourceTypes.Color |
FrameSourceTypes.Depth |
FrameSourceTypes.Infrared);

In the call we have to indicate the types of images that we wish to receive. Then, we must decide how to obtain them. We can choose to request the images at our convenience or, the most common way, subscribing to the MultiSourceFrameArrived event to be notified whenever there is data available.

_fReader.MultiSourceFrameArrived += new EventHandler<MultiSourceFrameArrivedEventArgs>(Image_FrameArrived);

In the event handler we will extract the different types of images. Keep in mind that not always you will have images of all types, so you have to control the null values:

private void Image_FrameArrived(object sender, MultiSourceFrameArrivedEventArgs e) { if (!_bProcessing) { try { _bProcessing = true; MultiSourceFrame frame = e.FrameReference.AcquireFrame(); if (frame == null) { return; } using (ColorFrame cframe = frame.ColorFrameReference.AcquireFrame()) { DrawColorImage(cframe); } using (InfraredFrame iframe = frame.InfraredFrameReference.AcquireFrame()) { DrawInfraredImage(iframe); } using (DepthFrame dframe = frame.DepthFrameReference.AcquireFrame()) { DrawDepthImage(dframe); } } catch { } finally { Refresh(); Application.DoEvents(); _bProcessing = false; } } }

The reason I use the _bProcessing Boolean variable is to prevent the accumulation of calls that can give the impression that the application hangs. We do not process a new frame again until we have shown the previous one.
The easiest image to show is the color image. Simply copy the bytes of the image into a Bitmap and display it on the screen:

private void DrawColorImage(ColorFrame cframe) { if (cframe != null) { Bitmap cbmp = null; BitmapData bd = null; try { pbColor.Width = cframe.FrameDescription.Width == 1920 ? 853 : 640; cbmp = new Bitmap(cframe.FrameDescription.Width, cframe.FrameDescription.Height); bd = cbmp.LockBits( new Rectangle(0, 0, cframe.FrameDescription.Width, cframe.FrameDescription.Height), ImageLockMode.ReadWrite, PixelFormat.Format32bppArgb); byte[] image = new byte[4 * cframe.FrameDescription.Width * cframe.FrameDescription.Height]; if (cframe.RawColorImageFormat == ColorImageFormat.Bgra) { cframe.CopyRawFrameDataToArray(image); } else { cframe.CopyConvertedFrameDataToArray(image, ColorImageFormat.Bgra); } Marshal.Copy(image, 0, bd.Scan0, image.Length); cbmp.UnlockBits(bd); bd = null; pbColor.Image = cbmp; } catch { } finally { if ((cbmp != null) && (bd != null)) { cbmp.UnlockBits(bd); } } } }

The appropriate format for the images is ColorImageFormat.Bgra. If it already comes in that format, we can make a faster copy without converting the data with CopyRawFrameDataToArray. Do not try to use the version of the function that copies the data directly to the IntPtr in the Scan0 member of the BitampData, because an exception will be produced (the unsafe code should be activated, which is unnecessary).

The infrared image has, for each pixel, a value that can be between 0 and 65535, and we will have to convert it to a color, in this case in a gray scale, pixel by pixel. I use the first capture to find a minimum and maximum pixel value to make the conversion as bright as possible:

private void DrawInfraredImage(InfraredFrame iframe) { if (iframe != null) { Bitmap ibmp = null; BitmapData bd = null; try { ushort[] idata = new ushort[iframe.FrameDescription.Width * iframe.FrameDescription.Height]; iframe.CopyFrameDataToArray(idata); if (_maxIR < 0) { for (int ix = 0; ix < idata.Length; ix++) { _maxIR = Math.Max(_maxIR, idata[ix]); _minIR = Math.Min(_minIR, idata[ix]); } _scIR = _maxIR - _minIR; } else { ibmp = new Bitmap(iframe.FrameDescription.Width, iframe.FrameDescription.Height); bd = ibmp.LockBits(new Rectangle(0, 0, iframe.FrameDescription.Width, iframe.FrameDescription.Height), ImageLockMode.ReadWrite, PixelFormat.Format32bppArgb); int[] ipixels = new int[iframe.FrameDescription.Width * iframe.FrameDescription.Height]; for (int ix = 0; ix < idata.Length; ix++) { int pxint = (255 * (idata[ix] - _minIR)) / _scIR; ipixels[ix] = Color.FromArgb(pxint, pxint, pxint).ToArgb(); } Marshal.Copy(ipixels, 0, bd.Scan0, ipixels.Length); ibmp.UnlockBits(bd); bd = null; pbInfrared.Image = ibmp; } } catch { } finally { if ((ibmp != null) && (bd != null)) { ibmp.UnlockBits(bd); } } } }

As for the distance-based image, it takes a value for each pixel that represents the distance to the sensor in millimeters. We can convert these distances into gray scale and build a Bitmap with the result, also processing pixel by pixel. In this case, the smallest and largest distance is provided by the DepthMinReliableDistance and DepthMaxReliableDistance properties of the DepthFrame object.

private void DrawDepthImage(DepthFrame dframe) { if (dframe != null) { Bitmap dbmp = null; BitmapData bd = null; try { int[] bdepth = new int[dframe.FrameDescription.Width * dframe.FrameDescription.Height]; ushort[] bw = new ushort[dframe.FrameDescription.Width * dframe.FrameDescription.Height]; dframe.CopyFrameDataToArray(bw); ushort dmax = dframe.DepthMaxReliableDistance; ushort dmin = dframe.DepthMinReliableDistance; int rec = dmax – dmin; for (int ix = 0; ix < bw.Length; ix++) { int dpixel = (bw[ix] >= dmin && bw[ix] <= dmax) ? (256 * (bw[ix] – dmin)) / rec : 0; bdepth[ix] = Color.FromArgb(dpixel, dpixel, dpixel).ToArgb(); } dbmp = new Bitmap(dframe.FrameDescription.Width, dframe.FrameDescription.Height); bd = dbmp.LockBits(new Rectangle(0, 0, dframe.FrameDescription.Width, dframe.FrameDescription.Height), ImageLockMode.ReadWrite, PixelFormat.Format32bppArgb); Marshal.Copy(bdepth, 0, bd.Scan0, bdepth.Length); dbmp.UnlockBits(bd); bd = null; pbDepth.Image = dbmp; } catch { } finally { if ((dbmp != null) && (bd != null)) { dbmp.UnlockBits(bd); } } } }

And that's all you need to build a simple application based on Kinect. In this example you can see all three images simultaneously, and the performance is not bad, especially using a USB 3.0 port, which is recommended. In future articles I will show something much more useful that we can achieve with this sensor, the recognition of gestures, postures and movements.