A simple video capture application using DirectShow

In the previous article in this series, I did a summary of the basic components of DirectShow: filters, filter graphs and pins to connect the elements together, and how we can identify them using the GraphEdit tool of the SDK. In this article I will show how to build a "simple" application to capture and playback video using the interfaces provided in DirectShow, a subset of the COM distributed object model from Microsoft.

As a starting point, I have based this demo on the open source project Touchless SDK, which I have simplified as much as possible to focus only on video capture and playback. In this link you can download the DirectShowDemo project, written with Visual Studio 2013.

The project consists of a Windows Forms application written in C# and a library of functions written in C++, which is a modification of the WebCamLib.dll library of the Touchless project that allows a greater number of video sources, including file playback.

The COM objects are uniquely identified by a CLSID (class identifier), and they implement a set of interfaces also uniquely identified by an IID (interface identifier). To instantiate a particular object you have to use the CoCreateInstance function, providing a class identifier and the identifier of the interface with which you want to see the object.

All the COM interfaces are derived from the generic IUnknown interface, through which you can see the instantiated object with a different interface using the QueryInterface function. You can see any COM object as a generic object, similar to the object type, through the IMoniker interface.

From this point of view, each filter is an object with a different class identifier that implements the interface IBaseFilter, a generic graph filter is another object of the FilterGraph class that implements the IGraphBuilder interface, to which you can add different filters by using the AddFilter function, as I did in the previous article with the GraphEdit tool.

The COM functions return the results as a code called HRESULT that indicates whether the call was successful or an error code if failed. This can be checked using the SUCCESS macro.

Get the list of cameras present in the system

Let's see first how to list the different video capture devices installed on the system. In the VideoCamLib library there is the RefreshCameraList function that performs this job.

First, we use CoCreateInstance to instantiate an object of the SystemDeviceEnum class with the ICreateDevEnum interface:

HRESULT hr = CoCreateInstance(CLSID_SystemDeviceEnum, NULL, CLSCTX_INPROC, IID_ICreateDevEnum, (LPVOID*)&pdevEnum);

Then we use this object to create a video sources enumerator, by using the CreateClassEnumerator function:

hr = pdevEnum->CreateClassEnumerator(CLSID_VideoInputDeviceCategory, &pclassEnum, 0);

The enumeration of video sources is performed with the Next function of this interface:

pclassEnum->Next(1, apIMoniker, &ulCount);

This function returns a generic IMoniker object. This object represents the device, and we store it in an array for later use:

g_aCameraInfo[*nCount].pMoniker = apIMoniker[0];

In order to get the properties of the object, identified by the IPropertyBag interface, we use the BindToStorage function:

hr = apIMoniker[0]->BindToStorage(0, 0, IID_IPropertyBag, (void **)&pPropBag);

Finally, we obtain the FriendlyName property with the name of the device using the Read function:

hr = pPropBag->Read(L"FriendlyName", &varName, 0);

Do not forget to free the COM objects using the Release function once have used, otherwise they remain in memory and this will produce memory lacks.

Capturing video from a device

To capture video from any of the devices that we have obtained, we will create a filter graph starting with the device, will pass through a SampleGrabber filter, that will provide us each of the frames through a callback function, and will end in a NullRenderer filter, that is simply an endpoint that performs no action, because the images will be processed and displayed by the program.

To start the capture, we will use the StartCamera function, whose parameters are the device, as a generic IUnknown interface, and a callback function to pass the video frames to the caller.

We can get an IMoniker generic object by using the IUnknown QueryInterface function:

hr = pUnk->QueryInterface(IID_IMoniker, (LPVOID*)&pMoniker);

By means of the CoCreateInstance function, we create an object of the FilterGraph generic class with the IGraphBuilder interface to build the filter graph:

hr = CoCreateInstance(CLSID_FilterGraph, NULL, CLSCTX_INPROC, IID_IGraphBuilder, (LPVOID*)&g_pGraphBuilder);

From this object we get the IMediaControl interface, that will allow us to start playback:

hr = g_pGraphBuilder->QueryInterface(IID_IMediaControl, (LPVOID*)&g_pMediaControl);

We must also create an object specialized in the construction of capture graph filters, which allow us to properly link the pins of different video filters:

hr = CoCreateInstance(CLSID_CaptureGraphBuilder2, NULL, CLSCTX_INPROC, IID_ICaptureGraphBuilder2, (LPVOID*)&g_pCaptureGraphBuilder);

And we associate it with the generic IGraphBuilder, where we will add different filters, using the SetFiltergraph function:

hr = g_pCaptureGraphBuilder->SetFiltergraph(g_pGraphBuilder);

As the IMoniker object that represents the camera is a generic object, the first thing to do is to cast it to a filter with the IBaseFilter interface using the BindToObject function:

hr = pMoniker->BindToObject(NULL, NULL, IID_IBaseFilter, (LPVOID*)&g_pIBaseFilterCam);

Now we can add it to the filter graph with the AddFilter function:

hr = g_pGraphBuilder->AddFilter(g_pIBaseFilterCam, L"VideoCam");

We will create a filter of the SampleGrabber class using again the CoCreateInstance function:

hr = CoCreateInstance(CLSID_SampleGrabber, NULL, CLSCTX_INPROC_SERVER, IID_IBaseFilter, (void**)&g_pIBaseFilterSampleGrabber);

With the ConfigureSampleGrabber function of the library, we configure this filter to get the frames on a particular video format and link it to the callback function to pass us the image data:

hr = ConfigureSampleGrabber(g_pIBaseFilterSampleGrabber);

Then add the filter to the graph with the AddFilter function as before. The last filter our graph is of the NullRenderer class. This filter, although performs no action, is needed to complete the graph, as it must end in a renderer filter so that it is complete. We use again the CoCreateInstance function:

hr = CoCreateInstance(CLSID_NullRenderer, NULL, CLSCTX_INPROC_SERVER, IID_IBaseFilter, (void**)&g_pIBaseFilterNullRenderer);

And we add it to the graph with AddFilter. Now let's configure the output pin of the camera with the image size parameters that we want, selecting them from those the camera allow, to do that we will get the IAMStreamConfig interface of the capture pin calling the FindInterface function of CaptureGraphBuilder:

hr = g_pCaptureGraphBuilder->FindInterface( &PIN_CATEGORY_CAPTURE, 0, g_pIBaseFilterCam, IID_IAMStreamConfig, (void**)&pConfig);

The data channel characteristics are returned in a structure of type AM_MEDIA_TYPE, we get the count of all the possibilities calling the GetNumberofCapabilities function:

hr = pConfig->GetNumberOfCapabilities(&iCount, &iSize);

And then we select among them which best fits our needs. Each item we will get using the GetStreamCaps function:

hr = pConfig->GetStreamCaps(iFormat, &pmtConfig, (BYTE*)&scc);

This structure contains a structure of VIDEOINFOHEADER type in the pbFormat member, which in turn contains a bmiHeader structure where we can find the width and height of the frames in its biWidth and biHeight members. When we have found the best fit, we will use the function SetFormat to select it:

pConfig->SetFormat(pmtConfig);

And now, the only thing left to do to complete the graph is to connect the input and output pins of all filters that have been added. This can be done by obtaining each of the output pins of each filter and the input pins of the next and then select the most appropriate to connect them, but there is a much simpler way to do it, leaving to DirectShow to perform this task, by using the RenderStream function of the CaptureGraphBuilder:

hr = g_pCaptureGraphBuilder->RenderStream(&PIN_CATEGORY_CAPTURE, &MEDIATYPE_Video, g_pIBaseFilterCam, g_pIBaseFilterSampleGrabber, g_pIBaseFilterNullRenderer);

We have to pass to this function the three filters that make up the graph, the capture filter, the intermediate one and the final renderer filter. If needed, DirectShow will add and connect additional filters in a fully transparent way.

The final image size is returned to the caller program in the output parameters pnWidth and pnHeight, this can be obtained from the SampleGrabber filter by means of the ISampleGrabber interface in a VIDEOINFOHEADER structure:

hr = g_pIBaseFilterSampleGrabber->QueryInterface(IID_ISampleGrabber, (LPVOID*)&pGrabber);

Finally, we can start the capture in the channel calling the Run function of the IMediaControl interface:

hr = g_pMediaControl->Run();

Playing a video file

In case that you want to play a video file, the only thing than changes is the filter that you have to use as a video source, instead of a device will be a file. This is performed calling the AddSourceFilter function of the FilterGraph object, passing it the file path:

hr = g_pGraphBuilder->AddSourceFilter(wFileName, L"VideoFile", &g_pIBaseFilterVideoFile);

In this case, instead of use the RenderStream function to build the channel, we will connect the pins manually. First we get the output pin of the file with the GetPin function:

IPin* pOutFile = GetPin(g_pIBaseFilterVideoFile, PINDIR_OUTPUT, 1);

And the input pin of the SampleGrabber filter in the same way:

IPin* pInGrabber = GetPin(g_pIBaseFilterSampleGrabber, PINDIR_INPUT, 1);

And we connect both pins using the Connect function of the FilterGraph object:

hr = g_pGraphBuilder->Connect(pOutFile, pInGrabber);

The same to connect the output of the SampleGrabber filter with the input of the NullRenderer filter. The remaining process is the same as in the camera device.

The DirectShowDemo application

Finally, let's see how to use this function library from an application. In the original project, there is a class library that encapsulates the calls to these functions and can be used from any application. This is the right thing to do, but I have extracted the basic code of this library and I have included it within the application to simplify as much as possible the project.

As the library of functions is not a native assembly of the .NET framework, we cannot link it directly as a reference of the project, so I copy it in the results directory of the project using a post build event. The NativeMethods class imports this library functions so they can be called from a .NET assembly. These are the basic functions:

VideoCamRefreshCameraList: Builds a list of video capture devices connected to the system.
VideoCamGetCameraDetails: With this function we obtain the IMoniker object representing a device and its name.
VideoCamInitialize: Performs the necessary initialization work (in this case no action is performed).
VideoCamDisplayCameraPropertiesDialog: Displays the dialog box with the selected device configuration options.
VideoCamStartCamera: Starts capture and playback of a device.
VideoCamStartVideoFile: Starts capture and playback of a video file.
VideoCamStopCamera: Stops the current playback and releases the allocated resources.
VideoCamCleanup: Releases all resources allocated before ending.

The class that encapsulates the camera is Camera and for video files there is the VideoFile class derived from the above. Each camera object contains an object of CameraMethods class that is responsible of making calls to the library functions. The CameraService class implements the list of available cameras. Finally, the IFrameSource interface encapsulates methods to start and stop playback and the callback function that is called whenever there is a new frame available.

As for the main program, the most important, in order to not to lengthen more this article, is the treatment of the data captured in the OnReadImage callback function. This data is passed in the parameter data, while the dimensions of the image are passed as a Size structure in the sz parameter.

Simply we create a Bitmap and get a pointer to the bytes that make up the image. In a Bitmap each pixel is represented by 32 bits, 8 for each of the red, green and blue channels plus 8 for the alpha transparency channel. The data provided by the video channel do not have this alpha channel, but they are composed only with the 24 bits of color, so we have to add the alpha channel to complete the Bitmap data. Finally, we put the image in a PictureBox control to view it.

The use of unsafe code is necessary, since we are working with pointers, so it is necessary to compile the project with the allow unsafe code option.

As in the previous article, I leave you the reference to a couple of books where you can find a more detailed reference of the DirectShow programming: