Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / C

Playing audio and video using DirectShow

4.56/5 (27 votes)
20 Mar 2009CPOL12 min read 11   7.8K  
This project demonstrates the basics of DirectShow and related concepts.

Introduction

The project demonstrates a simple audio/video player which uses DirectShow APIs to play audio and video files. It does not support all file formats, but does work with common formats like MP3, AVI, and others supported by DirectShow. The project is a useful guide for those who are getting started with DirectShow and want to learn the basics and get a feel of things related to COM and DirectShow.

Background

To give a brief background, DirectX is an advanced multimedia framework provided by Microsoft which makes all tasks related to multimedia easy to perform. From the simple need of playing an audio file to playing a video streamed over the internet, DirectX has it all.

DirectX is a huge collection of libraries, and includes components like DirectSound, Direct3D, DirectAnimation, DirectDraw, DirectShow et al. Each component specializes in its own field related to multimedia. DirectShow, however, has been in and out of the core DirectX package that Microsoft ships with Windows. As of this writing, DirectX10 is the latest version of DirectX that ships with Vista.

The entire DirectX library is based on the COM (Component Object Module) interface, so a bit of COM understanding would be helpful when working with DirectX.

DirectShow

DirectShow, as you might have guessed, is a COM based multimedia framework that makes the task of capturing, playing back, and manipulating media streams easier. It was formerly known as ActiveMovie, until it became known as DirectShow. DirectShow might internally use DirectSound and DirectDraw when playing a media, provided there is support from the hardware; otherwise, it might use the traditional Wave APIs and GDI APIs.

DirectShow concepts

Filters: Technically, Filters are just C++ classes complying with the COM interface, which take input in some form, process the data, and produce some output in the same/or different form. A Filter represents each stage in the processing of a media type.

Pins: A Pin is an interface in itself, and every Filter must implement atleast one Pin. Pins are a way for Filters to talk to other Filters. Typically, a Filter might have an input pin and an output pin. During the processing of the media, a Filter takes input through its input pin, processes the data, and passes the output to another filter through its output pin. The other filter, as is obvious, receives the processed data through its input pin. So, basically, the output pin of one filter is connected with the input pin of another filter, and that is how they talk.

Filter Graph and Filter Graph Manager: A Filter Graph is a sequence of connected filters. The processing of any type of media involves a Filter Graph with a specific set of filters most suitable for that media type. The Filter Graph Manager is the divine power that lets you create these Filter Graphs, add Filters to a Filter Graph, connect the Pins of the appropriate Filters, and finally, run the graph which accomplishes our objective, which is, if you remember, playing the media. Although, the above might sound intimidating, it is quite simple in reality. It takes very few APIs to achieve what is said. But, there is also an option for fine grained control, if you are really someone who likes to micro/nano control and get the details right. But, most of the things can be achieved by letting the Graph Manager do the dirty work :).

The picture below shows a sample filter graph for an MP3 media:

DS_mp3_Fitler_Graph2.gif

(Sorry for the picture quality, I messed up.)

A Source Filter is the first filter in a filter graph. This filter takes the input from a file or a URL or a capture device, and processes the data in a form which can be consumed by other filters. A characteristic of a source filter is that it has no input pins and only one output pin. Note: Pins are a secret between filters only. A Renderer Filter is the last filter in a filter graph. This filter takes the input from the previous filter, and renders it to a sound device or a display device or a file, in case of capture. Remember, this might use DirectSound or DirectDraw internally. A characteristic of a renderer filter is that it has one input pin and no output pins. Everything in between a source filter and a renderer filter is called a Transform Filter. A transform filter might have one or more input pins and one or more output pins.

More details about the architecture and concepts related to DirectShow can be found in this MSDN documentation: http://msdn.microsoft.com/en-us/library/ms783323(VS.85).aspx.

COM stuff that you need to know

Before your application can use COM, the COM library has to be initialized. This can be done by using the CoInitialize or CoIinitializeEx APIs. The CoCreateInstance API can be used to get an interface pointer to an instance of a particular COM class. If that sounds confusing, here:

C++
//code snippet I
IGraphBuilder *m_pGraph;
HRESULT hr;
hr = CoCreateInstance(CLSID_FilterGraph, 
                      NULL, 
                      CLSCTX_INPROC_SERVER,
                      IID_IGraphBuilder,
                     (void **)&m_pGraph);
if (SUCCEEDED(hr))
{
    //celebrate
    ...
}

The first parameter specifies that an instance of the filter graph class should be created. CLSID_FilterGraph is the class ID which identifies that class uniquely. Every COM class has a unique ID associated with it. The second last parameter specifies that you are looking for a Graph Builder (remember our Filter Graph Manager, this guy is the one) interface. IID_GraphBuilder is the interface ID that identifies the graph builder interface uniquely. And finally, the last parameter m_pGraph will point to/hold the graph builder interface.

Every COM interface implements the IUnknown interface which contains the all important QueryInterface method. We call the QueryInterface method on an interface to get a pointer to another interface, which the class might have implemented. For example (continuing from the previous code snippet):

C++
//code snippet II
IMediaControl m_pMediaControl;
hr = m_pGraph->QueryInterface(IID_IMediaControl,
                             (void **)& m_pMediaControl);

We are calling the QueryInterface method on m_pGraph, which we had obtained previously. The first parameter specifies that we are looking for the IMediaControl interface whose interface ID is IID_IMediaControl. And, if the class (CLSID_FilterGraph) does implement the Media Control interface, the m_pMediaControl variable will point to/hold the IMediaControl interface.

And finally, don’t forget to un-initialize the COM library when your application exits. The CoUninitialize API is used to do just that.

This is by no means the entire info related to COM. I have only mentioned the basic points that any application has to use. There is more to COM than you can imagine, and thankfully, Google has a pretty decent algorithm :).

What applications must do

Any application that wants to use DirectShow must follow these steps:

  1. Create the Filter Graph Manager
  2. Create the Filter Graph using the Filter Graph Manager
  3. Run the Filter Graph (to play the media)
  4. Release the resources

Creating the filter graph manager

The filter graph manager can be created by using the CoCreateInstance APP. See code snippet I.

Creating the filter graph

There are three ways in which you can create a filter graph: Automatic Graph Creation, Manual Graph Creation, and Semi-Automatic Graph Creation.

Automatic Graph Creation: This is the easiest of all methods, but doesn’t give your application much flexibility in choosing filters. Here, you just call the RenderFile method on the IGraphBuilder interface, passing in the filename or any other source. The graph manager will take all the pains to create an appropriate graph for you. The graph builder has its own algorithm to select various filters and then connect their pins to complete the graph. For example, if you pass the filename “love.mp3”, the graph builder will identify the file format and then choose the appropriate filters to build a graph suitable for MP3 media.

Manual Graph Creation: This method is the most painful, but provides you with super flexibility in choosing the filters. I won’t go into the details of this, but in the most abstract terms, first, you have to create the instances of the individual filters that you need. Then, you add each of these filters to the graph. At this point, the filters just lie in the graph like beans on a table. The next step is to connect the pins of these filters; this is again painful as it involves enumerating the pins of the filters and selecting the output pin of one and the input pin of another and then connecting the two together. Once you have connected all the pins, the graph is complete and ready to be run, assuming you haven’t messed up the pins :).

Semi-Automatic Graph Creation: This method provides you the flexibility while also trying to reduce the amount of pain. Here, you just choose a few priority filters (that you most surely want present in the graph) and then add them to the graph. These filters lie in the graph just like beans on a table. Then, you ask the graph manager to complete the graph. The graph manager will add the missing pieces (filters) and build a graph for you. One thing to note here is that the graph manager will try its ever best to include your priority filters in the final graph, so this way, you have the control to override some of the default filters which the graph manager would have otherwise chosen.

Running the filter graph

Once the filter graph is ready, the media can be played by calling the Run method of the IMediaControl interface. This will start playing the audio or video file in the default audio and video device, respectively.

Releasing the resources

Before your application quits, make sure all the acquired resources/pointers are freed.

Using the code

The project was built using VC++6.0. One thing I want to mention is that the project includes the STRSAFE_NO_DEPRECATE pre-processor directive in its setting. This is because I was getting some error on sprintf (and presumably on all string related functions, saying “error C2065: 'sprintf_instead_use_StringCbPrintfA_or_StringCchPrintfA' : undeclared identifier”) and a bit of Googling led me to this solution.

To build the project on your machine, you will need to install the “Platform SDK” and “DirectX SDK”. The details can be found by Googling around. But, do let me know if it’s a pain, I will provide the links and a brief on how to go about setting up the environment.

The UI of the player looks like this:

PlayerUI.JPG

Just open the file using the “Play File…” button, and the media will start playing. You can pause/play or stop the media. The total time and elapsed time, along with the file name, are also displayed. The progress bar is there only because I wanted to experiment with it. I know a slider control would have been much better; I will work on replacing it with a slider control, and maybe also have a play list along side.

  • //DSPlayer.cpp: This file contains the main function, creates the main dialog, and has the message loop. The main function initializes the COM library using CoInitialize(NULL). The only thing worth mentioning about the file is that it has a g_PlayerObject global variable of type PlayerClass. The PlayerClass class contains the real meat. The main function in this file creates a global object of type PlayerClass and calls Initialise on the object to initialize the member variables. After that, in the message loop, every triggered event (like pressing a button on the UI) calls a respective member function of the g_PlayerObject object; this is pretty self-explanatory.
  • //PlayerClass.cpp: This file contains the actual implementation of the player class. The member variables and the member functions of the class are self-explanatory.

The DirectShow related member variables are IGraphBuilder, IMediaControl, IMediaSeeking, and IMediaEventEx.

  • IGraphBuilder is the graph manager and is responsible for creating the graph.
  • IMediaControl enables us to run the graph once it is ready.
  • IMediaSeeking allows us to perform a seek operation on the media.
  • IMediaEventEx provides a notification to the application regarding the state of the media. For example, it notifies the app that the media has finished playing.

In the Initialise member function, we create the graph builder and get the other interfaces too.

C++
CoCreateInstance(CLSID_FilterGraph,
                 NULL, 
                 CLSCTX_INPROC_SERVER, 
                 IID_IGraphBuilder, 
                 (void**)&pGraphBuilder);


hr = pGraphBuilder->QueryInterface(IID_IMediaControl,
                                   (void **)&pMediaControl);
hr = pGraphBuilder->QueryInterface(IID_IMediaSeeking,
                                   (void**)&pMediaSeeking);
hr = pGraphBuilder->QueryInterface(IID_IMediaEventEx,
                                   (void**)&pMediaEventEx);

We set the notification window, so our app knows about when the media ends:

C++
pMediaEventEx->SetNotifyWindow((OAHWND)hOwner,
                                WM_GRAPHNOTIFY, 
                                0); 

The OpenFileDialog member function handles the file browser interface, and stores the filename selected by the user in the szFileName member variable. It then calls the StartPlayingFile private member function to begin playing the media.

The StartPlayingFile function creates the filter graph and gets the duration of the media:

C++
pGraphBuilder->RenderFile(wFileName, NULL);
pMediaSeeking->GetDuration(&lDuration100NanoSecs);

The function also creates a one second timer so that we can track the elapsed time, and then it plays the file using the Run method.

C++
SetTimer(hOwner,
         MY_TIMEREVENT, 
         1000, 
        (TIMERPROC)NULL);

pMediaControl->Run();

Other APIs include:

C++
pMediaControl->Pause(); //to pause the media 
pMediaControl->Run()  ; //resumes the media if it was previously paused
pMediaControl->Stop() ; //stops the media

As simple as that.

To perform a seek operation on the media, we use the SetPositions function of the IMediaSeeking interface. For example, I chose to seek the media to its starting (zero) position once the media has finished playing. Using the IMediaEventEx interface, we get to know about when the media has finished, and then we seek it to the beginning

C++
pMediaSeeking->SetPositions(&rt, 
                            AM_SEEKING_AbsolutePositioning,
                            NULL,
                            AM_SEEKING_NoPositioning);

The DoTimerStuff member function handles the timer event which triggers every second, and displays the elapsed time on the label.

Points of interest

I was initially annoyed by the progress bar for some time. The easiest way for using the progress bar I thought was setting the range of the progress bar from zero to the duration of the media in seconds. So, whenever the one second timer fired, I would just increment the value of the progress bar by one, and that should have taken care of all the things. But, I ran into problems when using this method. For some reason, the progress never stepped in correct increments, and it would always stay out of sync with the media. What I mean is that when the media just starts playing, the progress would have reached the half-way point.

So, I decided to try it the other way around, that is, I would set the range of the progress bar from zero to 100 (one hundred) and then normalize the elapsed time over this scale. For example, if a media is 300 seconds long and 30 seconds have elapsed, then it means that (30/300) *100 % of the media is finished playing, which is 10%. So, quite simply, I would set the progress bar to the value 10. This did work smoothly for me, although the implementation part is a little galling. But hey, whatever works :).

History

  • Updated: 21 Jun, 2008.
  • Updated: 2 Jul, 2008.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)