OpenCV Tutorial by R. Laganiere
OpenCV Tutorial by R. Laganiere
OpenCV Tutorial by R. Laganiere
Laganiere
The CFileDialog class is the one to use in order to create a file dialog. This one will show up by adding the following code to the the OnOpen member function void CCvisionDlg::OnOpen() { CFileDialog dlg(TRUE, _T("*.bmp"), "", OFN_FILEMUSTEXIST|OFN_PATHMUSTEXIST|OFN_HIDEREADONLY, "image files (*.bmp; *.jpg) |*.bmp;*.jpg| AVI files (*.avi) |*.avi|All Files (*.*)|*.*||",NULL); char title[]= {"Open Image"}; dlg.m_ofn.lpstrTitle= title; if (dlg.DoModal() == IDOK) { CString path= dlg.GetPathName(); // contain the // selected filename } } Note how the extensions of interest (here .bmp .jpg and .avi) for the files to be opened are specified using the fourth argument of the CFileDialog constructor. Now, by clicking on the
http://www.site.uottawa.ca/~laganier/tutorial/opencv+directshow/cvision.htm (2 of 38)5/29/2007 1:11:06 AM
Note also that we have included the DirectX directory information (which is, in our case, C: \DXSDK\Lib) that we will use in later examples. This one should always be the first in the list to avoid incompatibilities with other libraries. With these global settings, only the names of the library modules need to be specified when a
http://www.site.uottawa.ca/~laganier/tutorial/opencv+directshow/cvision.htm (3 of 38)5/29/2007 1:11:06 AM
Now add the following header file to the project, here called cvapp.h: #if !defined IMAGEPROCESSOR #define IMAGEPROCESSOR #include #include #include #include #include <stdio.h> <math.h> <string.h> "cv.h" // include core library interface "highgui.h" // include GUI library interface
class ImageProcessor { IplImage* img; // Declare IPL/OpenCV image pointer public: ImageProcessor(CString filename, bool display=true) { img = cvvLoadImage( filename ); // load image if (display) { } } // create a window cvvNamedWindow( "Original Image", 1 ); // display the image on window cvvShowImage( "Original Image", img );
#endif The function names starting with cvv are HighGui functions. To use the ImageProcessor class in the application, just include the header to the dialog. Once a file is open, an ImageProcessor instance can be created, this can be done as follows: void CCvisionDlg::OnOpen() { CFileDialog dlg(TRUE, _T("*.bmp"), "", OFN_FILEMUSTEXIST|OFN_PATHMUSTEXIST|OFN_HIDEREADONLY, "BMP files (*.bmp) |*.bmp|AVI files (*.avi) |*.avi| All Files (*.*)|*.*||",NULL); char title[]= {"Open Image"}; dlg.m_ofn.lpstrTitle= title; if (dlg.DoModal() == IDOK) { CString path= dlg.GetPathName(); ImageProcessor ip(path); // load, create and display } } Then when you select an image, this window should appear:
3. Processing an image
Now lets try to call one of the OpenCV function. We rewrite the header as follows: #if !defined IMAGEPROCESSOR #define IMAGEPROCESSOR #include #include #include #include #include <stdio.h> <math.h> <string.h> "cv.h" // include core library interface "highgui.h" // include GUI library interface
class ImageProcessor { IplImage* img; // Declare IPL/OpenCV image pointer public: ImageProcessor(CString filename, bool display=true) { img = cvvLoadImage( filename ); // load image if (display) { cvvNamedWindow( "Original Image", 1 ); cvvShowImage( "Original Image", img ); } } void display() { cvvNamedWindow( "Resulting Image", 1 ); cvvShowImage( "Resulting Image", img ); } void execute(); ~ImageProcessor() { cvReleaseImage( &img ); } }; extern ImageProcessor *proc; #endif
and we add a C++ source file, here named cvapp.cpp, that contains the function that does the processing. #include "stdafx.h"
http://www.site.uottawa.ca/~laganier/tutorial/opencv+directshow/cvision.htm (6 of 38)5/29/2007 1:11:06 AM
#include "cvapp.h" // A global variable ImageProcessor *proc = 0; // the function that processes the image void process(void* img) { IplImage* image = reinterpret_cast<IplImage*>(img); cvErode( image, image, 0, 2 ); } void ImageProcessor::execute() { process(img); } The process function is the one that calls the OpenCV function that does the processing. In this example, the processing consists in a simple morphological erosion (cvErode). Obviously, all the processing could have been done directly inside the execute member function. Also, there is no justification, at this point, to have use a void pointer as parameter for the process function. This has been done just for consistency with the examples to follow where the process function will become a callback function in the processing of a sequence. Note that for simplicity, we have added a global variable that points to the ImageProcessor instance that this application uses. Lets now modify our dialog by adding another button, i.e.:
The member functions now become: void CCvisionDlg::OnOpen() { CFileDialog dlg(TRUE, _T("*.bmp"), "", OFN_FILEMUSTEXIST|OFN_PATHMUSTEXIST| OFN_HIDEREADONLY, "image files (*.bmp; *.jpg) |*.bmp;*.jpg| AVI files (*.avi) |*.avi|All Files (*.*)|*.*||",NULL);
char title[]= {"Open Image"}; dlg.m_ofn.lpstrTitle= title; if (dlg.DoModal() == IDOK) { CString path= dlg.GetPathName(); if (proc != 0) delete proc; proc= new ImageProcessor(path); } } void CCvisionDlg::OnProcess() { if (proc != 0) { // process and display proc->execute(); proc->display(); } } If you open an image and push the process button, then the result is:
iplAllocateImage(gray, 1, 0); // Creating a color image IplImage* color = iplCreateImageHeader(3,0, IPL_DEPTH_8U, "RGB", "BGR", IPL_DATA_ORDER_PIXEL, IPL_ORIGIN_TL, IPL_ALIGN_QWORD, width, height NULL,NULL,NULL,NULL); iplAllocateImage(color, 1, 0); The first parameter specifies the number of channel and the second is 0 if there is no alpha channel in the image (which is most often the case in computer vision). The third parameter defines the pixel type. An unsigned 8 bits pixel (IPL_DEPTH_8U ) is the common choice but 2byte signed integer (IPL_DEPTH_16S) and 4-byte float (IPL_DEPTH_32F ) are also very useful. The next parameters specify the color model (basically "GRAY" or "RGB") and the channel sequence (in case of a color image). The data order parameter specifies how the different color channels are ordered. Under IPL the choices are pixel-oriented, i.e. RGBRGBRGB or plane-oriented, i.e. RRRRGGGGGBBBB The origin is normally at the top left corner (IPL_ORIGIN_TL). For an efficient use of the MMX capabilities of the processor, the line length of an image should be a multiple of 8 bytes. This is guaranteed by choosing the quad-word alignment, each line being padded with dummy pixels if necessary. Finally, the width (number of column) and the height (number of lines) of the image are specified. The last four parameters are usually NULL. Once the header created, memory must be allocated. This is the role of the iplAllocateImage function. An initial value for the pixel data can be specified, this is the last parameter. The middle parameter of this function must be set to 0 if no initialization is required. Do not forget to deallocate the images at the end of the process by calling iplDeallocate(image, IPL_IMAGE_ALL ). Note that for floating point image, iplAllocateImageFP and iplDeallocateImageFP must be used instead. An alternative way to create and allocate image is to use the OpenCV equivalent function. Here
http://www.site.uottawa.ca/~laganier/tutorial/opencv+directshow/cvision.htm (9 of 38)5/29/2007 1:11:06 AM
only the size, the pixel depth and the number of channels need to be specified, e.g.: IplImage* color = cvCreate( cvSize(width,height), IPL_DEPTH_8U, 3); To deallocate, you can then call cvReleaseImage(&image). When manipulating images, it is common to sequentially access all pixels of an image. To this end the iplPutPixel and iplGetPixel can be used. You just specified the pixel coordinates and an array containing the values, as follows: unsigned char values[3]; // 3 is for color image iplGetPixel(image, x, y, values); But for a more efficient loop, it is possible to directly access the buffer containing the pixels. Caution must however be taken, because the way this loop must be executed depends on the exact image format. This is illustrated by the following process function, where a 8-bit RGB image, with pixel-oriented data order is scanned. void process(void* img) { IplImage* image = reinterpret_cast<IplImage*>(img); int nl= image->height; int nc= image->width * image->nChannels; int step= image->widthStep; // because of alignment // because imageData is a signed char* unsigned char *data= reinterpret_cast<unsigned char *>(image->imageData); for (int i=0; i<nl; i++) { for (int j=0; j<nc; j+= image->nChannels) { // 3 channels per pixel if (data[j+1] > data[j] && data[j+1] > data[j+2]) { data[j]= 0xFF; // 255
http://www.site.uottawa.ca/~laganier/tutorial/opencv+directshow/cvision.htm (10 of 38)5/29/2007 1:11:06 AM
data[j+1]= 0xFF; data[j+2]= 0xFF; } } data+= step; // next line } } The result is:
Although this is the most efficient way to scan an image, this process can be error prone. In order to simplify this frequent task, an image Iterator can be introduced. The role of this iterator template is to take care of the pointer manipulation involve in the processing of an image. The template is as follows: template <class PEL> class IplImageIterator { int i, i0,j; PEL* data; PEL* pix; int step; int nl, nc; int nch;
public: /* constructor */ IplImageIterator(IplImage* image, int x=0, int y=0, int dx= 0, int dy=0) : i(x), j(y), i0(0) { data= reinterpret_cast<PEL*>(image->imageData); step= image->widthStep / sizeof(PEL); nl= image->height; if ((y+dy)>0 && (y+dy)<nl) nl= y+dy;
http://www.site.uottawa.ca/~laganier/tutorial/opencv+directshow/cvision.htm (11 of 38)5/29/2007 1:11:06 AM
if (y<0) j=0; data+= step*j; nc= image->width ; if ((x+dx)>0 && (x+dx)<nc) nc= x+dx; nc*= image->nChannels; if (x>0) i0= x*image->nChannels; i= i0;
nch= image->nChannels; pix= new PEL[nch];} /* has next ? */ bool operator!() const { return j < nl; } /* next pixel */ IplImageIterator& operator++() {i++; if (i >= nc) { i=i0; j++; data+= step; } return *this;} IplImageIterator& operator+=(int s) {i+=s; if (i >= nc) { i=i0; j++; data+= step; } return *this;} /* pixel access */ PEL& operator*() { return data[i]; } const PEL operator*() const { return data[i]; } const PEL neighbor(int dx, int dy) const { return *(data+dy*step+i+dx); } PEL* operator&() const { return data+i; } /* current pixel coordinates */ int column() const { return i/nch; } int line() const { return j; } }; An iterator of this type can be declared by specifying the type of the pixels in the image and by giving a pointer to the IplImage as argument to the iterator constructor, e.g.: IplImageIterator<unsigned char> it(image); Once the iterator constructed, two operators can be used to iterate over an image. First the !
http://www.site.uottawa.ca/~laganier/tutorial/opencv+directshow/cvision.htm (12 of 38)5/29/2007 1:11:06 AM
operator allows to determine if we reach the end of the image and the * operator that give access to the current pixel. A typical loop will therefore look like this: while (!it) { if (*it < 10) { *it= 0xFF; // 255 } ++it; } Note that if the image contains more than one channel, each iteration will give access to one of the channel of a pixel. This means that in the case of a color pixel, you have to iterate three times for each pixel. In order to access all components of a pixel, the operator & can be used. This one returns an array that contains the current pixel channel values. For example, the previous example will look like this (note how the iterator is incremented this time to make sure that we go from one pixel to another): void process(void* img) { IplImage* image = reinterpret_cast<IplImage*>(img); IplImageIterator<unsigned char> it(image); unsigned char* pixel; while (!it) { pixel= ⁢ if (pixel[1]>pixel[0] && pixel[1]>pixel[2]) { pixel[0]= 0xFF; // 255 pixel[1]= 0xFF; // 255 pixel[2]= 0xFF; // 255 } it+= 3; } } The use of image iterators is as efficient as directly looping with pointers. This is true as long as you set the compiler to optimize for speed, i.e.:
http://www.site.uottawa.ca/~laganier/tutorial/opencv+directshow/cvision.htm (13 of 38)5/29/2007 1:11:06 AM
When the processing involves more than one image, more than one iterator can be used. This is illustrated in the following example: void process(void* img) { IplImage* image = reinterpret_cast<IplImage*>(img); IplImage* tmp= cvCloneImage(image); IplImageIterator<unsigned char> src(tmp,1,1,tmp->width-2,tmp->height-2); IplImageIterator<unsigned char> res(image,1,1,image->width-2,image->height-2); while (!src) { *res= abs(*src - src.neighbor(-1,-1) + src.neighbor(-1,0) src.neighbor(0,-1)); ++src; ++res; } cvReleaseImage(&tmp); } Here the clone of the source image is used as input while the source image is modified inside the loop. Two iterators are therefore defined. Since the processing also involves the neighboring pixels, the neighbor method defined by the iterator is used. Also, in this case, a window is specified when creating the iterator (here it defines a 1-pixel strip around the image where no processing is undertaken). The resulting image is:
are three types of filters: source filters that output video and/or audio signals, transform filters that process an input signal and produce one (or several) output and finally rendering filters that display or save a media signal. The processing of a sequence is therefore done using a series of filters connected together; the output of one filter becoming the input of the next one (you can also have filters with multiple outputs). The first filter is usually a decompressor that reads a file stream and the last filter could be a renderer that displays the sequence in a window. In the DirectShow terminology, a series of filters is called a filter graph. We will first try to process an AVI sequence. Lets first see if DirectX is working fine. To do so, just use the GraphEdit application. This a very useful application included in the DirectX SDK that makes easy the building of filter graphs. It can be started from the Start|Programs| Microsoft DirectX 8.1 SDK|DirectX Utilities menu. The GraphEdit application window will pop up.
Our objective is now to visualize the building blocks required to obtain an AVI renderer. Select Graph|Insert Filters A window will display the list of available filters. Choose the DirectShow Filters tree and select the File Source(Async.) filter.
You will be asked to select an AVI file. The filter will appear in the GraphEdit window in the form of a box. Right-click on the output pin and select the Render Pin option. This is an intelligent option that will determine what filters are required to render the selected source file and will automatically assemble them together as shown here:
For an AVI sequence, the video renderer should be composed of 3 filters. The first one is the splitter that separates the video and audio components; this filter normally has two outputs (video and audio) but note that in the case of the selected sequence, no audio component was available. The second one is the appropriate decompressor that decodes the video sequence. Finally, the third filter is the renderer itself that creates the window and that displays the frame sequence in it. Just push the play button to execute the graph and the selected AVI sequence should be displayed in a window. We can build the same filter graph using Visual C++. You first need to include the following include path in your project settings: C:\DXSDK\samples\Multimedia\DirectShow\BaseClasses
And the following library path: C:\DXSDK\lib Finally add the following library: STRMBASE.LIB DirectX is implemented using the Microsoft COM technology. This means that when you want to do something, you do it by using a given COM interface. In order to initialize the COM layer, you must call: CoInitialize(NULL); And similarly, when you are done with COM, you need to uninitialize it: CoUninitialize(); A COM interface is an abstract class containing pure virtual functions (forming together the interface). Using a COM interface is the only way to communicate with a COM object. They are obtained by calling the appropriate API function. These functions return a value of type HRESULT representing an error code. The simplest way to verify whether a COM call failed or succeeded is to check the return value using the FAILED macro. All COM interface derives from the IUnknown interface. A very important rule when you use an interface is to never forget to release it after you have finished to use it otherwise it will result resource leaks. This is done by calling the Release method of the IUnknown interface which decrements the object's reference count by 1; when the count reaches 0, the object is deallocated. The safest way to call the Realease method is to use the macro SAFE_RELEASE that can be found in dxutil.h located in C:\DXSDK\samples \Multimedia\Common\include This macro is simply defined as: #define SAFE_RELEASE(p) { if(p){(p)->Release();(p)=NULL;}} To use a component of DirectX, you must first call its top-level interface. These are identified by a CLSID identifier and each interface is identified by an IID. For example, to create a DirectShow filter graph (use to build a series of filters) you call: IGraphBuilder *pGraph;
http://www.site.uottawa.ca/~laganier/tutorial/opencv+directshow/cvision.htm (16 of 38)5/29/2007 1:11:06 AM
To request the other interfaces of this object, you use QueryInterface method. For example: pGraph->QueryInterface( IID_IMediaControl, // interface identifier void **)&pMediaControl); // pointer to the interface Once the filter graph is created, it becomes easy to create all the filters required to render an AVI file. This is done by calling pGraph->RenderFile(MediaFile, NULL); This call does what the Render Pin option do in the GraphEdit application. To play the video, two more interfaces are required the IMediaControl that is used to start the playback and the IMediaEvent used to catch when the stream rendering has completed. Here is the complete class: class SequenceProcessor { IplImage* img; // Declare IPL/OpenCV image pointer IGraphBuilder *pGraph; IMediaControl *pMediaControl; IMediaEvent *pEvent; public: SequenceProcessor(CString filename, bool display=true) { CoInitialize(NULL); pGraph= 0; // Create the filter graph if (!FAILED( CoCreateInstance(CLSID_FilterGraph,
http://www.site.uottawa.ca/~laganier/tutorial/opencv+directshow/cvision.htm (17 of 38)5/29/2007 1:11:06 AM
NULL, CLSCTX_INPROC, IID_IGraphBuilder, (void **)&pGraph))) { // The two control interfaces pGraph->QueryInterface(IID_IMediaControl, (void **)&pMediaControl); pGraph->QueryInterface(IID_IMediaEvent, (void **)&pEvent); // Convert Cstring into WCHAR* WCHAR *MediaFile= new WCHAR[filename.GetLength()+1]; MultiByteToWideChar(CP_ACP, 0, filename, -1, MediaFile, filename.GetLength()+1); // Create the filters pGraph->RenderFile(MediaFile, NULL); if (display) { // Execute the filter pMediaControl->Run(); // Wait for completion. long evCode; pEvent->WaitForCompletion(INFINITE, &evCode); } } } ~SequenceProcessor() { // Do not forget to release after use SAFE_RELEASE(pMediaControl); SAFE_RELEASE(pEvent); SAFE_RELEASE(pGraph);
CoUninitialize(); } }; When an AVI file is selected, a rendering filter is created and the sequence is displayed. To have an idea of what filters have been created, we can enumerate them by adding the following member function to our class: std::vector<CString> enumFilters() { IEnumFilters *pEnum = NULL; IBaseFilter *pFilter; ULONG cFetched; std::vector<CString> names;
pGraph->EnumFilters(&pEnum); while(pEnum->Next(1, &pFilter, &cFetched) == S_OK) { FILTER_INFO FilterInfo; char szName[256]; CString fname; pFilter->QueryFilterInfo(&FilterInfo); WideCharToMultiByte(CP_ACP, 0, FilterInfo.achName, -1, szName, 256, 0, 0); fname= szName; names.push_back(fname); SAFE_RELEASE(FilterInfo.pGraph); SAFE_RELEASE(pFilter); } SAFE_RELEASE(pEnum); return names; } This method simply creates a vector of strings (you have to include <vector>) containing the names of the filters associated with the generated filter graph. This name is obtained by reading
http://www.site.uottawa.ca/~laganier/tutorial/opencv+directshow/cvision.htm (19 of 38)5/29/2007 1:11:06 AM
the FILTER_INFO structure. The enumeration is obtained by calling the method EnumFilter of the FilterGraph instance. Note how all interfaces are released, including the one indirectly obtained through FILTER_INFO that also contains a pointer to the associated filter graph. To display the filter names, we add a CListBox to the dialog. Do not forget to add a control member variable to this list. This can be done using the Class Wizard of the View menu. Select the Member Variables tab and then select the control ID that corresponds to the ClistBox (the name should be IDC_LIST1). Click on Add Variable button, call the variable m_list ; its Category must be Control. The m_list variable is now available as a member variable of the dialog class. The filter names are added to this list by changing the OnOpen method as follows: void CCvisionDlg::OnOpen() { CFileDialog dlg(TRUE, _T("*.bmp"), "", OFN_FILEMUSTEXIST|OFN_PATHMUSTEXIST|OFN_HIDEREADONLY, "image files (*.bmp; *.jpg) | *.bmp;*.jpg|AVI files (*.avi) | *.avi|All Files (*.*)|*.*||",NULL); char title[]= {"Open Image"}; dlg.m_ofn.lpstrTitle= title; if (dlg.DoModal() == IDOK) { CString path= dlg.GetPathName(); CString ext= dlg.GetFileExt(); if (proc != 0) delete proc; if (procseq != 0) delete procseq; if (ext.Compare("avi")) { proc= new ImageProcessor(path);
} else { procseq= new SequenceProcessor(path); // Obtaining the list of filters std::vector<CString> names= procseq->enumFilters(); m_list.ResetContent(); for (int i=0; i<names.size(); i++) m_list.AddString(names[i]); } } } and now if you open an AVI file, you can see the filter list:
pPin->QueryDirection(&PinDirThis); if (bFound = (PinDir == PinDirThis)) break; pPin->Release(); } pEnum->Release(); return (bFound ? pPin : 0); } The PIN_DIRECTION can be PINDIR_OUTPUT or PINDIR_INPUT. For example, to obtain a source filter and its output pin ready to be connected, we can do: IBaseFilter* pSource= NULL; // Add a source filter to the current graph pGraph->AddSourceFilter(mediaFile,0,&g_pSource); // Obtain the output pin IPin* pSourceOut= GetPin(pSource, PINDIR_OUTPUT); To add a filter (it must first be created) to the filter graph, we use the AddFilter method: // Add the pFilter to the current graph pGraph->AddFilter( pFilter, L"Name of the Filter"); The second argument is a name for the filter that must identifies it uniquely in the filter graph (if you set it to NULL, the graph manager will generate one for you). To connect to pins together, we simply use the Connect method // Connect pIn to pOut pGraph->Connect(pOut, pIn); What filters do we need to display an AVI sequence? We know the answer from the results displayed in the filter list box or in the GraphEdit application: 1. a Source filter that reads the file 2. an AVI splitter that reads the stream and split it into a video and an audio channel (we ignore the latter here). 3. an AVI video decompressor that decodes the video stream 4. a Video renderer that plays the video sequence in a window.
Note that for some filter, the pins are created dynamically. This is the case of the AVI splitter that will create the required output pins (video and/or audio) only when the source is connected to its input. This makes sense since the format of the output of this kind of filter is known only when the type of its input is known. It must also be obvious that, to be connected together, the respective output and input pins of two filters must be of compatible types. The properties of a given pin (such as major type and subtype) can be obtained as follows: AM_MEDIA_TYPE amt; pPin->ConnectionMediaType(&amt); The following member function will now create the complete filter graph. The procedure is simple: we first create the filter using CoCreateInstance (finding the right CLSID identifier is the key to obtain the filter we want), add it to the filter graph, obtain its input pin and connect if to the output pin of the previous filter. bool createFilterGraph(CString filename) { WCHAR *mediaFile= new WCHAR[filename.GetLength()+1]; MultiByteToWideChar(CP_ACP, 0, filename, -1, mediaFile, filename.GetLength()+1); // Create a source filter specified by filename IbaseFilter* pSource= NULL; if(FAILED(pGraph->AddSourceFilter(mediaFile,0,&pSource))) { ::MessageBox( NULL, "Unable to create source filter", "Error", MB_OK | MB_ICONINFORMATION ); return 0; } IPin* pSourceOut= GetPin(pSource, PINDIR_OUTPUT); if (!pSourceOut) { ::MessageBox( NULL, "Unable to obtain source pin", "Error", MB_OK | MB_ICONINFORMATION ); return 0; } // Create an AVI splitter filter IBaseFilter* pAVISplitter = NULL;
http://www.site.uottawa.ca/~laganier/tutorial/opencv+directshow/cvision.htm (23 of 38)5/29/2007 1:11:06 AM
if(FAILED(CoCreateInstance(CLSID_AviSplitter, NULL, CLSCTX_INPROC_SERVER, IID_IBaseFilter, (void**)&pAVISplitter)) || !pAVISplitter) { ::MessageBox( NULL, "Unable to create AVI splitter", "Error", MB_OK | MB_ICONINFORMATION ); return 0; } IPin* pAVIsIn= GetPin(pAVISplitter, PINDIR_INPUT); if (!pAVIsIn) { ::MessageBox( NULL, "Unable to obtain input splitter pin", "Error", MB_OK | MB_ICONINFORMATION ); return 0; } // Connect the source and the splitter if(FAILED(pGraph->AddFilter( pAVISplitter, L"Splitter")) || FAILED(pGraph->Connect(pSourceOut, pAVIsIn)) ) { ::MessageBox( NULL, "Unable to connect AVI splitter filter", "Error", MB_OK | MB_ICONINFORMATION ); return 0; } // Create an AVI decoder filter IBaseFilter* pAVIDec = NULL; if(FAILED(CoCreateInstance(CLSID_AVIDec, NULL, CLSCTX_INPROC_SERVER, IID_IBaseFilter, (void**)&pAVIDec)) || !pAVIDec) { ::MessageBox( NULL, "Unable to create AVI decoder", "Error", MB_OK | MB_ICONINFORMATION); return 0; }
IPin* pAVIsOut= GetPin(pAVISplitter, PINDIR_OUTPUT); if (!pAVIsOut) { ::MessageBox( NULL, "Unable to obtain output splitter pin", "Error", MB_OK | MB_ICONINFORMATION ); return 0; } IPin* pAVIDecIn= GetPin(pAVIDec, PINDIR_INPUT); if (!pAVIDecIn) { ::MessageBox( NULL, "Unable to obtain decoder input pin", "Error", MB_OK | MB_ICONINFORMATION ); return 0; } if(FAILED(pGraph->AddFilter( pAVIDec, L"Decoder")) || FAILED(pGraph->Connect(pAVIsOut, pAVIDecIn)) ) { ::MessageBox( NULL, "Unable to connect AVI decoder filter", "Error", MB_OK | MB_ICONINFORMATION ); return 0; } IPin* pAVIDecOut= GetPin(pAVIDec, PINDIR_OUTPUT); if (!pAVIDecOut) { ::MessageBox( NULL, "Unable to obtain decoder output pin", "Error", MB_OK | MB_ICONINFORMATION ); return 0; } // Render from the decoder if(FAILED(pGraph->Render( pAVIDecOut ))) {
::MessageBox( NULL, "Unable to connect to renderer", "Error", MB_OK | MB_ICONINFORMATION ); return 0; } SAFE_RELEASE(pAVIDecIn); SAFE_RELEASE(pAVIDecOut); SAFE_RELEASE(pAVIDec); SAFE_RELEASE(pAVIsOut); SAFE_RELEASE(pAVIsIn); SAFE_RELEASE(pAVISplitter); SAFE_RELEASE(pSourceOut); SAFE_RELEASE(pSource);
return 1; } By executing this manually built filter, the result is the same as previously.
We will see latter how the ProxyTrans filter can be used to process the sequence. But since we want to transform the original sequence through some process, it might be useful to be able to save the processed sequence. Lets make some test using again the GraphEdit application. Delete the Video Renderer filter; we will replace it by a chain that will compress back the sequence and save it to a file. We therefore need a Video Compressor, an AVI multiplexor and a File Writer. You can easily find all these filters in the list of available filters when you click on the Insert Filter button. Note that when you select the File Writer filter, you will be ask to specify a name for the output file. The resulting graph should be as follows:
Obviously, if you play this graph, the resulting file will be the same as the original because our ProxyTrans filter that is supposed to do the processing does not do anything for now. However, the size of the output sequence might be different from the size of the original sequence, this is because of the compressor used in the graph that might use different parameters to compress the sequence. You probably also noted that when you play the graph, no sequence is displayed, simply because we removed the Renderer. It is quite easy to add an extra path to the filter in order to allow the simultaneous display and saving of the sequence. The Smart Tee is the filter you need. Add it and create the following graph:
Note that the Smart Tee filter has two output pins. The capture pin controls the sequence flow; the preview pin will receive frames only if extra computational resources are available. When processing a sequence, you could also use two Smart Tee filters, one to display the original sequence, the other to display the processed one; that is what we will do now when building manually our filter graph. As you can see in the figure above, the creation of a video processing filter graph requires connecting several filters together. Many lines have to be added to our createFilterGraph method; the probability of making an error becomes then quite high. However, a closer look at this method reveals that the same sequence is repeated several times, suggesting that some generic function could be introduced to help the programmer. Following this idea, we can write an addFilter utility function. This one will be called each time a new filter need to be created and connected to some filter of a graph. This function has the following signature: bool addFilter(REFCLSID filterCLSID,
http://www.site.uottawa.ca/~laganier/tutorial/opencv+directshow/cvision.htm (27 of 38)5/29/2007 1:11:06 AM
The first parameter is the CLSID identifier that specifies which filter will be created. The second parameter is the name that will be given to this filter in the current graph. The third parameter is a pointer to the filter graph. The outputPin parameter is both an input and an output parameter. As an input, it contains a pointer to the output pin to which the filter to be created must be connected. When the function returns, this parameter will contain a pointer to the output pin(s) of the filter thus created; the number of output pins that needs to be created is given by the last parameter of this function. The function returns true if the filter has been successfully created and connected to the filter graph. This function can be written in a straightforward manner. First the filter is created using CoCreateInstance, then the input pin is obtained and is connected to the specified output pin. Once this done, the last step consists in obtaining the required number of output pins. The function is then as follows: bool addFilter(REFCLSID filterCLSID, WCHAR* filtername, IGraphBuilder *pGraph, IPin **outputPin, int numberOfOutput) { // Create the filter. IBaseFilter* baseFilter = NULL; char tmp[100]; if(FAILED(CoCreateInstance( filterCLSID, NULL, CLSCTX_INPROC_SERVER, IID_IBaseFilter, (void**)&baseFilter)) ||!baseFilter) { sprintf(tmp,"Unable to create %ls filter", filtername); ::MessageBox( NULL, tmp, "Error", MB_OK|MB_ICONINFORMATION ); return 0; }
// Obtain the input pin. IPin* inputPin= GetPin(baseFilter, PINDIR_INPUT); if (!inputPin) { sprintf(tmp, "Unable to obtain %ls input pin", filtername); ::MessageBox( NULL, tmp, "Error", MB_OK | MB_ICONINFORMATION ); return 0; } // Connect the filter to the ouput pin. if(FAILED(pGraph->AddFilter( baseFilter, filtername)) || FAILED(pGraph->Connect(*outputPin, inputPin)) ) { sprintf(tmp, "Unable to connect %ls filter", filtername); ::MessageBox( NULL, tmp, "Error", MB_OK | MB_ICONINFORMATION ); return 0; } SAFE_RELEASE(inputPin); SAFE_RELEASE(*outputPin); // Obtain the output pin(s). for (int i=0; i<numberOfOutput; i++) { outputPin[i]= 0; outputPin[i]= GetPin(baseFilter, PINDIR_OUTPUT, i+1); if (!outputPin[i]) { sprintf(tmp, "Unable to obtain %s output pin (%d)",
filtername, i); ::MessageBox( NULL, tmp, "Error", MB_OK | MB_ICONINFORMATION ); return 0; } } SAFE_RELEASE(baseFilter); return 1; } Using this function, it becomes easy to create a complex filter graph. The one we will build now will include the ProxyTrans filter (note that the header file initguid.h must be included to be able to use this filter). To be useful, this filter must do something. In fact, the objective of this filter is to give access to the programmer to each frame of the sequence that can thus be processed. This is realized through a callback function that is automatically called for each frame of the sequence. This callback function passes in argument a pointer to the current image, the user is then free to analyze and modify this image. Here is an example of a valid callback function that can be used with the ProxyTrans filter. void process(void* img) { IplImage* image = reinterpret_cast<IplImage*>(img); cvErode( image, image, 0, 2 ); } In order to have this function called, it must be registered to the ProxyTrans filter. This is simply done by calling this method of the IProxyTransform interface. pProxyTrans->set_transform(process, 0); Here is now the function that creates the filter graph that processes an input sequence and save the result in a file. Two preview windows are displayed, one for the original sequence, the other one for the out sequence. bool createFilterGraph() { IPin* pSourceOut[2];
http://www.site.uottawa.ca/~laganier/tutorial/opencv+directshow/cvision.htm (30 of 38)5/29/2007 1:11:06 AM
pSourceOut[0]= pSourceOut[1]= NULL; // Video source addSource(ifilename, pGraph, pSourceOut); // Add the decoding filters addFilter(CLSID_AviSplitter, L"Splitter", pGraph, pSourceOut); addFilter(CLSID_AVIDec, L"Decoder", pGraph, pSourceOut); // Insert the first Smart Tee addFilter(CLSID_SmartTee, L"SmartTee(1)", pGraph, pSourceOut,2); // Add the ProxyTrans filter addFilter(CLSID_ProxyTransform, L"ProxyTrans", pGraph, pSourceOut); // Set the ProxyTrans callback IBaseFilter* pProxyFilter = NULL; IProxyTransform* pProxyTrans = NULL; pGraph->FindFilterByName(L"ProxyTrans",&pProxyFilter); pProxyFilter->QueryInterface(IID_IProxyTransform, (void**)&pProxyTrans); pProxyTrans->set_transform(process, 0); SAFE_RELEASE(pProxyTrans); SAFE_RELEASE(pProxyFilter); // Render the original (decoded) sequence // using 2nd SmartTee(1) output pin addRenderer(L"Renderer(1)", pGraph, pSourceOut+1); // Insert the second Smart Tee addFilter(CLSID_SmartTee, L"SmartTee(2)", pGraph, pSourceOut,2); // Encode the processed sequence addFilter(CLSID_AviDest, L"AVImux", pGraph, pSourceOut); addFileWriter(ofilename, pGraph, pSourceOut);
// Render the transformed sequence // using 2nd SmartTee(2) output pin addRenderer(L"Renderer(2)", pGraph, pSourceOut+1); return 1; }
Check point #4: source code of the above example. You will note that the output file produced by this program is quite big. This is simply because we are not using any compressor when the sequence is saved. This is because such filter can only be obtained through enumeration. This is discussed in the next section.
IMoniker *pMoniker; ULONG cFetched; while(pEnumCat->Next( 1, // number of elements requested &pMoniker, // pointer to the moniker &cFetched) // number of elements returned == S_OK) These ones are identified by the IMoniker interface, an interface used to uniquely identify a COM object. A moniker is similar to a path in a file system and it can be used to obtain information about a given filter: IPropertyBag *pPropBag; pMoniker->BindToStorage(0, 0, IID_IPropertyBag, (void **)&pPropBag); Properties of a filter are obtained using the IPropertyBag interface. This generic interface is used to read and write properties using text. Moniker can also be used to create a filter: IBaseFilter* baseFilter; pMoniker->BindToObject(NULL, NULL, IID_IBaseFilter, (void**)&baseFilter); This later approach must be used to create an enumerated filter instead of using the CoCreateInstance function. The function presented below can be used to obtain the available filters of a category. It returns the friendly names and the CLSID identifier of each filter. Either can be used after to create a given filter. void enumFilters(REFCLSID CLSIDcategory, std::vector<CString>& names, std::vector<CLSID>& clsidFilters) { // Create the System Device Enumerator. HRESULT hr; ICreateDevEnum *pSysDevEnum = NULL; hr = CoCreateInstance(CLSID_SystemDeviceEnum, NULL, CLSCTX_INPROC_SERVER, IID_ICreateDevEnum, (void **)&pSysDevEnum);
// Obtain a class enumerator for the specified category. IEnumMoniker *pEnumCat = NULL; hr = pSysDevEnum->CreateClassEnumerator(CLSIDcategory, &pEnumCat, 0); if (hr == S_OK) { // Enumerate the monikers. IMoniker *pMoniker; ULONG cFetched; while(pEnumCat->Next(1, &pMoniker, &cFetched) == S_OK) { IPropertyBag *pPropBag; pMoniker->BindToStorage(0, 0, IID_IPropertyBag, (void **)&pPropBag); // To retrieve the friendly name of the filter VARIANT varName; VariantInit(&varName); hr = pPropBag->Read(L"FriendlyName", &varName, 0); if (SUCCEEDED(hr)) { CString str(varName.bstrVal); names.push_back(str); SysFreeString(varName.bstrVal); } VariantClear(&varName); VARIANT varFilterClsid; varFilterClsid.vt = VT_BSTR; // Read CLSID string from property bag hr = pPropBag->Read(L"CLSID", &varFilterClsid, 0); if(SUCCEEDED(hr)) { CLSID clsidFilter; // Save filter CLSID if(CLSIDFromString(varFilterClsid.bstrVal,
&clsidFilter) == S_OK) { clsidFilters.push_back(clsidFilter); } SysFreeString(varFilterClsid.bstrVal); } // Clean up. pPropBag->Release(); pMoniker->Release(); } pEnumCat->Release(); } pSysDevEnum->Release(); } This function is used to select a compression filter to be used in our sequence processing application. The list of compression filters is displayed in a list box (obtained after the output sequence is selected: void CCvisionDlg::OnSave() { // Select output file // Obtain and display compressors std::vector<CString> fname; std::vector<CLSID> fclsid; enumFilters(CLSID_VideoCompressorCategory, fname, fclsid); m_list.ResetContent(); for (int i=0; i<fname.size(); i++) m_list.AddString(fname[i]);
} The compression filter is selected by clicking on the corresponding item before pushing the process button. The sequence will then be saved, compressed according to the default control parameters of the chosen compressor. What if you are not satisfied with the resulting compression rate? You can obviously try to select another compression filter; however, it is also possible to use different control parameter values for the chosen filter. This can be done through a special interface called IAMVideoCompression. This interface is normally supported by the output pin of a compression filter. You can obtain the interface by calling the QueryInterface method of the pin: IAMVideoCompression *pCompress; pPin->QueryInterface(IID_IAMVideoCompression, (void**)&pCompress); Once obtained, the interface can be used to set the compression properties, namely: the key frame rate (a long integer), the number of predicted frames per key frame (also a long integer), and the relative compression quality (a double expressing a percentage between 0.0 and 1.0). It is then easy to set these values using the appropriate methods. long keyFrames, pFrames; double quality; hr = pCompress->put_KeyFrameRate(keyFrames); hr = pCompress->put_PFramesPerKeyFrame(pFrames); hr = pCompress->put_Quality(quality);
Check point #5: source code of the above example. The same strategy can be used to select a video capture device (e.g. a USB camera). The only difference is that these devices obviously do not have input pins. However, they normally have two output pins (one for capture and one for preview). The basic steps to build a camera-based video processing filter graph are as follows. First add the video capture device through enumeration: CString cameraName= ?; IPin* pSourceOut[2]; pSourceOut[0]= pSourceOut[1]= NULL; addFilterByEnum(CLSID_VideoInputDeviceCategory, cameraName,pGraph,pSourceOut,2);
http://www.site.uottawa.ca/~laganier/tutorial/opencv+directshow/cvision.htm (36 of 38)5/29/2007 1:11:06 AM
Second, add the ProxyTrans filter: addFilter(CLSID_ProxyTransform, L"ProxyTrans", pGraph, pSourceOut); Then you should add a renderer to the preview pin: addRenderer(L"Renderer(1)", pGraph, pSourceOut+1); And finally, you add the required filters to save the resulting sequence to a file: addFilter(CLSID_AviDest, L"AVImux", pGraph, pSourceOut); addFileWriter(ofilename, pGraph, pSourceOut); The complete application that includes the camera selection is given here.
Now to be able to change the camera settings (such as resolution or frame rate), you must access the facilities offered by the driver of the camera. The easiest way to do it is to use the old VideoForWindows technology (an ancestor of DirectShow). If the camera you use has a driver compatible with this technology, then it is possible to obtain dialog boxes to control the camera settings. This is done through the IAMVfwCaptureDialogs interface of the camera filter. The first thing to do is then to check if the camera supports this filter and if yes, to check what dialogs are available. The three standard dialogs are designated by an enumerated type: VfwCaptureDialog_Source, VfwCaptureDialog_Format, VfwCaptureDialog_Display. The procedure to obtain one of these dialogs is quite straightforward: IAMVfwCaptureDialogs *pVfw = 0; // pCap is a pointer to the camera base filter if (SUCCEEDED(pCap->QueryInterface( IID_IAMVfwCaptureDialogs, (void**)&pVfw))) { // Check if the device supports this dialog box. if (S_OK == pVfw->HasDialog(VfwCaptureDialog_Format)){ // Show the dialog box. pVfw->ShowDialog(VfwCaptureDialog_Format,
http://www.site.uottawa.ca/~laganier/tutorial/opencv+directshow/cvision.htm (37 of 38)5/29/2007 1:11:06 AM
hwndParent); // parent window } } A dialog like the following should then appear: