The Problem
I was coding an image processing application to identify unicolor circular objects in an image. I'm doing this using OpenCV 4.7.0 for C++. To develop the necessary functionality, I perform the following operations:
Thresholding: Filtering out the objects by color. This is done by 3 steps:
- Splitting the Image into
R
G
B
channels - Applying a threshold to each channel
- Bitwise AND on the post-threshold channels
- Splitting the Image into
Gaussian Blurring: Applying a Gaussian Blur filter with
ksize = 9x9
andsigma = 1.0
Hough Circles: Applying the Hough Circles Transform to the blurred image. The following parameters were used (after loads of trial-and-error):
dp = 1
minDist = 20
cannyThreshold = 50
votesThreshold = 30
minRadius = 1
maxRadius = 75
After attaining satisfactory results on my CPU code, I decided to migrate it to GPU. Why GPU? Well, the application is a part of a game, which needs to run at a high frame rate
But on migrating my code to run on GPU, I noticed that the results were off from the CPU code. The GPU code returned pretty undesirable values while running on the same parameters.
Why is this happening?
The Setup
- OpenCV 4.7.0
- CUDA v12.0
- Intel i5-10300H
- NVIDIA GeForce GTX 1650Ti
The Code
I wrote the code as two individual functions to perform the same on CPU and GPU
/*----------------------------------------CPU Code----------------------------------------*/
cv::Mat extractTargetsCPU(cv::Mat frame) {
std::vector<cv::Mat> channels;
cv::split(frame, channels);
cv::threshold(channels[0], channels[0], 127, 255, cv::THRESH_BINARY);
cv::threshold(channels[1], channels[1], 127, 255, cv::THRESH_BINARY);
cv::threshold(channels[2], channels[2], 127, 255, cv::THRESH_BINARY_INV);
cv::bitwise_and(channels[1], channels[2], channels[1]);
cv::bitwise_and(channels[0], channels[1], channels[0]);
cv::Mat targets;
cv::GaussianBlur(channels[0], channels[0], cv::Size(9, 9), 1.0f);
cv::HoughCircles(channels[0], targets, cv::HOUGH_GRADIENT, 1, 20, 50, 30, 1, 75);
return targets;
}
/*----------------------------------------GPU Code----------------------------------------*/
cv::Ptr<cv::cuda::Filter> gaussianFilter = cv::cuda::createGaussianFilter(CV_8UC1, CV_8UC1, cv::Size(9, 9), 1.0f);
cv::Ptr<cv::cuda::HoughCirclesDetector> houghCircles = cv::cuda::createHoughCirclesDetector(1, 20, 50, 30, 1, 75);
cv::cuda::GpuMat extractTargetsGPU(cv::cuda::GpuMat frame) {
std::vector<cv::cuda::GpuMat> channels;
cv::cuda::split(frame, channels);
cv::cuda::Stream bThresholdStream, gThresholdStream, rThresholdStream;
cv::cuda::threshold(channels[0], channels[0], 127, 255, cv::THRESH_BINARY, bThresholdStream);
cv::cuda::threshold(channels[1], channels[1], 127, 255, cv::THRESH_BINARY, gThresholdStream);
cv::cuda::threshold(channels[2], channels[2], 127, 255, cv::THRESH_BINARY_INV, rThresholdStream);
bThresholdStream.waitForCompletion();
gThresholdStream.waitForCompletion();
rThresholdStream.waitForCompletion();
cv::cuda::bitwise_and(channels[1], channels[2], channels[1]);
cv::cuda::bitwise_and(channels[0], channels[1], channels[0]);
gaussianFilter -> apply(channels[0], channels[0]);
cv::cuda::GpuMat targets;
houghCircles -> detect(channels[0], targets);
return targets;
}
Observations
I tested my functions on five test images and got the following outputs and sadly the GPU outputs are undesirable.
By undesirable, I mean that the GPU code is identifying a single circle as multiple concentric circles
To debug this, I exported the images at the Thresholding and Gaussian Blurring stages and analyzed them. Turns out that the CUDA implementation of the Gaussian Blur (cv::cuda::createGaussianFilter
) does not perform the same function as the ordinary CPU implementation (cv::GaussianBlur
)
To verify HoughCircles
, I read the exported blurred image from the CPU / GPU and fed it to the Hough Transform on both CPU and GPU. The undesirable behavior was still observed on the GPU function which made me question the accuracy of cv::cuda::createHoughCirclesDetector
+---------+-------------------+----------------+
| | Circles found by | Gaussian Blur |
| Image # +-------------------+ Pixel Mismatch |
| | CPU | GPU | in GPU vs CPU |
+---------+---------+---------+----------------+
| 1 | 1 | 3 | 324 |
| 2 | 3 | 4 | 223 |
| 3 | 3 | 4 | 213 |
| 4 | 3 | 4 | 197 |
| 5 | 1 | 2 | 179 |
+---------+---------+---------+----------------+
All this experimentation raised the following questions:
- Is the CUDA implementation of the standard functions accurate?
- Is this by any chance an issue of the GPU architecture rather than the issue of implementations?
- Am I missing out on something about GPU Memory / Parallelization in my implementation?
- How do I fix this / what is a possible workaround to this issue?
HoughCircles
step. I analyzed theGaussianBlur
and the difference was in a 1-2 pixel contour around each circle. It is theHoughCircles
which is messing up significantly