Surf/Sift = overkill in this case you certainly don't need it.

If you want real time speed (about 20fps+ on a 800x600 image) I recommend using [Cuda][1] to implement edge detection using a standard filter scheme like [sobel][2], then implement binarization + [image closure][3] to make sure the edges of circles are not segmented apart. 

The hardest part will be fitting circles. This is assuming you already got to the step where you have taken edges and made sure they are connected using image closure (morphology.) At this point I would proceed as follows:

1.  run [blob analysis/connected components][4] to segment out circles that do _not_ touch. If circles can touch the next step will be trickier
2.  for each connected componet/blob fit a circle or rectangle using [RANSAC][5] which can run in realtime (as opposed to Hough Transform which I believe is _very_ hard to run in real time.)

Step 2 will be _much_ harder if you can not segment the connected components that form circles seperately, so some additional though should be invested on how to guarantee that condition.

Good luck.


  [1]: http://en.wikipedia.org/wiki/CUDA
  [2]: http://en.wikipedia.org/wiki/Sobel_operator
  [3]: http://homepages.inf.ed.ac.uk/rbf/HIPR2/close.htm
  [4]: http://en.wikipedia.org/wiki/Connected_Component_Labeling
  [5]: http://en.wikipedia.org/wiki/RANSAC