1
$\begingroup$

I'm trying to understand a proof regarding radix-sort but to no avail.

I'll first write down a brief summary of the proof and then assign some questions which I hope will be clarified enough.

Suppose you have an array of integer numbers in the range $\{0,1,2,\ldots n \log n\}$. Show and explain an effective algorithm in the worst case which finds out if there are two elements with the same value. note - You can use an extra memory in order of magnitude equals to $O(n)$.

I don't really care about how the algorithm will look like. The idea to this proof is using radix-sort in $O(n)$ and then looking for two elements with the same value as the array is sorted.

The proof outline:

Let's suppose we are examining a larger domain which is $\{0,1,2 ,\ldots n^2 - 1\}$. Now we'll treat each number according to its binary representation using radix-sort bit by bit.

Right after this, comes the part I can't understand at all....

As we know the order of magnitude of radix-sort is $\Theta(d(n+k))$ and therefore all we have to decide is which a to choose to have order of magnitude equals to $O(n)$.

using the formula $\frac{2\log n}{a} (n + 2^a)$.

After this step, you just choose $a = \log n$ and you are done.

My questions are:

  1. How did the writer concluded the following formula: $\frac{2\log n}{a} (n + 2^a)$?

  2. Also, why do the writer prefer to work on the domain of $\{0,1,2 ,\ldots n^2 - 1\}$ instead the one given in the question?

$\endgroup$
2

1 Answer 1

1
$\begingroup$
  1. In the formula $\Theta(d(n+k))$, $d$ is the number of digits (iterations in the sort) and $k$ is the size of the buckets (the base you are working in). So, if you were sorting the numbers 0 through 999 with 10 buckets, there would be $d=3$ iterations (since the numbers have length 3 in base 10) and we would have $k=10$, which is the base.

    If we leave the base, $2^a$ to be determined later (I guess by using a power of two we can read a bits and let that determine the bucket), there are $\log_{2^a}(n^2) = \frac{\log_2 n^2}{\log_2(2^a)} = \frac{2 \log n}{a}$ iterations. $k= 2^a$ is just the base. So $d(n+k) = \frac{2 \log n}{a}(n + 2^a)$.

  2. The expanded range is to make the math prettier. Above we took $\log n^2$, which worked out nicely to give a constant factor 2 in our assymptotic. If we instead had $\log (n \log n)$, things would not be as nice. It should still work (possibly for a different choice of $a$), and if you're comfortable with big-O work you can do it as an exercise, but the proof is much cleaner when you don't have to argue that $O(\log n + \log \log n) = O(\log n)$.

$\endgroup$
2
  • $\begingroup$ What's to argue? $\lg n$ dominates $\lg \lg n$ and is thus a lower order term which can be dropped when determining the BigO value. $\endgroup$ Commented May 24, 2013 at 13:09
  • $\begingroup$ My point is that both log $n$ and log($n$ log $n$) are in $O$(log $n$), but the first is trivial, and therefore easier for people just learning big-O notation, while the second requires a bit of justification and understanding along the lines of what you're suggesting. $\endgroup$ Commented May 24, 2013 at 16:49

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.