Min and max value of index of coincidence

Question

I am pretty new to cryptography and trying to understand some mathematical aspects. Studying different types of cipher, I came across the definition of index of coincidence which is as follows:

Index of coincidence for a ciphertext of length $n$, in which the letter $A$ occurs $f_A$ times, $B$ occurs $f_B$ times, etc., is given by the formula $\frac{f_A(f_A−1)+f_B(f_B−1)+...+f_Z(f_Z−1)}{n(n−1)}$

The definition seems to be understandable for me, but I have a problem with one of the task connected with this topic:

Let $n$ be a natural number. Determine the largest and smallest possible value of index of coincidence for a string of length $n$. Give an example of a string of length $n$ for which the value of index of coincidence is the largest/the smallest.

Determining the maximum value of index of coincidence is quite obvious - it is $1$ and occurs when all letters in the ciphertext are the same (then we have $\frac{n(n-1)} {n(n-1)} = 1$).

Regarding the minimum value, I thought about the case when $n \le 26$ (the length of ciphertext is less than or equal to the number of letters in English alphabet). Then, the minimum value of index of coincidence is $0$ and it is when each letter in ciphertext occurs at most once.

But what if $n > 26$? Then it is not possible that each letter occurs at most once and some of them have to be repeated. I have no idea how to determine the minimum value of index of coincidence then. Can you help me/give me some tips?

As a guess, I'd look at the case where each letter occurs the same number of times (within $1$, to handle the case where $26\nmid n$. Seems the most natural "opposite" for the constant string. And I'd suggest handling the binary case first, as in that case it's a lot easier to write out all the possibilities. — lulu, Commented Mar 22 at 20:06
I've already did it for strings of length $\le26$ (I analysed cases when each letter occurs $0$ or $1$ time) and in this case the minimum value of index of coincidence is $0$. But I don't know how to determine its value for strings of greater length, because then some of the letters MUST be repeated. So for example, let's assume that we have a string of length 27, then the minimum value of index of coincidence is $\frac{2(2-1)}{27\cdot26}$, so only one letter occurs two times, and the others occurs only one time. But how to determine it for ANY n? — xterg, Commented Mar 22 at 23:54
I told you a natural guess (which is in fact correct). I suggest: try to prove that the guess works for a binary alphabet (not too bad). Then think about a general alphabet. — lulu, Commented Mar 22 at 23:56
By "binary alphabet" you mean the alphabet where only two letters are available instead of 26 as in general? — xterg, Commented Mar 22 at 23:59
Yes. And by "general alphabet" I mean an alphabet with $k$ letters for any $k$. Really, though, the binary case has all the information you need. Suppose that in the minimal case you have $f_0≥f_1+2$. Then define a new configuration with $f_0'=f_0-1$ and $f_1'=f_1+1$. Prove that the new configuration is lower, contradicting the assumption of minimality. — lulu, Commented Mar 23 at 0:04

lulu · Accepted Answer · 2024-03-23 13:34:44Z

1

To summarize the discussion in the comments:

For any alphabet and any $n$, the minimal solution is realized when, given any two distinct letters $A,B$, we have $|f_A-f_B|≤1$. If the alphabet has $k$ letters and the string is $n$ letters long, then either a letter occurs $\big \lfloor \frac nk \big \rfloor$ times, or one more than that.

Proof: as there are only finitely many possible words of a given length, the minimum configuration must exist (it might not be unique, of course). Pick a minimal configuration. Suppose, to the contrary of our claim, that in this minimal configuration we can find two letters with $f_A≥f_B+2$. We need to derive a contradiction. To do so, consider the same configuration only let $f_A'=f_A-1$ and $f_B'=f_B+1$. That is to say, make a new configuration by replacing one of the $A's$ with a $B$. Now, the new numerator differs from the old only in the terms corresponding to $A,B$. We note that $$f_A'(f_A'-1)+f_B'(f_B'-1)=(f_A-1)(f_A-2)+(f_B+1)f_B=$$ $$=f_A(f_A-1) +f_B(f_B-1)-2(f_A-1)+2f_B$$

And now we just have to remark that $f_A≥f_B+2\implies 2(f_A-1)≥2f_B+2$ so this switch lowers the numerator. This contradicts our assumption of minimality, so we are done.

To describe the minimal configuration precisely, write $n=qk+r$ using the usual division algorithm. Thus, $0≤r<k$ and $q=\big \lfloor \frac nk \big \rfloor$. Now use $q$ of each letter in your word (in whatever locations, that's not important), this will leave $r$ blanks. Fill those blanks with $r$ distinct letter, doesn't matter which. In this way, we see that, for $r$ letters, $f_*=\big \lfloor \frac nk \big \rfloor+1$ while for the other $k-r$ letters $f_*=\big \lfloor \frac nk \big \rfloor$

edited Mar 23 at 13:34

answered Mar 23 at 1:33

lulu

75.3k6 gold badges84 silver badges136 bronze badges

$\begingroup$ I'm not sure if I undserstand your proof correctly: first we assume that some minimal configuration exists and it satisfies the constraint $|f_A-f_B|\le1$. Then to the contrary, we suppose that in this configuration we can find two letters for which the number of their occurences differs by at least $2$ and shows that if it were so, the initial configuration wouldn't be minimal. But this is only our assumption that its is minimal, so, how does this confirm that the configuration which satisfies constraint $|f_A-f_B|\le1$ is really the minimal one? $\endgroup$
– xterg
Commented Mar 23 at 12:52
$\begingroup$ The claim is that a minimal configuration satisfies that inequality. To prove it, we work by contradiction. Suppose you had a minimal configuration that didn't satisfy the inequality. That would mean that, for some pair of letters, we had $f_A≥f_B+2$. From that, we derive a contradiction (of minimality), thereby proving the claim. $\endgroup$
– lulu
Commented Mar 23 at 13:01
$\begingroup$ But I don't understand this: point: $f_A \ge f_B + 2$ lowers the numerator. So the final result is lower than the result which satisfies $|f_A - f_B| \le 1$. Thus, it is minimal. $\endgroup$
– xterg
Commented Mar 23 at 13:18
$\begingroup$ Not following. Again: I suppose you gave me a configuration which violated the inequality. My construction shows that it is not minimal (as I use your configuration to construct one that's "smaller"). Therefore , any minimal configuration must satisfy the inequality. $\endgroup$
– lulu
Commented Mar 23 at 13:20
$\begingroup$ Remember, any theorem is equivalent to its contrapositive. Thus the statement "any minimal configuration satisfies the inequality" is equivalent to "any configuration which does not satisfy the inequality is not minimal" and my argument proves the second version. $\endgroup$
– lulu
Commented Mar 23 at 13:21

| Show 2 more comments

Stack Exchange Network

Min and max value of index of coincidence

1 Answer 1

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
cryptography
.

Hot Network Questions

Min and max value of index of coincidence

1 Answer 1

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged cryptography.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
cryptography
.