21
$\begingroup$

Puzzle: there are $n$ computers most of which are good; the others may be bad ("most" in the strict sense: there are strictly more good computers than bad ones). You may ask any computer $A$ about the good/bad status of another computer $B$. if $A$ is good it will correctly indicate $B$'s status, but otherwise it may answer whatever.

Your goal is to locate a good computer using the minimum number of questions in the worst case. In other words, devise an algorithm that requires no more than $N$ questions regardless of the outcome and is guaranteed to pinpoint a good computer, and make $N$ as small as possible.

The original puzzle asks for the optimal $N$ when $n=100$.

Warning: spoilers below. Stop here if you wish to think about this fun puzzle.

. . . . .

I, and everyone else I know who solved this, can do the $n=100$ case with $97$ questions in the worst case. I'm pretty sure this is optimal but I do really miserably on lower bounds. The simplest case where I can't match the bounds is $n=7$ (at most $3$ bad computers): this is doable with $5$ questions and I can rule out $3$ but I can't rule out $4$.

More generally, if the number of bad computers is at most $k$ (so $n=2k+1$ or $n=2k+2$), I can show that at least $k+1$ questions are needed while $2k-1$ questions suffice. Can anyone narrow that gap?

EDIT: starting a bounty, looking for improvements to the lower bound (or the upper bound, though I'd be surprised if the latter is possible) for general $n$. A transparent argument for why 7 computers require 5 questions is also good, but a computer-assisted case-by-case enumeration is not.

$\endgroup$
9
  • $\begingroup$ The only questions allowed are asking about the correctness of another computer? $\endgroup$
    – TMM
    Commented Mar 3, 2012 at 0:50
  • $\begingroup$ Yes, that's right. $\endgroup$
    – Alon Amit
    Commented Mar 3, 2012 at 1:20
  • 3
    $\begingroup$ 4 questions is enough for 7 computers, using the same method as I presume you used to get 97 for 100. (Pair them up; in each pair, ask one about the other; keep the ones that are claimed good and discard everybody else; if there was one unpaired, keep it or discard it as needed so that you keep an odd number; iterate until 2 or fewer are left.) Sloane A011371 is close to what this method gives; the references there might have good methods for a lower bound. (The Saks & Werman paper seems particularly promising, after a quick glance.) $\endgroup$
    – user21467
    Commented Mar 5, 2012 at 5:18
  • $\begingroup$ Interesting - this definitely isn't the method I used for 97. Let me think about it! $\endgroup$
    – Alon Amit
    Commented Mar 5, 2012 at 6:44
  • 1
    $\begingroup$ Lovely question. Steven's answer and Blecher's paper mentioned below are explicitly used in a recent paper of Robert Cowen, see MR3754094. Cowen, Robert. Adaptive fault diagnosis using self-referential reasoning. In Raymond Smullyan on self reference, 39–46, Outst. Contrib. Log., 14, Springer, Cham, 2017. $\endgroup$ Commented Nov 8, 2018 at 21:25

5 Answers 5

9
$\begingroup$

This is a very fun problem, which I have encountered in the literature as the knights and spies problem. Good computers are called knights, because they always tell the truth, while bad computers are called spies, because they say whatever they want. (Traditionally, a liar would be called a knave.) I will use this terminology because I feel it is more consistent with other liar/truth-teller puzzles.

Let me briefly discuss an alternative objective: figure out everyone's identity, not just a single knight's. I encourage you to think about this problem before reading further. This is an interesting problem, in part because it's exciting how many different bounds I have seen people come up with:

  • $n^2$ or $n(n-1)$ by asking everyone about everyone
  • $\Theta(n\sqrt{n})$ using a standard "square-root" trick or $\Theta(n\log{n})$ by being smarter about the recursion
  • $5n + O(1)$, $3n + O(1)$, and $2n + O(1)$ by increasingly efficient methods related to pairing up people
  • $3n/2 + O(1)$ optimally!

The 2009 paper "Knights, spies, games and ballot sequences" by Mark Wildon proves that the final bound is optimal, computing the exact value of the $O(1)$ for every $n$. Mark Wildon has a webpage with additional information about the problem which notes that this solution was previously published by Pavel M. Blecher in the 1983 paper "On a logical problem". The strategy used in these papers, which Wildon calls the "Spider Interrogation Strategy" is fantastic.

Your problem of identifying a single knight is more challenging! I'll go straight to the punch by saying that the optimal bound is $n-1 - H(n-1)$ where $H(k)$ is the number of 1 bits in the binary representation of $k$. I haven't carefully read the other answer to this question, but as far as I can tell, this bound can easily be achieved by a trivial modification of that strategy.

The upper bound isn't too bad, but the lower bound is much trickier. When my friends and I thought about (and solved!) this problem, we used a reduction to the following problem:

There are two players, the Questioner and God, and a collection of (nonnegative) numbers. The Questioner's aim is to achieve a violation of the (strict) triangle inequality, that is, have a single number be greater than or equal to the sum of the rest. God's aim is to not have this happen for as long as possible. Each move consists of the Questioner asking about two numbers, to which God replies "sum" or "difference"; the two numbers $a$ and $b$ are then replaced by either $a+b$ or $\lvert a-b\rvert$ accordingly. Since the number of numbers decreases by one each turn, eventually there will be two numbers and the (strict) triangle inequality is necessarily violated. If God manages to not lose before then, we say that "God wins".

Here are some fun facts:

  • Consider a starting position of $n$ ones. Then God wins if and only if $n$ is 1 more than a power of 2.

  • Suppose God has a winning position. Then his answer to any question the Questioner can ask is forced; that is, it is always the case that one of the two answers is losing. We call this "Uniqueness". The way we think about it is that God somehow has to be very careful in answering every single question, and indeed, if you look at optimal play in "endgame" positions, it's hard to predict what the correct answer for God is.

  • Suppose the starting position is $n=2^k+1$ ones. By the above two points, this is a winning position and God must carefully answer every question so as not to lose. Nonetheless, the answer to the first $(n-3)/2$ questions asked will necessarily be "sum".

So there's this phenomenon that in a certain class of positions, God answers blindly for the first half of the time, and then has to be very careful afterwards. It's very weird.

We finally classified winning positions. It turns out that the first bullet point above is true because $\binom{2n}{n}/2$ is odd only when $n$ is a power of 2.

Some more quick observations:

  • With regards to the reduction: this sum/difference game is just what you get if the spies are knaves. In that case, the knights and spies are just two groups of people which support themselves but accuse the others.

  • The lower bound $n-1-H(n-1)$ holds even if you're trying to find a knight or a spy.

Let me finish with some references to the literature. This result has been published many times! The term used in the literature for this game is "the majority game".

You have a group of $n$ items, each with one of $k$ labels. One of the labels is shared by a majority of the items. You are allowed to ask if two items have the same label, and the goal is to minimize the number of questions necessary to identify a majority element.

The best studied case is when $k=2$. The problem is often presented as played by a questioner and an answerer. This is not exactly the same game, because in principle spies could tell the truth. However, the lower bound reduction above is to the case when the spies are knaves, and then this is the same game.

Here's a summary of the relevant literature:

  • Michael J. Fischer and Steven L. Salzberg.
    Finding a majority among $n$ votes.
    J. Algorithms 3 (1982) 375--379.

    They consider the problem when $k$ is unknown, and a majority may or may not exist. They prove that the optimal bound is then $\lceil 3n/2 - 2\rceil$ questions. You may recognize this as the bound from Mark Wildon's paper. Now, I am a little confused because it doesn't seem to me that the problems map onto each other exactly, but I find it hard to imagine that it's an accident.

    Actually, the algorithm they present seems strikingly similar to Wildon's "spider interrogation strategy". Because Fischer and Salzberg are in a slightly different model (in particular one that is symmetric w.r.t. asking $x$ about $y$ and $y$ about $x$), you have to change some of the details. I don't think they don't exactly map onto each other (in terms of the questions asked), but they're similar.

  • Saks, Michael E. and Werman, Michael.
    On computing majority by comparisons.
    Combinatorica 11 (1991), no. 4, 383--387.

    They show that if $n$ is odd, the optimal bound is $n-H(n)$, where $H(n)$ is the number of 1-bits in $n$.

    Their analysis proceeds by first reformulating the game as between a selector and an assigner. A position in the game is a multiset of integers, and the rules are as in the sum-difference game, e.g. "the game ends when the largest number in the multiset is at least half the total." They then have a slick proof of the upper bound! (In their notation it's a lower bound.) I mention this because even though the upper bound isn't hard, some analyses gets mired in minor details.

    They then define a 2-adic valuation invariant which my friends and I also discovered. Their proof uses generating functions which ours did not, but I believe the actual invariant is the same as ours. Finally, they apply the invariant result to the position consisting of $2h+1$ ones by computing the valuation of $\binom{2h}{h}$.

  • Alonso, Laurent; Reingold, Edward M.; and Schott, René
    Determining the majority.
    Inform. Process. Lett. 47 (1993), no. 5, 253--255.

    They present a short proof of the previous paper's results. The upper bound uses the same neat trick. They then prove the lower bound with essentially the same 2-adic invariant technique, except with a different exposition. Admittedly, it's cleaner and doesn't use generating functions.

  • Wiener, Gábor
    Search for a majority element.
    International Conference (of the Forum for Interdisciplinary Mathematics) on Combinatorics, Information Theory and Statistics (Portland, ME, 1997). J. Statist. Plann. Inference 100 (2002), no. 2, 313--318.

    He proves some related results, including the following:

    If asking about $x$ and $y$ is an optimal question, then there exists an optimal question that doesn't ask about $x$. It follows that the optimal number of questions is monotonic, in the sense that adding another 1 into the position does not decrease the optimal number of questions.

$\endgroup$
1
  • $\begingroup$ Thank you for posting such an informative answer. I would be very interested to see an outline proof or a reference for the statement that God always has a unique winning move in the game with $2^r + 1$ balls. Similarly for the second statement about the first $(n-3)/2$ answers. $\endgroup$ Commented Mar 30, 2013 at 20:22
5
+200
$\begingroup$

Here's a more detailed explanation of the method I sketched in a comment. This method lowers the upper bound in some cases.

We'll perform an operation which (1) reduces the number of computers under consideration (by about half), and (2) preserves the fact that more than half of the computers under consideration are good. Repeating this operation eventually reduces the number of computers to 1, at which point the more-than-half condition tells us that the computer is good, and we're done.

The operation is this:

  1. Pair up all the computers arbitrarily. (Maybe one computer is left out; see step 3.)
  2. In each pair, ask one computer about the other. If the answer is "bad", discard both computers; if the answer is "good", discard just the testing computer and keep the tested one.
  3. If there was one computer left out of the pairing in step 1, then either keep or discard it so that you keep in total an odd number of computers.

Proof that after this operation, more than half of the computers are good: Let $G$ be the number of good computers to start with, and $B$ the number of bad computers. Write $(GG)$ for the number of pairs produced in step 1 with both computers good; write $(GB)$ for the number of pairs with the testing computer good and the tested computer bad; write $(BG)$ and $(BB)$ similarly. Let $G_2$ and $B_2$ denote respectively the number of good and bad computers that are kept after step 2, and let $G_3$ and $B_3$ denote the corresponding number for after step 3. We know that $G > B$ and want to show that $G_3 > B_3$.

In step 2, every $(GG)$ pair answers "good", so you keep the second good computer; this shows $G_2 \ge (GG)$. On the other hand, the only way for a bad computer to survive step 2 is if a bad computer vouches for it; this shows $B_2 \le (BB)$. Now consider three cases.

Case 1: There were an even number of computers to start with. Then $G = 2(GG) + (GB) + (BG)$ and $B = 2(BB) + (GB) + (BG)$. By hypothesis, $G > B$, so these identities imply $(GG) > (BB)$, which gives $G_2 > B_2$. Since there is no unpaired computer, $G_3 = G_2$ and $B_3 = B_2$, so we're done.

Case 2: There were an odd number of computers to start with, and the unpaired computer was good. Then $G = 2(GG) + (GB) + (BG) + 1$ and $B = 2(BB) + (GB) + (BG)$, which since $G > B$ implies $(GG) \ge (BB)$, so $G_2 \ge B_2$. If $G_2 > B_2$ then $G_3 > B_3$ whether or not we keep the unpaired good computer. If $G_2 = B_2$ then the total number of computers kept after step 2 is even, so in step 3 we keep the unpaired good computer, obtaining $G_3 = G_2+1 > B_2 = B_3$.

Case 3: There were an odd number of computers to start with, and the unpaired computer was bad. Then $G = 2(GG) + (GB) + (BG)$ and $B = 2(BB) + (GB) + (BG) + 1$, which since $G > B$ implies $(GG) > BB$. As in case 1 this yields $G_2 > B_2$. If $G_2 = B_2 + 1$ then the total number of computers kept after step 2 is odd, so in step 3 we discard the unpaired bad computer, obtaining $G_3 = G_2 > B_2 = B_3$. If $G_2 > B_2 + 1$ then $G_3 > B_3$ whether or not we keep the unpaired bad computer.

So in all cases, at the end of step 3 we have kept more good computers than bad computers, as desired.


As Alon said in comments, in the worst case (when all tests say "good") this method uses $Q(n) = n - h(n)$ questions, where $h(n)$ is the number of bits in the binary representation of $n$. (This can be easily proved by (strong) induction.) Some specific values:

  • $Q(100) = 97$, just as in the question.
  • $Q(2^m) = 2^m-1$, which is slightly worse than the estimate in the question, which in this case is $2^m-3$ (for $m\ge 2$, I presume).
  • In particular, $Q(2) = 1$, which is suboptimal, since the hypothesis that more than half the computers are good here tells us that all of them are good, so there is no need to ask questions. But I guess adding this special case to the algorithm saves us at most one question.
  • $Q(2^m-1) = 2^m - m$, which is asymptotically better than the estimate in the question (it's $n-\log n$ instead of $n-c$).
  • In particular, $Q(7) = 4$, slightly improving the estimate in the question.

Here's another way to look at this puzzle. Consider the following game. There is a (directed) graph with $2n$ vertices, labelled $x_1,\dotsc,x_n$ and $\neg x_1,\dotsc,\neg x_n$. At the beginning of the game, the graph has no edges. Every round, player A chooses two numbers $i$ and $j$; player B draws either the directed edge $x_i\to x_j$ or the directed edge $x_i\to\neg x_j$. At the end of the round, player A may claim victory by choosing some number $k$ and drawing $x_k\to\neg x_k$; player A then wins if, interpreting the graph as specifying a boolean formula (where each directed edge represents an implication, which is a 2-clause), it is not possible to satisfy that boolean formula with more than half of the variables set to "true". Player A's goal is to win as quickly as possible; player B's goal is to delay player A's victory as long as possible.

In short, the players jointly construct an instance of MAX 2-SAT, with player A trying to make it unsatisfiable and player B trying to keep it satisfiable.

(See, when player A chooses $k$, adds $x_k\to\neg x_k$, and shows that the resulting MAX 2-SAT instance is unsatisfiable, that amounts to a proof (by contraposition) that, if more than half the computers are good, the answers so far prove that computer $k$ is good.)

I have no insights from this way of thinking about the puzzle, except that it makes me suspect the problem is hard.

$\endgroup$
3
  • $\begingroup$ Thanks a lot, Steven. Let me just point out that for even values of $n$ you can actually do better than $n-h(n)$, simply by ignoring one of the computers. Thus since $Q(99) = 95$ we also have $Q(100)=95$, better than my original solution of $97$. This is because 100 computers allow at most 49 bad ones, just like 99 computers do. $\endgroup$
    – Alon Amit
    Commented Mar 11, 2012 at 4:43
  • $\begingroup$ @Alon Ah, you said this before and I didn't stop to think about it. Yes, nice point. (It's also important for this that subtracting 1 from an even number increases the number of 1s in the binary representation, which takes a bit of thinking about how borrowing works.) $\endgroup$
    – user21467
    Commented Mar 11, 2012 at 15:08
  • $\begingroup$ Right. It's easier to add 1 to an odd number and reason about how carrying works :) $\endgroup$
    – Alon Amit
    Commented Mar 11, 2012 at 15:26
3
$\begingroup$

Several related problems are discussed in papers of Martin Aigner (and his coauthors). See for example M. Aigner: Two Colors and More, In: Csiszár, Katona, Tardos (eds.): Entropy, Search, Complexity, Springer 2007, p. 9-26.

This paper mentions the Bleher paper in which the Knights and Spies problem (that Aigner call the liar problem) is solved. The version where we only have to find one knight was posed as an open problem in the 2001 COSSAC meeting in Ischia (http://www.di.unisa.it/COSSAC/Program.html) and was solved independently in 2002 by Aigner (see the paper above) and myself (G. Wiener: A Solution to the Liar Problem, In: Proceedings of the 3rd Japanese-Hungarian Workshop on Discrete Mathematics and its Applications. Tokyo, Japan, 21/01/2003-24/01/2003. pp. 232-239.). Aigner solved a somewhat generalized version of the problem, where it is known that there are at least k>n/2 knights and proved that the number of questions needed is exactly 2(n-k)-H(n-k). He also proved a similar result for the original problem, which is just sketched in the Bleher paper: if there are at least k>n/2 knights, then exactly 2n-k-1 questions are needed.

$\endgroup$
2
$\begingroup$

This problem reminds me a bit of the Byzantine Generals Problem. Briefly, you have N generals, each commanding an army, surrounding an enemy city. The generals need to arrange to attack at the same time. Pairs of generals can communicate by sending messages to each other.

Some of the generals are traitors who want to prevent the loyal generals from attacking at a common time. The traitors can send messages with false information, and they can forge the identities of the other generals.

There's a paper by Leslie Lamport, Robert Shostak, and Marshall Please called "The Byzantine Generals Problem", published in ACM Transactions on Programming Languages and Systems, Vol 4, No. 3, July 1982, that shows if 2/3 of the generals are loyal, the loyal generals can successfully agree on a common attack time despite the the efforts of the traitors. If generals can send unforgeable messages, then the loyal generals can agree on an attack time no matter how many traitors are trying to confound them.

$\endgroup$
1
$\begingroup$

An alternative method, which uses exactly $n-1$ queries each time, is just to pick a random computer then ask all the others if it's good. If a majority say "good", it's good. If a majority say "bad", it's bad. If there's an even split, it's good (because this case can only happen if we've queried an equal number of good and bad computers and all the bad computers lied. But since we know that there are more good than bad, it must be the case that the remaining computer is good.)

If we know the number of bad computers $k$, we can use exactly $2k$ queries using exactly the same method as above.

By my reckoning, this is the same number of queries as the worst case with Steven's method.

$\endgroup$

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .