ACS 2018 Khemani Original June2018

Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

Advances in Cognitive Systems X (20XX) 1-6 Submitted X/20XX; published X/20XX

Contract Bridge: Multi-agent Adversarial Planning in an Uncertain


Environment

Deepak Khemani KHEMANI @ IITM . AC . IN


Shikha Singh CS 16D008@ SMAIL . IITM . AC . IN
Department of Computer Science and Engineering, IIT Madras, Chennai 600036, India

Abstract
One by one over the last twenty odd years computer programs have become better than humans
at many games. And yet the citadel of Contract Bridge remains. The authors argue that bridge is
significantly and qualitatively different. So far, in the games where AI has done well, it was one
on one, man versus machine. But bridge is a team game, and with incomplete information to boot.
Partners in bridge need to communicate, and where there is communication there is interception,
and where there is interception there is the possibility of deception. A bridge player has to be an
epistemic agent that needs to reason with knowledge, belief, plans, and probabilities. We argue
that bridge is the final frontier for human level AI, survey the cognitive landscape, and chart out
strategies for human like performance.

1. Introduction: Why Bridge?


Games, like Checkers, Chess, Backgammon, Scrabble, Go, Poker and Bridge, that have fascinated
humankind have long been considered as test beds for Artificial Intelligence programs. This is
not surprising. On the one hand these games have always been associated with intelligence, and
pose problems that are inherently complex. On the other hand, games provide platforms where the
interface with the external world is intrinsically digital and poses no challenges of perception and
action, and they are domains where success or failure is simple to evaluate.
Of the games mentioned here, all but Contract Bridge has seen machines outperform humans.
The last major citadel to fall was the game of Go, when the world champion Lee Sedol was beaten
in 2016 by the program AlphaGo from DeepMind. In this paper we look at the game of Contract
Bridge.
Checkers, Chess and Go are two-player complete-information zero-sum games, as is the smaller
game of Tic-tac-toe. Conceptually they are all simple and similar, and can be represented by a
game tree with alternate layers representing the choices for the two players. The difficulty, or the
complexity arises from the size of the game tree. Chess has an estimated average game tree with
10120 leaf nodes, and it is not surprising that the minimax value of the game is not known, as it is
for Tic-tac-toe (the game ends in a draw) and even Checkers (Schaeffer et al., 2007). Go has a much
larger game tree estimated to be upwards of 10360 (Allis, 1994). Even though Chess and Go have
not been solved, they have computer programs that beat the best amongst humans. IBM’s Deep Blue
beat Garry Kasparov in 1997 (Campbell et al., 2002) and AlphaGo beat Lee Sedol in 2016 (Silver


c 20XX Cognitive Systems Foundation. All rights reserved.
D. K HEMANI AND S. S INGH

et al., 2016). These programs were able to do this, not by searching the entire game tree, but, by
searching it selectively aided in the process by an evaluation function. The evaluation function itself
is a product of human expert knowledge, encoded by humans and fine tuned by machine learning
algorithms. Then in 2017 the program AlphaGo Zero (Silver et al., 2017b) was able to learn Chess
and Go entirely from scratch without the aid of human knowledge, simply by repeatedly playing
against itself, and the self learnt versions are much stronger than their predecessors (Silver et al.,
2017a).
Scrabble and Backgammon are also board games, but they have an element of chance. There is
uncertainty about the options available for the opponent. In Scrabble each player draws tiles from a
fixed set of alphabet tiles, and the tiles drawn are hidden from the opponent. In Backgammon a pair
of dice determines what moves a player can make. And yet computer algorithms have been devised
that are much better than the best humans (Sheppard, 2002; Tesauro, 1995).
In 2017, the program Libratus from CMU (Brown & Sandholm, 2017) beat four leading players
at a two player limited version of Poker. Poker is an incomplete information game, and among other
things one is required to bluff effectively, and also call opponent’s bluffs. Libratus employs a notion
of card abstraction that is cognitively similar to the way bridge players think about cards. Given
that, cognitive activity is limited only to evaluate your own hand in the context of the bets made by
other players, and decide whether to continue betting or not.
On the table, Contract Bridge is a four-person game with opposing teams of two each. When an
individual player is dealt a hand, the holdings of the other three players are unknown to her. The first
task is to communicate information of her hand to partner, and in turn receive information from the
partner, to try and jointly estimate the combined worth of the two hands, and bid for an appropriate
contract. The opponents are also trying to do the same at the same time. In the process some amount
of information about the four hands becomes common knowledge amongst the four players. After
a contract has been won, the task is to fulfill it, and the opponents have the goal of defeating it.
The battle that ensues involves many forms of reasoning. In one sense the task undertaken by each
player is to cut through the fog of uncertainty so that the one knows the other players’ cards as much
as is needed to succeed. In another sense the task is to plan for all possible worlds that remain in
contention, so that one succeeds in as many of them as possible. In the midst of all this activity an
opponent may try to obfuscate matters by creating the possibility of imaginary worlds to lead you
astray. Above all bridge is a game of imagination, as much as it involves the traditional AI tools
of deploying search, knowledge and memory. An early article (Khemani & Ramakrishna, 1989)
espoused bridge as a game to be targeted by AI.
In the following sections we describe the game with its complexities, and chart out how different
kinds of reasoning have to be combined to play it well.

2. Contract Bridge
We treat Contract Bridge as a concrete instance of a multi-agent game scenario where the players
attempt to maximize their payoffs in resource bounded incomplete information epistemic situations.
Such games can also be models for protracted negotiations in different scenarios, like financial
contracts, and for adversarial scenarios like troop movements during a war. The salient points

2
C ONTRACT B RIDGE : M ULTI - AGENT A DVERSARIAL P LANNING IN AN U NCERTAIN E NVIRONMENT

that make Contract Bridge a manageable domain to model many real world complexities are given
below.
Contract bridge is a bounded horizon game played by multiple players with limited private
resources. A team in a bridge game is composed either of two players, in a pairs event, or four,
in a team-of-four-duplicate event. In a tournament many teams participate, and the goal is to win
the tournament. In all cases the final payoff for the players depends upon a series of matches, and
the score for each match depends upon the score of each deal played on the table. On each deal on
the table the general strategy is generally to maximize the payoff. This is not as straightforward as
winning a chess game, because while there is the notion of a par score for each deal, the name of the
game is to try and consistently beat the par. This is possible because of the incomplete information
of the game. As a result the payoff is determined not just what happens on that table, but also on
what happens on other table (or tables in a Pairs format). A bridge player, unlike in other games, is
not competing with opponents on the table, but other players at other tables holding the same cards.
Furthermore, since the final payoff is a non-linear function of all matches (except when it is knock
out format), and the match score a non-linear function of the score on each deal, players have to
often formulate deal level goals in a context dependent manner, based on their estimate of how the
tournament has gone so far. This makes tournament play complex enough. Playing a single deal
has its own unique complexities as we observe below.
In this paper we will focus on the reasoning needed on one table, assuming that it is context
independent (of what may happen on the other table(s)), or what may have happened in the past.
This is a reasonable delimitation of scope because there is no active communication or interaction
with the other table(s), and the past cannot be altered anyway.
On the table, the one that is in our scope, a team of two players, which we will call South (S)
and North (N), competes against the opposing team whose players we call West (W) and East (E).
The game proceeds in a sequential manner in the clockwise direction. The game is played with a
standard pack of 52 cards, which the dealer deals clockwise to the 4 players, 13 cards each. The
4 players become dealer turn by turn. The pack is made up of 4 suits, each of which is an ordered
set. The Ace is the highest card with rank 1, followed by the King, Queen, Jack, and the rest in
decreasing numeric value, the 2 being the lowest.
The play constitutes of 13 rounds, in each of which the players play a card on their turn. The
first player can play any card, and the rest three have to follow suit, playing a card of the same suit,
unless they do not have one in which case they can play any card. Each round is called a trick. The
card with the highest rank wins the trick, and the player who played it gets to start the next round.
The general objective is for each team/side to win as many tricks as possible, since the payoff
is proportional. What makes the game complex and interesting is the notion of a contract. The
contract is a declaration by one side to make a certain number of tricks. It is decided by an auction
described below. The contract may notify a suit as the trump suit, in which case even the smallest
card of the trump suit becomes higher than any card of any other suit. Further complexity is added
by tailoring the payoff to the final contract bid. Certain landmark contracts - a game, a small slam,
and a grand slam, with increasing number of tricks contracted - have additional bonus payoffs.
This makes them attractive goals to bid. But the payoffs are received only when the contract is
fulfilled, which is generally dependent on the number of high cards a side has. If the contract fails,

3
D. K HEMANI AND S. S INGH

the opposing side, called defenders, get a (different) payoff determined by the number of tricks the
contract fails by.
The contract is decided by an auction or bidding phase of the game that precedes the play. Thus
the game is a two-stage game. In the first stage, the auction, each player makes a bid in turn, and
bidding ends when three players pass consecutively. This is followed by the play phase, in which
one of the declaring side players exposes her cards, thus making more information available to each
of the other three players to reason with.
The auction begins with the dealer making the first bid. The minimum bid that one can make
is for 7 tricks and the maximum is for 13. Given the 5 denominations made up of the 4 suits and
a No Trump bid, there are 35 different possible contracts to bid for. This number is doubled to 70
by having it challenged by a bid called Double, and further doubled to 140 by a counter challenge
called a Redouble. In addition a player can make a Pass bid. Since 2 Pass bids may precede each of
the 140 bids, the length of the auction can be about 420 bids in the longest case (and 4 bids in the
shortest case). This space allows considerable opportunity for conveying useful information, as we
shall describe below.
This brief description of the general structure of the contract bridge game can give us some hints
on the different kinds of reasoning that may be involved.

3. Related Work
As the game of bridge proceeds in two stages- bidding and playing, different AI techniques have
been applied to solve the two parts individually. Earlier work shows that the latter is relatively
easy for the computers to solve than the former. This is because more information is available
to all players during the play phase, since the cards of the dummy are now open. Most bidding
programs use a rule-based approach (Carley, 1962; Lindelof, 1983; Wasserman, 1970) in which
a certain bidding decision is made if the condition of the corresponding bidding rule match with
auction patterns and constraints on the possible hands of the other players. A combined approach of
rule-based system and lookahead search was used (Gambäck et al., 1993; Ginsberg, 2001) in order
to make decision in states that matched either two or more rules or no rules at all.
One of the earliest attempts at declarer play employed knowledge of specialized play techniques,
called Thematic Acts, were stored in the form of rules and a planner attempted to retrieve relevant
‘plays’ and string them together (Khemani, 1989, 1994). Bridge Baron used expert rules to decide
on play actions. Smith et. al (Smith et al., 1996, 1998a) used Hierarchial Task Network (HTN) plan-
ning techniques to approach declarer play by generating and evaluating game trees whose branches
represent the number of tactical schemes that an agent can use, and not the number of moves that
agent can make as in converntional game trees, in the current state of play. Bridge Baron augmented
with Smith et. al’s HTN techniques won the 1997 World Computer Bridge Championships (Smith
et al., 1998b), nonetheless, its performance was worse than that of an amateur bridge player.
Ginsberg’s Intelligent Bridge player (GIB) (Ginsberg, 1999) used search based techniques-
Monte-Carlo simulations in card playing and Borel simulations (Monte-Carlo like simulations) in
bidding. Multiple instances of bridge’s perfect information variant, Double Dummy, consistent
with bidding and previous card plays are analysed to suggest next play (first introduced by Levy

4
C ONTRACT B RIDGE : M ULTI - AGENT A DVERSARIAL P LANNING IN AN U NCERTAIN E NVIRONMENT

(1989). Search reduction techniques such as alpha-beta pruning, transposition tables and move or-
dering heuristic brought down the branching factor of search trees from 4 to 1.3, and partition search
(Ginsberg, 1996) further reduced the search space to approx 18,000 nodes per deal. GIB also used
Iterative broadening (Ginsberg & Harvey, 1992) to return a low width answer if a high width search
fails to terminate in time.
However, Monte-Carlo approaches, as shown by Frank and Basin (Frank & Basin, 1998), do
not encourage information gathering actions such as discovery play and tend to defer decisions
until next round of play. To solve the problem of strategy-fusion and non-locality in sampling
algorithms like MC simulations, (Frank et al., 1998) formalised vector minimaxing and payoff-
reduction minimaxing algorithms. Frank demonstrated that prm algorithm outperforms the other
two on finding optimal strategies.
A slightly differnt approach for decision making and cooperation in bridge bidding was pro-
posed (Amit & Markovitch, 2006) which employed model-based Monte Carlo sampling method to
model partner(s)/opponents(s) information states and presented a learning framework that allows
co-training of partners on a training set of various classes of states with conflicting actions. (De-
Looze & Downey, 2007) used a combination of two self-organizing maps (SOMs) to find an optimal
bidding strategy for no trump bridge hands.
All the above work exploits human designed features for human bidding system but a recent
attempt (Yeh & Lin, 2016) to automatically learn the bidding system directly from raw data was
made using deep reinforcement learning. The proposed framework, based on Q-learning along with
UCB algrothms, enables learning complex rules of bidding by nonlinear functions on raw data to
avoid ambiguity of the bids. The work demonstrates that the proposed model exhibits comparable
performance with that of champion-winning programs.
The currently successful bridge playing programs are written using a customization of the Monte
Carlo and double-dummy techniques proposed by Ginsberg (1999) and most of them are regular
entrants in the WCBCs.
Since 1996 the American Contract Bridge League has been organising the official World Com-
puter Bridge Championships anually, held along with international bridge events like ACBL NABCs,
World Bridge Federation World Championships, and the first European Bridge Federation Open
Championship. The 2017 event, the 21st World Computer-Bridge Championship, was held, August
19th - 24th, at the 43rd World Team Championship, Lyon, France and witnessed the participation of
Wbridge5 1 , Shark Bridge 2 , Micro Bridge 3 , Q-Plus Bridge 4 , Bridge Baron 5 , RoboBridge 6 and
Synrey Bridge 7 .

1. wbridge5.com
2. sharkbridge.info
3. www.osk.3web.ne.jp/ mcbridge/
4. www.q-plus.com
5. www.bridgebaron.com
6. www.robobridge.com
7. www.xinruibridge.com

5
D. K HEMANI AND S. S INGH

4. The Possible Worlds


In board games like Chess and Go there is no uncertainty and there is only one possible world,
visible to all. In Poker against one player the opponent could be holding any 2 or 4 cards from the
remaining pack, which can be done in 45 C2 or 43 C4 ways. At various stages 3 or 4 or 5 cards are
made visible to the players. In Backgammon the uncertainty arises due to the throw of the dice, and
in Scrabble due to the draw of letter tiles. In both there is a board visible to all.
Unlike Chess and Go, in card games the initial position is different for each game when the
cards are dealt. In Bridge, 13 cards are dealt initially to each player. One could construct possible
worlds in which for every card there is a possible world in which each of the players holds it. But
there are constraints on what possible worlds can be possible together − each player should have
exactly 13 cards, and each card should be with only one player. Instead we can construct clean
possible world models, or Kripke structures, in the style of the 3-card example used in Dynamic
Epistemic Logic (van Ditmarsch et al., 2007) and specify in each possible world exactly which 13
cards are held by each of the four players. Then, the number of possible initial position, or deals is
52 C 49 C 26 C 30
13 × 13 × 13 . This number, approximately 10 , is obviously too large to be handled
with model checking.
However, our interest is to confront the possible worlds from the perspective of one player, who
can see her own cards. The number of possible worlds for this player is 49 C13 × 26 C13 × 13 C13
in the bidding phase, still a dauntingly large number. During play, each player can also see the 13
cards of the dummy, and this number reduces to 26 C13 × 13 C13 which is about 107 . Remember
that the planner for declarer play has to work with a search tree of her own cards and that of the
dummy, which has its own complexity. The current approaches employing Monte Carlo methods
sample these possible worlds to create double dummy problems, in which there is only one possible
world, and then solve each sample with game tree searching methods. The sampling can be aided
by additional constraints derived from information inferred during play, but this approach is still
in it’s infancy. While this does work, it does so only in some of the deals. It is also not cogni-
tively appealing, and there is no possibility of generating explanations and justifying the program’s
actions.
Instead of working with Kripke structures or their variations, we propose an alternative, and
perhaps a more cognitive approach, closer to one employed by humans. In this approach the hidden
information is represented by patterns of cards that are used by humans to visualize bridge hands
and reason with them. Our current work deals with the play of the hand, but we also propose how to
address the bidding phase. The basic idea is to create a representation in which what is known about
the different hands is explicit. To deal with complexity we often represent insignificant cards, and
what is unknown, abstraction of variables, where a ‘x’ stands for a small and insignificant cards, as
is done in bridge literature; or by creating a few clusters of possible worlds to reason with that are
determined in a teleological manner during the planning process. The entire reasoning process is
controlled by game theoretic reasoning about payoffs and meta level goals of the game described
above.
Building a human like bridge playing program will demand a complex problem solving archi-
tecture where different kinds of reasoning will need to be integrated. Some of the processes needed
are - planning and plan recognition, encoding and decoding of information, planning for communi-

6
C ONTRACT B RIDGE : M ULTI - AGENT A DVERSARIAL P LANNING IN AN U NCERTAIN E NVIRONMENT

cation, interception of communication along with deliberate disinformation and deception, reason-
ing with probabilities, counter planning, abduction and reasoning about intentions, reasoning about
knowledge, opportunistic planning, and recognizing Pareto like optimal scenarios - all in the face
of incomplete information.
The score of a deal is a function only of (a) the contract reached and (b) the number of tricks
made during play. Of this the first is decided in the bidding phase, while the second is decided
during play. Both stages of play happen with incomplete information, and hence the par score may
not always be achieved. In any case the payoff depends upon what happens on the other table or
tables.

5. Bidding
The epistemic battle begins with bidding. The goal of each side in bidding is to try and discover
the playing strength of the given deal, and try and arrive at the best contract from each side’s point
of view. As bidding proceeds, the level of the contract becomes higher and higher. One side has
to eventually let the other win the final contract, because bidding higher would entail losses greater
than the payoff the other side is likely to get.
The side with better cards is expected to get a positive score. A side may get a positive score
either by bidding and making a contract or by defeating the opponents’ contract wherein they extract
a penalty. The magnitude is determined by the rules of the game. Often a player has to decide which
one, the bonus or penalty is better, and decide to bid on or double the opponents’ bid. Contrariwise
the side with the weaker hands has to decide whether to let the opponents play in what they have
bid, or to make a sacrifice bid (bidding higher even when likely to fail) yielding a penalty lower
than what the opponents would otherwise get. Thus at the top level a bidding program should be
controlled by such reasoning - whether to bid for a contract or to double opponents’ contract. The
former is the default decision and is the subject of the bidding approaches described below. The
latter decision is often taken opportunistically, since in the first place the other (weaker) side has to
bid.

5.1 The Vocabulary of Bids


The bidding process is as follows. A bid is one of the following,
• Pass. The player has nothing to say.
• nY where n is an integer in the range [1..7] and Y is an element from the ordered list (C, D, H,
S, N) which stand for (Clubs, Diamonds, Hearts, Spades, No-trump). The contractual meaning
of the bid nY is that the bidding side promises to win n + 6 of the 13 tricks in the deal with Y
as trumps.
• Double is a bid that can be made after an opponent has made a bid nY. Its contractual meaning
is that it doubles the value of the contract. Very often it is also used to convey some specific
information. A doubled contract is denoted by nY ∗ .
• Redouble is a counter challenge to a Double. Contractually it doubles the (already doubled)
value of the deal. A redoubled contract is denoted by nY ∗∗ .

7
D. K HEMANI AND S. S INGH

The nY type bids are ordered as 1C, 1D, 1H, 1S, 1N, 2C, 2D, ..., 7H, 7S, 7N, with 1C being the
lowest and 7N the highest. Given a bid nY, only higher bids can be made in the future. A Pass bid
can me made anytime, and Double and Redouble only as defined above. Bidding ends when three
players pass in sequence.
The scoring system defines the payoff. For a side successfully bidding and making a contract
there are increasing bonuses for part scores, games, small slams (12 tricks) and grand slams (13
tricks). The primary goal of the bidders is to first identify the band in which the deal falls, and then
the actual contract (suit and denomination) that is the goal during play.

5.2 Bidding: Visualizing the Hidden Hands


The rules of the game define the contractual meaning of a bid. To bid for the best contract each
player communicates some information of her hand to the partner. This communication is done
via an encoded meaning of the bid. A bidding system encodes information into bids. One must
remember that the contractual obligation is only for the final bid contract. This means that bids
early in the bidding phase can be encoded with information describing the player’s holding. It must
be noted that this encoding is not private between the two players (that would be cheating) but it
must be explained on demand. The bidding system has to be explained to the opponents and this
is usually done via the convention card that each pair has to fill. Usually the players familiarize
themselves with opponents’ bidding codes (system) in advance.
There are 35 possible bids starting from 1C and the last being 7N. The encoded meaning of a
bid is however not context free. Rather it depends upon the context defined by the bids made so far,
including those by the opponents. There are many bidding systems used by bridge players around
the world. A commonly used system is the Standard American Yellow Card (SAYC) 8 . For example
the bidding sequence 1NS , PW , 2CN , PE , 2DS , PW , 3NN , PE , PS , PW has the following exchange of
information in SAYC. South1N : I have a balanced had with 15-17 high cards points (see below).
North2C : I have a supporting hand. Do you have 4 cards in Hearts or in Spades? South2D : No, I
don’t have 4 cards in either. North3N : In that case let us bid for the 3N game contract.
As suggested in the example above, the players convey information about their hands in terms
of features, or the bid may be asking for some specific information. Two common features are high
card points (HCP) in which high ranked cards are assigned points (usually 4 for ace, 3 for king, 2
for queen and 1 for jack), and lengths of suits (the number of cards held in a suit).
Bidding is essentially an epistemic activity in which each player reasons with what she knows
about the two hands, with what she knows her partner knows (the information she has herself con-
veyed), and what the opponents have bid, or even when they have been quiet. The bids made by all
players encode information that can be used to describe their hands in terms of features encoded in
bids. For this, the bids made must be consistent with the bidding system. Unless there is a deliber-
ate attempt to deceive everyone, including partner, by making a bid that conveys information that is
false. Each bidder must,
• create a picture of partner’s hand by using the information conveyed by partner in the encoded
bids, and also any inferences that one can make from all bids made by everyone so far.

8. http : //www.acbl.org/tournaments _page/general − inf ormation/convention − cards/

8
C ONTRACT B RIDGE : M ULTI - AGENT A DVERSARIAL P LANNING IN AN U NCERTAIN E NVIRONMENT

• create a picture of their combined hands. This follows as a corollary.


• create a picture of opponent hands. This is similar to creating pictures of partner’s hands.
• create a picture of her own hand as conveyed by her bids. This is crucial because it determines
what else needs to be communicated.
Bridge players categorize bids by the functional role the meaning plays during communication. A
descriptive bid conveys information about the hand. A player should compare her hand with the
hand her bids have shown, and try to encode the difference into the next bid. Very often careful
planning is required where future bids (or the rebid) are already planned. A sign-off bid says that
the player thinks that it is the final contract. A preemptive bid attempts to consume bidding space
so that the opponents cannot exchange information. It is usually made when a player thinks that the
opponents have better cards. An invitational bid is made by a responder suggesting (and inviting)
the first bidder to bid ahead if partner has a better hand. A forcing bid is a bid that tells partner
that she should bid further. It may or may not convey information. An asking bid is a special kind
of forcing bid that asks for some specific information for example the number of aces held. Other
forcing bids include a transfer bid asking partner to bid a specific suit, a take out double, and a cue
bid which explores a slam by showing controls (aces or kings). A lead directing bid (could be a
double) is a bid indicating to partner what suit to lead during play.
One must also keep in mind that opponents are listening, even when they are not competing
(because they do not have high cards). Any information revealed is also available to them, and
they may profit from it by make better choices during play. A bridge player tries to reveal only the
minimum information needed to decide upon the contract, and no more.
Some of the work in bidding is aimed at encoding a bidding system into a program (Amit &
Markovitch, 2006; DeLooze & Downey, 2007; Ginsberg, 1999), while others try to learn a bidding
system from labeled examples (Yeh & Lin, 2016). We advocate a more cognitive approach with
plans to exchange information about the two hands and at the same time try and identify the best
contract, all the time keeping the bids within a safe zone since eventually a bid will become a con-
tract, and a positive payoff is desirable. The different bidding systems are like different languages
whose semantics is defined in terms of the content the player wants to communicate. In our view
the program for bidding should be concerned only with the semantics or content generation. The
actual bids made, which depend on the bidding system, are the akin to surface level realization in
natural language generation. Ideally a program should be able to read a convention card and plan the
bidding as allowed by the system. An added advantage of this approach would be that the program
would be able to understand the bids when an opponent uses a different bidding system.
The high level algorithm for making a bid must take the following information into account and
then make a bid that will either convey some encoded meaning or decide the contract, or both. The
bidding must end only when the players feel that the desired contract has been bid. Many a player
has been left red faced on the table when the bidding ends before that!
− What is known about the two hands?
− Do we know the range (game/slam/part score)?
− Are we within the bidding space?
− Has partner made a forcing bid?
− Has partner responded to the forcing bid?

9
D. K HEMANI AND S. S INGH

− Has opponent interfered (giving partner another chance to bid)?


− Has opponent offered a penalty opportunity?
Thus the bidding phase is a veritable warfare situation. One has to create a picture of the combined
hands to bid the best contract, but the language available is a double edged one, committing the side
to higher and higher contracts even as we exchange more and more information. Sometimes the
bids made are not only for information exchange, they can also be for disinformation, in a tactical
attempt to swindle the opponents from achieving a payoff, or deceiving them into making wrong
choices later into play.
It is commonly accepted amongst expert bridge players that the bidding phase is by the harder
of the two in game.
Thus the bidding process involves information in different forms. One can describe ones hand,
ask about the partner’s hand, respond to questions, make a bid to help decision making later in
play, consume bidding space, decide when to hide information and when to use disinformation, to
encourage or discourage further communication. A bidding program has to embody a high level
strategy to choose amongst the above bid making goals, use the strategy to make appropriate bids
by interpreting a given bidding system (and it should be able to use different bidding systems), and
actively create a picture of the other three hands which in term will feed into the decisions needed
for future bids during bidding. At all time the goal is to try and reach the best contract, one that will
maximize ones payoff. The strategy often involves imagining the play of the hand, and attempts to
determine whether there is feasible line of play for the contract that one is considering bidding.

6. A Planning Approach to Declarer Play


We advocate a planning approach akin to that followed by human bridge players in which a declarer
takes stock of the situation when the opening lead is made by the left hand opponent and the dummy
appears. The basic unit of play is to play an eligible card. This is the action that is used by Monte
Carlo methods after creating a double dummy sample. A bridge player, on the other hand, thinks in
terms of combinations of cards played. The plan for the entire hand is composed of plans for indi-
vidual suits, which may be intertwined requiring careful scheduling. Bridge players have evolved
a vocabulary to talk about patterns and associated sequence of cards to be played. For example
cashing a trick means playing a winner card from one hand, and (usually) an insignificant card from
the other hand. Ducking refers to playing small cards from both hands with the intention of letting
opponents win the trick (often a battle has to be lost for a war to be won). A common pattern used
in planning the play of a hand is a tenace, illustrated in Figure 1(a). The figure illustrates the layout
in a suit, say Spades. North has the rank 1 card (Ace) and the rank 3 card (Queen). The rank 2 card
(King) is either with East or with West.
Associated with a tenace is a Thematic Act (Khemani, 1989) called a finesse, described in Figure
1(b). The attempt is to win a trick with the rank 3 card before the opponent can win a trick with the
rank 2 card, which is said to be finessed. In the given example it works as follows. South plays a
small card (say 5) and waits to see what West plays. If West plays the King then North plays the
Ace, else North plays the Queen. We have finessed the King! That is, won a trick with the Queen, a
card lower in rank. In bridge literature one would refer to the play in short as “finesse the queen”.

10
C ONTRACT B RIDGE : M ULTI - AGENT A DVERSARIAL P LANNING IN AN U NCERTAIN E NVIRONMENT

Figure 1. (a) A tenace is a pattern made up of rank 1 card and a rank 3 card, with the rank 2 card hidden with
one of the opponents. (b) A finesse is a card combination play associated with a tenace

A finesse can be played whenever a tenace exists. The play succeeds only if West has the
king. One can compute the probability of success of this play, and use the value to compare with
competing plans. With no other information the probability that either player has the king is 50%.
But as one gets more information this probability can be revised aposteriori. For example if West
has indicated that she has many HCPs then it is more that she holds the king. Likewise if East is
known to have more spades than West, then the king is more likely to be one of those cards.
If the high level planner requires 5 tricks from the given card combination in Figure 1(a), then
the finesse would be the right play. But in addition the suit must break 2-2. This requirement will
reduce the probability of success. If on the other hand only 4 tricks are required then a safety play is
best, in which first the ace must be cashed, and then one must play small to the queen. This succeeds
when the king is with West, and it also works when East has the king singleton and West has J 10
9. Planning in bridge is replete with such context dependent choices, balancing risk and reward.
A bridge player has many such thematic plays in her repository, and the more expert a player
is the larger the repository. A specialized class of plays called squeezes has been implemented in
a logic program called Python (Sterling & Nygate, 1990). The high level algorithm is to evaluate
the situation; investigate different thematic plays; construct a set of feasible plans with associated
probabilities of success; select the best plan and execute it; keep monitoring the plan to spot an
opportunistic situation or signs of failure. Some of features used to evaluate the situation are as
follows:
• Count of winners. Each hand has a certain number T of top tricks that can be cashed uncondi-
tionally. If N is the number of contracted tricks, then (N-T) is the deficit, the number of tricks
that need to be developed.
• Count of losers. How many top tricks do opponents have? How many do they need to develop?
• Control. Who is winning the first trick. Remember the winner gets to choose the suit to play in
the next trick.
• Tempo. Related to Control. If both sides are trying to develop tricks, how often do they have to
relinquish control.
Assuming that the top tricks are not sufficient to fulfill the contract, or defeat it from the opponents
perspective, then both sides try and develop tricks to overcome the deficit, before the other one does.
This involves complex reasoning involving Control and Tempo, because the side which develops

11
D. K HEMANI AND S. S INGH

tricks faster, and can regain control, will succeed. The repertoire of thematic plays is a large one.
Some of the names of such plays found in literature are finesse, simple squeeze, double squeeze,
criss-cross squeeze, dummy reversal, cross-ruff, end-play, elopement, and so on. A TA is like a high
level or a macro action that needs a pattern of cards, and some preconditions. The preconditions are
about the hidden hands. They have probabilities associated with them. However these probabilities
vary dynamically as more information becomes available. The following section illustrates the plan
generation process from the program reported in (Khemani, 1989). The program was written in the
forward chaining rule-based language OPS5.

A sample deal
The deal given to the program is given below. The suits are listed top to down in the order Spades,
Hearts, Diamonds and Clubs. Following that is a trace of the program, generated by canned text
snippets associated with the decision-making rules. We underline some words and phrases to high-
light the terms used by bridge players for patterns and plays.

North South
a42 65
a93 k j 10 6 5
aj73 k 10 2
k75 a42

the contract is for 9 tricks in no trumps... ...let’s look at the hand ... ... k - 10 of diamonds ..a
tenace over east ... a - j of diamonds ..a tenace over west ...there is a two way tenace in diamonds
...there is a major tenace in hearts over west ... k - 10 of hearts ..a tenace over east ...there is a two
way tenace in hearts ... a - 9 of hearts ..a tenace over west ... k - j of hearts ..a tenace over east
..can afford to lose 4 more tricks.. ..can stand 5 - 3 break in spades .. ..there is danger if spades
break 6 and 2 .. ..can stand 6 - 1 break in clubs .. ..there is danger if clubs break 7 and 0..
...okay lets look at the lead.. ... west has led the k of spades .. ..an honour lead.. west should
have the q and j as well.. ..in that case spades should be safe.. ..can win this one. lets plan a little..
...starting plan 1010 ... ..we have 7 tricks.. need 2 more .. tempo available is 0 ...losers not more
than 4.. ... no direct approach ... ... no direct approach ... ..may lose 1 tricks in hearts ..assuming
spades break 4 and 4 ...will lose only 3 tricks in spades ... so...
...starting plan 1012 ... ..we have 7 tricks.. need 2 more .. tempo available is 2 ...losers not more
than 1... ...can get 2 tricks from hearts if they break 5 - 0 ...losing 1 ... ..may need 3 entries to cash
long cards.. ..actually only 2 since there are only 1 outstanding cards... ..an entry in hearts to be
used last... ..an entry in diamonds .. ...since west is the dangerous opponent in spades ..ducking 2
round and finessing the q of hearts against him will cut him off .. if he has more than 4 spades.. ..
in that case there are 2 less losers.. and as much more leeway.. ...run the j .. ...small to 9 .. ...check
entries.. ..an entry in clubs .. ..and it looks okay... lets hope west has the q ...one less tempo.. ..so
..that plan looks workable... ....and the detailed plan is...

trick 1 : play the 5 of spades to the 2 ... assuming opponents play spades next ...
trick 2 : play the 6 of spades to the 4 ... assuming opponents play spades next ...

12
C ONTRACT B RIDGE : M ULTI - AGENT A DVERSARIAL P LANNING IN AN U NCERTAIN E NVIRONMENT

trick 3 : cash the a of spades discarding from south


trick 4 : play the 3 of diamonds to the k
trick 5 : run the j of hearts following with the 3
trick 6 : play the 5 of hearts and finesse the 9
trick 7 : cash the a of hearts following with 6
trick 8 : cash the k of clubs following with 2
trick 9 : play the 5 of clubs to the a
trick 10 : cash the k of hearts discarding from north
trick 11 : play the 2 of diamonds to the k

7. Epistemic Reasoning and Deception


Communication is an essential component of bridge. The bidding phase is all about communication.
In defense, the opponents trying to defeat the contract also largely relies on communication. This is
because both the defenders are active, and neither knows the combined strength of their two hands.
The defenders employ signals, like high-card-encouraging, high-low, conventional leads like the
fourth-best, and suit-preference signals. We have not yet addressed the defensive play in this paper,
but suffice it to say the defenders have the means of communicating their own plans and intentions
to partner, and this information is also available for the declarer. So while each of the three active
players cannot see two hidden hands, there is plenty of information inferred from the communication
that is happening. Combined with the fact that bridge players are (generally) conversant with the
set of TAs being deployed, there is also the possibility of recognizing the opponent’s plan, and aim
to scuttle it. Another weapon to scuttle opponent’s plan is deception; and there is no better arena for
skullduggery than the bridge table. An imaginative defender can advertise a fake plan to lead the
declarer up the garden path. We illustrate this with a real life example from, aptly, the World War I.
Maurice Gray (1989 - 1918) was a dispatch rider with the British Army and a keen bridge player.
He was sitting West on the following hand (Truscott & Truscott, 2004) which we analyse from the
declarer’s perspective sitting South. The declarer was in a 3NT contract (to make 9 tricks) and could
see the hand as depicted in Figure 2. South could count 5 top tricks (Spade A, Heart A K and Q,
Club A) and needed to develop 4 more.
The best option was to develop the required tricks from diamonds. South had only 1 control is
spades. Consequently the tempo was 0 since once he took the Spade Ace, East would have many
spade winners. East, then, was the dangerous opponent and had to be kept out. The standard plan
in this situation would be to duck 2 rounds of spades to make sure West does not have any more,
and hope that West has the Ace of Diamonds. The play would be a small diamond to the king. It
would work in following possible worlds - (a) W:AQxx - E:nil, (b) W:AQx - E:x, (c) W:Axx - E:Q,
(d) W:AQ - E:xx, and (e) W:Ax - E:Qx. In (a) and (b) West gets 2 tricks but East is kept at bay. In
(d) West gets one trick only, as also in (c) and (e) where the play ensures that Q falls below the A or
the K.
Consequently South embarked upon this plan. The play went as follows. On trick 1 South
ducked and allowed East to win the Jack of Spades. On trick 2 likewise East was allowed to win the

13
D. K HEMANI AND S. S INGH

Figure 2. A 3 no trump contract after East has made a bid showing long spades and West has dutifully led the
spade 9.

King of Spade. South won trick 3 with the Ace of Spades and was surprised to see West jettisoning
the Ace of Diamonds!
South now made the following epistemic inferences - “West has understood South’s plan, and is
trying to counteract it by creating an entry for East. He does this by jettisoning the Ace of Diamonds
so that in the layout (e) South cannot develop diamond tricks without letting East win a trick with
the queen, who would then cash all his spades”. Consequently, inferring that it must be layout (e),
he abandoned the plan for developing diamond tricks, discarded the Diamond 6, and attempted to
try the alternate plan of developing club tricks.
When the smoke cleared he discovered that the layout was actually (b)! Sitting West Maurice
Gray had spun a web of deception leading South astray from the original plan that would have
worked. Gray imagined the layout (e) and played as if that was the true layout, and succeeded in
deceiving South.

8. Concluding Remarks
When a bridge playing program can execute a play like the one above we would surely have to
concede that it is intelligent. The key to such plays is the ability to imagine. The complexity arises
because an agent has to imagine another agent thinking about possible worlds, and reason with
imaginary worlds. The epistemic logic community has started investigating lying and deception, and
contract bridge should be an ideal testing ground for doing so. An added challenge is to somehow
circumvent the complexities of model checking with humongous Kripke structures, and alternate
ways of representations have to be found. One human like approach is to abstract away insignificant
cards into variables called ‘x’ as is done in the analysis above. This would compact the possible
worlds for reasoning, even when we do not construct explicit Kripke structures.
But this has to be done carefully, because the ranks of cards change dynamically. If the A, K, Q
and J have been played on trick 1, then the 10 would become the new ‘ace’! Consequently smaller
cards also gain in stature.
Apart from that there are situations when small cards have importance due to their low rank.
One such situation is when one is trying to execute an end-play called a throw-in. To make it work
the declarer may have have to carefully preserve small cards so that an opponent is compelled to

14
C ONTRACT B RIDGE : M ULTI - AGENT A DVERSARIAL P LANNING IN AN U NCERTAIN E NVIRONMENT

win when the suit is played. Thus they cannot be abstracted away. Thus considerable work needs
to be done in knowledge representation, but unless we do so, there is little hope of reasoning with
models of the agent’s environment in an informed manner.
Contract bridge, then, is a multiplayer incomplete information game that relies heavily on prob-
abilistic reasoning and contingent planning employing knowledge of how to tackle card combina-
tions. An important aspect of the game is communication between partners trying to achieve a
shared goal. This communication is not secret, but visible to opponents, who try to make use of the
intercepted information, thus opening up the possibility of deception. The game rises to a new level
when players players start reasoning about the intentions and plans of other players. This makes it
the secret goal of AI.

Acknowledgements
Our understanding of the Monte Carlo methods and current approaches to bidding greatly benefitted
from discussions with Arun Bahulkar and his team at TRDDC Pune. We also thank R Ramanujam
and Hans Ditmarsch of IMSc Chennai for insights into dynamic epistemic logic and deception.

References
Allis, L. (1994). Searching for solutions in games and artificial intelligence. Doctoral dissertation,
Maastricht University, Limburg, Netherlands.
Amit, A., & Markovitch, S. (2006). Learning to bid in bridge. Machine Learning, 63, 287–327.
Brown, N., & Sandholm, T. (2017). Superhuman AI for heads-up no-limit poker: Libratus beats top
professionals. Science, (p. eaao1733).
Campbell, M., Hoane Jr, A. J., & Hsu, F.-h. (2002). Deep Blue. Artificial intelligence, 134, 57–83.
Carley, G. L. (1962). A program to play contract bridge. Doctoral dissertation, Massachusetts
Institute of Technology, Department of Electrical Engineering.
DeLooze, L. L., & Downey, J. (2007). Bridge bidding with imperfect information. IEEE Symposium
on Computational Intelligence and Games (pp. 368–373). IEEE.
van Ditmarsch, H., van Der Hoek, W., & Kooi, B. (2007). Dynamic Epistemic Logic, volume 337.
Springer Science & Business Media.
Frank, I., & Basin, D. (1998). Search in games with incomplete information: A case study using
bridge card play. Artificial Intelligence, 100, 87–123.
Frank, I., Basin, D. A., & Matsubara, H. (1998). Finding optimal strategies for imperfect informa-
tion games. AAAI/IAAI (pp. 500–507).
Gambäck, B., Rayner, M., Yard, M., & Pell, B. (1993). Pragmatic reasoning in bridge. Technical
report, SRI International, Cambridge Computer Science Research Center, Cambridge.
Ginsberg, M. L. (1996). Partition search. Proceedings of AAAI-96.
Ginsberg, M. L. (1999). GIB: Steps toward an expert-level bridge-playing program. IJCAI (pp.
584–593).

15
D. K HEMANI AND S. S INGH

Ginsberg, M. L. (2001). GIB: Imperfect information in a computationally challenging game. Jour-


nal of Artificial Intelligence Research, 14, 303–358.
Ginsberg, M. L., & Harvey, W. D. (1992). Iterative broadening. Artificial Intelligence, 55, 367–383.
Khemani, D. (1989). Theme based planning in an uncertain environment. Doctoral dissertation,
Department of Computer Science and Engineering, IIT Bombay, Mumbai, India.
Khemani, D. (1994). Planning with thematic actions. AIPS (pp. 287–292). AAAI Press.
Khemani, D., & Ramakrishna, R. (1989). Bridge: A benchmark for knowledge based planning. J.
Artif. Intell., Cogn. Sci. Appl. Epistemol, 6, 137–151.
Levy, D. N. (1989). The million pound bridge program. Heuristic Programming in Artificial
Intelligence– The First Computer Olympiad, (pp. 95–103).
Lindelof, T. (1983). COBRA: the computer-designed bidding system. London: Victor Gollancz.
Schaeffer, J., Burch, N., Björnsson, Y., Kishimoto, A., Müller, M., Lake, R., Lu, P., & Sutphen, S.
(2007). Checkers is solved. Science, 317, 1518–1522.
Sheppard, B. (2002). World-championship-caliber scrabble. Artificial Intelligence, 134, 241–275.
Silver, D., et al. (2016). Mastering the game of Go with deep neural networks and tree search.
Nature, 529, 484–489.
Silver, D., et al. (2017a). Mastering chess and shogi by self-play with a general reinforcement
learning algorithm. arXiv preprint arXiv:1712.01815.
Silver, D., et al. (2017b). Mastering the game of Go without human knowledge. Nature, 550, 354.
Smith, S. J., Nau, D., & Throop, T. (1998a). Computer bridge: A big win for AI planning. AI
Magazine, 19, 93.
Smith, S. J., Nau, D. S., & Throop, T. A. (1996). A planning approach to declarer play in contract
bridge. Computational Intelligence, 12, 106–130.
Smith, S. J., Nau, D. S., & Throop, T. A. (1998b). Success in spades: Using AI planning techniques
to win the world championship of computer bridge. AAAI/IAAI, 98, 1079–1086.
Sterling, L., & Nygate, Y. (1990). Python: an expert squeezer. The Journal of Logic Programming,
8, 21–39.
Tesauro, G. (1995). Temporal difference learning and TD-gammon. Communications of the ACM,
38, 58–68.
Truscott, A., & Truscott, D. (2004). The new york times bridge book: An anecdotal history of the
development, personalities, and strategies of the world’s most popular card game. Macmillan.
Wasserman, A. I. (1970). Realization of a skillful bridge bidding program. Proceedings of the
November 17-19, 1970, fall joint computer conference (pp. 433–444). ACM.
Yeh, C.-K., & Lin, H.-T. (2016). Automatic bridge bidding using deep reinforcement learning.
arXiv preprint arXiv:1607.03290.

16

You might also like