Exploring Tractability in Finitely-Valued SAT
Solving⋆
Nika Pona
Vienna University of Technology
[email protected]
Abstract. In this paper I describe the progress, preliminary results and
future work directions of a project of implementing a many-valued SAT
solver based on a generalization of algorithms used in modern Boolean
SAT solvers. Mimicking Boolean SAT solvers minimizes the algorithmdesign and implementation challenges related to such a task, since many
ideas can be easily adapted to the many-valued setting. Experimental
results show that even on the early stages of the development a manyvalued solver can perform better on some problems than modern Boolean
SAT solvers.
1
Introduction and Motivation
The starting idea of the project was to see whether the current many-valued
solvers could be improved using the theoretical results in complexity of finitelyvalued logics [5]. The research of the state of the art in the field showed that
there are no complete many-valued SAT solvers available, and that the most
common approach to solve the problems modelled as many-valued formulae is
to reduce them to Boolean SAT1 or Satisfiability Modulo Theory (SAT with
Linear Arithmetic Theory) [2]2 . Previously some complete many-valued solvers
were implemented and the results seemed to be promising [7] [8], but the projects
were discontinued and the software is not available any more. Thus the task
became to implement a many-valued SAT solver first.
1.1
Why Many-Valued SAT?
SAT solving has enjoyed a lot of success in the last two decades due to an
organized effort of the growing community of researchers. Since any finitelyvalued logic formula can be efficiently mapped to an equisatisfiable Boolean
logic formula by encoding the information about the many-valued domain with
additional constraints (cf.[3]) it may seem that investing time into a separate
many-valued solvers is superfluous. There are two reasons to think that such an
implementation effort can be interesting:
⋆
1
2
This project is supported by the Austrian Science Fund (FWF): I836-N23.
For the description of the most common encodings and their properties see references
here: http://bach.istc.kobe-u.ac.jp/sugar/
There is a solver available online that uses this approach: http://www.iiia.csic.
es/~amanda/files/2012/NiBLoS.zip.
Many-valued SAT as generalized SAT Investigating many-valued logics
proved to be useful in complexity and proof theory, where Boolean logic is seen
as a special case with two truth values. One can expect that similar results can
be achieved with respect to algorithms for the SAT problem. The conflict-driven
DPLL algorithms that are the basis of all modern SAT solvers generalize easily to
the many-valued setting: the literal watching scheme, Unique Implication Point
learning method and counter-based decision heuristics can be implemented in
basically the same way as in a SAT Solver (more on this below). This means
that the effort required for designing and implementing a many-valued solver
is relatively small; at the same time, looking at the Boolean SAT algorithms as
special cases of a more general scheme can provide some useful insights into SAT
solving.
CSP and many-valued SAT Another reason to look into many-valued SAT
is that it can be seen as an intermediate language between Constraint Satisfaction Problems and SAT or even a better alternative to SAT when it comes to
CSP solving. We know that CSP can be efficiently translated into SAT, and this
fact was used in the CSP community to develop solvers. For instance, the CSP
solver Sugar used by Scala constraints language: http://bach.istc.kobe-u.
ac.jp/sugar/, http://bach.istc.kobe-u.ac.jp/copris/, furthermore several solvers of the MiniZinc CSP Challenge 2015 are based on translations to
SAT as well: see http://www.minizinc.org/. This is the easiest, but not necessarily the most effective approach, since the encodings can become quite big
and, most importantly, the structure of the formula is lost and many unnecessary propagations are made. One can translate CSP into a many-valued CNF
formula by representing no-goods of every constraint as a clause. Such translation preserves the structure (domains) of the original problem, thus it may be
more efficient to use a many-valued SAT Solver as a the back-end of a generic
CSP solver. Below I will provide an example that supports this claim.
1.2
Overview
The main point of this presentation is to show that creating a competitive solver
for many-valued logic is not as challenging as it may seem, and given the advantages of many-valued modelling it is a potentially fruitful direction of research. To the paper I attach the core solver that implements several versions
of a basic conflict-driven algorithm with a resolution-based learning procedure:
https://github.com/akinanop/mvl-solver3 .
After the basic definitions of Section 2, I first provide empirical results from
testing the implemented solver and some theoretical remarks on the advantages
of many-valued solving (Section 3), since they provide motivation for the implementation task undertaken. In particular, I give an example where modelling a
problem as a many-valued formula and solving it directly with a many-valued
3
For more details on the actual implementation, see the readme file and the wiki
pages of the project.
solver is significantly (one-two orders of magnitude just in terms of solving time)
more efficient than formalizing it as a Boolean formula and using a SAT solver,
even a competitive one. Then in the Section 4 I describe the general idea of the
implementation and finally in Section 5 point to further directions of development of the project.
2
Definitions
Definition 1 (Many-valued SAT). A many-valued SAT problem P = (V, D, C)
is specified by a finite set V of variables, collection of sets (domains) D and
the set C of clauses. Each variable v ∈ V has an associated finite domain
dom(v) ∈ D. To solve the many-valued SAT problem P is to determine whether
there is an interpretation that satisfies all clauses in C.
Definition 2 (Literal, clause). A clause is a finite set of literals. A literal is
an expression of the form v = x or v 6= x, where v ∈ V and x ∈ dom(v). A
literal of the form v = x is called positive; a negative literal is of the form v 6= x.
Alternatively, one could consider many-valued literals of the form v ∈ A with
A ⊆ dom(v). The former representation is closer to Boolean SAT, thus permits
easier adaptation of the Boolean SAT algorithms. In particular, the input to
the many-valued SAT Solver can be given in a format similar to DIMACS in
Boolean SAT4 . Although the second formulation can give some advantages to
many-valued SAT, it departs from Boolean SAT and may provide additional
implementation challenges, thus I leave its exploration for later. See, for instance
[8] for such a formulation.
Definition 3 (Interpretation, model). An interpretation is a function mapping each variable v ∈ V to a value from dom(v). An interpretation I satisfies
a positive literal v = x if I(v) = x, and satisfies a negative literal v 6= x if
I(v) 6= x. An interpretation satisfies a clause if it satisfies at least one of the
literals from the clause.
3
Modelling Advantage of Many-valued SAT
I use the developed solver to show that solving some problems directly as manyvalued problems can have a significant advantage. The authors of the previous attempts to create a complete many-valued solver argued that overall their solvers
performed better than the Boolean SAT solvers [7] [8]. However, these projects
date back 13 and 6 years respectively, thus it is possible that the progress in SAT
solving of the last decade made these results obsolete. Below I show an example
of what can be called an intrinsic advantage of the many-valued formulation of a
problem: in this case, despite the implementation advances in Boolean SAT, the
4
For exact specification see here: https://github.com/akinanop/mvl-solver/wiki/
Extended-DIMACS-format
many-valued solver still performs better. In particular, I compare the developed
many-valued solver to minisat and some competitive solvers on the Pigeonhole
problem and n-queens problem. Moreover, I show that encoding a problem into
Boolean SAT via many-valued formulation already gives an advantage in the
search.
3.1
Pigeonhole problem
Pigeonhole problem (PHP) is a famous unsatisfiable problem, since despite it’s
easy formulation: “it is impossible to fit n pigeons into n−1 holes, such that each
hole contains exactly one pigeon”, its unsatisfiability is known to be difficult to
prove via automatic means5 . I consider the following encodings of the PHP:
SAT The Boolean SAT PHP is usually formulated as a CNF formula with
variables xij for each pair i ∈ [n] and j ∈ [n − 1] and with two types of clauses
for all m ∈ [n − 1]:
W
1. i xim for i ∈ [n];
2. ¬xkm ∨ ¬xlm for k 6= l ∈ [n]
MVL The many-valued SAT PHP consists of n variables of domain n − 1.
Domain declaration express the condition that each pigeon should be placed in
some hole, and the clauses k 6= j ∨ l 6= j for k 6= l ∈ [n] and j ∈ [n − 1] express
the condition that no two pigeons should be placed in the same hole.
MVL-SAT Additionally I consider a different Boolean SAT formulation of the
PHP – created by automatically translating a many-valued PHP into a Boolean
formula using linear encoding described in [6]. Replace each (negative) literal
of a many-valued PHP with a (negated) boolean variable. As in SAT encoding
add clauses of type 1. Furthermore, for each many-valued variable v, introduce
|dom(v)| − 1 new Boolean variables vi which will be used to enforce the property
that at most one value has to be assigned to the variable. This will introduce only
linear increase in the size of the original problem, unlike if one does it naively
via binary inequalities v 6= i ∨ v 6= j. For i ∈ {2, . . . , |dom(v)| − 1} add:
1.
2.
3.
4.
¬vi−1 ∨ v 6= i;
v 6= i ∨ vi ;
¬vi ∨ vi−1 ∨ v = i;
¬v1 ∨ v = 1.
Below are the characteristics of these encodings:
5
Resolution-based proofs of unsatisfiability of pigeonhole problem have exponential
lower bounds. Pure CDPLL algorithms for SAT are not stronger than resolution,
thus this result carries over. However, this can be improved by introducing the socalled symmetry breaking clauses [1]. For instance, one of the winning solvers in 2015
lingeling uses symmetry-detecting preprocessing and thus solves PHP instances
fast: http://fmv.jku.at/papers/BiereLeBerreLoncaManthey-SAT14.pdf.
Table 1. Number of variables and clauses on PHP with n = 10 . . . 15
MVL
SAT
MVL-SAT
variables
clauses
variables
clauses
variables
clauses
10
11
12
13
14
15
405
550
726
936
1183
1470
90
110
132
156
182
210
415
561
738
949
1197
1485
170
209
252
299
350
405
725
946
1206
1508
1855
2250
Below you can see that mvl-solver needs less time then minisat on both
Boolean formulations of PHP6 . On n = 15 neither minisat, nor modern 20142015 winner solvers glucose and COMiniSatPS7 terminated within 24 hours,
whereas mvl-solver with both heuristics8 was finished within 10-17 hours. Since
the architecture of the mvl-solver is quite basic and not quite efficient yet (in
particular, the propagation is very slow – on big satisfiable graph coloring instances where only extensive propagation is needed mvl-solver performs slowly
compared to minisat that finishes instantly), one can make the conclusion the
difference lies in the modelling advantage of the many-valued SAT.
Table 2. Times (s) on PHP with n = 10 . . . 15
minisat
mvl-solver
COMiniSatPS
n
MVL-SAT
SAT
BK
VSIDS
SAT
15
13
12
11
10
t/o
19hrs
1061
49
3
t/o
t/o
1624
82
6
17hrs
3239
414
44
4
10hrs
999
170
26
3
t/o
13hrs
450
24
3
Below I also provide other statistics on this problem: the number of conflicts
is significantly smaller for the mvl-solver, which is responsible for its better
performance, since propagation is slow due to the experimental implementation.
6
7
8
All tests are done on a machine with Intel Core i3-6100 CPU @ 3.70GHz × 4 processor and 7.7 GB memory.
For the results of the 2015 SAT Race see here: http://baldur.iti.kit.edu/
sat-race-2015/index.php?cat=results
BK chooses the literal that maximizes propagation effect based on currently unsatisfied clauses; VSIDS chooses the literal that occurs in more clauses, then counts for
all the literals in the theory are divided by 2 after a learned clause is added to the
clause set.
From this table one can see the second interesting result: the decrease in all
indicators for MVL-SAT encoding compared to the SAT encoding: the additional
constraints added from MVL encoding help trim the search space considerably.
This confirms that exploiting structural information through MVL-encoding can
be beneficial on difficult, but structured problems9 .
Table 3. Other statistics on PHP with n = 10
Restarts
Conflicts
Decisions
Propagations
CPU time (s)
3.2
MVL-SAT
SAT
MVL (VSIDS)
1023
472432
522643
7077471
3
2047
1034642
1243538
12935371
6
0
1793
1793
50778
3
N-queens
I also compared the performance of minisat and mvl-solver on the n-queens
problems for n = 4 . . . 70, which are typically not very difficult, albeit large,
satisfiable problems. See Figure 1. below for the results of the tests10 . Here the
advantages are not as clear as in the case of the pigeonhole problem (overall the
solver perform worse time-wise), but despite of this some observations can be
made. The number of conflicts in minisat is quite small (less than 300 on any
instance), however, on more instances mvl-solver “got lucky” and had even
smaller number of conflicts or no conflicts at all; on the other hand, on some
cases it got stuck and needed up to 10-20 times more backtracks. In each case
minisat performed 2-4 restarts, which suggests that this could also be useful
in mvl-solver to avoid the bottlenecks. Then the performance could become
better overall, since in a many-valued case it is easier to guess a solution to these
problems. Some preliminary testing showed that on cases were mvl-solver got
stuck, restarts do improve the situation, however, more work is needed to provide
a stable improvement on all instances using restarts.
4
Algorithms and Implementation
Currently there are no complete many-valued solvers available to the public
that are not based on translations to SAT or SMT. Thus the main task of the
9
10
The creators of glucose complain that most solvers were created with the aim of
improving propagation (in order to learn more clauses faster), but this is not so
important for difficult cases. Thus they look into the structure of the problems and
learned clauses, hence the idea of useful strong “glue” clauses [4].
8 instances require more than 800 backtracks.
800
mvl-solver (bk)
minisat
700
600
Conflicts
500
400
300
200
100
0
0
10
20
30
Queens
40
50
Fig. 1. minisat and mvl-solver on n-queens
60
70
project was to develop such a solver. I reused parts of the open source software
(written in C++) created by a Master Student at the University of Minnesota in
2005: http://www.d.umn.edu/~lalx0004/research/. It contained some severe
algorithmic and implementation mistakes, but provided a good starting point.
Thus I kept the input/output part of the solver as well as most of the data
structures, but I implemented a different conflict analysis algorithm based on
Algorithm 7 in [6] and removed some redundancies. I also added literal watching
scheme and branching heuristics described in [7]11 . As a disadvantage of building
upon this solver, the choice of data-structures was restricted, which had an effect
on the overall efficiency of the solver.
The basic structure of CDPLL algorithms for Boolean SAT and many-valued
SAT is the same. Decision and propagations are made until a falsified clause is
found. Each decision literal increases the decision level. Every time a conflict
is reached, a so-called no-good (clause representing an impossible assignment,
derived from the “reason clauses” that lead to the conflict) is learned using
a particular method (here resolution is used), typically aiming at first Unique
Implication Point – the earliest propagation that causes the conflict. The learnt
clause is then added to the clause database and the backtrack level is computed
from it: upon backtrack the learned clause is unit, thus the propagation continues
from the backtrack level. The learnt clause is implied by the original clause set,
thus the addition doesn’t change the problem semantically. Moreover, the most
used VSIDS and Counter-based heuristics from Boolean SAT are easily adapted
and already improve the search drastically.
Data: Problem in extended DIMACS format
Result: SAT / UNSAT
while checkSat() 6= sat do
if checkSat() = conflict then
if level = 0 then
return UNSAT
end
level = analyzeConflict();
backtrack(level);
end
else if checkUnit() then
propagate(unitLiteral);
end
else
chooseLiteral();
propagate(decisionLiteral);
end
end
return SAT;
Algorithm 1: CDPLL
11
Thanks to Irene Hiess for implementing the VSIDS heuristic.
The main difference between Boolean SAT CDPLL and many-valued SAT
CDPLL lies in the propagation phase – when a positive literal is chosen, one also
has to propagate the negative literals with the remaining values. This makes one
choice more powerful and actually corresponding to many Boolean propagations.
If all except one value on a variable are assigned, then the remaining positive
literal is propagated (entail literal). Moreover, the generalized resolution is used,
since there more contradicting combinations of literals than in Boolean SAT.
Structural Information As I mentioned before, the main advantage of the
many-valued SAT in comparison to Boolean SAT is that the structural information is preserved in the many-valued formulation of the problem. When implementing a solver, one can exploit this feature in the following way: given the
domain size of a variable and a number of appearances of a variable in a clause we
have a threshold above which we know that the clause can still be satisfied, thus
we don’t have to visit such clauses. Here we don’t have to know a specific value
of a variable to be sure that the clauses where it appears are conflict-free. This
can improve checkSat(), checkUnit() and analyzeConflict() procedures12 .
Conflict Analysis I implemented a resolution-based algorithm that computes
the learned clause based on the first Unique Implication Point. The difference
with Boolean SAT is that one also uses the entail clauses in the resolution:
clauses stating that a variable should take at least one value from its domain.
These are “lazy clauses” – they are invoked only when needed during conflict
analysis and are not part of the clause database. This improves the efficiency of
solving by making the formulation of the problem smaller.
Propagation Compared to SAT, there are more variations of propagation,
simply because there are more types of choices possible. I follow the version which
is closest to SAT: either positive or negative literal is propagated. However, it is
also possible to choose or propagate several values at the same time. It is still
unclear which decisions are more interesting: choosing a positive literal trims
the search space considerably and leads to conflicts faster, especially in the case
of 2-SAT problems. However, the learned clauses after such decisions are quite
weak. Choosing a negative literal has less instant effect, since it removes only
one value from the domain of a variable, but its propagating is faster and can
lead to stronger propagation later after a clause is learned.
5
Future Work
5.1
Implementation
Data structures In order to take real advantage of the mentioned many-valued
features, better data-structures are needed. For instance, in order to efficiently
12
As currently implemented only the last effect is observable, in order to efficiently to
perform this pre-check I am changing the data-structures.
perform propagation after many-valued choices (when not only positive literals
are allowed to be chosen, but also literals with several possible values) one can
use bitset representation of domains and then use the bitset operations which
are very efficient. I am currently exploring this possibility. This way the watched
literal scheme can be improved as described above. See [8] for more details.
Graph-based learning Now the clause-learning relies on resolution; however,
there are other possibilities. In particular, there is a generalization of the Unique
Implication Point method specific to many-valued setting that permits to learn
stronger no-goods during the conflict analysis using the paths/cuts computations
on the implication graph of the problem. However, computing such clauses is not
linear as in our case [7]. But it may pay off since most of the time is spend on
propagation, and it may be more beneficial to avoid increase in propagation
rather than increase in time per conflict analysis.
Quality of learned clauses In SAT solving greedy learning scheme is used
and emphasis is put on fast propagation, and not on quality of learned clauses.
The currently winning solvers try to avoid this and concentrate on the quality
of the learned clauses. They rely on the idea of glue clauses [4] – learned clauses
that contain literals of only two levels. If such clauses are not removed and the
solver aims at learning them the performance improves13 .
Restart strategies The experience of SAT shows that restarts are important
in order to avoid bottle-necks in the search (also known as the heavy-tailedness
phenomenon [10]). As we have seen from the n-queens example, the search even
on easy problems can lead to wrong directions, thus such techniques should be
implemented.
Heuristics Given the role of many-valued SAT as an intermediate between CSP
and Boolean SAT, one could also use CSP heuristics for selecting a branching
variable that proved to be effective [9].
Benchmarks To test the solver I developed some benchmarks in extended DIMACS format14 https://github.com/akinanop/mvl-solver/wiki/Benchmarks,
as well as used some existing ones. Namely, some graph coloring problems:
www-users.cs.york.ac.uk/~frisch/NB/, mat.gsia.cmu.edu/COLOR/instances.
html#XXDSJ, random binary CSP: www.lirmm.fr/~bessiere/generator.html,
quasi-groups with holes: www.cs.cornell.edu/gomes/gs-csgc.pdf. There are
few difficult problems that are not 2-SAT, thus it may be interesting to find
more benchmarks of this type. However, we know that 2-SAT in many-valued
setting is already NP-complete. This fact could be used to gain more efficiency
by specializing the solver’s data-structures and methods to 2-SAT problems.
13
14
Take inspiration from glucose: http://www.labri.fr/perso/lsimon/glucose/.
Thanks to Pavlo Myronov for implementing the graph coloring problems translator.
5.2
Theoretical investigation
One can explain why Boolean SAT solvers became efficient using the notion of
a backdoor sets of variables [10]: a backdoor set is a set of variables of a propositional formula such that fixing the truth values of the variables in the backdoor
set moves the formula into some polynomial-time decidable class. Current best
heuristics guess these sets and then one solves polynomial sub-problems. Intuitively, a small backdoor set explains how a backtrack search can get “lucky”
on certain runs: the backdoor variables are identified early on in the search and
point in the right direction. It may be interesting to see whether the modelling
and solving in many-valued SAT makes it easier to identify such sets earlier on
some structured problems.
6
Conclusion
To summarize: in this project I developed a many-valued solver with several basic
conflict-driven algorithms generalized from Boolean SAT. My experience shows
that adapting the SAT algorithms to the many-valued setting can be worthwhile, given that the generalizations come naturally and don’t require special
theoretical effort. Provided the benefits of many-valued SAT solving mentioned
in the literature and exemplified by the case study here, it seems like a fruitful
direction of research.
References
1. Aloul, F.A., Ramani, A., Markov, I.L., Sakallah, K.A.: Solving difficult instances
of boolean satisfiability in the presence of symmetry. IEEE Trans. on CAD of
Integrated Circuits and Systems 22(9), 1117–1137 (2003), http://dx.doi.org/
10.1109/TCAD.2003.816218
2. Ansótegui, C., Bofill, M., Manyà, F., Villaret, M.: Automated theorem provers for
multiple-valued logics with satisfiability modulo theory solvers. Preprint submitted
to Fuzzy Sets and Systems (2015)
3. Anstegui, C., Many, F.: Mapping problems with finite-domain variables into problems with boolean variables. In: In SAT 2004. pp. 1–15. Springer LNCS (2004)
4. Audemard, G., Simon, L.: Predicting learnt clauses quality in modern SAT solvers.
In: IJCAI 2009, Proceedings of the 21st International Joint Conference on Artificial
Intelligence, Pasadena, California, USA, July 11-17, 2009. pp. 399–404 (2009),
http://ijcai.org/papers09/Papers/IJCAI09-074.pdf
5. Chepoi, V., Creignou, N., Hermann, M., Salzer, G.: The helly property and satisfiability of boolean formulas defined on set families. Eur. J. Comb. 31(2), 502–516
(2010), http://dx.doi.org/10.1016/j.ejc.2009.03.022
6. Jain, A.: Watched literals in a finite domain sat solver: Master thesis, University
of Minnesota (2005), http://www.d.umn.edu/~jainx086/Thesis_Report.pdf
7. Jain, S., O’Mahony, E., Sellmann, M.: A complete multi-valued SAT solver. In:
Principles and Practice of Constraint Programming - CP 2010 - 16th International
Conference, CP 2010, St. Andrews, Scotland, UK, September 6-10, 2010. Proceedings. pp. 281–296 (2010), http://dx.doi.org/10.1007/978-3-642-15396-9_24
8. Liu, C., Kuehlmann, A., Moskewicz, M.W.: CAMA: A multi-valued satisfiability solver. In: 2003 International Conference on Computer-Aided Design, ICCAD 2003, San Jose, CA, USA, November 9-13, 2003. pp. 326–333 (2003),
http://doi.ieeecomputersociety.org/10.1109/ICCAD.2003.1257732
9. Refalo, P.: Impact-based search strategies for constraint programming. In: Principles and Practice of Constraint Programming - CP 2004, 10th International Conference, CP 2004, Toronto, Canada, September 27 - October 1, 2004, Proceedings.
pp. 557–571 (2004), http://dx.doi.org/10.1007/978-3-540-30201-8_41
10. Ryan Williams, C.G., Selman, B.: On the connections between backdoors, restarts,
and heavy-tailedness in combinatorial search (2003), http://www.aladdin.cs.
cmu.edu/papers/pdfs/y2003/sat7.pdf