Academia.eduAcademia.edu

Exploring Tractability in Finitely-Valued SAT Solving

In this paper I describe the progress, preliminary results and future work directions of a project of implementing a many-valued SAT solver based on a generalization of algorithms used in modern Boolean SAT solvers. Mimicking Boolean SAT solvers minimizes the algorithm-design and implementation challenges related to such a task, since many ideas can be easily adapted to the many-valued setting. Experimental results show that even on the early stages of the development a many-valued solver can perform better on some problems than modern Boolean SAT solvers.

Exploring Tractability in Finitely-Valued SAT Solving⋆ Nika Pona Vienna University of Technology [email protected] Abstract. In this paper I describe the progress, preliminary results and future work directions of a project of implementing a many-valued SAT solver based on a generalization of algorithms used in modern Boolean SAT solvers. Mimicking Boolean SAT solvers minimizes the algorithmdesign and implementation challenges related to such a task, since many ideas can be easily adapted to the many-valued setting. Experimental results show that even on the early stages of the development a manyvalued solver can perform better on some problems than modern Boolean SAT solvers. 1 Introduction and Motivation The starting idea of the project was to see whether the current many-valued solvers could be improved using the theoretical results in complexity of finitelyvalued logics [5]. The research of the state of the art in the field showed that there are no complete many-valued SAT solvers available, and that the most common approach to solve the problems modelled as many-valued formulae is to reduce them to Boolean SAT1 or Satisfiability Modulo Theory (SAT with Linear Arithmetic Theory) [2]2 . Previously some complete many-valued solvers were implemented and the results seemed to be promising [7] [8], but the projects were discontinued and the software is not available any more. Thus the task became to implement a many-valued SAT solver first. 1.1 Why Many-Valued SAT? SAT solving has enjoyed a lot of success in the last two decades due to an organized effort of the growing community of researchers. Since any finitelyvalued logic formula can be efficiently mapped to an equisatisfiable Boolean logic formula by encoding the information about the many-valued domain with additional constraints (cf.[3]) it may seem that investing time into a separate many-valued solvers is superfluous. There are two reasons to think that such an implementation effort can be interesting: ⋆ 1 2 This project is supported by the Austrian Science Fund (FWF): I836-N23. For the description of the most common encodings and their properties see references here: http://bach.istc.kobe-u.ac.jp/sugar/ There is a solver available online that uses this approach: http://www.iiia.csic. es/~amanda/files/2012/NiBLoS.zip. Many-valued SAT as generalized SAT Investigating many-valued logics proved to be useful in complexity and proof theory, where Boolean logic is seen as a special case with two truth values. One can expect that similar results can be achieved with respect to algorithms for the SAT problem. The conflict-driven DPLL algorithms that are the basis of all modern SAT solvers generalize easily to the many-valued setting: the literal watching scheme, Unique Implication Point learning method and counter-based decision heuristics can be implemented in basically the same way as in a SAT Solver (more on this below). This means that the effort required for designing and implementing a many-valued solver is relatively small; at the same time, looking at the Boolean SAT algorithms as special cases of a more general scheme can provide some useful insights into SAT solving. CSP and many-valued SAT Another reason to look into many-valued SAT is that it can be seen as an intermediate language between Constraint Satisfaction Problems and SAT or even a better alternative to SAT when it comes to CSP solving. We know that CSP can be efficiently translated into SAT, and this fact was used in the CSP community to develop solvers. For instance, the CSP solver Sugar used by Scala constraints language: http://bach.istc.kobe-u. ac.jp/sugar/, http://bach.istc.kobe-u.ac.jp/copris/, furthermore several solvers of the MiniZinc CSP Challenge 2015 are based on translations to SAT as well: see http://www.minizinc.org/. This is the easiest, but not necessarily the most effective approach, since the encodings can become quite big and, most importantly, the structure of the formula is lost and many unnecessary propagations are made. One can translate CSP into a many-valued CNF formula by representing no-goods of every constraint as a clause. Such translation preserves the structure (domains) of the original problem, thus it may be more efficient to use a many-valued SAT Solver as a the back-end of a generic CSP solver. Below I will provide an example that supports this claim. 1.2 Overview The main point of this presentation is to show that creating a competitive solver for many-valued logic is not as challenging as it may seem, and given the advantages of many-valued modelling it is a potentially fruitful direction of research. To the paper I attach the core solver that implements several versions of a basic conflict-driven algorithm with a resolution-based learning procedure: https://github.com/akinanop/mvl-solver3 . After the basic definitions of Section 2, I first provide empirical results from testing the implemented solver and some theoretical remarks on the advantages of many-valued solving (Section 3), since they provide motivation for the implementation task undertaken. In particular, I give an example where modelling a problem as a many-valued formula and solving it directly with a many-valued 3 For more details on the actual implementation, see the readme file and the wiki pages of the project. solver is significantly (one-two orders of magnitude just in terms of solving time) more efficient than formalizing it as a Boolean formula and using a SAT solver, even a competitive one. Then in the Section 4 I describe the general idea of the implementation and finally in Section 5 point to further directions of development of the project. 2 Definitions Definition 1 (Many-valued SAT). A many-valued SAT problem P = (V, D, C) is specified by a finite set V of variables, collection of sets (domains) D and the set C of clauses. Each variable v ∈ V has an associated finite domain dom(v) ∈ D. To solve the many-valued SAT problem P is to determine whether there is an interpretation that satisfies all clauses in C. Definition 2 (Literal, clause). A clause is a finite set of literals. A literal is an expression of the form v = x or v 6= x, where v ∈ V and x ∈ dom(v). A literal of the form v = x is called positive; a negative literal is of the form v 6= x. Alternatively, one could consider many-valued literals of the form v ∈ A with A ⊆ dom(v). The former representation is closer to Boolean SAT, thus permits easier adaptation of the Boolean SAT algorithms. In particular, the input to the many-valued SAT Solver can be given in a format similar to DIMACS in Boolean SAT4 . Although the second formulation can give some advantages to many-valued SAT, it departs from Boolean SAT and may provide additional implementation challenges, thus I leave its exploration for later. See, for instance [8] for such a formulation. Definition 3 (Interpretation, model). An interpretation is a function mapping each variable v ∈ V to a value from dom(v). An interpretation I satisfies a positive literal v = x if I(v) = x, and satisfies a negative literal v 6= x if I(v) 6= x. An interpretation satisfies a clause if it satisfies at least one of the literals from the clause. 3 Modelling Advantage of Many-valued SAT I use the developed solver to show that solving some problems directly as manyvalued problems can have a significant advantage. The authors of the previous attempts to create a complete many-valued solver argued that overall their solvers performed better than the Boolean SAT solvers [7] [8]. However, these projects date back 13 and 6 years respectively, thus it is possible that the progress in SAT solving of the last decade made these results obsolete. Below I show an example of what can be called an intrinsic advantage of the many-valued formulation of a problem: in this case, despite the implementation advances in Boolean SAT, the 4 For exact specification see here: https://github.com/akinanop/mvl-solver/wiki/ Extended-DIMACS-format many-valued solver still performs better. In particular, I compare the developed many-valued solver to minisat and some competitive solvers on the Pigeonhole problem and n-queens problem. Moreover, I show that encoding a problem into Boolean SAT via many-valued formulation already gives an advantage in the search. 3.1 Pigeonhole problem Pigeonhole problem (PHP) is a famous unsatisfiable problem, since despite it’s easy formulation: “it is impossible to fit n pigeons into n−1 holes, such that each hole contains exactly one pigeon”, its unsatisfiability is known to be difficult to prove via automatic means5 . I consider the following encodings of the PHP: SAT The Boolean SAT PHP is usually formulated as a CNF formula with variables xij for each pair i ∈ [n] and j ∈ [n − 1] and with two types of clauses for all m ∈ [n − 1]: W 1. i xim for i ∈ [n]; 2. ¬xkm ∨ ¬xlm for k 6= l ∈ [n] MVL The many-valued SAT PHP consists of n variables of domain n − 1. Domain declaration express the condition that each pigeon should be placed in some hole, and the clauses k 6= j ∨ l 6= j for k 6= l ∈ [n] and j ∈ [n − 1] express the condition that no two pigeons should be placed in the same hole. MVL-SAT Additionally I consider a different Boolean SAT formulation of the PHP – created by automatically translating a many-valued PHP into a Boolean formula using linear encoding described in [6]. Replace each (negative) literal of a many-valued PHP with a (negated) boolean variable. As in SAT encoding add clauses of type 1. Furthermore, for each many-valued variable v, introduce |dom(v)| − 1 new Boolean variables vi which will be used to enforce the property that at most one value has to be assigned to the variable. This will introduce only linear increase in the size of the original problem, unlike if one does it naively via binary inequalities v 6= i ∨ v 6= j. For i ∈ {2, . . . , |dom(v)| − 1} add: 1. 2. 3. 4. ¬vi−1 ∨ v 6= i; v 6= i ∨ vi ; ¬vi ∨ vi−1 ∨ v = i; ¬v1 ∨ v = 1. Below are the characteristics of these encodings: 5 Resolution-based proofs of unsatisfiability of pigeonhole problem have exponential lower bounds. Pure CDPLL algorithms for SAT are not stronger than resolution, thus this result carries over. However, this can be improved by introducing the socalled symmetry breaking clauses [1]. For instance, one of the winning solvers in 2015 lingeling uses symmetry-detecting preprocessing and thus solves PHP instances fast: http://fmv.jku.at/papers/BiereLeBerreLoncaManthey-SAT14.pdf. Table 1. Number of variables and clauses on PHP with n = 10 . . . 15 MVL SAT MVL-SAT variables clauses variables clauses variables clauses 10 11 12 13 14 15 405 550 726 936 1183 1470 90 110 132 156 182 210 415 561 738 949 1197 1485 170 209 252 299 350 405 725 946 1206 1508 1855 2250 Below you can see that mvl-solver needs less time then minisat on both Boolean formulations of PHP6 . On n = 15 neither minisat, nor modern 20142015 winner solvers glucose and COMiniSatPS7 terminated within 24 hours, whereas mvl-solver with both heuristics8 was finished within 10-17 hours. Since the architecture of the mvl-solver is quite basic and not quite efficient yet (in particular, the propagation is very slow – on big satisfiable graph coloring instances where only extensive propagation is needed mvl-solver performs slowly compared to minisat that finishes instantly), one can make the conclusion the difference lies in the modelling advantage of the many-valued SAT. Table 2. Times (s) on PHP with n = 10 . . . 15 minisat mvl-solver COMiniSatPS n MVL-SAT SAT BK VSIDS SAT 15 13 12 11 10 t/o 19hrs 1061 49 3 t/o t/o 1624 82 6 17hrs 3239 414 44 4 10hrs 999 170 26 3 t/o 13hrs 450 24 3 Below I also provide other statistics on this problem: the number of conflicts is significantly smaller for the mvl-solver, which is responsible for its better performance, since propagation is slow due to the experimental implementation. 6 7 8 All tests are done on a machine with Intel Core i3-6100 CPU @ 3.70GHz × 4 processor and 7.7 GB memory. For the results of the 2015 SAT Race see here: http://baldur.iti.kit.edu/ sat-race-2015/index.php?cat=results BK chooses the literal that maximizes propagation effect based on currently unsatisfied clauses; VSIDS chooses the literal that occurs in more clauses, then counts for all the literals in the theory are divided by 2 after a learned clause is added to the clause set. From this table one can see the second interesting result: the decrease in all indicators for MVL-SAT encoding compared to the SAT encoding: the additional constraints added from MVL encoding help trim the search space considerably. This confirms that exploiting structural information through MVL-encoding can be beneficial on difficult, but structured problems9 . Table 3. Other statistics on PHP with n = 10 Restarts Conflicts Decisions Propagations CPU time (s) 3.2 MVL-SAT SAT MVL (VSIDS) 1023 472432 522643 7077471 3 2047 1034642 1243538 12935371 6 0 1793 1793 50778 3 N-queens I also compared the performance of minisat and mvl-solver on the n-queens problems for n = 4 . . . 70, which are typically not very difficult, albeit large, satisfiable problems. See Figure 1. below for the results of the tests10 . Here the advantages are not as clear as in the case of the pigeonhole problem (overall the solver perform worse time-wise), but despite of this some observations can be made. The number of conflicts in minisat is quite small (less than 300 on any instance), however, on more instances mvl-solver “got lucky” and had even smaller number of conflicts or no conflicts at all; on the other hand, on some cases it got stuck and needed up to 10-20 times more backtracks. In each case minisat performed 2-4 restarts, which suggests that this could also be useful in mvl-solver to avoid the bottlenecks. Then the performance could become better overall, since in a many-valued case it is easier to guess a solution to these problems. Some preliminary testing showed that on cases were mvl-solver got stuck, restarts do improve the situation, however, more work is needed to provide a stable improvement on all instances using restarts. 4 Algorithms and Implementation Currently there are no complete many-valued solvers available to the public that are not based on translations to SAT or SMT. Thus the main task of the 9 10 The creators of glucose complain that most solvers were created with the aim of improving propagation (in order to learn more clauses faster), but this is not so important for difficult cases. Thus they look into the structure of the problems and learned clauses, hence the idea of useful strong “glue” clauses [4]. 8 instances require more than 800 backtracks. 800 mvl-solver (bk) minisat 700 600 Conflicts 500 400 300 200 100 0 0 10 20 30 Queens 40 50 Fig. 1. minisat and mvl-solver on n-queens 60 70 project was to develop such a solver. I reused parts of the open source software (written in C++) created by a Master Student at the University of Minnesota in 2005: http://www.d.umn.edu/~lalx0004/research/. It contained some severe algorithmic and implementation mistakes, but provided a good starting point. Thus I kept the input/output part of the solver as well as most of the data structures, but I implemented a different conflict analysis algorithm based on Algorithm 7 in [6] and removed some redundancies. I also added literal watching scheme and branching heuristics described in [7]11 . As a disadvantage of building upon this solver, the choice of data-structures was restricted, which had an effect on the overall efficiency of the solver. The basic structure of CDPLL algorithms for Boolean SAT and many-valued SAT is the same. Decision and propagations are made until a falsified clause is found. Each decision literal increases the decision level. Every time a conflict is reached, a so-called no-good (clause representing an impossible assignment, derived from the “reason clauses” that lead to the conflict) is learned using a particular method (here resolution is used), typically aiming at first Unique Implication Point – the earliest propagation that causes the conflict. The learnt clause is then added to the clause database and the backtrack level is computed from it: upon backtrack the learned clause is unit, thus the propagation continues from the backtrack level. The learnt clause is implied by the original clause set, thus the addition doesn’t change the problem semantically. Moreover, the most used VSIDS and Counter-based heuristics from Boolean SAT are easily adapted and already improve the search drastically. Data: Problem in extended DIMACS format Result: SAT / UNSAT while checkSat() 6= sat do if checkSat() = conflict then if level = 0 then return UNSAT end level = analyzeConflict(); backtrack(level); end else if checkUnit() then propagate(unitLiteral); end else chooseLiteral(); propagate(decisionLiteral); end end return SAT; Algorithm 1: CDPLL 11 Thanks to Irene Hiess for implementing the VSIDS heuristic. The main difference between Boolean SAT CDPLL and many-valued SAT CDPLL lies in the propagation phase – when a positive literal is chosen, one also has to propagate the negative literals with the remaining values. This makes one choice more powerful and actually corresponding to many Boolean propagations. If all except one value on a variable are assigned, then the remaining positive literal is propagated (entail literal). Moreover, the generalized resolution is used, since there more contradicting combinations of literals than in Boolean SAT. Structural Information As I mentioned before, the main advantage of the many-valued SAT in comparison to Boolean SAT is that the structural information is preserved in the many-valued formulation of the problem. When implementing a solver, one can exploit this feature in the following way: given the domain size of a variable and a number of appearances of a variable in a clause we have a threshold above which we know that the clause can still be satisfied, thus we don’t have to visit such clauses. Here we don’t have to know a specific value of a variable to be sure that the clauses where it appears are conflict-free. This can improve checkSat(), checkUnit() and analyzeConflict() procedures12 . Conflict Analysis I implemented a resolution-based algorithm that computes the learned clause based on the first Unique Implication Point. The difference with Boolean SAT is that one also uses the entail clauses in the resolution: clauses stating that a variable should take at least one value from its domain. These are “lazy clauses” – they are invoked only when needed during conflict analysis and are not part of the clause database. This improves the efficiency of solving by making the formulation of the problem smaller. Propagation Compared to SAT, there are more variations of propagation, simply because there are more types of choices possible. I follow the version which is closest to SAT: either positive or negative literal is propagated. However, it is also possible to choose or propagate several values at the same time. It is still unclear which decisions are more interesting: choosing a positive literal trims the search space considerably and leads to conflicts faster, especially in the case of 2-SAT problems. However, the learned clauses after such decisions are quite weak. Choosing a negative literal has less instant effect, since it removes only one value from the domain of a variable, but its propagating is faster and can lead to stronger propagation later after a clause is learned. 5 Future Work 5.1 Implementation Data structures In order to take real advantage of the mentioned many-valued features, better data-structures are needed. For instance, in order to efficiently 12 As currently implemented only the last effect is observable, in order to efficiently to perform this pre-check I am changing the data-structures. perform propagation after many-valued choices (when not only positive literals are allowed to be chosen, but also literals with several possible values) one can use bitset representation of domains and then use the bitset operations which are very efficient. I am currently exploring this possibility. This way the watched literal scheme can be improved as described above. See [8] for more details. Graph-based learning Now the clause-learning relies on resolution; however, there are other possibilities. In particular, there is a generalization of the Unique Implication Point method specific to many-valued setting that permits to learn stronger no-goods during the conflict analysis using the paths/cuts computations on the implication graph of the problem. However, computing such clauses is not linear as in our case [7]. But it may pay off since most of the time is spend on propagation, and it may be more beneficial to avoid increase in propagation rather than increase in time per conflict analysis. Quality of learned clauses In SAT solving greedy learning scheme is used and emphasis is put on fast propagation, and not on quality of learned clauses. The currently winning solvers try to avoid this and concentrate on the quality of the learned clauses. They rely on the idea of glue clauses [4] – learned clauses that contain literals of only two levels. If such clauses are not removed and the solver aims at learning them the performance improves13 . Restart strategies The experience of SAT shows that restarts are important in order to avoid bottle-necks in the search (also known as the heavy-tailedness phenomenon [10]). As we have seen from the n-queens example, the search even on easy problems can lead to wrong directions, thus such techniques should be implemented. Heuristics Given the role of many-valued SAT as an intermediate between CSP and Boolean SAT, one could also use CSP heuristics for selecting a branching variable that proved to be effective [9]. Benchmarks To test the solver I developed some benchmarks in extended DIMACS format14 https://github.com/akinanop/mvl-solver/wiki/Benchmarks, as well as used some existing ones. Namely, some graph coloring problems: www-users.cs.york.ac.uk/~frisch/NB/, mat.gsia.cmu.edu/COLOR/instances. html#XXDSJ, random binary CSP: www.lirmm.fr/~bessiere/generator.html, quasi-groups with holes: www.cs.cornell.edu/gomes/gs-csgc.pdf. There are few difficult problems that are not 2-SAT, thus it may be interesting to find more benchmarks of this type. However, we know that 2-SAT in many-valued setting is already NP-complete. This fact could be used to gain more efficiency by specializing the solver’s data-structures and methods to 2-SAT problems. 13 14 Take inspiration from glucose: http://www.labri.fr/perso/lsimon/glucose/. Thanks to Pavlo Myronov for implementing the graph coloring problems translator. 5.2 Theoretical investigation One can explain why Boolean SAT solvers became efficient using the notion of a backdoor sets of variables [10]: a backdoor set is a set of variables of a propositional formula such that fixing the truth values of the variables in the backdoor set moves the formula into some polynomial-time decidable class. Current best heuristics guess these sets and then one solves polynomial sub-problems. Intuitively, a small backdoor set explains how a backtrack search can get “lucky” on certain runs: the backdoor variables are identified early on in the search and point in the right direction. It may be interesting to see whether the modelling and solving in many-valued SAT makes it easier to identify such sets earlier on some structured problems. 6 Conclusion To summarize: in this project I developed a many-valued solver with several basic conflict-driven algorithms generalized from Boolean SAT. My experience shows that adapting the SAT algorithms to the many-valued setting can be worthwhile, given that the generalizations come naturally and don’t require special theoretical effort. Provided the benefits of many-valued SAT solving mentioned in the literature and exemplified by the case study here, it seems like a fruitful direction of research. References 1. Aloul, F.A., Ramani, A., Markov, I.L., Sakallah, K.A.: Solving difficult instances of boolean satisfiability in the presence of symmetry. IEEE Trans. on CAD of Integrated Circuits and Systems 22(9), 1117–1137 (2003), http://dx.doi.org/ 10.1109/TCAD.2003.816218 2. Ansótegui, C., Bofill, M., Manyà, F., Villaret, M.: Automated theorem provers for multiple-valued logics with satisfiability modulo theory solvers. Preprint submitted to Fuzzy Sets and Systems (2015) 3. Anstegui, C., Many, F.: Mapping problems with finite-domain variables into problems with boolean variables. In: In SAT 2004. pp. 1–15. Springer LNCS (2004) 4. Audemard, G., Simon, L.: Predicting learnt clauses quality in modern SAT solvers. In: IJCAI 2009, Proceedings of the 21st International Joint Conference on Artificial Intelligence, Pasadena, California, USA, July 11-17, 2009. pp. 399–404 (2009), http://ijcai.org/papers09/Papers/IJCAI09-074.pdf 5. Chepoi, V., Creignou, N., Hermann, M., Salzer, G.: The helly property and satisfiability of boolean formulas defined on set families. Eur. J. Comb. 31(2), 502–516 (2010), http://dx.doi.org/10.1016/j.ejc.2009.03.022 6. Jain, A.: Watched literals in a finite domain sat solver: Master thesis, University of Minnesota (2005), http://www.d.umn.edu/~jainx086/Thesis_Report.pdf 7. Jain, S., O’Mahony, E., Sellmann, M.: A complete multi-valued SAT solver. In: Principles and Practice of Constraint Programming - CP 2010 - 16th International Conference, CP 2010, St. Andrews, Scotland, UK, September 6-10, 2010. Proceedings. pp. 281–296 (2010), http://dx.doi.org/10.1007/978-3-642-15396-9_24 8. Liu, C., Kuehlmann, A., Moskewicz, M.W.: CAMA: A multi-valued satisfiability solver. In: 2003 International Conference on Computer-Aided Design, ICCAD 2003, San Jose, CA, USA, November 9-13, 2003. pp. 326–333 (2003), http://doi.ieeecomputersociety.org/10.1109/ICCAD.2003.1257732 9. Refalo, P.: Impact-based search strategies for constraint programming. In: Principles and Practice of Constraint Programming - CP 2004, 10th International Conference, CP 2004, Toronto, Canada, September 27 - October 1, 2004, Proceedings. pp. 557–571 (2004), http://dx.doi.org/10.1007/978-3-540-30201-8_41 10. Ryan Williams, C.G., Selman, B.: On the connections between backdoors, restarts, and heavy-tailedness in combinatorial search (2003), http://www.aladdin.cs. cmu.edu/papers/pdfs/y2003/sat7.pdf