Details of the SAT solver

There are many resources which show that the problem of solving a Sudoku can be transformed into a Boolean Satisfiability Problem.

If a block has R rows and C columns then each row, column and block has D = R*C cells with numbers from 1 to D. There also are D rows, D columns and D blocks in the Sudoku puzzle.

The SAT solver will be invoked only after the puzzle is simplified as far as possible by standard methods like hidden singles, naked singles, block-line interaction, hidden pairs, naked pairs, basic fish etc. This preprocessing usually heavily reduces the number of variables actually needed in the SAT solver - eventually an "easy" Sudoku is even already completely solved. After the simplification the information to set up the SAT solver is contained in two arrays:

  • the array rc_n of length D2 of bitvectors of size D.
    rc_n[D*r + c] holds in its bitvector the candidates which still are possible in the cell in row r and column c.
  • the array rc_set of length D2 of integers.
    rc_set[D*r + c] holds the number n in the cell in row r and column c if it is occupied and 0 if it still empty.

A SAT solver has as input a formula in Conjunctive normal form (CNF) - usually coded in the DIMACS format. We have a boolean variable for each candidate v (1<=v<=D) in each row r (0<=r<D) and column c (0<=c<D), so there are D3 variables in total - but if the candidates could be deleted by the simplification process they are not used. In the DIMACS format the variables are expressed by natural numbers. I use following function to compute the variable name:

function varname(r, c, v: Integer): String;
begin
    result := IntToStr(r * D * D + c * D + v);
end;

The formula expressed in CNF must reflect the fact that there is exactly one number for a given row/column pair, exactly one column for a given row/number pair, exactly one row for a given column/number pair and exactly one block position for a given block/number pair in an valid Sudoku.

Suppose you have three boolean variables x1, x2 and x3 and exactly one of the variables should be true. You express this by the 4 clauses

x1 v x2 v x3 (true if at least one variable is true) and

¬ x1 v ¬ x2 (true if not both x1 and x2 are true) and

¬ x1 v ¬ x3 (true if not both x1 and x3 are true) and

¬ x2 v ¬ x3 (true if not both x2 and x3 are true).

So if you have k boolean variables and exactly one of the should be true you need k*(k - 1)/2 + 1 clauses if you generalize the method above.

Have a look at the following 4x4-Sudoku (D=4). Im principle it can be completely solved in the simplicfication phase with the standard methods but instead we completely solve it here with the SAT solver to demonstrate the method.

Two examples for cell clauses

In row 0, column 1 the number 4 is set. Hence the corresponding variable with name varname(0,1,4)="8" is true, In the DIMACS format, where each line has be terminated by "0" this is coded by the line

8 0

In the cell in row 1, column 2 the three candidates 1, 3 and 4 are possible. Exactly one of them is in the solution. The corresponding variable names are "25", "27" and "28". This leads to the clauses

25 27 28 0 (DIMACS format for "25" v "27" v "28")
-25 -27 0    (DIMACS format for ¬"25" v ¬"27")
-25 -28 0    (DIMACS format for ¬"25" v ¬"28")
-27 -28 0    (DIMACS format for ¬"27" v ¬"28")

As you can see in the DIMACS format the negation of a variable is represented by a "-" sign.

One example for row clauses

In row 1 the number 1 still is candidate in all columns 0, 1, 2 and 3. The corresponding variable names are "17", "21", "25" and "29". Exactly one of the variables is true. So we get

17 21 25 29 0
-17 -21 0
-17 -25 0
-17 -29 0
-21 -25 0
-21 -29 0
-25 -29 0

The same principle holds for the column clauses and the block clauses. The resulting CNF-formula has 64 (43) variables and 130 clauses. In the DIMACS format this has to be specified at the beginning of the file:

p cnf 64 130

Here is the complete cnf.txt file for this example.

And this for example is the general code to generate the clauses for the columns.


rcQ(row, col, num - 1) returns true iff the cell in row r and column c contains the candidate num.
n_clauses is a global variable which counts the number of clauses in the CNF-formula.

procedure column_clauses(clauses: TStringList);
var
   row, row2, col, num: Integer;
   s: String;
begin
   for col := 0 to D - 1 do
     for num := 1 to D do
     begin
         // each column contains each number at least once, dim^2 clauses
         s := '';
         for row := 0 to D - 1 do // iterate over rows of column
             if rcQ(row, col, num - 1) then
                 s := s + varname(row, col, num) + ' ';
         if s <> '' then
         begin
            clauses.Add(s + '0');
            Inc(n_clauses);
         end;
          // never two same numbers in one column, D^2*Binomial(D,2) clauses
         for row := 0 to D - 2 do
             if rcQ(row, col, num - 1) or (rc_set[D * row + col] = num) then
                  for row2 := row + 1 to D - 1 do
                      if rcQ(row2, col, num - 1) or (rc_set[D * row2 + col] = num) then
                      begin
                          clauses.Add('-' + varname(row, col, num) + ' -' + varname(row2,
                          col, num) + ' 0');
                          Inc(n_clauses);
                     end;
     end;//for num
end;

Once the cnf.txt file has been created the sat4j SAT solver is invoked with

java.exe -server -jar org.sat4j.core.jar cnf.txt

from the application.

The raw output from the SAT solver contains the two lines

s SATISFIABLE
v 2 8 -9 11 13 17 -19 -21 23 -25 -27 28 -29 30 -32 -33 36 37 42 47 -49 51 -52 -53 54 -55 57 -60 -61 64 0'

The first line indicates that there are values for the variables such that all clauses are satisfied. In the second line these values are displayed. Interesting for the solution are those variables which are true. This are are those 16 variables without "-" sign: 2, 8, 11, 13, 17, 23, 28, 30, 36, 37, 42, 47, 51, 54, 57 and 64.

For 28 the inverse of the varname function gives for example r = 1, c = 2, v = 4. The complete solution is

With a SAT solver it is also very easy to check if the found solution is unique. Just add one more clause which is false exactly for the found solution. In the example above this is the clause

-2 -8 -11 -13 -17 -23 -28 -30 -36 -37 -42 -47 -51 -54 -57 -64 0

Now the SAT solver returns UNSATISFIABLE which shows that the found solution was unique.

< Home > < Program usage >

© 2018  Herbert Kociemba