4

This is not a question, it is a curious discovery I wanted to share

The most common hashing algorithm in chess programming is, I believe, Zobrist hashing. However, I may have found a different way of hashing. Beware that I'm just an amateur, so this could be entirely wrong. I'm confident this method can be considered hashing, but I don't know how to prove it, and also, I don't know if it is new, but I couldn't find anything on it on the internet.

So, the algorithm is actually pretty simple to conceptualize, for a given position, you get the number of attackers for each square for black and white, and basically that is it. If we say the maximum number of attackers per square is 16, we can use a byte to save both white and black attackers for each square using only 64 bytes.

Why would the maximum be only 16? Because in any legal position there is a maximum of 16 pieces for each side, and so the maximum number of attackers can only be 16. As I said, this is only valid for legal positions. If you want any position, the maximum number of attackers would be:

  • 8 (for each square where a knight could be attacking) +
  • 14 (for each square a vertical/horizontal moving piece could attack us from) +
  • 13 (for each square a diagonal moving piece could attack us from),

so 35 maximum. From a computer science standpoint, this number is not as manageable as the first, that is why I chose only legal positions.

The way I conceptualize it being is the first 32 bytes for white and the rest for black, the first byte being for index one and two, squares a1 and b1 respectively, and so on for every byte.

A hashing algorithm must not yield the same result for two different inputs and must be difficult to work your way backwards.

This is the case for this algorithm, it is easy to compute the number of attackers per square, but it is hard to do it the other way round, and also, I believe, although as I said, I have no real proof of it, that every hash is unique to one position. Each hash is 64 bytes, so it is 256 bits, and so there are in theory 2^256 different hashes, despite some of them (or maybe most) being impossible to reach.

A chess board has around 5 * 10^44 positions, 2^256 equals 1.2 * 10^77, so it is a number a lot bigger than the number of legal chess positions. The chance of a hash repeating is really low. If you can prove me or show me how to prove that the hashes don't (or do) repeat, please do it.

A really nice property of this hash, is the if you sum all the white attackers and subtract from them all the black attackers, you get a somewhat stockfish accurate evaluation of the game. This is why:

A pawn, attacks 2 squares when it is not on the side of the board, so it contributes 2 to the eval. A knight, when it is in the middle, it attacks 8 squares, but the it is on the side or the corner it attacks less, so the eval naturally prefers knights in the middle than to the side, which is good. A similar thing happens with all the other pieces, except for the rook, that can attack 14 squares for any square it is on.

If we compute the average number of attacked squares for each piece, we get the following:

Pawn: 1.75
Knight: 5.25
Bishop: 8.06
Rook: 14.00
Queen: 22.06
King: 6.56

If we normalize it so a pawn has a value of 1 we get:

Pawn: 1.00
Knight: 3.00
Bishop: 4.61
Rook: 8.00
Queen: 12.61
King: 3.75

These piece values are not that far from the ones used on chess websites, in relative terms, everything is kind of the same, with two rooks being better than a queen, and in this case, a rook and a bishop being as valuable as a queen. The bishop is also more valuable than the knight, which is a thing many people take into account.

It is curious how such a simple algorithm can achieve almost the same behavior as some less generalized techniques like piece square tables.

Lastly, an advantage of this hash is that it is useful in neural networks, we can have 64 input neurons to encode the whole position. Some advanced chess neural networks could use this technique paired with some other important data for evaluating a position like which side it is to move. With this we can create evaluation functions based on neural networks that with only depth 1 pair which stockfish on depth 20,30 or more, it all comes down to the database.

If you found this interesting and want to build on it, you are free to do it, as long as you give me credit. I'm open to suggestions and questions.

Edit:
To clear some things, when I refer to the number of attackers I actually refer to the number of attackers without taking into account any blockers.

This hash is supposed to be a starting point for a more generalised evaluation function, the biggest problem with it is that calculating the number of attackers per square is slow, but that can be sped up by storing the bitboards representing the attacks for each square for each piece. I believe that a good programmer could make it faster, maybe by adding the attacks bitboards of each piece of a given side. It would be like (in terms of speed) piece square tables but instead of just adding values for each piece, we would do sums for each square, so while PST is 2 additions per piece, here it would be 64, so 32 times more. Probably using the fact that we only need 4 bits per square we could do some bit operations instead of adding ints like we would in PST. If we wanted to get an eval from this, it would be 128 additions if we don't use any weights. In the end, it would be still really slow, but maybe a little more useful?

Edit 2:
You are speaking a lot about tradeoffs in size, any idea on how to make this hash smaller, while keeping its characteristics (or at least most of them, the no collisions part is not a must, it's just a funny byproduct of this algorithm)?

6
  • 7
    What is the question?
    – qwr
    Commented Sep 30, 2023 at 21:15
  • 4
    "A hashing algorithm must not yield the same result for two different inputs and must be difficult to work your way backwards." Only for a perfect hash. Practical hash functions have a low chance of collision.
    – qwr
    Commented Sep 30, 2023 at 21:18
  • How low does that chance need to be? Commented Oct 1, 2023 at 18:03
  • From chessprogramming.org/Zobrist_Hashing it says 64-bit is a good tradeoff size. This is coveniently the word size on 64-bit processors and is 1/8 the length of your proposed 64 bytes.
    – qwr
    Commented Oct 1, 2023 at 23:15
  • 2
    Stack exchange sites are better suited for Q&A than discussion. The question from your post appears to be "has anyone done this before and how efficient is it?"
    – qwr
    Commented Oct 1, 2023 at 23:16

1 Answer 1

5

Let me start out by elaborating on Hauke's comment: Add friendly pieces/pawns to b5, c6, e6, f5, f3, e2. Then a Na1 is hashed the same as a Nd4. So as presented this function is not collision free.

And some more general remarks:

In chess you usually use (Zobrist) hashes for transposition tables. These are supposed to be very small - e.g. 64 Bits - to ensure that the transposition table entries themselves can be very small. Since there are much more than 2^64 chess positions, there will necessarily be many collisions, however, the way Zobrist hashes are built seems to in practice lead to few enough collisions that this is not a big problem. (this also has to do with the fact that in an alpha-beta search, very rarely getting a wrong evaluation turns out to not affect the search result most of the time)

If you want a function that has no collisions, there are some simple ways to do this, for example you can take a bitboard of occupied squares, which will use 64 Bit and have at most 32 ones set. Then for those 32 ones you add on a 32 Bit-board denoting the colours and then 32 x 3 Bit for the piece type on each of those 32 Bits, for a total of only 192 Bit or 24 Bytes. For a more compact representation you can probably use something like this to calculate an index for the position among the list of all possible chess positions https://github.com/tromp/ChessPositionRanking

This is not really done in chess engines however, since Zobrist hashing is super fast (usually a couple of XORs which can be done in nano seconds), yields a small result and is generally "good enough". However, if you need something reliably collision free you may use the above.

You also mention that you want it to be difficult to compute a preimage for the hash. This is the case when talking about cryptographic hash functions, however, I can not think of many practical chess applications where one would want/need this property. In any case I am doubtful one can make such a simple hash that is collision free and (cryptographically) hard to compute preimages.

Finally, you do mention that it approximates some evaluation functions. And that is indeed where I think this is useful. In particular, having a map of attackers per side and square can be useful in evaluation both directly (add everything up and you got a pure "activity eval" which while not super strong can play some reasonable-ish chess) as well as for further evaluation terms, for example you can count the number of attacks on squares adjacent to each sides king to get a measure for how endangered the respective kings are. Nowadays many engines use NNs directly as their evaluation function, however, at least back in the day that was a useful evaluation term to have. Furthermore, I think giving a NN something similar to this in addition to the position has been done in Shogi (?) back in the days. I do not know whether it is still used and/or whether it is/was at some point successful in chess.

(also this post was written while being sick so please excuse (and point out) any mistakes I may have made)

6
  • 1
    Here is my sha512 hash of a FEN that makes any player ELO 4000. c6dfaf29166623733a3ef8ee8254066096249da87d6034c9e7cfe1843aa8f40531bd5a9004a1075dcaf1261acc076a22f847124dcb602c58f8b9d5bad6009407 Commented Oct 3, 2023 at 13:13
  • Thanks, but I wasn't saying that I needed it to be difficult to compute a preimage for the hash, I said that that is a property of hashes. I don't know but, was it clear that what I meant with number of attackers is number of attackers without taking blockers into account? Because it makes the algorithm better and a lot faster at evaluating. I took part in the Sebastian League chess coding challenge, using this method and multiplying each square by an weight, while it was slow(but probably could be optimised), with enough training it was on pair with a simple PESTO Commented Oct 3, 2023 at 15:09
  • I unintentionally skipped the first part of your answer. I tested it and no, it doesn't have the same hash, because the algorithm doesn't take into account any blockers or occupied squares, the attacks are as if the pieces were in an empty board Commented Oct 3, 2023 at 15:32
  • Probably worthwhile to open a chatroom for this, how did this work again? Trying: chat.stackexchange.com/rooms/148905/attack-table-hashing
    – koedem
    Commented Oct 3, 2023 at 15:45
  • @koedem: I interpreted the hashing as "a friendly piece also counts the 'attacks'", so e.g. Pf5 has +1 with Nd4 but 0 with Na1. I tried to make up a counterexample even for that case but failed so far. After second thought: Pa7+Pc7+(Bb8 or Pb6) should collide in any interpretation. Commented Oct 3, 2023 at 21:07

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.