This is not a question, it is a curious discovery I wanted to share
The most common hashing algorithm in chess programming is, I believe, Zobrist hashing. However, I may have found a different way of hashing. Beware that I'm just an amateur, so this could be entirely wrong. I'm confident this method can be considered hashing, but I don't know how to prove it, and also, I don't know if it is new, but I couldn't find anything on it on the internet.
So, the algorithm is actually pretty simple to conceptualize, for a given position, you get the number of attackers for each square for black and white, and basically that is it. If we say the maximum number of attackers per square is 16, we can use a byte to save both white and black attackers for each square using only 64 bytes.
Why would the maximum be only 16? Because in any legal position there is a maximum of 16 pieces for each side, and so the maximum number of attackers can only be 16. As I said, this is only valid for legal positions. If you want any position, the maximum number of attackers would be:
- 8 (for each square where a knight could be attacking) +
- 14 (for each square a vertical/horizontal moving piece could attack us from) +
- 13 (for each square a diagonal moving piece could attack us from),
so 35 maximum. From a computer science standpoint, this number is not as manageable as the first, that is why I chose only legal positions.
The way I conceptualize it being is the first 32 bytes for white and the rest for black, the first byte being for index one and two, squares a1 and b1 respectively, and so on for every byte.
A hashing algorithm must not yield the same result for two different inputs and must be difficult to work your way backwards.
This is the case for this algorithm, it is easy to compute the number of attackers per square, but it is hard to do it the other way round, and also, I believe, although as I said, I have no real proof of it, that every hash is unique to one position. Each hash is 64 bytes, so it is 256 bits, and so there are in theory 2^256 different hashes, despite some of them (or maybe most) being impossible to reach.
A chess board has around 5 * 10^44 positions, 2^256 equals 1.2 * 10^77, so it is a number a lot bigger than the number of legal chess positions. The chance of a hash repeating is really low. If you can prove me or show me how to prove that the hashes don't (or do) repeat, please do it.
A really nice property of this hash, is the if you sum all the white attackers and subtract from them all the black attackers, you get a somewhat stockfish accurate evaluation of the game. This is why:
A pawn, attacks 2 squares when it is not on the side of the board, so it contributes 2 to the eval. A knight, when it is in the middle, it attacks 8 squares, but the it is on the side or the corner it attacks less, so the eval naturally prefers knights in the middle than to the side, which is good. A similar thing happens with all the other pieces, except for the rook, that can attack 14 squares for any square it is on.
If we compute the average number of attacked squares for each piece, we get the following:
Pawn: 1.75
Knight: 5.25
Bishop: 8.06
Rook: 14.00
Queen: 22.06
King: 6.56
If we normalize it so a pawn has a value of 1 we get:
Pawn: 1.00
Knight: 3.00
Bishop: 4.61
Rook: 8.00
Queen: 12.61
King: 3.75
These piece values are not that far from the ones used on chess websites, in relative terms, everything is kind of the same, with two rooks being better than a queen, and in this case, a rook and a bishop being as valuable as a queen. The bishop is also more valuable than the knight, which is a thing many people take into account.
It is curious how such a simple algorithm can achieve almost the same behavior as some less generalized techniques like piece square tables.
Lastly, an advantage of this hash is that it is useful in neural networks, we can have 64 input neurons to encode the whole position. Some advanced chess neural networks could use this technique paired with some other important data for evaluating a position like which side it is to move. With this we can create evaluation functions based on neural networks that with only depth 1 pair which stockfish on depth 20,30 or more, it all comes down to the database.
If you found this interesting and want to build on it, you are free to do it, as long as you give me credit. I'm open to suggestions and questions.
Edit:
To clear some things, when I refer to the number of attackers I actually refer to the number of attackers without taking into account any blockers.
This hash is supposed to be a starting point for a more generalised evaluation function, the biggest problem with it is that calculating the number of attackers per square is slow, but that can be sped up by storing the bitboards representing the attacks for each square for each piece. I believe that a good programmer could make it faster, maybe by adding the attacks bitboards of each piece of a given side. It would be like (in terms of speed) piece square tables but instead of just adding values for each piece, we would do sums for each square, so while PST is 2 additions per piece, here it would be 64, so 32 times more. Probably using the fact that we only need 4 bits per square we could do some bit operations instead of adding ints like we would in PST. If we wanted to get an eval from this, it would be 128 additions if we don't use any weights. In the end, it would be still really slow, but maybe a little more useful?
Edit 2:
You are speaking a lot about tradeoffs in size, any idea on how to make this hash smaller, while keeping its characteristics (or at least most of them, the no collisions part is not a must, it's just a funny byproduct of this algorithm)?