0

I found this code in an older program from Angus Johnson:

const
  table:  ARRAY[0..255] OF DWORD =
 ($00000000, $77073096, $EE0E612C, $990951BA,
  $076DC419, $706AF48F, $E963A535, $9E6495A3,
  $0EDB8832, $79DCB8A4, $E0D5E91E, $97D2D988,
  $09B64C2B, $7EB17CBD, $E7B82D07, $90BF1D91,
  $1DB71064, $6AB020F2, $F3B97148, $84BE41DE,
  $1ADAD47D, $6DDDE4EB, $F4D4B551, $83D385C7,
  $136C9856, $646BA8C0, $FD62F97A, $8A65C9EC,
  $14015C4F, $63066CD9, $FA0F3D63, $8D080DF5,
  $3B6E20C8, $4C69105E, $D56041E4, $A2677172,
  $3C03E4D1, $4B04D447, $D20D85FD, $A50AB56B,
  $35B5A8FA, $42B2986C, $DBBBC9D6, $ACBCF940,
  $32D86CE3, $45DF5C75, $DCD60DCF, $ABD13D59,
  $26D930AC, $51DE003A, $C8D75180, $BFD06116,
  $21B4F4B5, $56B3C423, $CFBA9599, $B8BDA50F,
  $2802B89E, $5F058808, $C60CD9B2, $B10BE924,
  $2F6F7C87, $58684C11, $C1611DAB, $B6662D3D,

  $76DC4190, $01DB7106, $98D220BC, $EFD5102A,
  $71B18589, $06B6B51F, $9FBFE4A5, $E8B8D433,
  $7807C9A2, $0F00F934, $9609A88E, $E10E9818,
  $7F6A0DBB, $086D3D2D, $91646C97, $E6635C01,
  $6B6B51F4, $1C6C6162, $856530D8, $F262004E,
  $6C0695ED, $1B01A57B, $8208F4C1, $F50FC457,
  $65B0D9C6, $12B7E950, $8BBEB8EA, $FCB9887C,
  $62DD1DDF, $15DA2D49, $8CD37CF3, $FBD44C65,
  $4DB26158, $3AB551CE, $A3BC0074, $D4BB30E2,
  $4ADFA541, $3DD895D7, $A4D1C46D, $D3D6F4FB,
  $4369E96A, $346ED9FC, $AD678846, $DA60B8D0,
  $44042D73, $33031DE5, $AA0A4C5F, $DD0D7CC9,
  $5005713C, $270241AA, $BE0B1010, $C90C2086,
  $5768B525, $206F85B3, $B966D409, $CE61E49F,
  $5EDEF90E, $29D9C998, $B0D09822, $C7D7A8B4,
  $59B33D17, $2EB40D81, $B7BD5C3B, $C0BA6CAD,

  $EDB88320, $9ABFB3B6, $03B6E20C, $74B1D29A,
  $EAD54739, $9DD277AF, $04DB2615, $73DC1683,
  $E3630B12, $94643B84, $0D6D6A3E, $7A6A5AA8,
  $E40ECF0B, $9309FF9D, $0A00AE27, $7D079EB1,
  $F00F9344, $8708A3D2, $1E01F268, $6906C2FE,
  $F762575D, $806567CB, $196C3671, $6E6B06E7,
  $FED41B76, $89D32BE0, $10DA7A5A, $67DD4ACC,
  $F9B9DF6F, $8EBEEFF9, $17B7BE43, $60B08ED5,
  $D6D6A3E8, $A1D1937E, $38D8C2C4, $4FDFF252,
  $D1BB67F1, $A6BC5767, $3FB506DD, $48B2364B,
  $D80D2BDA, $AF0A1B4C, $36034AF6, $41047A60,
  $DF60EFC3, $A867DF55, $316E8EEF, $4669BE79,
  $CB61B38C, $BC66831A, $256FD2A0, $5268E236,
  $CC0C7795, $BB0B4703, $220216B9, $5505262F,
  $C5BA3BBE, $B2BD0B28, $2BB45A92, $5CB36A04,
  $C2D7FFA7, $B5D0CF31, $2CD99E8B, $5BDEAE1D,

  $9B64C2B0, $EC63F226, $756AA39C, $026D930A,
  $9C0906A9, $EB0E363F, $72076785, $05005713,
  $95BF4A82, $E2B87A14, $7BB12BAE, $0CB61B38,
  $92D28E9B, $E5D5BE0D, $7CDCEFB7, $0BDBDF21,
  $86D3D2D4, $F1D4E242, $68DDB3F8, $1FDA836E,
  $81BE16CD, $F6B9265B, $6FB077E1, $18B74777,
  $88085AE6, $FF0F6A70, $66063BCA, $11010B5C,
  $8F659EFF, $F862AE69, $616BFFD3, $166CCF45,
  $A00AE278, $D70DD2EE, $4E048354, $3903B3C2,
  $A7672661, $D06016F7, $4969474D, $3E6E77DB,
  $AED16A4A, $D9D65ADC, $40DF0B66, $37D83BF0,
  $A9BCAE53, $DEBB9EC5, $47B2CF7F, $30B5FFE9,
  $BDBDF21C, $CABAC28A, $53B39330, $24B4A3A6,
  $BAD03605, $CDD70693, $54DE5729, $23D967BF,
  $B3667A2E, $C4614AB8, $5D681B02, $2A6F2B94,
  $B40BBE37, $C30C8EA1, $5A05DF1B, $2D02EF8D);

//CRC algorithm courtesy of Earl F. Glynn ...
//(http://www.efg2.com/Lab/Mathematics/CRC.htm)
function CalcCRC32(p: pchar; length: integer): dword;
var
  i: integer;
begin
  result := $FFFFFFFF;
  for i := 0 to length-1 do
  begin
    result := (result shr 8) xor table[ pbyte(p)^ xor (result and $000000ff) ];
    inc(p);
  end;
  result := not result;
end;

The CalcCRC32 function gives back erroneous results if the code is compiled in 64-bit program.

How could this function be changed to make it work in a 64-bit program in Delphi 10.1 Berlin?

The code has been taken from: TextDiff\BasicDemo2\HashUnit.pas on http://www.angusj.com/delphi/textdiff.html

I have used these two texts to test TextDiff:

Text 1:

CompanyName=Igor Pavlov
FileDescription=7-Zip Standalone Console
FileVersion=17.01 beta
InternalName=7za
LegalCopyright=Copyright (c) 1999-2017 Igor Pavlov
OriginalFilename=7za.exe
ProductName=7-Zip
ProductVersion=17.01 beta

Text2:

CompanyName=Igor Pavlov
FileDescription=7-Zip Standalone Console
FileVersion=4.61 beta
InternalName=7za
LegalCopyright=Copyright (c) 1999-2008 Igor Pavlov
OriginalFilename=7za.exe
ProductName=7-Zip
ProductVersion=4.61 beta

Here is how I changed the code according to the solution:

function CalcCRC32(p: PByte; length: NativeUInt): dword;
var
  i: integer;
begin
  result := $FFFFFFFF;
  for i := 0 to length-1 do
  begin
    result := (result shr 8) xor table[ pbyte(p)^ xor (result and $000000ff) ];
    inc(p);
  end;
  result := not result;
end;

function HashLine(const line: string; IgnoreCase, IgnoreBlanks: boolean): pointer;
var
  i, j, len: integer;
  s: String;
begin
  s := line;
  if IgnoreBlanks then
  begin
    i := 1;
    j := 1;
    len := length(line);
    while i <= len do
    begin
      if not (line[i] in [#9,#32]) then
      begin
        s[j] := line[i];
        inc(j);
      end;
      inc(i);
    end;
    setlength(s,j-1);
  end;
  if IgnoreCase then s := AnsiLowerCase(s);
  //return result as a pointer to save typecasting later...
  result := pointer(CalcCRC32(PByte(s), length(s)));
end;
12
  • 1
    It's not going to be a 32/64 bit issue. It's going to be an ANSI/Unicode issue Commented Jan 25, 2019 at 21:02
  • You don't show in code how you use your examples.
    – LU RD
    Commented Jan 25, 2019 at 21:10
  • Look in the edited question. Commented Jan 25, 2019 at 21:22
  • What is the definition of s? Commented Jan 25, 2019 at 21:26
  • 2
    @user1580348: If s := 'Hello' then length(s) = 4 but it takes 8 bytes and contains the following values: $48, $00, $65, $00, $6C $00, $6C, $00, $6F, $00. For each character 2 bytes. I tested your CalcCRC32(p: PByte; length: NativeUInt): Dword on Berlin 10.1 Update 2 and my Win32 and Win64 console apps return the same value. Commented Jan 25, 2019 at 21:56

3 Answers 3

3

In general, this code should work in 64bit, provided length does not exceed 2GB. That is not your issue.

The p parameter needs to be changed from PChar to PByte (or even just Pointer) since PChar is PWideChar in D2009+ but the code is expecting PChar to be PAnsiChar instead.

Also, you should probably change length from Integer to Native(U)Int so you can take better advantage of 64bit memory sizes greater than 2GB.

Now, with that said, if you want to get the CRC of a string, be aware that string is a UTF-16 encoded UnicodeString in D2009+, but CRC operates on bytes rather than characters. So, when computing the CRC of a string, you have to decide which byte encoding it should be converted to first. And when comparing the CRCs of multiple strings, make sure they are converted to the same byte encoding first.

4
  • The code still does not work. It has been taken from Angus Johnson's TextDiff\BasicDemo2: angusj.com/delphi/textdiff.html Commented Jan 25, 2019 at 20:52
  • 1
    @user1580348 the code works fine. You are just not taking Unicode into account correctly. The original code was written for a Delphi version that predated Unicode. Commented Jan 25, 2019 at 22:27
  • Thank you Remy, I understand this now. How could Angus Johnson's code be made Unicode-aware? Commented Jan 25, 2019 at 22:40
  • Don't pass text to the function. Pass bytes, binary data. Start with text, encode using your chosen encoding, e.g. UTF8, then pass those bytes to the CRC function Commented Jan 26, 2019 at 7:18
1

You may read the following to better understand how string works in Delphi.

Here you have the interface section of an unicode aware HashLine function; there is no reason to use Pointer as the result type.

uses System.Types;

function HashLine(const line: string; IgnoreCase, IgnoreBlanks: boolean): dword;

Here the implementation part.

uses
  System.SysUtils, System.StrUtils, System.Character, System.Classes;


const
  table:  ARRAY[0..255] OF DWORD =
 ($00000000, $77073096, $EE0E612C, $990951BA,
  ...
  $B40BBE37, $C30C8EA1, $5A05DF1B, $2D02EF8D);

function CalcCRC32(p: PByte; length: NativeUInt): DWORD;
var
  i: NativeUInt;
begin
  result := $FFFFFFFF;
  for i := 0 to length-1 do
  begin
    result := (result shr 8) xor table[ p^ xor (result and $000000ff) ];
    inc(p);
  end;
  result := not result;
end;

function HashLine(const line: string; IgnoreCase, IgnoreBlanks: boolean): DWORD;
var
  i, j: integer;
  s: string;
  b: TBytes;
begin
  if IgnoreBlanks then
  begin
    j := low(string);
    setlength(s, length(line));
    for i := low(line) to high(line) do
    begin
      // if not (line[i] in [#9,#32]) then
      if not line[i].IsWhiteSpace() then
      begin
        s[j] := line[i];
        inc(j);
      end;
    end;
    setlength(s,j-1);
  end else begin
    s := line;
  end;

  if IgnoreCase then
    s := s.ToLower();

  b := TEncoding.UTF8.GetBytes(s);

  result := CalcCRC32(@b[0], length(b));
end;

The call HashLine('HEllo, World!', false, false) results F47B1828 which is equal to the result here

3
  • This looks very interesting and promising. I will test it tomorrow after having put my body to rest. Commented Jan 26, 2019 at 0:05
  • Sorry for being late, I caught a heavy cold, still being affected. Understood your idea. But how would you integrate this into the whole TextDiff project? Since procedure TForm1.BuildHashList in Unit1.pas for example expects a Pointer parameter and now gets a DWORD. Commented Jan 27, 2019 at 19:12
  • @user1580348: That may be a new question, but after Angus Johnson last comment ...and much of the code was (and still is) a mess... at your question, you may take another approach at all. Commented Jan 28, 2019 at 7:56
0

Just change result := not result; to result := result xor $FFFFFFFF;.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.