How do I check if a file is *mostly* identical with another?

I need to use Powershell to check if two files are the same but with the following restriction: there are eight specific bytes in the first 2K that are allowed to be different (if you're interested, it's certain timestamp bytes in the superblock of an ext4 image).

The code I found on Stack Overflow (obviously) for doing full checks is as follows:

$md5 = New-Object -TypeName System.Security.Cryptography.MD5CryptoServiceProvider
$hash = [System.BitConverter]::ToString(
            $md5.ComputeHash([System.IO.File]::ReadAllBytes("fspec.bin")))

This gives me the hash of the entire file but what I really need is:

  • the first 2K of the file as a byte array so I can check specifics; and
  • the checksum of the remainder of the file to check equality.

The System.IO.File class has ReadAllBytes but does not appear to have the capacity to read a section of the file, nor seek to a specific place.

I have attempted to read in the byte array and use array slicing to get the parts as follows:

$restOfFile = [System.IO.File]::ReadAllBytes("fspec")
$firstTwoK = $restOfFile[0..2048]
$restOfFile = $restOfFile[2048..$restOfFile.Length]
# Then:
#    1. Check bytes in firstTwoK.
#    2. Check MD5 of all bytes in restOfFile.

Unfortunately, the fact that it's a 750M file is causing problems:

Array dimensions exceeded supported range.
At C:\testprog\testprog.ps1:42 char:1
+ ${devBytes} = ${devBytes}[2048..${devBytes}.Length]
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : OperationStopped: (:) [], OutOfMemoryException
    + FullyQualifiedErrorId : System.OutOfMemoryException

Is there a functional way to do what I need?

1 answer

  • answered 2020-01-14 04:00 Bender the Greatest

    Use one of the derived types of System.Security.Cryptography.HashAlgorithm and use its ComputeHash method to specify an offset. For checking file uniqueness, MD5 is still fine to use, though you can use a stronger algorithm if you choose as well:

    $fileBytes = [System.File.IO]::ReadAllBytes("C:\path\to\file.ext")
    $md5Cng = [System.Security.Cryptography.MD5Cng]::Create()
    $fileHashAfterOffset = $md5Cng.ComputeHash( $fileBytes, 2KB, $fileBytes.length - 2KB )
    

    The first argument of ComputeHash is the file as a Byte[]. The second argument is the offset (e.g. don't include the first x bytes when generating the hash), and the third argument is how many bytes you want to evaluate. In this case, we want the rest of the file, so we take the total number of bytes in the $fileBytes array and subtract the offset from it.

    Using 2KB is shorthand to get the number of bytes in 2 kilobytes.