An introduction to Checksums

Hi everyone

This is a quick introduction into checksums, and practically how to use them, at the request of someone through the Ask Johno section of the blog.

I know what a checksum is, but I’m not sure how to implement or use one (in PHP) to achieve what I need.

Anonymous Asker

They go on to explain what they are trying to achieve, which is essentially to verify a file hasn’t changed (in this case it’s the HTTP document sent by an API) before doing something.

Firstly, a checksum is a very small data snippet (datum) which represents something larger.

In my experience there are 2 main reasons to use one, these are;

  1. To verify a change in something larger
  2. To verify that a piece of data matches from that which was sent by its originator (checksum on an API payload)

I will very briefly cover both of these with PHP examples.

Note; I am using MD5 for simplicity, but depending on your requirements this likely won’t be the best option for your use case.

Edit: Please see Further Reading at the bottom of this article for more information about hashing

Using a Checksum to Detect Changes

Now I don’t know what we’ve got, but whatever it is; we are going to need a string representation, 2 main ways of getting this:

$stringRepresentation = serialize($myThingToCheck);

or

$stringRepresentation = json_encode($myThingToCheck);

Personally, I would advise PHP’s serialize because that will work with, and instantiate objects. However, the choice is yours depending on your use case, JSON is smaller than a PHP serialization.

Now that we have a string, we need to make it small and easy to check. Something like this will work fine;

$checksum = md5($stringRepresentation);

Now to check for changes, we just need the last checksum that we stored that something happened on.

if($checksum != $oldChecksum){
echo 'Something has changed';
}

So that covers how to use a checksum to detect changes in pieces of data, which is useful if you’re having to poll for changes.

Using a Checksum to Verify Validity

This is something I’ve noticed a couple of times, particularly when working in the financial industry, and around certain payment gateways.

It usually looks something like this; but it does change per API integration so be aware of that and follow their own documentation.

$secret = 'my_secret_api_key';

$payload = [
'foo' => 'Bar',
'another' => 'Thing',
'datetime' => '2018-12-25 00:00:00'
];

$jsonPayload = json_encode($payload);

$checksum = md5($jsonPayload . $secret);

$payload['checksum'] = $checksum;

// Do the rest of your stuff here, including sending the payload etc.

One advantage of this approach is that you never expose the API key in plain text.

If you were to want to verify on the API so that you’re the provider, rather than integration, you would simply do these steps in the opposite order, so it would look something like this

// Assuming you've done everything you need, and now have the $payload array/object back

$secret = 'the_secret_youre_expecting';

$checksumProvided = $payload['checksum'];
unset($payload['checksum'];
$checksum = md5(json_encode($payload) . $secret);

if($checksum != $checksumProvided){
// If the checksums don't match, in theory the secret key provided was incorrect
}

I think that about covers this topic, as a brief introduction to checksums, and how they’re often used in PHP.

Further Reading

PHP: hash() for more information about the best ways to hash data

I have deliberately not gotten into the discussion over hashing algorithms, as it really is out of the scope of this article, and indeed a whole book could be written on the topic alone.

Thanks to u/artemix-org and u/BradChesney79 on Reddit for suggesting this edit.