Making a secure data hash

string sha1 ( string source [, bool raw_output])

string password_hash ( string password, int algorithm [, array options])

bool password_verify ( string password, string hash)

SHA stands for the "Secure Hash Algorithm", and it is a way of converting a string of any size into a 40-bit hexadecimal number that can be used for verification. If you did not know what hashes are, they are like unidirectional (one-way) encryption designed to check the accuracy of input. By unidirectional I mean that you cannot run $hash = sha1($somestring), then somehow decrypt $hash to get $somestring - it is just not possible, because a hash does not contain its original text. What, then, are hashes good for?

Well, imagine you have users enter a password. How do you check the password is correct?

    if ($password == "Frosties") {
        // ........

While that solution certainly works, it means that whoever reads your source code gets your password. Similarly if you store all your users' passwords in your database and someone cracks it, you are going to look pretty dumb. If you hash the passwords of people on your database, or in your files, then malicious users will not be able to retrieve the original password. It's not ideal – see the "Password hashing" section below - but it is quite common.

A downside of hashing passwords is that authorised users will not be able to get at the passwords either - whether or not that is a good thing varies from case to case, but usually having hashed passwords is worthwhile, and people who forget their password must simply reset it to a new password as opposed to retrieving it.

Hashing is most commonly used to check whether files have downloaded properly - if your hash is equal to the correct hash value, then you have downloaded the file without problem.

The process of data hashing involves taking a value and converting it into a semi-meaningless string of letters and numbers of a fixed length. There is no way - no way whatsoever - to "decrypt" a hash to obtain the original value. The only way to hack a hash is to try all possible combinations of input, which, given that the input for the hash can be as long as you want, can take an awfully long time.

Consider this script:

    print sha1("hello") . "\n";
    print sha1("Hello") . "\n";
    print sha1("hello") . "\n";
    print sha1("This is a very, very, very, very, very, very, very long test");

Here is the output I get:


There are three key things to notice there: firstly, all the output is exactly 40 characters in length, and always will be. Secondly, the difference between the hash of "hello" and the hash of "Hello" is gigantic despite the only difference being a small caps change. Finally, notice that there is no way to distinguish between long strings and short strings - because the hash is not reversible (that is, you cannot extract the original input from the hash), you can create a hash of strings of millions of characters in just 40 bytes.

If you had stored your users' passwords hashed in your database, then you need to hash the password they provide before you compare it against the value in your database. One thing that is key to remember is that sha1() will always give the same output for a given input.

Author's Note: If you set the optional second parameter to true, the SHA1 hash is returned in raw binary format and will have a length of 20.

Hashing passwords

Hashing data using sha1 is a great way to generate non-critical hashes, and for a long time it was also the most popular way to hash passwords. But as cybercrime increases in complexity, plain old sha1() hasn't really kept up with the time, so as of PHP 5.5 there's a smarter way: password_hash().

This new function has a few advantages over sha1(). First, it generates a different hash for the same string if you run it again and again, which means the hash for the password "Frosties" will be different every time. This might sound like it breaks the very point of hashes, but password_hash() is being clever: it uses a different random seed each time it runs, then places that random seed inside the hash it generates. This allows outputs to be different so that rainbow tables (huge lookups of precalculated hashes) can't be used to crack passwords, while also ensuring the hash can be verified.

A second advantage is that password_hash() takes a second parameter that lets you specify the algorithm, but you can specify "PASSWORD_DEFAULT" to have it automatically use the recommended algorithm. This is an advantage because the algorithm can change over time to be stronger and stronger as needed, without you needing to change your code. And don't worry about backwards compatibility: password_hash() also saves the algorithm name into its hashes, so it can verify hashes even if the algorithm is changed.

Author's Note: Because the hashing algorithm can change in the future, you should ensure you allocate enough space in your database, and not allocate just enough to return today's hashes. The PHP reference guide suggests being able to hold 255 characters.

Let's take a look at a basic password hashing example:

    echo password_hash("frosties", PASSWORD_DEFAULT), "\n";
    echo password_hash("frosties", PASSWORD_DEFAULT), "\n";
    echo password_hash("frosties", PASSWORD_DEFAULT), "\n";

As you can see, that code hashes the same password three times. Running that code, here's the output I get:

$2y$10$qR5hC3OpCiz/fPeP4/04O.lJ0tesCNoL6ieqD9v6bnWbWkv4FcqVe $2y$10$OefiTIclfHSE4TcdAlM/1.h7ckUxWjDlg3w8xFVM/nk53cL3jAEkO $2y$10$fj3g2tKkGo6BSKXXEVNtxOwMD8DrXMsS0mINoUP.eX4cA34vHKN/K

As you can see, each output is different, which means you can't verify a user's password just by doing a basic string compare like you could with sha1(). Instead, you need to use a different function called password_verify(), which takes a user's plain-text password as its first parameter and a hash to compare as its second value. It then hashes the plain-text password using the same random seed as was used in the hash, and returns true if they match. Here's an example:

    $hash = password_hash("frosties", PASSWORD_DEFAULT);
    if (password_verify("frosties", $hash)) {
        echo "Password match!\n";

It's worth noting that both password_hash() and password_verify() are significantly slower than sha1(). When you're working with passwords, this extra cost is insignificant compared to the increase in your system security, but for other data that is not security-sensitive sha1() is a better choice.


Want to learn PHP 7?

Hacking with PHP has been fully updated for PHP 7, and is now available as a downloadable PDF. Get over 1200 pages of hands-on PHP learning today!

If this was helpful, please take a moment to tell others about Hacking with PHP by tweeting about it!

Next chapter: Alternative data hashing >>

Previous chapter: Changing string case

Jump to:


Home: Table of Contents

Copyright ©2015 Paul Hudson. Follow me: @twostraws.