In this post, we’re going to discuss an important element in cybersecurity: cryptographic hash functions.
Before going in-depth, let’s do a real-world example. For example, when you log into your social/bank etc. the hash functions are what keep your password secure so no one else can access it. They ensure data hasn’t been altered, store passwords safely, and even help create digital signatures.
These functions are everywhere in computer security with different goals, such as:
- checking data integrity
- keeping passwords safe
- making sure communication stays secure.
We’ve also seen how to use it to identify malware and retrieve some information from VirusTotal here.
In this article, we’ll look at how to use cryptographic hash functions in Python and why they’re so important.
If you’re curious about hash functions in Python, you’re in the right place. We’ll cover what a hash function is, how to use it in Python, and why it’s such an essential tool for keeping your data safe.
What Are Cryptographic Hash Functions?
What is a Hash Function with an Example?
A hash function takes something in input and turns it into a fixed-size string of characters: the hash value. Imagine that you want to calculate the hash of “StackZero”, in that case, you have to pass that input to a hash function that will turn that into a unique code like “a592a…”.
A cryptographic hash function is a special type of hash function that turns input data into a fixed-size string of characters called a hash value, a unique data fingerprint.
Collision resistance is one of the main and essential features we want in secure cryptographic hash functions.
This property consists of an almost null probability of having the same hash value for two different inputs, it’s very important because if two different inputs could produce the same hash, attackers could trick the system by swapping in bad data for legitimate data. For example, in digital signatures, if two different documents have the same hash, an attacker could replace a valid document with a fake one, and the system wouldn’t be able to find the difference. You can think of it like fingerprints: just as no two people should have the same fingerprint, no two different pieces of data should end up with the same hash. This helps keep attackers from sneaking harmful data past the system.
Why it’s so important?
Cryptographic hash functions’ goal, as we introduced, is to ensure data integrity, store passwords safely, and create digital signatures. So, for example, if you hash a file and share its hash value, someone else can apply the same function to confirm the integrity. If the hashes are the same it’s unlikely being altered. This is really helpful to make sure sensitive information stays safe during transfers.
These hash functions are also the state of the art for keeping user data private and adding another layer of security to our applications. One of the most practical uses is due to their property to turn sensitive information into a random sequence that can’t easily be reversed back to the original, they can be used for storing passwords. The final result is that even if someone breaks into the database, it would be extremely hard to reverse back the original password.
What are the 3 Properties of a Cryptographic Hash Function?
- Deterministic: The same input will always create the same hash value. In this way, we can check if the data is corrupted or has been changed.
- Fast Computation: It should be quick to calculate a hash so systems can run smoothly. Due to the fact that sometimes we need to check high data volume, it would slow down too much the system.
- Irreversible: You shouldn’t be able to figure out the original input just from the hash value. This makes it much harder for someone to steal sensitive information.
- Collision Resistant: With this property we aim for the hash function’s output to be almost unique so that having the same output for two different inputs would be extremely rare. It makes hard the hacker’s life because it’s not so easy to create a malicious piece of data having the same hash as the original one and trick the system.
Have you ever heard about some cryptographic hash functions like MD5, SHA-1, and SHA-256? Even if they are still in use on many systems, MD5 and SHA-1 aren’t considered safe anymore because it’s not hard enough to create collisions.
What Is N Hash in Cryptography?
In cryptography, N hash means hashing data multiple times. This is often used for password protection, because it makes things slower and more expensive for hackers trying to guess passwords, making their attacks less effective.
Try to imagine the effects of a brute force attack where a malicious actor should try millions of passwords and calculate the hash many times on each trial.
Just as a practical view, we can use our beloved Python and hashlib library, which has a function called pbkdf2_hmac (Password-Based Key Derivation Function 2 with HMAC) that allows us to do exactly this operation. The more times you hash the value, the harder it is for attackers to guess the original, especially after the usage of salt.
Algorithms like Bcrypt and Scrypt use this method to make passwords harder to crack.
Example: How to Use PBKDF2 in Python for N Hash
The PBKDF2 function is commonly used to hash data multiple times to add an extra layer of security. Here is a code example that uses Python’s hashlib
library to create a secure hash with PBKDF2:
import hashlib
import os
# Example: Creating a PBKDF2 hash
input_string = "StackZero"
salt = os.urandom(16)
hash_object = hashlib.pbkdf2_hmac('sha256', input_string.encode(), salt, 100000)
hash_hex = hash_object.hex()
print(f"PBKDF2 Hash: {hash_hex}")
Here is the output when running on my computer.
In this example, the password is hashed 100,000 times, making it much harder for attackers to brute-force the original value.
Hash Functions vs. Cryptographic Hash Functions
The only goal of regular hash functions is to output a fixed-size value from input data, they can be good for a quick look-up like finding duplicates. We have an example with the Python’s built-in hash() function. The hash functions are not secure by design but they can easily reversed.
On the other hand, cryptographic hash functions are designed to be secure thanks to the properties we mentioned before.
You need also to remember that, in this kind of function, even the smallest change in the input causes a huge change in the hash value, which is called the “avalanche effect“, so it’s almost impossible for someone to figure out any relationships between the input and the output.
Cryptographic hash functions also protect against different types of attacks, like pre-image attacks (reversing the hash) and birthday attacks (finding two inputs with the same hash).
Why Are Hash Functions Used in Encryption?
What is SHA in Cryptography?
SHA is the acronym for Secure Hash Algorithm which indicates a family of cryptographic hash functions some popular algorithms that belong to this family are:
- SHA-1
- SHA-256
- SHA-512
Hash functions are great for making sure data hasn’t been changed. They work with other security tools to create safe communication, digital signatures, and password storage.
For example, when storing passwords, it’s much better to hash the password before saving it to a database, because in this way we have another level of protection in case of unauthorized access to the database by preventing the attacker from reading them in plaintext. Hash functions are also used to create encryption keys used to safely transmit data.
Another popular usage of them is into digital certificates, so that we can confirm that websites and software are real and safe to use because creating a unique fingerprint for the website or software, and making it clear if something has been tampered with.
Cryptographic Hash Function of Bitcoin
What is SHA-256 Cryptographic Hash Function?
SHA-256 is a well-known, specific cryptographic function belonging to the SHA-2 family. Its name derives from the output, which is a 256-bit has, and we can usually see it as a 64-char hexadecimal string.
SHA-256 is considered secure, and that is widely used in different fields with high-security requirements like blockchain, password storage and secure communication
For example, Bitcoin brilliantly uses this cryptographic hash function the so-called miners solve really hard math problems by hashing data and adding new transactions to the blockchain. Each block contains the hash of the previous one, which links all of them together it assures data consistency because if someone wanted to change any part of the blockchain, they’d need to change every block that came after it, which will need enormous power.
Mining involves finding a hash that meets certain rules, called proof of work, it means that the miners must solve a very hard problem which takes a lot of computational power and gives us the awareness that nobody can change the blockchain without redoing the whole work.
This creates another desirable property: transparency. Everyone can use SHA-256 to hash a block and compare it with the expected value to verify its integrity.
Using Hash Functions in Python
Example: Creating a Hash with SHA-256
Python’s hashlib
library makes it simple to create cryptographic hashes. Here’s how to create a SHA-256 hash:
import hashlib
input_string = "Hello, world!"
hash_object = hashlib.sha256(input_string.encode())
hash_hex = hash_object.hexdigest()
print(f"SHA-256 Hash: {hash_hex}")
In the above example, the string “StackZero” is hashed using SHA-256 to create a unique hash value. And this is the output when we run it:
SHA-256 Hash: 8a02bb5b0a6bf4bfdcb17ab8c6403a85e5d4591a97edf3b7cc49c684f01b4dd8
Even a small change in the input will result in a completely different hash, illustrating the avalanche effect and we can test it by just adding, for example, an exclamation mark to our string that will become “StackZero!”. Now the output will be:
SHA-256 Hash: e361c50cfcd075726095f375520a9b76021ff4ebc06153780ba37d078528be5a
By looking at those inputs you can easily understand what is the avalanche effect.
Common Hash Functions in Python
Now that we have a bit more clear idea of what are cryptographic hash functions we can explore the world of Python and see how we can use hashlib to hash our data.
Examples of Common Hash Functions
Here are some examples of common hash functions supported by Python’s hashlib
library:
- MD5: This hash function is no longer secure but can still be used for non-critical purposes like checksums.
hash_md5 = hashlib.md5(input_string.encode()).hexdigest()
print(f"MD5 Hash: {hash_md5}")
- SHA-1: More secure than MD5 but still has known vulnerabilities and should not be used for sensitive applications.
hash_sha1 = hashlib.sha1(input_string.encode()).hexdigest()
print(f"SHA-1 Hash: {hash_sha1}")
- SHA-256: Strong and secure, recommended for critical applications such as password storage.
hash_sha256 = hashlib.sha256(input_string.encode()).hexdigest()
print(f"SHA-256 Hash: {hash_sha256}")
- SHA-512: Even stronger than SHA-256, providing additional security.
hash_sha512 = hashlib.sha512(input_string.encode()).hexdigest()
print(f"SHA-512 Hash: {hash_sha512}")
These examples show how to generate different hash values for the same input using various algorithms.
If you need even more security, you can use SHA-512, which creates a longer hash. Depending on how much security you need, you can pick one of these hash functions.
Practical Use Cases
Cryptographic hash functions find usage in many real-world applications, they are used in a very large number of industries to maintain the integrity and authenticity of data. Here are some examples:
Password Storage: Passwords should be hashed before they are saved. Adding a random salt makes the hashes even harder to crack, when a user creates an account, his password is hashed and stored and during the log-in phase, the password is hashed again and compared to the stored hash.
Data Integrity: Hashing helps make sure files haven’t been tampered with. This guarantees the integrity of sensitive information and helps maintain trust in data accuracy. When you download software, you can hash the file and compare it to the hash from the source to make sure it hasn’t been changed by a hacker.
Digital Signatures: Digital signatures help verify that a document or message is authentic and that has not been modified. We can get another level of security by joining it with asymmetric encryption in order to guarantee also authenticity, so the hash of the message is signed with a private key, and the recipient can use the sender’s public key.
Blockchain Technology: In blockchains, hashes link blocks together, making it almost impossible to change past transactions because you would have to change all the others too. Due to that, the blockchain is tamper-proof, providing transparency and trust within the network.
Checksums for File Transfer: Hash functions are used to make checksums that verify that files are transferred without errors or tampering, to do that you only need to hash the file before and after the transfer and compare the hashes to ensure they match.
Salting Hashes for Added Security
Example: How to Use Salt in Hashing for Extra Security
Adding a salt to hashes makes them more secure, especially for passwords. Here is a Python example that uses a salt to create a more secure hash:
import os
import hashlib
# Generate a random salt
salt = os.urandom(16)
input_string = "my_secure_password"
# Hash the input string with the salt using PBKDF2
hash_object = hashlib.pbkdf2_hmac('sha256', input_string.encode(), salt, 100000)
hash_hex = hash_object.hex()
print(f"Salted Hash: {hash_hex}")
In this example, the salt is generated using os.urandom()
, and then the password is hashed along with the salt using PBKDF2 with 100,000 iterations. This process makes the hash significantly harder to crack.
The salt can be saved in clear and it’s used to prevent the pre-computation of the hashes, and also weak passwords won’t be found in the rainbow tables due to their concatenation with salt which makes their hash completely different (avalanche effect). But we can discuss the topic in more depth in the next paragraph.
Why Can’t a Hash be Reversed?
Hashes are designed to be one-way, so you cannot reverse them. It happens because the hashing process generally involves complex functions that will transform the input into a fixed-size string, with the consequence of losing information.
Consider a simple example of a hash function: taking the modulo of a number. If we apply modulo 7 to a given input, say 12, the result is 5.
Now, if we input 19, the result is also 5.
However, once we have just the output 5, it’s impossible to determine whether the original input was 12, 19, or any other number that would give the same remainder when divided by 7.
Conclusion
We have explored the importance of cryptographic hash functions for keeping data safe, making sure it hasn’t been changed, and protecting passwords then we checked, with some examples, Python’s hashlib.
we understood the reasons why we must use strong hash functions like SHA-256, and remember to add salt when storing passwords.
And finally,
Now that you’ve learned the basics, I suggest you try using hash functions in your projects as we did in our examples, but before, experiment with different inputs and salt in our previous examples’ code and see how the hash changes. After that, you will be ready to start implementing this wonderful security mechanism in your next applications. Happy coding and keep following StackZero!