Understanding Different Types of Hash Functions in Python

1. Division Method

The division method involves taking the key, dividing it by a prime number, and using the remainder as the hash value. The prime number reduces collisions.

Formula:
h(k) = k mod m
Where k is the key and m is a prime number.

Python Example:

def division_hash(key, m):
    return key % m

keys = [123, 456, 789, 1011, 1213]
m = 7  # A prime number
hash_values = [division_hash(key, m) for key in keys]
print(hash_values)

2. Multiplication Method

The multiplication method involves multiplying the key by a constant A (0 < A < 1), extracting the fractional part, and then multiplying by m (size of the hash table) to get the hash value.

Formula:
h(k) = ⌊ m(kA mod 1 ⌋
Where A is a constant (often chosen as the inverse of the golden ratio ≈ 0.6180339887.

Python Example:

def multiplication_hash(key, m, A=0.6180339887):
    return int(m * ((key * A) % 1))

keys = [123, 456, 789, 1011, 1213]
m = 10  # Size of the hash table
hash_values = [multiplication_hash(key, m) for key in keys]
print(hash_values)

3. Mid-Square Method

The mid-square method involves squaring the key, extracting a portion from the middle of the resulting number, and using that as the hash value.

Steps:

Square the key.
Extract the middle digits.

Python Example:

def mid_square_hash(key, m):
    square = key * key
    square_str = str(square)
    mid_start = len(square_str) // 2 - len(str(m)) // 2
    mid_end = mid_start + len(str(m))
    mid_square = int(square_str[mid_start:mid_end])
    return mid_square % m

keys = [123, 456, 789, 1011, 1213]
m = 100  # Size of the hash table
hash_values = [mid_square_hash(key, m) for key in keys]
print(hash_values)

4. Folding Method

The folding method involves dividing the key into equal-sized parts, summing these parts, and using the sum modulo the size of the hash table as the hash value.

Steps:

Divide the key into parts.
Sum the parts.
Take the sum modulo the size of the hash table.

Python Example:

def folding_hash(key, m, part_size=2):
    key_str = str(key)
    parts = [int(key_str[i:i+part_size]) for i in range(0, len(key_str), part_size)]
    return sum(parts) % m

keys = [123456, 789101, 112131, 415161, 718192]
m = 100  # Size of the hash table
hash_values = [folding_hash(key, m) for key in keys]
print(hash_values)

5. Cryptographic Hash Functions

Cryptographic hash functions are special types of hash functions that are designed to be secure. They take an input (or "message") and return a fixed-size string of bytes. The output is typically a "digest" that represents the data uniquely. Some common cryptographic hash functions include MD5, SHA-1, and SHA-256.

Key Properties of Cryptographic Hash Functions

Deterministic: The same input will always produce the same output.
Quick computation: The hash value is computed quickly.
Preimage resistance: It should be infeasible to generate the original input given the hash output.
Small changes in Input Produce large changes in output: Even a tiny change in the input should produce a completely different hash.
Collision resistance: It should be infeasible to find two different inputs that produce the same hash output.

We'll look at some Python examples using the hashlib library, which provides cryptographic hash functions.

Python Examples Using `hashlib`

MD5

MD5 produces a 128-bit hash value. It is widely used but is considered broken and unsuitable for further use due to vulnerabilities.

import hashlib

def md5_hash(data):
    hash_object = hashlib.md5(data.encode())
    return hash_object.hexdigest()

data = "Hello, World!"
md5_digest = md5_hash(data)
print(f"MD5: {md5_digest}")

SHA-1

SHA-1 produces a 160-bit hash value. It is more secure than MD5 but has known vulnerabilities and is not recommended for security-critical applications.

import hashlib

def sha1_hash(data):
    hash_object = hashlib.sha1(data.encode())
    return hash_object.hexdigest()

data = "Hello, World!"
sha1_digest = sha1_hash(data)
print(f"SHA-1: {sha1_digest}")

SHA-256

SHA-256 is part of the SHA-2 family and produces a 256-bit hash value. It is currently considered secure.

import hashlib

def sha256_hash(data):
    hash_object = hashlib.sha256(data.encode())
    return hash_object.hexdigest()

data = "Hello, World!"
sha256_digest = sha256_hash(data)
print(f"SHA-256: {sha256_digest}")

Example of Comparing Hashes

Let's look at how cryptographic hash functions work by computing and comparing the hashes of different strings.

import hashlib

def hash_data(data, algorithm='sha256'):
    if algorithm == 'md5':
        hash_object = hashlib.md5(data.encode())
    elif algorithm == 'sha1':
        hash_object = hashlib.sha1(data.encode())
    elif algorithm == 'sha256':
        hash_object = hashlib.sha256(data.encode())
    else:
        raise ValueError("Unsupported algorithm")
    return hash_object.hexdigest()

# Example usage
data1 = "Hello, World!"
data2 = "Hello, world!"  # Only 'w' is lowercase

algorithms = ['md5', 'sha1', 'sha256']
for algo in algorithms:
    hash1 = hash_data(data1, algo)
    hash2 = hash_data(data2, algo)
    print(f"{algo.upper()} of '{data1}': {hash1}")
    print(f"{algo.upper()} of '{data2}': {hash2}")
    print(f"Are the hashes equal? {'Yes' if hash1 == hash2 else 'No'}\n")

Example of Comparing Hashes

MD5 of 'Hello, World!': ed076287532e86365e841e92bfc50d8c
MD5 of 'Hello, world!': fc3ff98e8c6a0d3087d515c0473f8677
Are the hashes equal? No

SHA-1 of 'Hello, World!': 2ef7bde608ce5404e97d5f042f95f89f1c232871
SHA-1 of 'Hello, world!': 7b502c3a1f48c8609ae212cdfb639dee39673f5e
Are the hashes equal? No

SHA-256 of 'Hello, World!': c0535e4be2b79ffd93291305436bf889314e4a3faec05ecffcbb7df31e82e06b
SHA-256 of 'Hello, world!': 64ec88ca00b268e5ba1a35678a1b5316d212f4f36631e9b76e961ac73d52b620
Are the hashes equal? No

Even a small change in the input text ("Hello, World!" vs. "Hello, world!") results in a different hash value, showing how sensitive cryptographic hash functions are to input changes.

6. Universal hashing

Universal hashing is a method for creating hash functions that reduce the change of collisions. It works by randomly selecting a hash function from a set of functions. This makes the functions harder to predict, which improves security and performance in many applications.

Properties of Universal Hashing

Random selection: A hash function is chosen randomly from a family of hash functions.
Low chance of a collision: The chance of two different keys colliding (i.e., hash to the same value) is low.
Hash Function Family: The family must be large and diverse to ensure good distribution and low collisions.

Example: Carter-Wegman Universal Hashing

One of the simplest forms of universal hashing was introduced by Carter and Wegman. It uses a prime number and two randomly chosen constants.

Formula:
h_a,b(x) = ((ax + b) mod p mod m)

Where:

p is a prime number larger than the maximum possible key value.
a and b are randomly chosen integers such that 1 ≤ a < p and 0 ≤ b < p.
m is the size of the hash table.

Python Implementation

Let's implement a universal hashing function and use it to hash some keys.

import random

class UniversalHash:
    def __init__(self, p, m):
        self.p = p  # A prime number
        self.m = m  # Size of the hash table
        self.a = random.randint(1, p - 1)
        self.b = random.randint(0, p - 1)

    def hash(self, key):
        return ((self.a * key + self.b) % self.p) % self.m

keys = [123, 456, 789, 1011, 1213]
p = 104729  # A large prime number
m = 10  # Size of the hash table

universal_hash = UniversalHash(p, m)
hash_values = [universal_hash.hash(key) for key in keys]
print("Keys:", keys)
print("Hash values:", hash_values)

Example Output

Keys: [123, 456, 789, 1011, 1213]
Hash values: [2, 7, 4, 6, 9]  # Note: The values will vary because of the random a and b

Explanation

Initialization: The UniversalHash class is initialized with a prime number p and the hash table size m. It randomly chooses a and b.
Hash Function: The hash method computes the hash value using the formula: h_a,b(x) = ((ax + b) mod p) mod m.

Advantages of Universal Hashing

Reduced collisions: By randomly choosing a and b, the probability of collisions is minimized.
Security: It is hard for an adversary to predict the hash function, making it more secure against certain types of attacks.
Flexibility: Can be tailored to different applications by choosing appropriate values for p, a, and b.

Applications of Universal Hashing

Hash tables: Provides a more uniform distribution of keys.
Cryptography: Used in schemes where the unpredictability of the hash function is critical.
Randomized algorithms: Provides performance guarantees by reducing worst-case behavior.

Universal hashing is a powerful technique that uses randomness to achieve better performance and security, making it widely used in computer science and cryptography.

Salt

In the context of cryptography, a "salt" is a random value added to the input of a hash function to ensure that the output (hash value) is unique, even for identical inputs. Salting is primarily used to enhance security, especially when dealing with passwords, by preventing attacks such as rainbow table attacks.

Why Use Salt?

Unique hashes: Adding a unique salt to each input ensures that identical inputs produce different hash outputs.
Rainbow table protection: Rainbow tables are pre-computed tables for reversing cryptographic hash functions. Salts make these tables impractical by requiring a unique table for each possible salt value.
Mitigation of precomputed attacks: Without a salt, attackers can use precomputed hash values to quickly crack passwords. Salting forces attackers to compute the hash individually for each password attempt.

How Salt is Used?

When hashing a password, the salt is generated and then concatenated with the password before hashing. The salt must be stored alongside the hash so it can be used during the verification process.

Example in Python

Here is an example of how to use salt in password hashing using the hashlib library:

import hashlib
import os

def hash_password(password, salt=None):
    # Generate a new salt if not provided
    if salt is None:
        salt = os.urandom(16)  # 16 bytes of random data
    
    # Concatenate the salt and the password
    salted_password = salt + password.encode()
    
    # Hash the concatenated salt and password
    hash_object = hashlib.sha256(salted_password)
    hash_value = hash_object.hexdigest()
    
    return salt, hash_value

def verify_password(stored_hash, stored_salt, password_to_check):
    # Rehash the provided password with the stored salt
    _, hash_value = hash_password(password_to_check, stored_salt)
    
    # Compare the newly computed hash with the stored hash
    return stored_hash == hash_value

password = "my_secure_password"
stored_salt, stored_hash = hash_password(password)

print("Salt:", stored_salt.hex())
print("Hash:", stored_hash)

# Verify the password
password_check = "my_secure_password"
is_valid = verify_password(stored_hash, stored_salt, password_check)
print("Is the password valid?", is_valid)

Explanation

Salt generation: The os.urandom function generates a random salt.
Concatenation: The salt is concatenated with the password before hashing.
Hashing: The concatenated string is hashed using SHA-256.
Verification: During verification, the stored salt is reused to hash the provided password and the resulting hash is compared to the stored hash.

Benefits of Using Salt

Prevents identical hashes: Even if two users have the same password, their hashes will be different due to different salts.
Improves security: Makes pre-computed attacks impossible.
Password hashing best practices: Salted hashes are a fundamental aspect of secure password storage.

Using salt is an essential practice in cryptography to ensure the security and uniqueness of hash values, especially for sensitive data like passwords.

14 Jul 2024 NimoDB Data Structures & Algorithms

Hash functions Python hash functions Cryptographic hash functions Division method hashing Multiplication method hashing Mid-square hashing Folding method hashing Universal hashing Perfect hashing Salt in cryptography Password security Hashing algorithms SHA-256 hashing MD5 hashing Two-level hashing Cryptography techniques Random hashing techniques Secure hashing methods

NS

Nima Saadati

Python Back-End Developer

Recent Project

Posts