3 minutes
Leetcode July: Hamming Distance
We have a great, if simple, problem to work on today. The problem presents us with two decimal integers and asks us to return the Hamming distance between them, which to put simply, is a count of comparative difference in binary representation of the two numbers. Hamming codes and Hamming distance are essential concepts in computer science, specifically in error detection encoding, and this problem is a great introduction to thinking about data in terms of binary encoding and manipulation. Here’s an example:
$x = 100$, $y = 200$
$x = [01100100]$
$y = [11001000]$
$d = [10101100]$
Hamming distance = 4.
Pretty clear, right? I have two goto algorithms for this problem. The first is a naive algorithm that doesn’t require much consideration of the inputs’ binary representation other than understanding its relation to factors of 2. Each iteration compares the rightmost bit of each integer, using the modulo operator, and then right shifting each number, by dividing by 2.
int hammingDistance(int x, int y) {
int ham = 0;
while(x || y) {
if(x%2 != y%2) ham++;
x /= 2;
y /= 2;
}
return ham;
}
That one’s easy to remember, but fails to take full advantage of the underlying binary representation of the inputs. The classic algorithm to take advantage of the binary represenation for this problem is Kernighan’s bit counting algorithm. This algorithm works by taking the xor–exclusive or–of the two inputs and then selectively counting the rightmost 1-bit of the resulting xor. This process of selection and elimination is done by a clever bit of bit manipulation wherein 1 is subtracted from the xor, turning all 0-bits to the right of the rightmost 1-bit to 1-bits, and then discarding those 1-bits by taking the AND product with the original xor. It’s a little easier to visualize:
$x = 100$, $y = 200$
$x = [01100100]$
$y = [11001000]$
$xyXor = [10101100]$
$xyXor - 1 = [10101011]$
$xyXor\; \& \;(xyXor - 1) = [10101000]$
$distance = 1$
$xyXor - 1 = [10100111]$
$xyXor\; \& \;(xyXor - 1) = [10100000]$
$distance = 2$
$xyXor - 1 = [10011111]$
$xyXor\; \& \;(xyXor - 1) = [10000000]$
$distance = 3$
$xyXor - 1 = [01111111]$
$xyXor\; \& \;(xyXor - 1) = [00000000]$
$distance = 4$
Hamming distance = 4.
And here’s the corresponding code:
int hammingDistance(int x, int y) {
int ham = 0;
int xyXor = x xor y;
while(xyXor) {
xyXor &= (xyXor - 1);
ham++;
}
return ham;
}
What a cool algorithm! While both of these algorithms have a constant asymptotic complexity, bound by the size of the input–here a 32-bit integer–not to be confused with the input value, the Kernighan algorithm will always have an as good or better average runtime because it only counts the 1-bits. That was a lot of fun. Check back tomorrow to see what interesting problem we get to work on!