This is the mail archive of the
mailing list for the GDB project.
Re: (patch) hpjyg09: bcache optimizations
>>>>> "Jim" == Jim Blandy <firstname.lastname@example.org> writes:
Jim> But this situation still sounds weird. The bcache is a very
Jim> simple thing --- bcache.h and bcache.c total 289 lines of code.
Jim> You give it a string of bytes, and it either adds a copy of your
Jim> string to its hash table, or finds an identical copy already
Jim> present. Either way, it hands you back a pointer to its stashed
Jim> copy. So, a bcache is helpful if you expect a lot of duplicates.
This brings back memories.
Many years ago, I discovered CVS had terrible hash behavior. Date and
Revision strings were used as keys, and each character (of which there
were only digits, '.', and '/') was weighted equally. I can't recall
exactly, but I think the original hash function resulted in ~10% bucket
utilization --- the remaining ~90% would not be used by any valid keys.
And even among the remaning ~10%, the distribution was non-uniform.
I changed it to one of the functions compared in the Red Dragon book's
discussion of hash functions and their suitability for use for hashing
identifiers (which tend to have a lot of duplication/redundency). This
change resulted in a uniform distribution across all buckets and a
cooresponding performance improvement.
For the longest time, I used to keep a histogram plot of the original
and new hash functions in a folder, and would show them to anyone who
would listen to the tale: switching from a nieve to a complex data
structure will not automatically result in a performance improvement.
So I agree with Jim, fixing bcache to use a better hash table is a
much better approach than microoptimizing the existing hash function
by making it an inline function or a macro. Many thanks for spending
the time and tracking down the root cause.
For reference, here is hash function I used:
hashpjw(const char* x)
unsigned int h = 0;
unsigned int g;
while (*x != 0)
h = (h << 4) + *x++;
if ((g = h & 0xf0000000) != 0)
h = (h ^ (g >> 24)) ^ g;
It will have to be modified slightly for use in bcache, but it may
result in acceptable behavior in this case as well.