I read this piece of code in a c++ book(Sams - C++ Footprint and Performance Optimization).
There is an article suggesting that string comparison can be made faster by treating strings as integer.
I am a little confused about the reason behind it? Is it that by doing so we will lessen the number of machine instructions that will take place for converting each character first to an integer and then to bits or something else?
Can anybody please elaborate?
inline int int_strcmp(char *s1, char *s2, int len)
{
if ((len & 0x03) != 0)
return(macro_strcmp(s1, s2));
for (int i =0; *(int*)&s1[i] == *(int*)&s2[i];)
{
i+=4;
if (i >= len)
return(true); }
return(false);
}
Article (for reference)
Another way to speed up string comparison is by treating the character arrays as integer arrays . On most systems an integer is four times larger than a character, and so four characters can be compared at the same time. When the strings have a length that is a multiple of the length of an integer, doing integer comparisons can greatly speed up your string comparison. Listing 10.1 shows what an integer string compare function can look like.
Listing 10.1 Integer String Comparison
inline int int_strcmp(char *s1, char *s2, int len)
{
if ((len & 0x03) != 0)
return(macro_strcmp(s1, s2));
for (int i =0; *(int*)&s1[i] == *(int*)&s2[i];)
{
i+=4;
if (i >= len)
return(true); // match
}
return(false);
}
The int_strcmp function quickly checks whether the given string length is a multiple of four. If this is not the case, the previously discussed macro_strcmp function is called. For strings with a correct length, characters are compared in groups of four by casting the character pointers to integer pointers and thus reading integers from the string. The longer the compared strings are—and the more they look alike—the more benefits this int_strcmp function reaps compared to the previous implementations given in this chapter. For our example of finding Small Table articles in the dbase.txt file, this integer implementation is again faster (up to 50%). This is reflected in the results of the 10Source01.cpp program.
The reason why the string lengths have to be a multiple of the integer size for this function to work is that any other size will cause this function to compare beyond the length of a string. A string with a length of six, for instance, will be compared in two loop iterations. The first iteration compares the first four bytes, which is no problem. The second iteration compares the second four bytes, of which only two are part of the actual string. The third byte will be a null indicating the end of the string. The fourth byte, however, is not part of the string. Two strings of six characters that are identical could be found to be different just because of the value of these fourth bytes. The int_compare function can of course be altered to check for different string sizes but it is not very likely that it will still be faster in most string compare cases.