Click here to Skip to main content
15,867,308 members
Articles / Programming Languages / ASM
Article

Optimizing String-Manipulation Functions

Rate me:
Please Sign up or sign in to vote.
3.54/5 (8 votes)
26 Dec 20045 min read 119.7K   404   15   33
Optimizing string-manipulation functions.

Abstract

VC++, GCC and other compilers for x86 use inefficient implementation of strncmp(). If this function call is part of an inner loop, your application may slow down dramatically. A set of preprocessor macros replaces this lame function, thus making your code smaller and faster.

What's the Problem?

Text search applications and interpreters typically have strncmp() in their inner loops. This sample code interprets "PRINT" and "GOTO" commands (of course, it's just a primitive example):

char buf[] = "PRINT xyz";
if(0 == strncmp(buf, "PRINT", 5)) {
    DoPrint(buf+5);
}
else if(0 == strncmp(buf, "GOTO", 4)) {
    DoGoto(buf+4);
}

Let's see how VC++ 6.0 implements this function (use your favorite disassembler to view the code):

ASM
strncmp proc ; int strncmp(const char *s1, const char *s2, size_t len)
    ; function prolog
    push ebp
    mov ebp, esp
    push edi
    push esi
    push ebx
    ; if (0 == len) goto exit
    mov ecx, dword ptr [ebp+10]
    jcxz exit

    ; search for terminating zero in s1
    mov ebx, ecx
    mov edi, dword ptr [ebp+08]
    mov esi, edi
    xor eax, eax
    repnz scasb
    ;ecx = the smaller of two numbers: len and strlen(s1)
    neg ecx
    add ecx, ebx

    ;compare ecx bytes from s1 and s2
    mov edi, esi
    mov esi, dword ptr [ebp+0C]
    repz cmpsb
    ;compare the first different character in both strings
    mov al, byte ptr [esi-01]
    xor ecx, ecx
    cmp al, byte ptr [edi-01]

    ja smaller_s1 ; s1 is less than s2, should return -1
    je exit ; s1 == s2, should return zero
    ; ecx = -2
    dec ecx
    dec ecx
smaller_s1:
    ; if s1 was less than s2, return ~0 = -1
    ; if s1 was greater than s2, return ~(-2) = 1
    not ecx

exit:
    ; epilog: return the value
    mov eax, ecx
    pop ebx
    pop esi
    pop edi
    leave
    ret
end proc


; Using strncmp
; if(0 == strncmp(buf, "PRINT", 5)) {
    push 5
    lea    ecx, DWORD PTR [esp+4] ; buf
    push OFFSET PRINT
    push ecx
    call _strncmp
    add    esp, 12
    test eax, eax
    jne    SHORT skip
; DoPrint(buf+5)
    ;some printing stuff here
skip:

Umm, tons of machine code from a simple line of your C program... Is there any easier way to compare strings? Sure, there is:

ASM
     cmp DWORD PTR [buf], 'NIRP'
     jnz skip
     cmp BYTE PTR [buf+4], 'T'
     jnz skip
; DoPrint
     ;printing
skip:

These four instructions do the same comparison as the long code above, but they work much faster and take up less space. We start by comparing the first four characters of the string with 'PRIN'. 32-bit processors such as Pentium or Athlon can read and compare 4 bytes very quickly. Just put one cmp instruction to do this.

But why 'NIRP' instead of 'PRIN'? Why are the character reversed? Remember, x86 is a little-endian processor: it reverses byte order when reading DWORDs from memory. Thus we need to reverse our string too. The processor will read reversed DWORD, compare it with reversed 'PRIN', and say if they are equal. If so, it will then compare fifth character with 'T'. If all conditions are met, DoPrint will be executed. In other cases, the processor will jump to the skip label.

You don't need to use assembler for this trick. The same can be done with C/C++:

if(('P' | ('R' << 8) | ('I' << 16) | ('N' << 24)) == *(long*)buf &&
    'T' == buf[4]) {
    DoPrint(buf+5);
}

Exactly the same nice and fast machine code will be generated. But such tricky code is hard to read and maintain. Let's make our work easier and write some macros.

Macros and Their Usage

#ifdef UNICODE
#define cmp2ch(s,a,b) (*(long*)(s) == \
    ((unsigned short)(a) | (unsigned short)(b) << 16))
#define cmp4ch(s,a,b,c,d) (*(long*)(s) == \
    ((unsigned short)(a) | (unsigned short)(b) << 16) && \
    *(long*)(s+2) == \
    ((unsigned short)(c) | (unsigned short)(d) << 16))
#define set2ch(s,a,b) (*(long*)(s) = \
    ((unsigned short)(a) | (unsigned short)(b) << 16));
#define set4ch(s,a,b,c,d) (*(long*)(s) = \
    ((unsigned short)(a) | (unsigned short)(b) << 16), \
    *(long*)(s+2) = \
    ((unsigned short)(c) | (unsigned short)(d) << 16));
#else
#define cmp2ch(s,a,b) (*(short*)(s) == \
    ((unsigned char)(a) | (unsigned char)(b) << 8))
#define cmp4ch(s,a,b,c,d) (*(long*)(s) == \
     ((unsigned char)(a) | (unsigned char)(b) << 8 |   \
     (unsigned char)(c) << 16 | (unsigned char)(d) << 24))
#define set2ch(s,a,b) (*(short*)(s) = \
    ((unsigned char)(a) | (unsigned char)(b) << 8));
#define set4ch(s,a,b,c,d) (*(long*)(s) = \
     ((unsigned char)(a) | (unsigned char)(b) << 8 |   \
     (unsigned char)(c) << 16 | (unsigned char)(d) << 24));
#endif

cmp2ch and cmp4ch macros compare 2 and 4 characters of a string. The following example shows how to use them:

if(cmp4ch(buf, 'P','R','I','N') && buf[4] == 'T')
    DoPrint(buf+5);

Another example:

if(cmp4ch(buf, 'Y','E','S','\0'))
    DoYes();

If the string is equal to "YES", DoYes() function is invoked.

Although macros seem to be quite complicated, they are compiled to very simple machine code. Note that all shifts and ORs are evaluated during compilation, so the resulting code consists of just a fast comparison (CMP and JNZ instructions).

Finally, let's talk about set2ch and set4ch. As their names say, these macros do fast assignment of 2 or 4 characters:

C++
set4ch(buf, 'P','R','I','N'); // Equivalent of strcpy(buf, "PRINT"),
set2ch(buf+4, 'T','\0');      // but much shorter and faster

The same technique can be applied to other functions, e.g., to replace strcat():

TCHAR filename[MAX_PATH];
// Get path from control
int len = SendMessage(hwnd, WM_GETTEXT, 0, (LPARAM)filename);
if(*(p + len) != '\\')
   *(p + len) = '\\', len++;
// Append file name
set4ch(p + len, 'f', 'i', 'l', 'e');
set4ch(p + len + 4, '.', 't', 'x', 't');
*(p + len + 8) = '\0';
// Open the file
HANDLE hFile = CreateFile(filename, GENERIC_READ, FILE_SHARE_READ, ...);

Portability and Localization

These macros were tested on various platforms and compilers including MS VC++ 6.0, LCC 3.8, MinGW and GCC 3.2 under Linux. The tests compiled and run successfully, producing perfect and fast code. I have been using cmpXch macros in my freeware applications for one year and had no problems with them.

When compiled with #define UNICODE, the macros perform comparison of wide characters. So there should be no difficulties in maintaining Unicode and ANSI builds of your application. Macros also can be easily ported to Win64. (If anybody can test them on 64-bit processor, please post a message below.)

Localization seems to be much harder than porting. MS VC++ can properly compile this line with Russian characters in Unicode:

set4ch(str, TEXT('Ш'), TEXT('и'), TEXT('л'), TEXT('о'));

But other compilers generate a wrong code! LCC 3.8 can't properly convert a literal ASCII character to Unicode (not to mention it was unable to optimize shifts). MinGW also failed to select right Russian character when UNICODE was defined.

However, macros work fine with English language (in both Unicode and ASCII modes) and with any language using 8-bit ASCII. If you are building BASIC interpreter or parsing HTML tags, this should be enough for you. When you need Unicode support for non-English characters, you have to use MS VC++ compiler.

Case-insensitive search

How to perform case-insensitive search using this technique? One of the possible solutions is to compare all possible combinations of 2 first characters (e.g., PR, pr, Pr, pR for "PRINT"). If this comparison returns true, let's use strcmpi, lstrcmpi or CharUpper to check out the rest of our string:

if(cmp2ch(buf, 'P', 'R') || cmp2ch(buf, 'P', 'r') ||
   cmp2ch(buf, 'p', 'R') || cmp2ch(buf, 'p', 'r')) {
        char tmp[5];
        *(long*)tmp = *(long*)buf; // Fast copy 4 chars from buf to tmp
        tmp[4] = buf[4];           // or: memcpy(tmp, buf, 5 * sizeof(TCHAR));
        CharUpperBuff(tmp, 5); // Convert 5 chars to uppercase
        if(cmp4ch(tmp, 'P', 'R', 'I', 'N') && tmp[4] == 'T')
            DoPrint(buf+5);    // Compare converted characters
}

Another variation (ugly, but faster):

if((cmp2ch(buf, 'P', 'R') || cmp2ch(buf, 'P', 'r') ||
    cmp2ch(buf, 'p', 'R') || cmp2ch(buf, 'p', 'r')) &&
   (cmp2ch(buf+2, 'I', 'N') || cmp2ch(buf+2, 'I', 'n') ||
    cmp2ch(buf+2, 'i', 'N') || cmp2ch(buf+2, 'i', 'n'))
    && (buf[4] == 'T' || buf[4] == 't'))
        DoPrint(buf+5);

If you need to compare only English letters, you can use a very fast trick. Note that each capital letter has the fifth bit cleared, and corresponding small letters have the same bit set:

0100 0001 — A0100 0010 — B0100 0011 — C0100 0100 — D
0110 0001 — a0110 0010 — b0110 0011 — c0110 0100 — d

Let's clear this bit by ANDing letters with 1101 1111 = 0xDF. Thus we get only capital letters and can compare them with a constant:

if(('P' | ('R' << 8) | ('I' << 16) | ('N' << 24)) == (*(long*)buf & 0xDFDFDFDF)
    && (buf[4] & 0xDF) == 'T') {
        DoPrint(buf+5);
    }

Although this code is extremely fast and small, it works only for English letters. Adding digits or brackets will introduce bugs. For example, "{" will be converted to "[" and this code won't operate correctly:

if(('[' | ('A' << 8)) == (*(short*)buf & 0xDFDF)) {
        ReportError(); // Will call ReportError() for "[a", "{a", "[A", "{A"
    }

This can be rewritten as follows:

if(('[' | ('A' << 8)) == (*(short*)buf & 0xDFFF)) {
        ReportError(buf+5); // Will call ReportError() for both "[a" and "[A"
    }

This code converts the second letter by ANDing it with DF, but leave the first letter unchanged.

Buffer Length Warning

You should always check that string comparison doesn't exceed available buffer length. Avoid such kind of code:

char buf[256], *p = buf;
GetBuf(&buf, 256); // Read 256 characters from anywhere
do {
    if(cmp2ch(p, '\r', '\n'))
        DoSomething(p);
} while(*p++);

Remember, you are comparing the current character and the next character by using cmp2ch. If there are only 256 characters in string, the last iteration will try to compare 256 and 257 characters with "\r\n". So, you should add one extra character when allocating the buffer:

char buf[257], *p = buf;
GetBuf(&buf, 256); // Read 256 characters from anywhere
...

Or, if you know the string length, make only (length - 1) iterations:

char *buf = (char*)malloc(len), *p = buf;
GetBuf(&buf, len); // Read len characters from anywhere
len--; // Don't compare the last character
while(len > 0) {
    if(cmp2ch(p, '\r', '\n'))
        DoSomething(p);
    len--, p++;
}
free(buf);

Future

Of course, these macros can't be considered an elegant way to compare strings. But C standard library and CString-like classes are even worse. strcmp() and other string manipulation routines proved to be very large and non-effective, and they can significantly slow down your inner loops.

The problem resides in C language itself. There is no built-in string type, so the compiler can't optimize string-manipulation code. I hope some compiler writer will read this article and will think of a language with fast string processing. It would be great if any compiler could apply these tricks automatically.

History of changes

12-24-2004:

  • The first version.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here


Written By
Software Developer
Czech Republic Czech Republic
Peter is the developer of Aba Search and Replace, a tool for replacing text in multiple files. He likes to program in C with a bit of C++, also in x86 assembly language, Python, and PHP.

Comments and Discussions

 
GeneralIn many times there are much simpler solutions Pin
MichaelYork8-Feb-10 4:45
MichaelYork8-Feb-10 4:45 
GeneralRe: In many times there are much simpler solutions Pin
Peter Kankowski8-Feb-10 15:13
Peter Kankowski8-Feb-10 15:13 
Generalok, the strncmp is slow... Pin
necropower12325-Aug-07 8:02
necropower12325-Aug-07 8:02 
GeneralRe: ok, the strncmp is slow... Pin
Peter Kankowski27-Aug-07 12:55
Peter Kankowski27-Aug-07 12:55 
GeneralAlignment penalty Pin
anton_petrov31-Oct-06 7:27
anton_petrov31-Oct-06 7:27 
GeneralNice Optimzation Pin
basementman5-Jan-05 4:10
basementman5-Jan-05 4:10 
GeneralRe: Nice Optimzation Pin
Gernot Frisch5-Jan-05 4:36
Gernot Frisch5-Jan-05 4:36 
Generalforget it Pin
Gernot Frisch5-Jan-05 5:23
Gernot Frisch5-Jan-05 5:23 
GeneralRe: forget it Pin
Peter Kankowski5-Jan-05 18:21
Peter Kankowski5-Jan-05 18:21 
GeneralRe: forget it Pin
Gernot Frisch6-Jan-05 10:26
Gernot Frisch6-Jan-05 10:26 
GeneralRe: forget it Pin
Gernot Frisch6-Jan-05 22:23
Gernot Frisch6-Jan-05 22:23 
GeneralRe: Nice Optimzation Pin
Peter Kankowski5-Jan-05 18:23
Peter Kankowski5-Jan-05 18:23 
Hello, Gernot,

try this code:
#include <windows.h>
#include <string.h>
#include <stdio.h>

// returns 0 if equal, 1 if unequal
// strncmp for Unicode
inline int qstrncmp(const WCHAR* p1, const WCHAR* p2, size_t i) {
	while(i >= 2) {
		if(*(long*)p1 != *(long*)p2)
			return 1;
		p1 += 2, p2 += 2, i -= 2;
	}
	if(i && *p1 != *p2)
		return 1;
	return 0;
}

// strncmp for ANSI
inline int qstrncmp(const char* p1, const char* p2, size_t i) {
	while(i >= 4) {
		if(*(long*)p1 != *(long*)p2)
			return 1;
		p1 += 4, p2 += 4, i -= 4;
	}
	if(i >= 2) {
		if(*(short*)p1 != *(short*)p2)
			return 1;
		p1 += 2, p2 += 2, i -= 2;
	}
	if(i && *p1 != *p2)
		return 1;
	return 0;
}

const int REP = 10000000;
int main(int, char**) {
	char c1[] = "'Twas brillig, and the slithy toves did gyre and gimble in the wabe";
	char c2[] = "'Twas brillig";
// Check if qstrncmp returns valid results. Try to replace "brillig" in c2
// to "brilliant" or "trillion" and see what's happens
	for(size_t i = 0; i <= strlen(c2); ++i)
		printf("%2d chars: %2d %2d\n", i, strncmp(c1, c2, i), qstrncmp(c1, c2, i));

// Speed test
    DWORD z = GetTickCount();
	for(i=0; i<REP; ++i)
		strncmp(c1, c2, 10);
	z = GetTickCount() - z;
	printf("strncmp: %d ticks\n", z);

    z = GetTickCount();
	for(i=0; i<REP; ++i)
		qstrncmp(c1, c2, 10);
	z = GetTickCount() - z;
	printf("qstrncmp: %d ticks\n", z);

    z = GetTickCount();
	int d=0;
	for(i=0; i<REP; ++i)
		if(cmp4ch(c1, '\'','T','w','a') && cmp4ch(c1+4, 's',' ','b','r') &&
		   cmp4ch(c1+8, 'i','l','l','i') && *(c1+12) == 'g')
			d++; // Prevent optimization
	z = GetTickCount() - z;
	printf("cmpXch: %d ticks, %d\n", z, d);
	return 0;
}

qstrncmp is slower than cmpXch, but it's faster
than strncmp (I tested these functions under VC++6.0).
If you need to compare the string with a constant,
you'd better use cmpXch macros. But if the second
string is not a constant, qstrncmp will be faster
than strncmp.

See also message "Re: Crazy Idea" on this page
about comparison code generation at runtime.

Peter Kankowski
GeneralRe: Nice Optimzation Pin
Peter Kankowski5-Jan-05 18:20
Peter Kankowski5-Jan-05 18:20 
GeneralMaking it useable Pin
Gernot Frisch5-Jan-05 4:02
Gernot Frisch5-Jan-05 4:02 
GeneralRe: Making it useable Pin
necropower12324-Aug-07 17:03
necropower12324-Aug-07 17:03 
GeneralRe: Making it useable Pin
Gernot Frisch24-Aug-07 21:25
Gernot Frisch24-Aug-07 21:25 
GeneralRe: Making it useable Pin
necropower12325-Aug-07 6:34
necropower12325-Aug-07 6:34 
GeneralCrazy idea Pin
Jörgen Sigvardsson28-Dec-04 8:28
Jörgen Sigvardsson28-Dec-04 8:28 
GeneralRe: Crazy idea Pin
Peter Kankowski29-Dec-04 15:00
Peter Kankowski29-Dec-04 15:00 
GeneralInteresting Pin
John R. Shaw27-Dec-04 7:54
John R. Shaw27-Dec-04 7:54 
GeneralRe: Interesting Pin
Peter Kankowski27-Dec-04 15:29
Peter Kankowski27-Dec-04 15:29 
Question? Pin
Goran Mitrovic26-Dec-04 20:37
Goran Mitrovic26-Dec-04 20:37 
AnswerWell... Pin
peterchen26-Dec-04 22:00
peterchen26-Dec-04 22:00 
GeneralRe: Well... Pin
Goran Mitrovic26-Dec-04 22:23
Goran Mitrovic26-Dec-04 22:23 
GeneralRe: Well... Pin
Peter Kankowski27-Dec-04 0:02
Peter Kankowski27-Dec-04 0:02 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.