Click here to Skip to main content
15,898,371 members
Articles / Programming Languages / C#
Article

StringBuilder vs. String / Fast String Operations with .NET 2.0

Rate me:
Please Sign up or sign in to vote.
3.91/5 (65 votes)
30 Mar 2007CPOL7 min read 382.2K   701   101   47
Comparision of String/StringBuilder functions. Efficient String handling.

Introduction

Strings are so heavily used in all programming languages that we do not think about them very much. We use them simply and hope to do the right thing. Normally all goes well but sometimes we need more performance so we switch to StringBuilder which is more efficient because it does contain a mutable string buffer. .NET Strings are immutable which is the reason why a new string object is created every time we alter it (insert, append, remove, etc.).

That sounds reasonable, so why do we still use the .NET String class functions and not the faster StringBuilder? Because optimal performance is a tricky thing and the first rule of the performance club is to measure it for yourself. Do not believe somebody telling you (including me!) that this or that is faster in every case. It is very difficult to predict the performance of some code in advance because you have to know so many variables that influence the outcome. Looking at the generated MSIL code does still NOT tell you how fast the code will perform. If you want to see why your function is so slow/fast you have to look at the compiled (JIT ed) x86 assembler code to get the full picture.

Greg Young did some very nice posts about what the JITer does make of your MSIL code at your CPU. In the following article I will show you the numbers for StringBuilder vs String which I did measure with .NET 2.0 a P4 3.0 GHz with 1 GB RAM. Every test was performed 5 million times to get a stable value.

Insert a String / Remove a character from one

I inserted the missing words at the beginning of the the sentence "The quick brown fox jumps over the lazy dog" to find out the break even point between String.Insert and StringBuilder.Insert. To see how the removal of characters worked I removed in a for loop one character from the beginning of our test sentence. The results are shown in the diagram below.

Screenshot - StringInsert.JPG
C#
// Used Test functions for this chart
string StringRemove(string str, int Count)  
{
    for(int i=0;i<Count;i++)
        str = str.Remove(0, 1);

    return str;
}

string StringBuilderRemove(string str, int Count)
{
    StringBuilder sb = new StringBuilder(str);
    for(int i=0;i<Count;i++)    
        sb.Remove(0, 1);

    return sb.ToString();        
} 

string StringInsert(string str, string [] inserts)
{
    foreach (string insert in inserts)
        str = str.Insert(0, insert);


    return str;        
}

string StringBuilderInsert(string str, string [] inserts)
{          
    StringBuilder sb = new StringBuilder(str);
    foreach (string insert in inserts)
        sb.Insert(0, insert);

    return sb.ToString();      
}

We see here that StringBuilder is clearly the better choice if we have to alter the string. Insert and Remove operations are nearly always faster with StringBulder. The removal of characters is especially fast with StringBuilder where we gain nearly a factor of two.

Replace one String with another String

Things do become more interesting when we do replace anywhere from one to five words of our fox test sentence.

Screenshot - StringReplace.JPG
C#
// Used Test functions for this chart
string StringReplace(string str, 
                     List<KeyValuePair<string,string>> searchReplace)
{
    foreach (KeyValuePair<string, string> sreplace in searchReplace)
        str = str.Replace(sreplace.Key, sreplace.Value);

    return str;
}

string StringBuilderReplace(string str, 
                            List<KeyValuePair<string,string>> searchReplace)
{
    StringBuilder sb = new StringBuilder(str);
    foreach (KeyValuePair<string, string> sreplace in searchReplace)
        sb.Replace(sreplace.Key, sreplace.Value);

    return sb.ToString();
}

This is somewhat surprising. StringBuilder does not beat String.Replace even if we do many replaces. There seems to be a constant overhead of about 1s we see in our data that we pay if we use StringBuilder. The overhead is quite significant (30%) when we have only a few String.Replaces to do.

String.Format

I checked when StringBuilder.AppendFormat is better than String.Format, and also appended it with the "+" operator.

Screenshot - StringFormat.JPG
C#
// Functions used for this chart
string StringFormat(string format, int Count, params object[] para) 
{ 
    string str=String.Empty; 
    for(int i=0;i<Count;i++) 
        str += String.Format(format,para); return str; 
} 

string StringBuilderFormat(string format, int Count, params object[] para )
{ 
    StringBuilder sb = new StringBuilder(); 
    for(int i=0;i<Count;i++) 
        sb.AppendFormat(format, para); 

    return sb.ToString(); 
}

StringBuilder is better when you have to format and concatenate a string more than five times. You can shift the break even point even further if you do recycle the StringBuilder instance.

String Concatenation

This is the most interesting test because we have several options here. We can concatenate strings with +, String.Concat, String.Join and StringBuilder.Append.

Screenshot - StringConcat.JPG
C#
string Add(params string[] strings) // Used Test functions for this chart
{
    string ret = String.Empty;    
    foreach (string str in strings)
        ret += str;

    return ret;
}

string Concat(params string[] strings)
{
    return String.Concat(strings);
}

string StringBuilderAppend(params string[] strings)
{
    StringBuilder sb = new StringBuilder();
    foreach (string str in strings)
        sb.Append(str);

    return sb.ToString();
}

string Join(params string[] strings)
{
    return String.Join(String.Empty, strings);
}

And the winner for String Concatenation is ... Not string builder but String.Join? After taking a deep look with Reflector I found that String.Join has the most efficient algorithm implemented which allocates in the first pass the final buffer size and then memcopy each string into the just allocated buffer. This is simply unbeatable. StringBuilder does become better above 7 strings compared to the + operator but this is not really code one would see very often.

Comparing Strings

An often underestimated topic is string comparisons. To compare Unicode strings your current locale settings has to be taken into account. Unicode characters with values greater than 65535 do not fit into the .NET Char type which is 16-bit wide. Especially in Asian countries these characters are quite common which complicates the matter even more (case invariant comparisons). The language specialties honoring comparison function of .NET 2.0 (I guess this is true for .NET 1.x also) is implemented in native code which does cost you a managed to unmanaged, and back transition.

Screenshot - StringCompare.JPG
C#
// Used Test functions for this chart
int StringCompare(string str1, string str2) 
{
    return String.Compare(str1, str2, StringComparison.InvariantCulture);
}


int StringCompareOrdinal(string str1, string str2)
{
    return String.CompareOrdinal(str1, str2);
}

It is good that we compared the string comparison functions. A factor of 3 is really impressive and shows that localization comes with a cost which is not always negligible. Even the innocent looking mode StringComparison.InvariantCulture goes into the same slow native function which explains this big difference. When strings are interned, the comparison operation is much faster (over a factor 30) because a check for reference equality is made by the CLR.

To tell the truth, I was surprised by this result also and I did not know for a long time th use of this strange CompareOrdinal function. String.CompareOrdinal does nothing else than to compare the string char (16-bit remember) by char which is done 100% in managed code. That does allow the JITer to play with its optimizing muscles as you can see. If somebody does ask you what this CompareOrdinal is good for you now know why. You can (should) use this function on strings that are not visible to the outside world (users) and are therefore never localized. Only then it is safe to use this function. Remember: Making a program working fast but incorrect is easy. But making it work correctly and operate quickly is a hard thing to do. When you mainly deal with UI code the it's a good bet that you should forget this function very fast.

Conclusions

The following recommendations are valid for our small test strings (~30 chars) but should be applicable to bigger strings (100-500) as well (measure for yourself!). I have seen many synthetic performance measurements that demonstrate the power of StringBuilder with strings that are 10KB and bigger. This is the 1% case in real world programs. Most strings will be significantly shorter. When you optimize a function and you can "feel" the construction costs of an additional object then you have to look very carefully if you can afford the additional initialization costs of StringBuilder. <thread>

String Operation Most Efficient
InsertStringBuilder.Insert > 2 Insertion Strings
String.Insert otherwise
RemoveStringBuilder is faster > 2 characters
to remove
ReplaceString.Replace always
FormatString.Format < 5 Append + Format operations
StringBuilder.AppendFormat > 5 calls
Concatenation+ for 2 strings
String.Join > 2 strings to concatenate

The shiny performance saving StringBuilder does not help in all cases and is, in some cases, slower than other functions. When you want to have good string concatenation performance I recommend strongly that you use String.Join which does an incredible job.

Points of Interest

  • I did not tell you more about the String.Intern function. You need to know more about string interning only if you need to save memory in favor of processing power.
  • If you want to see a good example how you can improve string formatting 14 times for fixed length strings have a look at my blog.
  • Did you notice that there is no String.Reverse in .NET? In any case, you would rarely need that function anyway Greg did put up a little contest to find the fastest String.Reverse function. The functions presented there are fast but do not work correct with surrogate (chars with a value > 65535) Unicode characters. Making it fast and correct is not easy).
  • The test results obtained here are .NET Framework, machine and string length specific. Please do not simply look at the numbers and use this or that function without being certain that the results obtained here are applicable to your concrete problem.

History

  • 28.7.2006 Fixed Download/Fine tuning the coloring of the charts to make it more readable.
  • 27.7.2006 Updated String Comparison graph. Interned string comparison is much faster.
  • 27.7.2006 Fixed bug in String.Concat Diagram. The numbers below 3 string concats where wrong. Thanks Greg for pointing this out.
  • 27.7.2006 Changed String.Format diagramm to get the full picture until when StringBuilder does outperform String.Format and Concat.
  • 26.7.2006 Released v1.0 on CodeProject

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Systems Engineer Siemens
Germany Germany
He is working for a multi national company which is a hard and software vendor of medical equipment. Currently he is located in Germany and enjoys living in general. Although he finds pretty much everything interesting he pays special attention to .NET software development, software architecture and nuclear physics. To complete the picture he likes hiking in the mountains and collecting crystals.

Comments and Discussions

 
GeneralRe: Appending Strings Pin
Alois Kraus25-Jul-06 20:53
Alois Kraus25-Jul-06 20:53 
GeneralRe: Appending Strings Pin
simon.proctor26-Jul-06 2:26
simon.proctor26-Jul-06 2:26 
GeneralWrong Testing Method Pin
davepermen25-Jul-06 14:21
davepermen25-Jul-06 14:21 
GeneralRe: Wrong Testing Method Pin
Alois Kraus25-Jul-06 20:55
Alois Kraus25-Jul-06 20:55 
GeneralRe: Wrong Testing Method Pin
MogobuTheFool2-Aug-06 5:26
MogobuTheFool2-Aug-06 5:26 
GeneralRe: Wrong Testing Method Pin
Alois Kraus2-Aug-06 7:43
Alois Kraus2-Aug-06 7:43 
GeneralRe: Wrong Testing Method Pin
MogobuTheFool3-Aug-06 11:15
MogobuTheFool3-Aug-06 11:15 
GeneralRe: Wrong Testing Method Pin
Alois Kraus3-Aug-06 13:19
Alois Kraus3-Aug-06 13:19 
I have tried to do it your way:
<br />
       StringBuilder RepSb = new StringBuilder(100);<br />
        string StringBuilderReplace(string str, List<KeyValuePair<string, string>> searchReplace)<br />
        {<br />
            RepSb.Length = 0;<br />
            RepSb.Append(str);<br />
            foreach (KeyValuePair<string, string> sreplace in searchReplace)<br />
            {<br />
                RepSb.Replace(sreplace.Key, sreplace.Value);<br />
            }<br />
<br />
            return RepSb.ToString();<br />
        }


Action String.Replace quick -> slow did consume 3,32 s
Action StringBuilder.Replace quick -> slow did consume 4,18 s
Action String.Replace quick -> slow, lazy -> busy did consume 6,92 s
Action StringBuilder.Replace quick -> slow, lazy -> busy did consume 7,36 s
Action String.Replace quick -> slow, lazy -> busy, dog -> cat did consume 9,67 s
Action StringBuilder.Replace quick -> slow, lazy -> busy, dog -> cat did consume 10,15 s
Action String.Replace quick -> slow, lazy -> busy, dog -> cat, The -> A, jumps -> crawls did consume 16,26 s
Action StringBuilder.Replace quick -> slow, lazy -> busy, dog -> cat, The -> A, jumps -> crawls did consume 16,16 s

       string StringBuilderReplace(string str, List<KeyValuePair<string, string>> searchReplace)<br />
        {<br />
            StringBuilder sb = new StringBuilder(str);<br />
            foreach (KeyValuePair<string, string> sreplace in searchReplace)<br />
            {<br />
                sb.Replace(sreplace.Key, sreplace.Value);<br />
            }<br />
<br />
            return sb.ToString();<br />
        }

Action String.Replace quick -> slow did consume 3,30 s
Action StringBuilder.Replace quick -> slow did consume 4,34 s
Action String.Replace quick -> slow, lazy -> busy did consume 6,94 s
Action StringBuilder.Replace quick -> slow, lazy -> busy did consume 7,50 s
Action String.Replace quick -> slow, lazy -> busy, dog -> cat did consume 10,08 s
Action StringBuilder.Replace quick -> slow, lazy -> busy, dog -> cat did consume 10,35 s
Action String.Replace quick -> slow, lazy -> busy, dog -> cat, The -> A, jumps -> crawls did consume 15,97 s
Action StringBuilder.Replace quick -> slow, lazy -> busy, dog -> cat, The -> A, jumps -> crawls did consume 16,50 s

As you can see the numbers do not differ that much. String.Replace does still win.

The String.Replace does call
public extern string Replace(string oldValue, string newValue);
StringBuilder.Replace does call
public extern StringBuilder Replace(string oldValue, string newValue, int startIndex, int count);

As you can see there is a new StringBuilder object created inside StringBuilder every time you do a replace. This is the reason why the StringBuilder is so "bad" in this test. Measure measure measure and check if your assumptions about performance are correct. In this case it is not a fundamental flaw in my test because I did not reuse StringBuilder. This does not explain the 0,86s difference in the first replace test at all. The creation overhead of StringBuilder does only contribute with 20% to this difference.
Now you can argue that I
a) Used the wrong capacity in the first place.
b) Should not use Insert instead of Append
c) ....

I did want to get into this inside my article since it will
a) Complicate the code
b) Make it harder to understand
c) Make it harder to use the information gained here inside your own applications because the test routine does now have much more constraints (Initial StringBuilder Capacity, Shared instance, ..)




Do you have read the comments to this article like
http://www.codeproject.com/useritems/StringBuilder_vs_String.asp?msg=1595637#xx1595637xx

There you see the usage of a StringBuilder instance only once and are still better off as if you would use StringBuilder directly. You could also suggest that String.Format should have a shared preallocated StringBuilder instance. Obviously there were arguments against doing it this way. In this case thread safety is a very good reason to NOT reuse a shared StringBuilder instance. The performance costs of locking, introduction of scalability issues such as thread thrashing will prevent you from doing such an optimization. Clever buffer size choice does serve you much better when you make a good guess how big the resulting string will get.
The object creation overhead you mention does not play such a big role as you suggest if the object is created and destroyed in the same function (Generation 0 object collections are very fast). I encourage you to do your own tests by altering mine and compare them. I can tell you the outcome already: There will be a constant overhead removed at the t-axis. This offset delta_t will depend on the reused Stringbuilder instance history (n-reallocations later the buffer will be so big that there will be no buffer reallocations). What will this test prove? It will show that you are fastest if there are no buffer reallocations at all if you make the buffer big enough (the StringBuilder object creation is not the most expensive thing here).

Yours,
Alois Kraus
GeneralRe: Wrong Testing Method Pin
MrDnote15-Nov-06 23:34
MrDnote15-Nov-06 23:34 
GeneralRe: Wrong Testing Method Pin
mross015-Dec-06 8:19
mross015-Dec-06 8:19 
GeneralRe: Wrong Testing Method Pin
Stephen Brannan30-Mar-07 10:34
Stephen Brannan30-Mar-07 10:34 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.