Click here to Skip to main content
15,885,216 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
In SQL, you can compare successfully two strings depending on the collation
With Latin1_General_CI_AS, fussballmanager and fußballmanager are considered equal.

They are different with SQL_Latin1_General_CP1_CI_AS.

I am trying to replicate the same behaviour in C++. I understand it's dependent on the locale and I would like to have some code where the user can decide which locale to use.

What I have tried:

STL


std::locale loc1("german");

wstring a("fussballmanager");
wstring b("fußballmanager");

const std::collate<wchar_t>& coll = use_facet<std::collate<wchar_t> >(loc1);
int nRes = coll.compare(a.data(), a.data() + a.size(),
                        b.data(), b.data()  + b.size());
// => nRes is 1 not 0


BOOST


boost::locale::generator gen;
std::locale loc = gen("en-GB");
std::locale::global(loc);


wstring a("fussballmanager");
wstring b("fußballmanager");
int nRes = use_facet<boost::locale::collator<wchar_t> >(loc).compare(boost::locale::collator_base::primary, a, b);
// Crash here (it works if I leave the locale string empty: replace en-GB by ""
Posted
Updated 25-Nov-16 2:48am
Comments
BadJerry 25-Nov-16 10:46am    
OK if you write the following (add ut8) - no crash in Boost!
std::locale loc = gen("en-GB.UTF-8");

Jochen's solutions probably means no need for boost for others ( I already had it)

1 solution

There is a really good answer about string comparison at this SO thread:
c++ - How to compare a "basic_string" using an arbitary locale - Stack Overflow[^].

Regarding your SQL examples, both should return false for the example strings when using AS (Accent Sensitive). But I'm not quite sure about the special case Eszett. The only difference is CP1 which selects code page 1252 while the other uses a default code page (which might be 1252 too).

I have not used the boost collating and comparing functions so far and can't therefore give a solution using them.

If you are on Windows, you can use the CompareStringEx function (Windows)[^].

There is also the ICU - International Components for Unicode[^] project providing a library that supports collated comparing (see Collation - ICU User Guide[^]).
 
Share this answer
 
Comments
BadJerry 25-Nov-16 9:46am    
Very useful - did not know about CompareStringEx which I will try now! I was also aware of ICU but did not want to add more dependency to my project.
Thak you very much!

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900