15,901,666 members
Articles / General Programming / Algorithms

Extended Number to Numeral (Number Spelling) Converter

Rate me:
25 May 2016MIT11 min read 17.8K   3   3   5
Numbers (positive AND negative integral/fractional) to english/russian words

Introduction

The subject is pretty self-descriptive. Most important thing about it is that it uses generalized strategy to process `numbers`. While currently the module can convert numbers to the two different languages: russian (cyrillic) AND english (latinic), with support of the two english dialectsamerican AND british, in future such construction allows to extend the module to support even more cyrillic AND/OR latinic languages.

Background

Here i will list some helpful links to the `online conversion tools`, which you can use to check spelling (including the module output).

English:

English AND russian:

By the way, i found some bugs in this tools, so part of their output might be incorrect.

Also, information about `numerals` in different languages:

Russian:

Scales:

Using the code

`LocaleSettings` struct used to configure the conversion:

C++
```// Enables some language very specific rules for numbers spelling
//  (like pronouncing four-digit numbers in US & UK Eng.)
bool verySpecific = false;
bool positiveSign = false; // add positive sign [for positive nums]
// Если целая часть равна нулю, то она может не читаться: 0.75 (.75) – point seventy five
bool shortFormat  = false; // skip mention zero int. / fract. part
bool foldFraction = false; // try find repeated pattern & treat it
ELocale locale = ELocale::L_EN_GB;
size_t precison = size_t(LDBL_DIG); // max. digits count (<= 'LDBL_DIG')```

Flags:

`1) verySpecific`

For ENG GB, ENG US:

- replaces zero / nought with the 'o' letter (1.02 = "one point o two")

- enables specific handling of four-digit numbers with non-zero hundreds: they are often named using multiples of "hundredAND combined with tens AND/OR ones ("one thousand one", "eleven hundred three", "twelve hundred twenty-five", "four thousand forty-two", or "ninety-nine hundred ninety-nine" etc)

* for ENG GB this style is common for multiples of 100 between 1,000 AND 2,000 (e.g. 1,500 as "fifteen hundred") BUT NOT for higher numbers.

`2) positiveSign`: enables addition of explicit 'positive' / 'plus' / 'плюс' signature for the numbers > 0

Examples:

1.3 = "plus one point three" [EN GB]

1.181818181818 = "плюс одна целая и восемнадцать в периоде" [RU + `foldFraction`]

`3) shortFormat`: skip mentioning unexisting integral OR fractional part of the number

Examples:

0.0 = "zero" [EN US]

0.01 = "point zero one" [EN US]

999000.0 = "nine hundred and ninety-nine thousand" [EN GB]

`4) foldFraction`: [ONLY for fractions] enables mechanism of finding repeated digits pattern in the fractional part of a number AND (if found) shortening it to the first occurrence with addition of periodic signature.

Examples:

EN GB + `verySpecific`:

-7289.120912091209 = "minus seven thousand two hundred and eighty-nine point one two o nine repeating"

EN US`positiveSign`:

28364768.07310731 = "positive twenty-eight million three hundred sixty-four thousand seven hundred sixty-eight point zero seven three one to infinity"

Options:

`1) precison`: maximum count of digits (in the fractional part) to process. Result number representation would be rounded to the last digit. Can be zero. Limited to the LDBL_DIG value. Trailing zeroes in the result number are ignored.

`2) locale`: selected language OR language dialect. Value selected from the `ELocale` enumeration (old C++ enum, NOT new C++11 enum class). Can have the following values:

C++
```L_RU_RU, // Russian Federation Russian
L_EN_US, // United States English
L_EN_GB, // United Kingdom English```

The flags AND options can be combined in ANY combination, BUT some flags (OR options) can be ignored OR reinterpreted in some cases.

Example`verySpecific + positiveSign + shortFormat + foldFraction`

0.0034013401 = "plus o point o o three four o one repeating" [EN GB]

As you can see, despite the `shortFormat` flag is set, zero integral part is NOT ignored.

Function call interface + short description:

C++
```// 'ReserveBeforeAdding' can be used to DISABLE possible 'trade-space-for-time' optimization
template<class TStrType, const bool ReserveBeforeAdding = true>
// "Number to the numeric format string" (321 -> "three hundred twenty-one")
// Accpets negative numbers AND fractions
// Complexity: linear in the number's digit count
static bool numToNumFormatStr(long double num, TStrType& str,
LocaleSettings& localeSettings =
LocaleSettings::DEFAULT_LOCALE_SETTINGS,
const char** const errMsg = nullptr) {```

`errMsg` pointer can be used to get an error message (as a static const. POD C str.), explaining of what exaclty happened, if something occasionally goes wrong.

As you can see, different container types are supported here, all of them, however, should met the requirements:

C++
`'TStrType' SHOULD support operator '+=', 'empty' AND 'size' methods`

Function adds numeral text to the existing content of the `str`, delimiting it with the spacer if the container is not empty at the start of a function's work.

Conversion stages description

There are a total of four main steps.

`1) `checking of the incoming value & treating it's sign

```auto negativeNum = false;
if (num < 0.0L) {
negativeNum = true;
num = -num; // revert
}
//// Check borders
static const auto VAL_UP_LIMIT_ = 1e100L; // see 'getOrderStr'
if (num >= VAL_UP_LIMIT_) {
if (errMsg) *errMsg = "too big value";
return false;
}
if (ELocale::L_RU_RU == localeSettings.locale) { // for rus. lang. ONLY
static const auto VAL_LOW_LIMIT_RU_ = 10.0L / VAL_UP_LIMIT_;
if (num && num < VAL_LOW_LIMIT_RU_) {
if (errMsg) *errMsg = "too low value";
return false;
}
}
//// Treat sign
const auto delimiter = DEFAULT_DELIMITER;
auto getSignStr = [](const ELocale locale, const bool positive) throw() -> const char* {
switch (locale) {
case ELocale::L_EN_US: return positive ? "positive" : "negative";
case ELocale::L_EN_GB: return positive ? "plus" : "minus";
case ELocale::L_RU_RU: return positive ? "плюс" : "минус";
}
assert(false); // locale error
// Design / implementation error, NOT runtime error!
return "<locale error [" MAKE_STR_(__LINE__) "]>"; // works OK in GCC
};
if (negativeNum || (localeSettings.positiveSign && num)) { // add sign
if (!str.empty()) str += delimiter; // if needed
str += getSignStr(localeSettings.locale, !negativeNum);
}
if (truncated::ExecIfPresent(str)) { // check if truncated
if (errMsg) *errMsg = "too short buffer"; return false;
}```

`VAL_UP_LIMIT_` is involved here because of the `getOrderStr` language-specific morphological lambda limitations for the russian language. This (AND others) labmda will be presented later in this article.

`truncated::ExecIfPresent` is a special conditional optimization for the StaticallyBufferedString-like classes (if provided as a storage). It use Exec-If-Present idiom.

`2)` getting the number representation as a char. array & analysing it

C++
```static const size_t MAX_DIGIT_COUNT_ = size_t(LDBL_DIG);
// Normalized form (mantissa is a 1 digit ONLY):
//  first digit (one of 'MAX_DIGIT_COUNT_') + '.' + [max. digits AFTER '.' - 1] + 'e+000'
//   [https://en.wikipedia.org/wiki/Scientific_notation#Normalized_notation]
static const size_t MAX_STR_LEN_ = 6U + MAX_DIGIT_COUNT_;

// +24 to be on a safe side in case if NOT normalized form (unlikely happen) + for str. terminator
static const size_t BUF_SIZE_ = AUTO_ADJUST_MEM(MAX_STR_LEN_ + 24U, 8U);
char strBuf[BUF_SIZE_];
// 21 digits is max. for 'long double' [https://msdn.microsoft.com/ru-ru/library/4hwaceh6.aspx]
//  (20 of them can be AFTER decimal point in the normalized scientific notation)
if (localeSettings.precison > MAX_DIGIT_COUNT_) localeSettings.precison = MAX_DIGIT_COUNT_;
const ptrdiff_t len = sprintf(strBuf, "%.*Le", localeSettings.precison, num); // scientific format
// On failure, a negative number is returned
if (len < static_cast<decltype(len)>(localeSettings.precison)) {
if (errMsg) *errMsg = "number to string convertion failed";
return false;
}```

`sprintf` is used here because of, comparing to the naive conversion way (applying series of simple arithmetic operations, like *, / AND %), it gives no (OR almost no) precision penalty (however, involving extra performance overhead). Function assumes that the resulted (received from `sprintf`) representation will be in normalized form of scientific notation, but the code was designed (though NOT tested) to work even if the resulted output would not be normalized.

Analyzation process consists of gathering information about number representation (like exponent value in the scientific notation) AND separating char. array on parts (by adjusting specific pointers, like `fractPartEnd`).

C++
```char* currSymbPtr;    // ptr. used to iterate over the numeric str.
char* fractPartStart; // in the original scientific representation
char* fractPartEnd;   // past the end [will point to the str. terminator, replacing the exp. sign]
long int expVal;      // 3 for '1.0e3'
auto fractPartLen = ptrdiff_t();
size_t intPartLen; // real len.
size_t intPartBonusOrder; // of the current digit
size_t fractPartLeadingZeroesCount; // extra zeroes count BEFORE first meaning digit
static const auto DECIMAL_DELIM_ = '.'; // [decimal separator / decimal mark] to use
auto analyzeScientificNotationRepresentation = [&]() throw() {
currSymbPtr = strBuf + len - size_t(1U); // from the end to start (<-)
//// Get exp.
static const auto EXP_SYMB_ = 'e';
while (EXP_SYMB_ != *currSymbPtr) {
--currSymbPtr; // rewind to the exp. start
assert(currSymbPtr > strBuf);
}
fractPartEnd = currSymbPtr;
*currSymbPtr = '\0'; // break str.: 2.22044604925031310000e+016 -> 2.22044604925031310000 +016
const char* errMsg;
const auto result = strToL(expVal, currSymbPtr + size_t(1U), errMsg);
assert(result);
//// Get int. part len.
fractPartStart = currSymbPtr - localeSettings.precison;
intPartLen = fractPartStart - strBuf;
assert(intPartLen);
if (localeSettings.precison) --intPartLen; // treat zero fract. precison ('1e0')
assert((currSymbPtr - strBuf - int(localeSettings.precison) - 1) >= 0);
assert(localeSettings.precison ? DECIMAL_DELIM_ == *(strBuf + intPartLen) : true);
//// Finishing analyse (partition the number): get int. part real len.
if (expVal < 0L) { // negative exp.
if (static_cast<size_t>(-expVal) >= intPartLen) { // NO int. part
fractPartLeadingZeroesCount = -(expVal + static_cast<long int>(intPartLen));
intPartLen = size_t(); // skip processing int. part
} else { // reduce int. part
intPartLen += expVal; // decr. len.
}
intPartBonusOrder = size_t();
if (localeSettings.precison) // if fract. part exists [in the scientific represent.]
--fractPartLen; // move delim. into the fract part., so reduce it length
} else { // non-negative exp.: incr. len.
std::min<decltype(localeSettings.precison)>(expVal, localeSettings.precison);
}
};
analyzeScientificNotationRepresentation();
// Rewind to the fract. start [BEFORE getting fract. part real len.]
currSymbPtr = strBuf + intPartLen +
(expVal > decltype(expVal)() ? size_t(1U) : size_t()); // 1.23e1 = 12.3e0 [move right +1]```

After the main analysis is finished, fractional part (if exist) of the number will be precisely inspected to determine if there are meaningless trailing zeros presented AND (if required) if the fractional part consist of some repeated pattern.

C++
```auto fractPartTrailingZeroesCount = size_t(), fractPartAddedCount = size_t();
char* fractPartRealStart;
auto folded = false; // true if repeated pattern founded
auto calcFractPartRealLen = [&]() throw() {
if (DECIMAL_DELIM_ == *currSymbPtr) ++currSymbPtr; // skip delimiter when it separtes ('1.1e0')
assert(fractPartEnd >= currSymbPtr); // 'currSymbPtr' SHOULD now be a real fract. part start
fractPartRealStart = currSymbPtr;
fractPartLen += fractPartEnd - currSymbPtr; // 'fractPartLen' CAN be negative BEFORE addition
assert(fractPartLen >= ptrdiff_t()); // SHOULD NOT be negative now
if (!fractPartLen) return; // NO fract. part
//// Skip trailing zeroes
auto fractPartCurrEnd = fractPartEnd - size_t(1U); // will point to the last non-zero digit symb.
while ('0' == *fractPartCurrEnd && fractPartCurrEnd >= currSymbPtr) --fractPartCurrEnd;
assert(fractPartCurrEnd >= strBuf); // SHOULD NOT go out of the buf.
fractPartTrailingZeroesCount = fractPartEnd - fractPartCurrEnd - size_t(1U);
fractPartLen >= static_cast<ptrdiff_t>(fractPartTrailingZeroesCount));
fractPartLen -= fractPartTrailingZeroesCount;
//// Fraction folding (if needed)
if (fractPartLen > size_t(1U) && localeSettings.foldFraction) {
//// Remove delim. (if needed)
assert(fractPartStart && fractPartStart > strBuf); // SHOULD be setted (delim. founded)
if (fractPartRealStart < fractPartStart) { // move: "12.1e-1" -> "1 21e-1"
currSymbPtr = fractPartStart - size_t(1U);
assert(*currSymbPtr == DECIMAL_DELIM_);
while (currSymbPtr > fractPartRealStart)
*currSymbPtr-- = *(currSymbPtr - size_t(1U)); // reversed move
*currSymbPtr = '\0';
fractPartRealStart = currSymbPtr + size_t(1U); // update, now SHOULD point to the new real start
assert(fractPartLen);
}
//// Actual folding (if needed)
if (fractPartLen > size_t(1U)) {
const auto patternLen = tryFindPattern(fractPartRealStart, fractPartLen);
if (patternLen) {
fractPartLen = patternLen; // actual folding (reduce fract. part len. to the pattern. len)
folded = true;
}
}
}
};
// We are NOT using 'modfl' to get part values trying to optimize by skipping zero parts
calcFractPartRealLen(); // update len.
assert(fractPartLen ? localeSettings.precison : true);
const auto fractPartWillBeMentioned = fractPartLen || !localeSettings.shortFormat;
currSymbPtr = strBuf; // start from the beginning, left-to-right (->)```

Recognition of the repeated pattern (which may be presented in a fractional part) performed by the step-by-step sequential scanning.

C++
```// Return nullptr if a pattern of such a len. is EXISTS (returns last NOT matched occurrence else)
auto testPattern = [](const char* const str, const char* const strEnd,
const size_t patternSize) throw() {
assert(str); // SHOULD NOT be nullptr
auto equal = true;
auto nextOccurance = str + patternSize;
while (true) {
if (memcmp(str, nextOccurance, patternSize)) return nextOccurance; // NOT macthed
nextOccurance += patternSize;
if (nextOccurance >= strEnd) return decltype(nextOccurance)(); // ALL matched, return nullptr
}
};

// Retruns pattern size if pattern exist, 0 otherwise
// TO DO: add support for advanced folding: 1.25871871 [find repeated pattern NOT ONLY from start]
//  [in cycle: str+1, str+2, ...; get pattern start, pattern len. etc in 'tryFindPatternEx']
//   ['сто двадцать целых двадцать пять до периода и шестьдесят семь в периоде']
//    [controled by 'enableAdvancedFolding' new option]]
auto tryFindPattern = [&](const char* const str, const size_t totalLen) throw() {
const size_t maxPatternLen = totalLen / size_t(2U);
auto const strEnd = str + totalLen; // past the end
for (auto patternSize = size_t(1U); patternSize <= maxPatternLen; ++patternSize) {
if (totalLen % patternSize) continue; // skip invalid dividers [OPTIMIZATION]
if (!testPattern(str, strEnd, patternSize)) return patternSize;
}
return size_t();
};```

For example, having 1.23452345 number, first we test if the fractional part consists only of repeated 2 (no), then if only of repeated 23 (wrong again), 234 is next (nope), AND finally 2345 hit the spot. Such inspection performed if only fractional part exist AND only by the explicit request of the user (disabled by default).

`3)` processing integral part of the number

This is the first step, when all preparation is finished AND where the real processing starts.

C++
```processDigitsPart(intPartLen, getIntSubPartSize(), intPartBonusOrder, false);
if (truncated::ExecIfPresent(str)) { // check if truncated
if (errMsg) *errMsg = "too short buffer"; return false;
}
if (intPartLen) { // if int. part exist
assert(currSymbPtr > strBuf);
intPartLastDigit = *(currSymbPtr - ptrdiff_t(1)) - '0';
assert(intPartLastDigit > ptrdiff_t(-1) && intPartLastDigit < ptrdiff_t(10));
if (intPartLen > size_t(1U)) { // there is also prelast digit
auto intPartPreLastDigitPtr = currSymbPtr - ptrdiff_t(2);
if (DECIMAL_DELIM_ == *intPartPreLastDigitPtr) --intPartPreLastDigitPtr; // skip delim.: 2.3e1
assert(intPartPreLastDigitPtr >= strBuf); // check borders
intPartPreLastDigit = *intPartPreLastDigitPtr - '0';
assert(intPartPreLastDigit > ptrdiff_t(-1) && intPartPreLastDigit < ptrdiff_t(10));
}
}
strLenWithoutFractPart = str.size(); // remember (for future use)

Both integral AND fractional parts are processed by the `processDigitsPart` generic processing lambda. This unified processing strategy will be presented later in this article.

After the main processing, two additional internal parameters: `intPartLastDigit` AND `intPartPreLastDigit` are also determined - they are required for a russian language processing, to choose an appropriate ending for the int. part AND for a fraction delimiter:

5.1 = "пять целых одна десятая"

1.5 = "одна целая пять десятых"

1 = "один" [`shortFormat`]

`4)` processing fractional part of the number

C++
```if (fractPartLen) {
currSymbPtr = fractPartRealStart; // might be required if folded [in SOME cases]
}
processDigitsPart(fractPartLen, getFractSubPartSize(localeSettings), size_t(), true);
//// Add specific ending (if needed, like 'десятимиллионная')
assert(fractPartLen >= decltype(fractPartLen)());
size_t fractPartLastDigitOrderExt = fractPartLeadingZeroesCount + fractPartLen;
if (!fractPartLastDigitOrderExt) fractPartLastDigitOrderExt = size_t(1U); // at least one
}
assert(totalAddedCount); // SHOULD NOT be zero
if (truncated::ExecIfPresent(str)) { // check if truncated
if (errMsg) *errMsg = "too short buffer"; return false;
} return true;```

`addFractionDelimiter` is another generic processing lambda, while `addFractionPrefix` is a language-specific processing lambda (this types of lambdas will be soon described more precisely).

`addFractionDelimiter` is obviously used to add fraction separator.

`addFractionPrefix` is used to add some language-specific content before starting an actual processing of the fractional part. For example, for english language it is leading zeros - in the scientific notation they might NOT be presented in the processed char. array: 0.0037 would be represented as "3.7e-3" (normalized form), so those zeros would NOT be processed during the main processing cycle AND so have to be added elsewhere.

There are three groups of lambdas, which was't described yet AND which is used durring the convertion process:

`1) language-specific lambdas`: their run time behavior is heavily based on the selected language

`a) morphological lambdas`: provides morphems of the selected language

`b) processing lambdas`: used to configure `generic processing lambdas` based on the language

`2) generic processing lambdas`: their internal logic is totally independent from the selected language, however, their execution process are configured by the `language-specific processing lambdas`

Now we'll talk about all those functions.

Language-specific morphological lambdas

In fact, this functions represents the exact language. They provide a morphems used to construct the resulted numeral.

Each word can have up to a 3 morphems (affixes) in addition to the root:

`1)` prefix: placed before the stem of a word
`2)` infix: inserted inside a word stem
OR
interfix: [linkage] placed in between two morphemes AND does NOT have a semantic meaning
`3)` postfix: (suffix OR ending) placed after the stem of a word

Word = [prefix]<root>[infix / interfix][postfix (suffix, ending)]

Each function returns root AND can optionally provide infix AND/OR postfix.

Do NOT consider, however, the returned values to be the root / the postfix etc in the exact linguistic meaning (as a morphemes gained from the correct AND proper morphological analysis). Consider them to be a "root" / a "postfix" specific to the current project.

`1) getZeroOrderNumberStr`

Returns numerals for numbers 0 - 9 (step 1) in the form of rootpostfix.

Examples: "th" + "ree" (3), "вос" + "емь" (8)

C++
```auto getZeroOrderNumberStr = [&](const size_t currDigit, const size_t order, const char*& postfix,
const LocaleSettings& localeSettings) throw() -> const char* {
static const char* const EN_TABLE[] = // roots
{"", "one", "tw", "th", "fo", "fi", "six", "seven", "eigh", "nine"};
static const char* const EN_POSTFIXES[] = // endings
{"", "", "o", "ree", "ur", "ve", "", "", "t", ""};
static const char* const RU_TABLE[] =
{"нол", "од", "дв", "тр", "четыр", "пят", "шест", "сем", "вос", "девят"};
static const char* const RU_POSTFIXES[] = // восЕМЬ восЬМИ восЕМЬЮ
// одИН одНОГО одНОМУ одНИМ; двА двУХ двУМ двУМЯ; трИ трЕМЯ; четырЕ четырЬМЯ четырЁХ
{"ь", "ин", "а", "и", "е", "ь", "ь", "ь", "емь", "ь"};
// НолЬ нолЯ нолЮ; пятЬ пятЬЮ пятЕРЫХ; шестЬ шестЬЮ шестИ; семЬ семИ семЬЮ; девятЬ девятЬЮ девятИ
static_assert(sizeof(EN_TABLE) == sizeof(RU_TABLE) && sizeof(EN_TABLE) == sizeof(EN_POSTFIXES) &&
sizeof(RU_TABLE) == sizeof(RU_POSTFIXES) &&
size_t(10U) == std::extent<decltype(EN_TABLE)>::value,
"Tables SHOULD have the same size (10)");
assert(currDigit < std::extent<decltype(EN_TABLE)>::value); // is valid digit?
switch (localeSettings.locale) {
case ELocale::L_EN_US: case ELocale::L_EN_GB:
postfix = EN_POSTFIXES[currDigit];
if (!currDigit) { // en.wikipedia.org/wiki/Names_for_the_number_0_in_English
// American English:
//  zero:       number by itself, decimals, percentages, phone numbers, some fixed expressions
//  o (letter): years, addresses, times and temperatures
//  nil:        sports scores
if (localeSettings.verySpecific) return "o"; // 'oh'
return localeSettings.locale == ELocale::L_EN_US ? "zero" : "nought";
}
return EN_TABLE[currDigit];
case ELocale::L_RU_RU:
postfix = "";
switch (order) {
case size_t(0U): // last digit ['двадцать две целых ноль десятых']
// Один | одНА целая ноль десятых | одна целая одНА десятая
if (!fractPartWillBeMentioned) break;
case size_t(3U): // тысяч[?]
switch (currDigit) {
case size_t(1U): postfix = "на"; break; // 'ста двадцать одНА тысяча'
case size_t(2U): postfix = "е"; break; // 'ста двадцать двЕ тысячи' []
}
break;
}
if (!*postfix) postfix = RU_POSTFIXES[currDigit]; // if NOT setted yet
return RU_TABLE[currDigit];
}
assert(false); // locale error
return "<locale error [" MAKE_STR_(__LINE__) "]>";
};```

`2) getFirstOrderNumberStr`

Returns numerals for numbers 10 - 19 (step 1AND 20 - 90 (step 10) in the form of root + infix + postfix.

Example: "дв" + "адцат" + "ь" (20)

C++
```auto getFirstOrderNumberStr = [&](const size_t currDigit, const size_t prevDigit,
const char*& infix, const char*& postfix,
const LocaleSettings& localeSettings) throw() -> const char* {
//// Sub. tables: 10 - 19 [1]; Main tables: 20 - 90 [10]

static const char* const EN_SUB_TABLE[] = {"ten", "eleven"}; // exceptions [NO infixes / postfixes]
static const char* const EN_SUB_INFIXES[] = // th+ir+teen; fo+ur+teen; fi+f+teen
{"", "", "", "ir", "ur", "f", "", "", "", ""};
#define ESP_ "teen" // EN_SUB_POSTFIX
static const char* const EN_SUB_POSTFIXES[] = // tw+elve ["a dozen"]; +teen ALL others
{"", "", "elve", ESP_, ESP_, ESP_, ESP_, ESP_, ESP_, ESP_}; // +teen of ALL above 2U (twelve)
static const char* const EN_MAIN_INFIXES[] = // tw+en+ty ["a score"]; th+ir+ty; fo+r+ty; fi+f+ty
{"", "", "en", "ir", "r", "f", "", "", "", ""}; // +ty ALL

#define R23I_ "дцат" // RU_20_30_INFIX [+ь]
#define RT1I_ "на" R23I_ // RU_TO_19_INFIX [на+дцат+ь]
static const char* const RU_SUB_INFIXES[] = // +ь; одиннадцатЬ одиннадцатИ одиннадцатЬЮ
// ДесятЬ десятИ десятЬЮ; од и надцат ь / тр и надцат ь; дв е надцат ь; вос ем надцат ь
{"", "ин" RT1I_, "е" RT1I_, "и" RT1I_, RT1I_, RT1I_, RT1I_, RT1I_, "ем" RT1I_, RT1I_};

// ДвадцатЬ двадцатЬЮ двадцатЫЙ двадцатОМУ двадцатИ; семьдесят BUT семидесяти!
#define R5T8I_ "ьдесят" // RU_50_TO_80_INFIX [NO postfix]
static const char* const RU_MAIN_INFIXES[] = // дв а дцат ь; тр и дцат ь; пят шест сем +ьдесят
{"", "", "а" R23I_, "и" R23I_, "", R5T8I_, R5T8I_, R5T8I_, "ем" R5T8I_, ""}; // вос ем +ьдесят
static const char* const RU_MAIN_POSTFIXES[] = // дв а дцат ь; тр и дцат ь; пят шест сем +ьдесят
{"", "", "ь", "ь", "", "", "", "", "", "о"}; // сорок; вос ем +ьдесят; девяност о девяност а

static_assert(sizeof(EN_SUB_INFIXES) == sizeof(EN_MAIN_INFIXES) &&
sizeof(EN_SUB_POSTFIXES) == sizeof(RU_MAIN_POSTFIXES) &&
sizeof(RU_SUB_INFIXES) == sizeof(RU_MAIN_INFIXES), "Tables SHOULD have the same size");
assert(prevDigit < std::extent<decltype(EN_SUB_POSTFIXES)>::value); // is valid digits?
assert(currDigit < std::extent<decltype(EN_SUB_POSTFIXES)>::value);
switch (localeSettings.locale) {
case ELocale::L_EN_US: case ELocale::L_EN_GB:
switch (prevDigit) {
case size_t(1U): // ten - nineteen
infix = EN_SUB_INFIXES[currDigit], postfix = EN_SUB_POSTFIXES[currDigit];
if (currDigit < size_t(2U)) return EN_SUB_TABLE[currDigit]; // exceptions
break;
default: // twenty - ninety
assert(!prevDigit && currDigit > size_t(1U));
infix = EN_MAIN_INFIXES[currDigit], postfix = "ty"; // +ty for ALL
break;
}
break;
case ELocale::L_RU_RU:
switch (prevDigit) {
case size_t(1U): // десять - девятнадцать
infix = RU_SUB_INFIXES[currDigit], postfix = "ь"; // +ь for ALL
if (!currDigit) return "десят";
break;
default: // двадцать - девяносто
assert(currDigit > size_t(1U));
infix = RU_MAIN_INFIXES[currDigit], postfix = RU_MAIN_POSTFIXES[currDigit];
switch (currDigit) {
case size_t(4U): return "сорок"; // сорокА
case size_t(9U): return "девяност"; // девяностО девяностЫХ девяностЫМ
}
break;
}
break;
default: assert(false); // locale error
return "<locale error [" MAKE_STR_(__LINE__) "]>";
} // END switch (locale)
const char* tempPtr;
return getZeroOrderNumberStr(currDigit, size_t(), tempPtr, localeSettings);
};```

`3) getSecondOrderNumberStr`

Returns numerals for numbers 100 - 900 (step 100) in the form of root + infix + postfix.

Examples: "fi" + "ve" + " hundred" (500), "дв" + "е" + "сти" (200)

C++
```// 100 - 900 [100]
auto getSecondOrderNumberStr = [&](const size_t currDigit, const char*& infix, const char*& postfix,
const LocaleSettings& localeSettings) throw() -> const char* {
static const char* const RU_POSTFIXES[] =
{"", "", "сти", "ста", "ста", "сот", "сот", "сот", "сот", "сот"};
static_assert(size_t(10U) == std::extent<decltype(RU_POSTFIXES)>::value,
"Table SHOULD have the size of 10");
assert(currDigit && currDigit < std::extent<decltype(RU_POSTFIXES)>::value);
switch (localeSettings.locale) {
case ELocale::L_EN_US: case ELocale::L_EN_GB:
postfix = " hundred";
return getZeroOrderNumberStr(currDigit, size_t(), infix, localeSettings);
case ELocale::L_RU_RU:
postfix = RU_POSTFIXES[currDigit];
switch (currDigit) {
case size_t(1U): infix = ""; return "сто"; break;
case size_t(2U): {
const char* temp;
infix = "е"; //ALWAYS 'е'
return getZeroOrderNumberStr(currDigit, size_t(), temp, localeSettings); // дв е сти
}
}
return getZeroOrderNumberStr(currDigit, size_t(), infix, localeSettings);
} // END switch (locale)
assert(false); // locale error
return "<locale error [" MAKE_STR_(__LINE__) "]>";
};```

`4) getOrderStr`: returns name of the large number based on its order

Uses short scale for the english language (both american AND british).

C++
```// Up to 10^99 [duotrigintillions]
auto getOrderStr = [](size_t order, const size_t preLastDigit, const size_t lastDigit,
const char*& postfix, const LocaleSettings& localeSettings)
throw() -> const char* {
// https://en.wikipedia.org/wiki/Names_of_large_numbers
static const char* const EN_TABLE[] = // uses short scale (U.S., part of Canada, modern British)
{"", "thousand", "million", "billion", "trillion", "quadrillion", "quintillion", "sextillion",
"septillion", "octillion", "nonillion", "decillion", "undecillion", "duodecillion" /*10^39*/,
"tredecillion", "quattuordecillion", "quindecillion", "sedecillion", "septendecillion",
"octodecillion", "novemdecillion ", "vigintillion", "unvigintillion", "duovigintillion",
"tresvigintillion", "quattuorvigintillion", "quinquavigintillion", "sesvigintillion",
"septemvigintillion", "octovigintillion", "novemvigintillion", "trigintillion" /*10^93*/,
"untrigintillion", "duotrigintillion"};
// https://ru.wikipedia.org/wiki/Именные_названия_степеней_тысячи
static const char* const RU_TABLE[] = // SS: short scale, LS: long scale
{"", "тысяч", "миллион", "миллиард" /*SS: биллион*/, "триллион" /*LS: биллион*/,
"квадриллион" /*LS: биллиард*/, "квинтиллион" /*LS: триллион*/,
"секстиллион" /*LS: триллиард*/, "септиллион" /*LS: квадриллион*/, "октиллион", "нониллион",
"дециллион", "ундециллион", "додециллион", "тредециллион", "кваттуордециллион" /*10^45*/,
"квиндециллион", "седециллион", "септдециллион", "октодециллион", "новемдециллион",
"вигинтиллион", "анвигинтиллион", "дуовигинтиллион", "тревигинтиллион", "кватторвигинтиллион",
"квинвигинтиллион", "сексвигинтиллион", "септемвигинтиллион", "октовигинтиллион" /*10^87*/,
"новемвигинтиллион", "тригинтиллион", "антригинтиллион", "дуотригинтиллион"}; // 10^99
static_assert(sizeof(EN_TABLE) == sizeof(RU_TABLE), "Tables SHOULD have the same size");
static const size_t MAX_ORDER_ =
(std::extent<decltype(EN_TABLE)>::value - size_t(1U)) * size_t(3U); // first empty

static const char* const RU_THOUSAND_POSTFIXES[] = // десять двадцать сто двести тысяч
// Одна тысячА | две три четыре тысячИ | пять шесть семь восемь девять тысяч
{"", "а", "и", "и", "и", "", "", "", "", ""};
static const char* const RU_MILLIONS_AND_BIGGER_POSTFIXES[] = // один миллион; два - четыре миллионА
// Пять шесть семь восемь девять миллионОВ [миллиардОВ триллионОВ etc]
// Десять двадцать сто двести миллионОВ миллиардОВ etc
{"ов", "", "а", "а", "а", "ов", "ов", "ов", "ов", "ов"};
static_assert(size_t(10U) == std::extent<decltype(RU_THOUSAND_POSTFIXES)>::value &&
size_t(10U) == std::extent<decltype(RU_MILLIONS_AND_BIGGER_POSTFIXES)>::value,
"Tables SHOULD have the size of 10");
switch (localeSettings.locale) {
case ELocale::L_EN_US: case ELocale::L_EN_GB:
postfix = "";
if (size_t(2U) == order) return "hundred"; // 0U: ones, 1U: tens
order /= 3U; // 0 - 1: empty, 3 - 5: thousands, 6 - 8: millions, 9 - 11: billions etc
assert(order < std::extent<decltype(EN_TABLE)>::value);
return EN_TABLE[order]; // [0, 33]
case ELocale::L_RU_RU:
assert(preLastDigit < size_t(10U) && lastDigit < size_t(10U));
if (size_t(3U) == order) { // determine actual postfix first
if (size_t(1U) != preLastDigit) {
postfix = RU_THOUSAND_POSTFIXES[lastDigit];
} else postfix = ""; // 'тринадцать тысяч'
} else if (order > size_t(3U)) { // != 3U
if (size_t(1U) == preLastDigit) { // десять одиннадцать+ миллионОВ миллиардОВ etc
postfix = "ов";
} else postfix = RU_MILLIONS_AND_BIGGER_POSTFIXES[lastDigit];
}
order /= 3U; // 6 - 8: миллионы, 9 - 11: миллиарды etc
assert(order < std::extent<decltype(RU_TABLE)>::value);
return RU_TABLE[order]; // [0, 33]
}
assert(false); // locale error
return "<locale error [" MAKE_STR_(__LINE__) "]>";
};```

`5) getFractionDelimiter`

Returns POD C str., which represents the fractional separator used in the selected language.

C++
```// 'intPartPreLastDigit' AND 'intPartLastDigit' CAN be negative (in case of NO int. part)
auto getFractionDelimiter = [](const ptrdiff_t intPartPreLastDigit, const ptrdiff_t intPartLastDigit,
const char*& postfix, const bool folded,
const LocaleSettings& localeSettings) throw() -> const char* {
assert(intPartPreLastDigit < ptrdiff_t(10) && intPartLastDigit < ptrdiff_t(10));
postfix = "";
switch (localeSettings.locale) {
case ELocale::L_EN_US: case ELocale::L_EN_GB: return "point"; // also 'decimal'
case ELocale::L_RU_RU: // "целые" НЕ употребляются в учебниках!
if (intPartLastDigit < ptrdiff_t() && localeSettings.shortFormat) return ""; // NO int. part
if (folded) postfix = "и";
return ptrdiff_t(1) == intPartLastDigit ?
(ptrdiff_t(1) == intPartPreLastDigit ? "целых" : "целая") : // одинадцать целЫХ | одна целАЯ
"целых"; // ноль, пять - девять целЫХ; две - четыре целЫХ; десять цел ых
}
assert(false); // locale error
return "<locale error [" MAKE_STR_(__LINE__) "]>";
};```

`6) getFoldedFractionEnding`

If the number had a fractional part with the repeated pattern, which was folded, this specific ending would be added to the end of the numerical string, to indicate pattern reoccurrence.

C++
```auto getFoldedFractionEnding = [](const LocaleSettings& localeSettings) throw() {
// Also possibly 'continuous', 'recurring'; 'reoccurring' (Australian)
switch (localeSettings.locale) {
case ELocale::L_EN_US: return "to infinity"; // also 'into infinity', 'to the infinitive'
case ELocale::L_EN_GB: return "repeating"; // also 'repeated'
case ELocale::L_RU_RU: return "в периоде";
}
assert(false); // locale error
return "<locale error [" MAKE_STR_(__LINE__) "]>";
};```

Generic processing lambdas

As i have already said, this one are language-independent AND used to process both `integral` AND `fractional` parts of the number (one per time).

`1) processDigitsPart`: main processing cycle

C++
```size_t intPartAddedCount, strLenWithoutFractPart;
// Strategy used to process both integral AND fractional parts of the number
// 'digitsPartSize' is a total part. len. in digits (i. e. 1 for 4, 3 for 123, 6 for 984532 etc)
//  [CAN be zero in some cases]
// 'partBonusOrder' will be 3 for 124e3, 9 for 1.2e10, 0 for 87654e0 etc
// 'fractPart' flag SHOULD be true if processing fraction part
auto processDigitsPart = [&](size_t digitsPartSize, const size_t digitsSubPartSize,
size_t partBonusOrder, const bool fractPart) {
currDigit = size_t(), prevDigit = size_t(); // reset
if (digitsPartSize) {
assert(digitsSubPartSize); // SHOULD be NOT zero
size_t currDigitsSubPartSize =
(digitsPartSize + partBonusOrder) % digitsSubPartSize; // 2 for 12561, 1 for 9 etc
if (!currDigitsSubPartSize) currDigitsSubPartSize = digitsSubPartSize; // if zero remanider
// Will be 2 for '12.34e4' ('1234e2' = '123 400' - two last unpresented zeroes); 1 for 1e1
auto subPartOrderExt = size_t(); // used ONLY for a last subpart

// OPTIMIZATION HINT: redesign to preallocate for the whole str., NOT for a diffirent parts?
if (ReserveBeforeAdding) // optimization [CAN acquire more / less space then really required]
str.reserve(str.length() + estimatePossibleLength(digitsPartSize, fractPart, localeSettings));
do {
if (currDigitsSubPartSize > digitsPartSize) { // if last AND unnormal [due to the '%']
subPartOrderExt = currDigitsSubPartSize - digitsPartSize;
partBonusOrder -= subPartOrderExt;
currDigitsSubPartSize = digitsPartSize; // correct
}
digitsPartSize -= currDigitsSubPartSize;
processDigitsSubPart(currDigitsSubPartSize, digitsSubPartSize,
digitsPartSize + partBonusOrder, subPartOrderExt, fractPart);
currDigitsSubPartSize = digitsSubPartSize; // set default [restore]
} while (digitsPartSize);
}
auto mentionZeroPart = [&]() {
if (!str.empty()) str += delimiter;
const char* postfix;
str += getZeroOrderNumberStr(size_t(), size_t(), postfix, localeSettings);
str += postfix;
};
if (!addedCount) { // NO part
if (!localeSettings.shortFormat || folded) { // NOT skip mention zero parts
if (fractPart) {
} else intPartLastDigit = ptrdiff_t(); // now. IS int. part
mentionZeroPart();
} else if (fractPart) { // short format AND now processing fraction part
assert(!folded); // NO fract. part - SHOULD NOT be folded
assert(strLenWithoutFractPart <= str.size()); // SHOULD NOT incr. len.
if (!intPartAddedCount) { // NO int. part [zero point zero -> zero] <EXCEPTION>
mentionZeroPart(); // do NOT incr. 'addedCount'!!
}
}
}
};```

This function takes a part of the number, for example, 1278 from 1278.45 AND process it by the subparts of the speicified size (currently 3, 2 OR 1). Considering `digitsSubPartSize` = 2, there will be two such subparts: 12 AND 78. Each such subpart is processed by the other generic processing lambda`processDigitsPart` (see below).

In fact, `processDigitsPart` performs a series of calls to the  `processDigitsPart` function, correctly separating the part on subparts, until the are no more subparts remains, also performing special action in the end, if there are nothing was actually added  (in order to correctly process numbers like 0.0 with the `shortFormat` flag turned ON AND some other specific cases).

This function also use `estimatePossibleLength` language-specific processing lambda (will be described later) AND `addFractionDelimiter` generic processing lambda (already mentioned, will be precisely described later).

`2) processDigitsSubPart`subprocessing cycle

Process subpart, received from the parent cycle (`processDigitsPart`). Both two this functions are closures, which actually aren't processing any real number, they are, of course, processing the `strBuf` char. array, which was previously filled by the `sprintf` function during stage 1 of the conversion (see 'Conversion stages description' section above).

C++
```auto addedCount = size_t(); // during processing curr. part
auto emptySubPartsCount = size_t();
// Part order is an order of the last digit of the part (zero for 654, 3 for 456 of the 456654 etc)
// Part (integral OR fractional) of the number is consists of the subparts of specified size
//  (usually 3 OR 1; for ENG.: 3 for int. part., 1 for fract. part)
// 'subPartOrderExt' SHOULD exists ONLY for a LAST subpart
auto processDigitsSubPart = [&](const size_t currDigitsSubPartSize,
const size_t normalDigitsSubPartSize,
const size_t order, size_t subPartOrderExt, const bool fractPart) {
assert(currDigitsSubPartSize && currDigitsSubPartSize <= size_t(3U));
auto currAddedCount = size_t(); // reset
auto emptySubPart = true; // true if ALL prev. digits of the subpart is zero
prevDigit = std::decay<decltype(prevDigit)>::type(); // reset
for (size_t subOrder = currDigitsSubPartSize - size_t(1U);;) {
if (DECIMAL_DELIM_ != *currSymbPtr) { // skip decimal delim.
currDigit = *currSymbPtr - '0'; // assuming ANSI ASCII
PPOCESS_DIGIT_:
assert(*currSymbPtr >= '0' && currDigit < size_t(10U));
emptySubPart &= !currDigit;
normalDigitsSubPartSize, fractPart);
if (subPartOrderExt) { // treat unpresented digits [special service]
--subPartOrderExt;
prevDigit = currDigit;
currDigit = std::decay<decltype(currDigit)>::type(); // remove ref. from type
goto PPOCESS_DIGIT_; // don't like 'goto'? take a nyan cat here: =^^=
}
if (!subOrder) { // zero order digit
++currSymbPtr; // shift to the symb. after the last in an int. part
break;
}
--subOrder, prevDigit = currDigit;
}
++currSymbPtr;
}
if (emptySubPart) ++emptySubPartsCount; // update stats
// Add order str. AFTER part (if exist)
const char* postfix;
auto const orderStr = getOrderStr(order, prevDigit, currDigit, postfix, localeSettings);
assert(orderStr && postfix);
if (*orderStr) { // if NOT empty (CAN be empty for zero order [EN, RU])
assert(str.size()); // NOT zero
str += delimiter, str += orderStr, str += postfix;
}
}
};```

This function calls `processDigitOfATriad` language-specific processing lambda for the each digit in the processed subpart.

As it is obvious of the name AND listing of a function, it usually used to process subparts of size = 3. Actually, it can process subparts of size 1, 2, OR (AND all those sizes are really required at some point).

When all digits of the subpart are processed, function appends order string (like "thousand") if it is needed. This event occurs only if we process subparts of at least `minDigitsSubPartSizeToAddOrder` size, which is setted by the call to a `getMinDigitsSubPartSizeToAddOrder` language-specific processing lambda (would be presented in the next section of an article).

`3) addFractionDelimiter`

A very simple function, used to correctly separate `integral` AND `fractional` parts of the number.

C++
```auto intPartPreLastDigit = ptrdiff_t(-1), intPartLastDigit = ptrdiff_t(-1); // NO part by default
const char* postfix;
auto const fractionDelim =
getFractionDelimiter(intPartPreLastDigit, intPartLastDigit, postfix, folded, localeSettings);
if (*fractionDelim) { // if NOT empty
if (!str.empty()) str += delimiter;
str += fractionDelim;
}
if (*postfix) {
if (*fractionDelim) str += delimiter;
str += postfix;
}
};```

Language-specific processing lambdas

Final pack of lambdas, used during the processing.

The following ones are used to configure the conversion strategy, based on the selected language.

`1) getMinDigitsSubPartSizeToAddOrder`

Returns the minimal subpart size, for which an order string (like "hundred" OR "thousand" for english) should be appended during the conversion.

For example, for english again, when processing 1256 by subparts of size = 2, we would append "hundred" after 12, while processing the same number by subparts of size = 1, we would append nothing.

C++
```auto getMinDigitsSubPartSizeToAddOrder = [](const LocaleSettings& localeSettings) throw() {
switch (localeSettings.locale) {
case ELocale::L_EN_US: case ELocale::L_EN_GB: return size_t(2U); // hundreds
case ELocale::L_RU_RU: return size_t(3U); // тысячи
}
assert(false); // locale error
return size_t();
};```

`2) getSpecificCaseSubPartSize`

Returns the subpart size, when there is some specific processing required. You can see the samples of such specific cases in the function's listing.

C++
```// Returns zero (NOT set, undefined) if NOT spec. case
auto getSpecificCaseSubPartSize = [](const long double& num,
const LocaleSettings& localeSettings) throw() {
switch (localeSettings.locale) {
/*
In American usage, four-digit numbers with non-zero hundreds
are often named using multiples of "hundred"
AND combined with tens AND/OR ones:
"One thousand one", "Eleven hundred three", "Twelve hundred twenty-five",
"Four thousand forty-two", or "Ninety-nine hundred ninety-nine"
*/
case ELocale::L_EN_US:
if (num < 10000.0L) {
bool zeroTensAndOnes;
const auto hundreds =
MathUtils::getDigitOfOrder(size_t(2U), static_cast<long long int>(num), zeroTensAndOnes);
if (hundreds && !zeroTensAndOnes) return size_t(2U); // if none-zero hundreds
}
break;
// In British usage, this style is common for multiples of 100 between 1,000 and 2,000
//  (e.g. 1,500 as "fifteen hundred") BUT NOT for higher numbers
case ELocale::L_EN_GB:
if (num >= 1000.0L && num < 2001.0L) {
// If ALL digits of order below 2U [0, 1] is zero
if (!(static_cast<size_t>(num) % size_t(100U))) return size_t(2U); // if is multiples of 100
}
break;
}
return size_t();
};```

`3) getIntSubPartSize`

Returns the subpart size, when processing an integral part of the number.

C++
```auto getIntSubPartSize = [&]() throw() {
auto subPartSize = size_t();
if (localeSettings.verySpecific)
subPartSize = getSpecificCaseSubPartSize(num, localeSettings); // CAN alter digits subpart size
if (!subPartSize) { // NOT set previously
switch (localeSettings.locale) { // triads by default
// For eng. numbers step = 1 can be ALSO used: 64.705 — 'six four point seven nought five'
case ELocale::L_EN_US: case ELocale::L_EN_GB: case ELocale::L_RU_RU: subPartSize = size_t(3U);
}
}
return subPartSize;
};```

`4) getFractSubPartSize`

Returns the subpart size, when processing a fractional part of the number.

C++
```auto getFractSubPartSize = [](const LocaleSettings& localeSettings) throw() {
switch (localeSettings.locale) {
case ELocale::L_EN_US: case ELocale::L_EN_GB:
// Step = 2 OR 3 can be ALSO used: 14.65 - 'one four point sixty-five'
return size_t(1U); // point one two seven
case ELocale::L_RU_RU: return size_t(3U); // сто двадцать семь сотых
}
assert(false); // locale error
return size_t();
};```

`5) estimatePossibleLength`

A heuristic function, used to predict the possible length of the string, that would represent the targeted number's part. It used to optionally preallocate memory for the provided storage, before an actual processing begins, in order to reduce an overall execution time (optimization).

C++
```// Currently there is NO specific handling for 'short format' AND 'very specific' options
auto estimatePossibleLength = [](const size_t digitsPartSize, const bool fractPart,
const LocaleSettings& localeSettings) throw() {
// If processing by the one digit per time; EN GB uses 'nought' instead of 'zero'
static const auto EN_US_AVG_CHAR_PER_DIGIT_NAME_ = size_t(4U); // 40 / 10 ['zero' - 'nine']
static size_t AVG_SYMB_PER_DIGIT_[ELocale::COUNT]; // for ALL langs; if processing by triads

struct ArrayIniter { // 'AVG_SYMB_PER_DIGIT_' initer
ArrayIniter() throw() {
//// All this value is a result of the statistical analysis
AVG_SYMB_PER_DIGIT_[ELocale::L_EN_GB] = size_t(10U); // 'one hundred and twenty two thousand'
AVG_SYMB_PER_DIGIT_[ELocale::L_EN_US] = size_t(9U);  // 'one hundred twenty two thousand'
AVG_SYMB_PER_DIGIT_[ELocale::L_RU_RU] = size_t(8U);  // 'сто двадцать две тысячи'
}
}; static const ArrayIniter INITER_; // static init. is a thread safe in C++11

static const auto RU_DELIM_LEN_ = size_t(5U); // "целых" / "целая"
// Frequent postfixes (up to trillions: 'десятитриллионных')
static const auto RU_MAX_FREQ_FRACT_POSTFIX_LEN_ = size_t(17U);

switch (localeSettings.locale) {
case ELocale::L_EN_US: case ELocale::L_EN_GB:
if (!fractPart) return AVG_SYMB_PER_DIGIT_[localeSettings.locale] * digitsPartSize;
// For the fract part [+1 for the spacer]
return (EN_US_AVG_CHAR_PER_DIGIT_NAME_ + size_t(1U)) * digitsPartSize;
case ELocale::L_RU_RU: // RU RU processes fract. part by the triads (like an int. part)
{
size_t len_ = AVG_SYMB_PER_DIGIT_[ELocale::L_RU_RU] * digitsPartSize;
if (fractPart && digitsPartSize) len_ += RU_DELIM_LEN_ + RU_MAX_FREQ_FRACT_POSTFIX_LEN_;
return len_;
}
}
assert(false); // locale error
return size_t();
};```

Next ones does some language-specific action.

`6) addFractionPrefix`

Used for a fractional part preprocessing.

For english language it adds leading zeroes, which could otherwise be missed, due to the format (scientific representation) of data in the basic char. array. Does nothing for the russian language.

C++
```auto addFractionPrefix = [&]() {
switch (localeSettings.locale) {
case ELocale::L_EN_US: case ELocale::L_EN_GB: // 'nought nought nought' for 1.0003
{
const char* postfix;
assert(str.size()); // NOT empty
str += delimiter;
str += getZeroOrderNumberStr(size_t(), leadingZeroIdx, postfix, localeSettings);
str += postfix;
}
return;
}
case ELocale::L_RU_RU: return; // NO specific prefix
}
assert(false); // locale error
};```

`7) addFractionEnding`

Used to do a fraction postprocessing.

For russian language it appends specific ending (like "десятимиллионная") based on the order (of magnitude) of a fractional part (AND on some other params, like a two last digits). Does nothing for the english language.

C++
```size_t currDigit, prevDigit;
// 'order' is an order of the last digit of a fractional part + 1 (1 based idx.)
//  [1 for the first, 2 for the second etc]
auto addFractionEnding = [&](const size_t orderExt) {
if (folded) { // add postifx for the folded fraction
auto const ending = getFoldedFractionEnding(localeSettings);
if (*ending) { // if NOT empty
str += delimiter;
str += ending;
}
return;
}
switch (localeSettings.locale) {
case ELocale::L_EN_US: case ELocale::L_EN_GB: break; // NO specific ending currently
case ELocale::L_RU_RU: {
assert(orderExt); // SHOULD NOT be zero
const size_t subOrder = orderExt % size_t(3U);
switch (subOrder) { // zero suborder - empty prefix
case size_t(1U): // ДЕСЯТ ая(ых) | ДЕСЯТ И тысячная(ых) ДЕСЯТ И миллиардная(ых)
toAdd = orderExt < size_t(3U) ? "десят" : "десяти"; break;
case size_t(2U): // СОТ ая(ых) | СТО тысячная(ых) СТО миллиардная(ых)
toAdd = orderExt < size_t(3U) ? "сот" : "сто"; break;
}
str += delimiter;
}
//// Add root (if NOT yet) + part of the postfix (if needed)
if (orderExt > size_t(2U)) { // from 'тысяч н ая ых'
const char* temp;
str += getOrderStr(orderExt, size_t(), size_t(), temp, localeSettings);
str += "н"; // 'десят И тысяч Н ая ых', 'сто тысяч Н ая ых'
}
assert(prevDigit < size_t(10U) && currDigit < size_t(10U));
if (size_t(1U) == prevDigit) { // одинадцать двенадцать девятнадцать сотЫХ десятитысячнЫХ
} else { // NOT 1U prev. digit
if (size_t(1U) == currDigit) {
toAdd = "ая"; // одна двадцать одна десятАЯ, тридцать одна стотысячнАЯ
} else toAdd = "ых"; // ноль десятых; двадцать две тридцать пять девяносто девять тясячнЫХ
}
}
break;
default: // locale NOT present
assert(false); // locale error
str += "<locale error [" MAKE_STR_(__LINE__) "]>";
}
};```

`8) processDigitOfATriad`

This is 1 of the 3 main processing functions (along with the `processDigitsPart` AND `processDigitsSubPart`). Used to process individual digits from the subpart of size up to 3 (a triad), so the `subOrder` is a digit index within the subpart, which can be [0, 2]: zero for 9 in 639, 2 for 6 in the same subpart`order` is an actual order of magnitude of the current digit (3 for 8 in 208417).

C++
```// Also for 'and' in EN GB
// ONLY up to 3 digits
const size_t normalDigitsSubPartSize, const bool fractPart) {
char delim_;
switch (localeSettings.locale) { // choose delim.
case ELocale::L_EN_US: case ELocale::L_EN_GB: delim_ = '-'; break; // 'thirty-four'
case ELocale::L_RU_RU: default: delim_ = delimiter; break; // 'тридцать четыре'
}
str += delim_;
};
auto addDelim = [&](const char delim) {
if (ELocale::L_EN_GB == localeSettings.locale) {
// In AMERICAN English, many students are taught NOT to use the word "and"
//  anywhere in the whole part of a number
str += delim;
str += ENG_GB_VERBAL_DELIMITER;
}
}
str += delim;
};
assert(subOrder < size_t(3U) && prevDigit < size_t(10U) && currDigit < size_t(10U));
const char* infix, *postfix;
switch (subOrder) {
case size_t(): // ones ('three' / 'три') AND numbers like 'ten' / 'twelve'
if (size_t(1U) == prevDigit) { // 'ten', 'twelve' etc
if (!str.empty()) addDelim(delimiter); // if needed
str += getFirstOrderNumberStr(currDigit, prevDigit, infix, postfix, localeSettings);
str += infix, str += postfix;
} else if (currDigit || size_t(1U) == normalDigitsSubPartSize) { // prev. digit is NOT 1
//// Simple digits like 'one'
if (prevDigit) { // NOT zero
assert(prevDigit > size_t(1U));
} else if (!str.empty()) addDelim(delimiter); // prev. digit IS zero
str += getZeroOrderNumberStr(currDigit, order, postfix, localeSettings);
str += postfix;
}
break;

case size_t(1U): // tens ['twenty' / 'двадцать']
if (currDigit > size_t(1U)) { // numbers like ten / twelve would be proceeded later
if (!str.empty()) addDelim(delimiter); // if needed
str += getFirstOrderNumberStr(currDigit, size_t(), infix, postfix, localeSettings);
str += infix, str += postfix;
} // if 'currDigit' is '1U' - skip (would be proceeded later)
break;

case size_t(2U): // hundred(s?)
if (!currDigit) break; // zero = empty
if (!str.empty()) str += delimiter; // if needed
switch (localeSettings.locale) {
case ELocale::L_EN_US: case ELocale::L_EN_GB: // 'three hundred'
str += getZeroOrderNumberStr(currDigit, order, postfix, localeSettings);
str += postfix;
str += delimiter;
{
const char* postfix_; // NO postfix expected, just a placeholder var.
str += getOrderStr(size_t(2U), size_t(0U), currDigit, postfix_, localeSettings);
assert(postfix_ && !*postfix_);
}
break;
case ELocale::L_RU_RU: // 'триста'
str += getSecondOrderNumberStr(currDigit, infix, postfix, localeSettings);
str += infix, str += postfix;
break;
}
break;
} // 'switch (subOrder)' END
};```

Tests

There are over 4k lines of tests (over 380 test cases) in the `ConvertionUtilsTests` module (see "TESTS" folder).

C++
```...

#include <iostream>
#include <string>

int main() {
std::string str;
ConvertionUtils::LocaleSettings localeSettings;
auto errMsg = "";
std::cout.precision(LDBL_DIG);

auto num = 6437268689.4272L;
localeSettings.locale = ConvertionUtils::ELocale::L_EN_US;
ConvertionUtils::numToNumFormatStr(num, str, localeSettings, &errMsg);
std::cout << num << " =>\n " << str << std::endl << std::endl;

num = 1200.25672567L;
str.clear();
localeSettings.locale = ConvertionUtils::ELocale::L_EN_GB;
localeSettings.foldFraction = true;
localeSettings.verySpecific = true;
ConvertionUtils::numToNumFormatStr(num, str, localeSettings, &errMsg);
std::cout << num << " =>\n " << str << std::endl << std::endl;

num = 1.0000300501L;
str.clear();
localeSettings.locale = ConvertionUtils::ELocale::L_RU_RU;
ConvertionUtils::numToNumFormatStr(num, str, localeSettings, &errMsg);
std::cout << num << " =>\n " << str << std::endl << std::endl;

num = 9432654671318.0e45L;
str.clear();
localeSettings.shortFormat = true;
localeSettings.locale = ConvertionUtils::ELocale::L_RU_RU;
ConvertionUtils::numToNumFormatStr(num, str, localeSettings, &errMsg);
std::cout << num << " =>\n " << str;

return 0;
}```

Result:

```6437268689.4272 =>
six billion four hundred thirty-seven million two hundred sixty-eight thousand six hundred eighty-nine point four two seven two

1200.25672567 =>
twelve hundred point two five six seven repeating

1.0000300501 =>
одна целая триста тысяч пятьсот одна десятимиллиардная

9.432654671318e+57 =>
девять октодециллионов четыреста тридцать два септдециллиона шестьсот пятьдесят четыре седециллиона шестьсот семьдесят один квиндециллион триста восемнадцать кваттуордециллионо```

Points of Interest

Developed strategy allows to extend module to support other languages, like `spanish`, for example: 0.333333333333 = "cero coma treinta y tres periodico".

The class is using FuncUtilsMathUtilsMacroUtils AND MemUtils modules.

This module [`ConvertionUtils`] is just a small part of the library, which uses C++11 features and which I am working under now, I decided to make it a `public` property.

If you saw ANY errors in the processing, please notify me here in the comments AND/OR on the `GitHub`.

History

Written By
Software Developer (Senior) https://www.simbirsoft.com
Russian Federation
C/C++ dеvеlopеr (MS VS/WinAPI)

 First Prev Next
 Thanks Member 125659985-Jun-16 6:37 Member 12565998 5-Jun-16 6:37
 Re: Thanks Shvetsov Evgeniy5-Jun-16 11:27 Shvetsov Evgeniy 5-Jun-16 11:27
 Good Charles18726-May-16 7:04 Charles187 26-May-16 7:04
 Perfect
 Re: Good skyformat99@gmail.com26-May-16 19:44 skyformat99@gmail.com 26-May-16 19:44
 Re: Good Shvetsov Evgeniy5-Jun-16 11:27 Shvetsov Evgeniy 5-Jun-16 11:27
 Last Visit: 31-Dec-99 18:00     Last Update: 19-May-24 22:46 Refresh 1