Interesting problem...
This looks like a word-based document "compression" process.
But I'm a little confused as to
why you would replace all of the info in
Vocabulary
from the word strings to the corresponding number. That is the
only information of the mapping between the words and the numbers. Without it, you will have
no way to reverse the mapping and recreate the original string.
So, essentially, the output can be almost arbitrary since there's no way to reconstruct anything useful!
If it is really required to replace the vocabulary values then:
Vocabulary = Enumerable.Range(0, Vocabulary.Count).Select(n => n.ToString()).ToList();
For doing the replacements in
document
, I'd probably use a
Dictionary<string, int>
to hold the word-to-number mapping instead of needing to scan the
Vocabulary
List
at every word.
Another option would be to iterate through the
Vocabulary
list, and apply a
Regex.Replace
across the whole
document
for each word with the corresponding number. This will almost certainly misbehave if
document
can contain numbers that are the same as any of the word replacement values. Also, this is
O(N²) on the length of the
document
.