This is what actually should happen to non-Unicode text. It's very likely that it's one of the obsolete Cyrillic-specific encodings. It could be Windows 1251 or KOI8-R:
Windows-1251 — Wikipedia, the free encyclopedia[
^],
KOI8-R — Wikipedia, the free encyclopedia[
^].
Too bad, there are a lot more encodings people used for Cyrillic in the past.
One of the simple ways to find it out is to read a file in that obsolete encoding and write a new file in UTF (better be UTF-8). You can open the file with a
StreamReader
using this constructor:
StreamReader Constructor (String, Encoding) (System.IO)[
^];
see also:
Encoding Class (System.Text)[
^].
[EDIT]
After I already wrote my answer with links, I remembered that you did not indicate your platform and language. I provided the answer for .NET (or any other CLR implementation). If you really need something else, please clarify.
[END EDIT]
You can make a guess and construct one of supported encodings by the "code page":
Encoding Constructor (Int32) (System.Text)[
^].
You can refer to the table of code pages to find what you need:
Appendix H Code Pages[
^].
But how to quickly find out what encoding do you have? If I cannot see it immediately, I use the following trick: rename the file as *.html to open it in some browser. Modern browsers have the "text-encoding" feature with "auto-detect" by language feature. You can quickly try out different encodings, and auto-detect is very likely to show correct result right away.
In worst case, you find out what is the encoding but it is not supported by default, you can use the reference on this encoding and write you own transcoding from those codes to Unicode code points. All you need to do is to create a transcoding table and use it.
And of course, you should not write non-Unicode software any longer, even if you only use American English, except some rare cases. Stay out of trouble.
—SA