Click here to Skip to main content
15,887,333 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Hello.

I am trying to paste some text form the clipboard to a specific field that is in some application. The application is on Windows 7, the Language for non-Unicode programs is set to that Cyrillic language.

I have copied Cyrillic characters to the clipboard, something like "Петър". When I paste them the text field is displayed like this: "??????"

Then I test to copy some inputted via keyboard characters in that field copy via mouse or keyboard and there is a success.

Can it be that the application is made with ASCII comparability?

What I have tried:

I have copied Cyrillic characters to the clipboard, something like "Петър". When I paste them the text field is displayed like this: "??????"


Then I test to copy some inputted via keyboard characters in that field copy via mouse or keyboard and there is a success.
Posted
Comments
Sergey Alexandrovich Kryukov 5-Feb-16 14:01pm    
Language? Platform?
—SA
Zhivko Kabaivanov 8-Feb-16 4:25am    
The language is C#

1 solution

This is what actually should happen to non-Unicode text. It's very likely that it's one of the obsolete Cyrillic-specific encodings. It could be Windows 1251 or KOI8-R:
Windows-1251 — Wikipedia, the free encyclopedia[^],
KOI8-R — Wikipedia, the free encyclopedia[^].

Too bad, there are a lot more encodings people used for Cyrillic in the past.

One of the simple ways to find it out is to read a file in that obsolete encoding and write a new file in UTF (better be UTF-8). You can open the file with a StreamReader using this constructor: StreamReader Constructor (String, Encoding) (System.IO)[^];
see also: Encoding Class (System.Text)[^].

[EDIT]

After I already wrote my answer with links, I remembered that you did not indicate your platform and language. I provided the answer for .NET (or any other CLR implementation). If you really need something else, please clarify.

[END EDIT]

You can make a guess and construct one of supported encodings by the "code page": Encoding Constructor (Int32) (System.Text)[^].

You can refer to the table of code pages to find what you need: Appendix H Code Pages[^].

But how to quickly find out what encoding do you have? If I cannot see it immediately, I use the following trick: rename the file as *.html to open it in some browser. Modern browsers have the "text-encoding" feature with "auto-detect" by language feature. You can quickly try out different encodings, and auto-detect is very likely to show correct result right away.

In worst case, you find out what is the encoding but it is not supported by default, you can use the reference on this encoding and write you own transcoding from those codes to Unicode code points. All you need to do is to create a transcoding table and use it.

And of course, you should not write non-Unicode software any longer, even if you only use American English, except some rare cases. Stay out of trouble.

—SA
 
Share this answer
 
v8
Comments
Sascha Lefèvre 5-Feb-16 14:22pm    
+5UDE might be worth mentioning: https://github.com/errepi/ude
Sergey Alexandrovich Kryukov 5-Feb-16 14:23pm    
Thank you, Sascha.
This link worth a separate answer.
—SA
Zhivko Kabaivanov 8-Feb-16 4:35am    
Okay, let me explain a little more. From my C# desktop application first I put the Cyrillic value to clipboard with Clipboard.SetText() and using the user32.dll I input the value via simulation keypress of Ctrl and V. When it is pasted via the simulation keypress in the specific application that displays my Cyrillic string value with ????,
the question is why is this the behavior of that specific application?
Sergey Alexandrovich Kryukov 8-Feb-16 10:21am    
This is the correct behavior of Unicode software: every time you encode some character to the encoding not supporting its code point, ASCII '?' is produced on output. Naturally, everything ends up with Unicode, but too late: the information on what it was originally is already lost.

If you have all the source code, you can easily find out this point with the debugger.

This is how it looks with user32 (or any other Windows API): there are always two functions, for example, SetText and SetTextW. To avoid problem, you should always use 'W' version. But 'W' versions can be mapped to "ANSI", that is, your code has only SetText, but actually SetTextW is called; it depends on language and technology.

—SA

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900