Click here to Skip to main content
15,879,326 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I am currently working on a translation project where one aspect is to convert a certain amount of Rtf data. I get the text out of the RTF into an array that I send through an AWS translator. The particular language is Bengali.
154 = Customer Code
LID = the Laguage Identifier for AWS
Text2Translate = the Rtf.Text Split on the linefeed.

foreach ( DataRow CTDV in calcTranslationsDV.Rows )
{ string LID = getLID(CTDV["Det_User_Field_29"].ToString());
  foreach ( DataRow FT in FText.ToTable().Rows )
  { RichTextBox FrText = new RichTextBox() {Rtf = FT["Free_Text"].ToString() };
    string[] Text2Translate = StripTabsandOtherText(FrText.Text.Split('\n'));
    XMLDocument Translation = AWSAPIWrapper.getTranslation(154, LID, Text2Translate);
                    
  foreach ( XmlElement xlate in Translation.DocumentElement )
  { string myString = FrText.Rtf;
    FrText.Rtf = myString.Replace(xlate["TextToTranslate"].InnerText, xlate["TranslatedText"].InnerText);
}


This code actually works really well and returns all the translations in the Translation XMLDocument. The Only problem is the the myString.Replace() places ????????????? in the Rtf for the resulting string value.

xlate["TranslatedText"].InnerText Contains মান অতিক্রম

What I have tried:

Before Replace FrText.Rtf Contains this top row of the word table in the Rtf field:

{\rtf1\ansi\ansicpg1252\deff0\deflang1033\deflangfe1033{\fonttbl{\f0\fnil\fcharset0 Calibri;}}
\viewkind4\uc1\trowd\trgaph108\trleft-180\trbrdrl\brdrs\brdrw10 \clbrdrb\brdrw15\brdrs\clbrdrr\brdrw15\brdrs \cellx354\clbrdrl\brdrw15\brdrs\clbrdrb\brdrw15\brdrs \cellx5609\pard\intbl\sl252\slmult1\f0\fs17 4\cell EXCEEDING STANDARDS\cell\row

After Replace FrText.Rtf Contains this top row of the word table in the Rtf field:
{\rtf1\ansi\ansicpg1252\deff0\deflang1033\deflangfe1033{\fonttbl{\f0\fnil\fcharset0 Calibri;}}
\viewkind4\uc1\trowd\trgaph108\trleft-180\trbrdrl\brdrs\brdrw10 \clbrdrb\brdrw15\brdrs\clbrdrr\brdrw15\brdrs \cellx354\clbrdrl\brdrw15\brdrs\clbrdrb\brdrw15\brdrs \cellx5609\pard\intbl\sl252\slmult1\f0\fs17 4\cell ??? ???????\cell\row

Note: for brevity I did not include the entirety of the Rtf nor of the first tdrow. I included enough to show the initial state and the result state after the Replace.

Further, the Replace statement works in the debug window but not in code. So, I attempted to add some font control on the Rtf object being used to no avail:
FrText.Font = new Font("Calibri", (float)8, FontStyle.Regular, (GraphicsUnit)3);
Posted
Updated 2-Jan-21 13:45pm

1 solution

Quote:
The Only problem is the the myString.Replace() places ????????????? in the Rtf for the resulting string value.
I would want to check if you are using the proper encoding for the characters "throughout" the application.

The question marks are more likely to appear if the characters cannot be mapped to a proper glyph—maybe the data is lost, maybe the data is unable to be printed (because the font-family simply doesn't understand those code points in Unicode), etc. What I would do is I would try and print the data on the console or output it to a file with Unicode (UTF-8, for example) and see if the data is printed properly on the file.

Try giving this article of mine a look, where I explore the Unicode reading/writing and how to work around some basic problems in .NET (C#): Reading and writing Unicode data in .NET[^]



Unless you translate question marks, which result in question marks being returned. :laugh:
 
Share this answer
 
Comments
Member 13735228 2-Jan-21 20:26pm    
Interesting Article. I understand unicode to a degree. I don't always understand where to best apply it as to affect outcome. Is this a property on the RichTextBox that I failed to set? Or, is it more a "culture" issue?
Afzaal Ahmad Zeeshan 2-Jan-21 20:32pm    
I am not that expert in the RichTextBox type, but I think if you can debug the process you will see where the characters are lost in the process.
Member 13735228 2-Jan-21 20:45pm    
Ok thanks! This does give me a little direction of where to look! Appreciate the help!
Member 13735228 2-Jan-21 20:46pm    
Further, there's the whole question of using RegEx.Replace vs. String.Replace.
Member 13735228 3-Jan-21 17:23pm    
The problem is actually the language identifier in the RTF. I'm having to take a different route at this time due to time constraints on this project. later, in the springtime I will probably be getting back to this.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900