Click here to Skip to main content
15,886,026 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I have a piece of RTF data that is being sent to my application shown below:

XML
{\rtf1\sstecf22000\ansi\deflang2057\ftnbj\uc1\deff0
{\fonttbl{\f0 \fnil \fcharset0 Microsoft Sans Serif;}{\f1 \fswiss Tahoma;}}
{\colortbl ;\red0\green0\blue0 ;\red255\green255\blue255 ;}
{\stylesheet{\f1\fs18 Normal;}{\cs1 Default Paragraph Font;}}
{\*\revtbl{Unknown;}{JOE BLOGS;}}
{\info{\doccomm TEST1 TEST1}}\paperw12240\paperh15840\margl1800\margr1800\margt1440\margb1440\headery720\footery720\nogrowautofit\deftab720\formshade\fet4\aendnotes\aftnnrlc\pgbrdrhead\pgbrdrfoot\revisions
\sectd\pgwsxn12240\pghsxn15840\guttersxn0\marglsxn1800\margrsxn1800\margtsxn1440\margbsxn1440\headery720\footery720\sbkpage\pgncont\pgndec
\plain\plain\f1\fs18\ql\plain\f1\fs18\plain\f0\fs17\lang2057\hich\f0\dbch\f0\loch\f0\fs17
\deleted\revauthdel1\revdttmdel1196190643 \{\\Rtf1\\Ansi\\Deff0\{\\Fonttbl\{\\F0\\Fnil\\Fcharset0 Microsoft Sans Serif;\}\}\par \\Viewkind4\\Uc1\\Pard\\Lang2057\\F0\\Fs17 Cup....\\Par\par \}\par\plain\f0\fs17\lang2057\hich\f0\dbch\f0\loch\f0\fs17
\revised\revauth1\revdttm1196190643 hello world \plain\f1\fs18\par
}


When i convert it to plain text there is still RTF data being displayed.
XML
\{\\Rtf1\\Ansi\\Deff0\{\\Fonttbl\{\\F0\\Fnil\\Fcharset0 Microsoft Sans Serif;\}\}\par \\Viewkind4\\Uc1\\Pard\\Lang2057\\F0\\Fs17 Cup....\\Par\par \}


So how would i detect and remove the necessary RTF data?:
XML
\{\\Rtf1\\Ansi\\Deff0\{\\Fonttbl\{\\F0\\Fnil\\Fcharset0 Microsoft Sans Serif;\}\}\par \\Viewkind4\\Uc1\\Pard\\Lang2057\\F0\\Fs17 Cup....\\Par\par \}


I tried to use rejex but that only detects everything in the RTF block that i have.
e.g
XML
({\\)(.+?)(})|(\\)(.+?)(\b)|}$


I want to remove only the unnecessary RTF data.

Here is the entire RTF block of data:

What I have tried:

I tried to use the following code to try and see if i can remove the unnecessary RTF data but i think having the string specified like this is wrong.

C#
string result = rtfString;
		const string toLookFor = "{\\Rtf1\\Ansi\\Deff0{\\Fonttbl{\\F0\\Fnil\\Fcharset0 Microsoft Sans Serif;}}\n\\Viewkind3\\Uc1\\Pard\\Lang2057\\F0\\Fs17 Cup....\\Par\n}\ntext 3";

        try
        {
            if (IsRichText(rtfString))
            {
               if(rtfString.contains(toLookFor))
			   {
					   rtfString = rtfString.replace(toLookFor, "");
			   }
            }
            else
            {
                result = rtfString;
            }
        }
        catch
        {
            throw;
        }

        return result;
Posted
Updated 9-Jan-17 23:27pm
v2

1 solution

With Windows you can use a RichText edit control to convert RTF to plain text (just create it in memory without displaying it and use the appropriate functions to set RTF and get text).

C# example: How to: Convert RTF to Plain Text (C# Programming Guide)[^].

With Linux you can use the unrtf(1) - Linux man page[^] tool or check it's source code.
 
Share this answer
 
Comments
Eagle32 13-Jan-17 8:28am    
Thanks for sharing that, my application does not have a UI so i will refer to link you shared.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900