Click here to Skip to main content
15,889,527 members
Please Sign up or sign in to vote.
4.33/5 (3 votes)
See more:
I am trying to read a integration symbols and some other using C#. I have a .doc file . i am reading like ∫,∑. I don't have it in C# variable. I am developing a Question Paper Generation of Chemistry.Then what do i do? i will have to save this value into sql server database also.Code is below-

public void ReadMsWord()
{
   // variable to store file path
   string filePath = null;
   // open dialog box to select file 
   OpenFileDialog file = new OpenFileDialog();
   // dilog box title name
   file.Title = "Word File";
   // set initial directory of computer system
   file.InitialDirectory = @"C:\Users\shri\Desktop\EXTRA\question\TnA\demo_swarit.doc";
   // set restore directory 
   file.RestoreDirectory = true;

   // execute if block when dialog result box click ok button
   if (file.ShowDialog() == DialogResult.OK)
   {
       // store selected file path
       filePath = file.FileName.ToString();
   }

   try
   {
      // create word application
     Microsoft.Office.Interop.Word.Application word = new Microsoft.Office.Interop.Word.ApplicationClass();
     // create object of missing value
     object miss = System.Reflection.Missing.Value;
     // create object of selected file path 
     object path = filePath;
     // set file path mode 
     object readOnly = false;
     // open document                 
     Microsoft.Office.Interop.Word.Document docs = word.Documents.Open(ref path, ref miss, ref readOnly, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss);

     // select whole data from active window document
     docs.ActiveWindow.Selection.WholeStory();

     // handover the data to cllipboard 
     docs.ActiveWindow.Selection.Copy();

     // clipboard create reference of idataobject interface which transfer the data
     IDataObject data = Clipboard.GetDataObject();

     //set data into richtextbox control in text format
     richTextBox1.Text = data.GetData(DataFormats.Text).ToString();

     // read bitmap image from clipboard with help of iddataobject interface
     Image img = (Image)data.GetData(DataFormats.Bitmap);

     // close the document 
     docs.Close(ref miss, ref miss, ref miss);

    }
    catch (Exception ex) { MessageBox.Show(ex.ToString()); }
}
Posted
Updated 13-Dec-12 7:07am
v5
Comments
Casey Sheridan 12-Dec-12 20:52pm    
Can you show the code you're using to read the .doc and what is being returned when you run across a subscript?

Hi,

You need to encode your input character to read it correctly. you can convert it into UTF-8 to get it as it should be.

you can use,
C#
var result = Encoding.UTF8.GetString("your byte array")

hope this may work for you.
 
Share this answer
 
v2
You can get an Unicode version of text using :
C#
data.GetData(DataFormats.UnicodeText).ToString()

But if you used different font in your Word document (like Symbols) you cannot get the right string.

May be, in this case, you can try to get the Rtf data using DataFormats.Rtf. And I think it should be easy to put Rtf data into the RichTextBox.
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900