Click here to Skip to main content
15,886,963 members
Please Sign up or sign in to vote.
1.00/5 (1 vote)
See more:
string[] words;
private void ExportPDFToExcel(string fileName)
{
StringBuilder text = new StringBuilder();
PdfReader pdfReader = new PdfReader(fileName);

for (int page = 1; page <= pdfReader.NumberOfPages; page++)
{
ITextExtractionStrategy strategy = new LocationTextExtractionStrategy();
string currentText = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy);
words = currentText.Split('\n');
for (int j = 0, len = words.Length; j < len; j++)
{
currentText = Encoding.UTF8.GetString(Encoding.UTF8.GetBytes(words[j]));
text.Append(currentText + Environment.NewLine);
pdfReader.Close();

}
FileStream fs1 = new FileStream("D:\\Yourfile.txt", FileMode.OpenOrCreate, FileAccess.Write);
StreamWriter writer = new StreamWriter(fs1);
writer.Write(text);
writer.Close();

StreamReader objReader = new StreamReader(@"D:\\Yourfile.txt");
string sLine = "";
ArrayList arrText = new ArrayList();
while (sLine != null)
{
sLine = objReader.ReadLine();
if (sLine != null)
arrText.Add(sLine);
}
callExcel(arrText, false);
}
}

C#
private void button1_Click(object sender, EventArgs e)
       {
           string file = Path.GetFullPath(@"C:\Users\karthi\Desktop\ast_sci_data_tables_sample.pdf");
           this.ExportPDFToExcel(file);
       }
Posted

1 solution

Sorry but PDF does not have a table format, no concept of cell/row/column/header/footer...

Most of the tables you see are made of block of text that are "printed" in the right position to look like cells in a table. The best you can do is extract the text & metadata (font, position, ...) of your particular PDF and use heuristics to recreate the table structure. This is NOT a generic solution and has to br reviewed for every table you want to extract.
 
Share this answer
 
Comments
Hari prakash R 24-Sep-15 10:39am    
K fine, Now i have extract using HTML code.....i got that extraction code..then how to convert that HTML code to excel file using c# in Windows application

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900