The .NET framework classes on their own do not contain the ability to OCR PDF files and convert them to Word, so you will need an SDK or library to do that. One option is described in
this CodeProject article we posted a while back.
Since that article was published, the LEADTOOLS SDK has been significantly improved in different ways, but the .NET code is still simple to write and understand, and looks like this:
IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.LEAD, false);
RasterCodecs rasterCodecs = new RasterCodecs();
rasterCodecs.Options.Load.AllPages = true;
ocrEngine.Startup(rasterCodecs, null, null, null);
string fileName = @"inputFile.pdf";
IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument();
CodecsImageInfo fileinfo = rasterCodecs.GetInformation(fileName, true);
for (int pagenumber = 1; pagenumber <= fileinfo.TotalPages; pagenumber++)
{
ocrDocument.Pages.AddPage(rasterCodecs.Load(fileName, 0, CodecsLoadByteOrder.Bgr, pagenumber, pagenumber), null);
}
ocrDocument.Pages.Recognize(null);
ocrDocument.Save(fileName + ".docx", DocumentFormat.Docx, null);
If you would like to try it, you can download the free evaluation of the main LEADTOOLS setup from
this page.