Click here to Skip to main content
15,886,519 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
Hi people. I am beginner in R Language.
I have the following problem: from many PDF files containing technical reports (in Portuguese language) from many authors (all is in Natural Language) how can I develop an Intelligent System to identify the Author(s) Name(s) by an input of small set of Keywords that are nearly matched with their works done?

For example, I know that to read and start to process this text in R I can use the following line codes: (where yyyyyyyyyyyyyy is the URL or the drive path where is my PDF file, for ex. XXX.pdf)

install.packages("pdftools")
library(pdftools)
download.file("yyyyyyyyyyyyyy/XXX.pdf", "./XXX.pdf")
text <- pdt_text("./XXX.pdf")

I know that I will need to make a NLP (Natural Language Processing) from here, but how is the best way to do this? Will I need use ontology?
After this, after structured this text processing how can I develop an Intelligent System to identify the Author(s) Name(s) by an input of small set of Keywords that are nearly matched with their works done?

Thanks for any help

What I have tried:

I tried read the text in Natural Language inside a PDF report and it looks ok, but after this I don't know how to proceed.
Posted
Comments
[no name] 5-Mar-19 13:32pm    
If the PDF in encrypted, none of this does you any good. And, R, in this case, looks like a sledge hammer to kill a flea. You haven't even figured out "what" identifies an "author". Once you do that, a simple "text reader" will probably do. NLP?!

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900