Click here to Skip to main content
15,886,075 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
im working on a project to extract all images, pdf, word documents from a web page and store it in a database.
so i need to convert web page to xhtml using Jtidy, and then convert that XHTML to XML using XSLT, then extract all those files using the XML.
i dont know how to convert XHTML to XML using XSLT and extract data from it.
somebody help me on it.
Posted

1 solution

NO. Your idea sounds interesting but does things you do not want and probably even don't need.

Read this: http://java.sun.com/developer/technicalArticles/ThirdParty/WebCrawler/[^]

and use another approach.
 
Share this answer
 
Comments
kalai91 15-Mar-12 6:11am    
thanks for ur solution. i read that article. but still im forced to do it in that way! wat shall i do??
TorstenH. 15-Mar-12 10:14am    
start here: XSL @w3schools

and also take a look here: Converting XHTML files to XSL and XML files @ ibm.com

an old fashion book will also be worth it. It's kind of like an additional display. Also is the search mechanism much better than anything online.
kalai91 19-Mar-12 6:48am    
Thank you...

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900