Click here to Skip to main content
15,867,906 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
Hi I am coding a web-crawler which will crawl the websites and selectively parse different sections of a web site.

I am a .Net developer so the choice was obvious that I did it in .Net but the speed was very slow which included downloading and parsing of HTMLPages

Then I tried to just download the contents first using .Net and then same domains using python but the python was very impressive in downloading data. I have achieved downloading using python but the later part is not that easy to code in python, which obviously i don't want to do.

The same batch of domain which took 100 seconds in Python
was taking 20 minutes in .Net based crawler

I tried http://www.regexhacks.com/ to download and in took 10 seconds in Python and same was taking 2 minutes in .Net crawler

Does anyone anyone have any idea why this is slow in .Net but fast in python?
Posted
Updated 11-Feb-11 19:41pm
v3
Comments
Sergey Alexandrovich Kryukov 12-Feb-11 1:18am    
It needs your codes to see. First of all, did you run the crawler and python on the same system? I mean, you can use python on server side (module WSGI, highly recommend), but you did not tag ASP.NET.
--SA
eqlit 12-Feb-11 1:25am    
Sorry I can not publish code some company policy u know :)
but
Yes I tried the downloading on same system, and the code did not include anything except HTTP Download and queue management, and I made a console application for the purpose

1 solution

Have you tried using IronPython to insert the python code into your .NET application. That should allow it to download the pages with the speed found in Python. In my opinion the speed in python is faster because Python downloads pages in the form of tuples of byte strings whereas .NET may be downloading the HTML code as it is.
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900