Click here to Skip to main content
15,887,135 members
Articles / Web Development / HTML

Web crawling with C# (part one)

Rate me:
Please Sign up or sign in to vote.
4.97/5 (87 votes)
7 Apr 2016CPOL32 min read 217.5K   11.5K   143  
Designing a web crawler using C#
This is a starting point of ideas to assist coders getting started in web crawling. A lot of the concepts and ideas discussed in this article are geared towards a robust, large scale architecture. It looks at the best approach is to create a list or queue, that you push links onto for crawling, policies and rules like, what priority we need to give links and is this a link/website we want to go to? We also discuss the needs of being very careful and gentle when crawling servers, what can go wrong, and applicable languages, frameworks, and platforms.

Views

Daily Counts

Downloads

Weekly Counts

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Chief Technology Officer SocialVoice.AI
Ireland Ireland
Allen is CTO of SocialVoice (https://www.socialvoice.ai), where his company analyses video data at scale and gives Global Brands Knowledge, Insights and Actions never seen before! Allen is a chartered engineer, a Fellow of the British Computing Society, a Microsoft mvp and Regional Director, and C-Sharp Corner Community Adviser and MVP. His core technology interests are BigData, IoT and Machine Learning.

When not chained to his desk he can be found fixing broken things, playing music very badly or trying to shape things out of wood. He currently completing a PhD in AI and is also a ball throwing slave for his dogs.

Comments and Discussions