Click here to Skip to main content
15,887,596 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Hi.
I need a program/algorithm wich will search for a specified phrase and save urls of sites containing them in text file.

I know C# and a little bit of Java (still learning it).
Maybe you can tell me what i need to do.

What I have tried:

I tried some pre-made programs but it didn't work. I also tried making something similar to searching in document, but around web. And this failed too. I’ve got few ideas but don’t know how to write them as a code.
Posted
Updated 27-Mar-16 6:32am
Comments
Sergey Alexandrovich Kryukov 27-Mar-16 12:19pm    
You did not try to show what you have tried. Just mentioning is not helpful. We have no idea what we can help with; you did not share any particular difficulties, nothing.
—SA

The easiest way (and for a beginner pretty much the only) would be to use a WebBrowser control[^] and get it to ask Google: Then scrape the results for the URLs and store them.
But even that's going to be fairly messy. Some of these may help: site scraping c#[^]
 
Share this answer
 
You're asking how to write a search engine? That is a massive complicated task spanning numerous complicated technologies and theories and will constitute hundreds of thousands of lines of code. This is not a trivial task that only requires a few lines of code.

Learn how to write a web crawler and learn how to use technologies like Lucene which you can use for searching.
 
Share this answer
 
First of all, please see my comment to the question.

Now, here are the components you really need. First, you need to be able to download each page you want to search in. This can be done using the class HttpWebRequest:
HttpWebRequest Class (System.Net)[^]

For the starting point, you can look at the source code of my application I shared in full here: how to download a file from internet[^].

This application is very small (only one code file) and clear, so it's not hard to see how it works. See also my other past answers:
FTP: Download Files[^],
how to download a file from the server in Asp.net 2.0[^],
get specific data from web page[^],
Performing a some kind of Web Request and getting result[^],
How to get particular data from a url using c#[^].

Now, when you have the content from the Web, it is, typically, HTML data, which you would need to parse, to perform your search, and, importantly, to find out other URLs for your search. I would recommend HTML Agility Pack, an open-source product under the Microsoft Public License: Html Agility Pack — Home[^].

Anyway, review this list of parsers: Comparison of HTML parsers — Wikipedia, the free encyclopedia[^].

Generally, what you need is close to Web scraping: Web scraping — Wikipedia, the free encyclopedia[^].

—SA
 
Share this answer
 
v2

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900