Click here to Skip to main content
15,909,737 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
C # grab audio files online url address.For example, I want to crawl google music in mp3 or wma format audio files's url address.How to achieve it? Thank you.
first,I want to know how to search songs in my programme;
second,when I get the search web page,how can I get the real address of the song.
thir,how to get the webpage's original source code correctly.
Thank you.
Posted
Updated 31-May-11 6:00am
v2

1 solution

For crawling, you need to fetch HTML files using the class System.Net.HttpWebRequest. You compile-time type should be System.Net.WebRequest though as the instances of derived classes are created by the factory method Create, depending on URI schema. See the code sample here: http://msdn.microsoft.com/en-us/library/system.net.webrequest.aspx[^]. You can fetch the resource using HTTP request method "GET".

The problem is knowing the URL. In many cases, the servers generate HTTP pages on the fly on some search request, for example. As the requests could be posted in many different ways, this is application-specific. In such cases, you will need to learn how to simulate such "POST" request for every Web application separately.

Now, you need to parse the HTML file and find your audio file references. If you are only interested only in auto files and not in any specific structure, it can be as simple as using regular expression. Your search criteria can include "http://" in the beginning and file extensions you're interested in at the end. See the class System.Text.RegularExpressions.Regex, http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.aspx[^]. Your Regex pattens should be easy enough to design. If you face a problem here, as a follow-up question.

These techniques will solve your problem only when the HTML statically presents the URLs of the audio records.

Using such method you can scrap only the URLs which are statically known in your HTML document. There are more cryptic scenarios where the user does not click at the anchored link with the static URL of the resource you need. Instead, some Javascript is called; it forms some HTTP request to the server which is sent using Ajax to get the file downloaded. In some cases, the user needs to answer some question to confirm the request is done not by a robot. Such scenarios are crackable in principle. At the same time, it is not theoretically possible to devise a universal algorithm which could analyze the Javascript algorithm involved to mimic the required user actions by your crawler. In some separate cases you will be able to crack such scenario when you use your own brain. The result cannot be guaranteed in all cases.

—SA
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900