C # grab audio files online url address

Question

0.00/5 (No votes)

See more:

C # grab audio files online url address.For example, I want to crawl google music in mp3 or wma format audio files's url address.How to achieve it? Thank you.
first,I want to know how to search songs in my programme;
second,when I get the search web page,how can I get the real address of the song.
thir,how to get the webpage's original source code correctly.
Thank you.

Posted 31-May-11 5:41am

pucx

Updated 31-May-11 6:00am

v2

Add a Solution

1 solution

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Sergey Alexandrovich Kryukov · Answer 1 · 2011-05-31T08:27:00

For crawling, you need to fetch HTML files using the class System.Net.HttpWebRequest. You compile-time type should be System.Net.WebRequest though as the instances of derived classes are created by the factory method Create, depending on URI schema. See the code sample here: http://msdn.microsoft.com/en-us/library/system.net.webrequest.aspx[^]. You can fetch the resource using HTTP request method "GET".

The problem is knowing the URL. In many cases, the servers generate HTTP pages on the fly on some search request, for example. As the requests could be posted in many different ways, this is application-specific. In such cases, you will need to learn how to simulate such "POST" request for every Web application separately.

Now, you need to parse the HTML file and find your audio file references. If you are only interested only in auto files and not in any specific structure, it can be as simple as using regular expression. Your search criteria can include "http://" in the beginning and file extensions you're interested in at the end. See the class System.Text.RegularExpressions.Regex, http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.aspx[^]. Your Regex pattens should be easy enough to design. If you face a problem here, as a follow-up question.

These techniques will solve your problem only when the HTML statically presents the URLs of the audio records.

Using such method you can scrap only the URLs which are statically known in your HTML document. There are more cryptic scenarios where the user does not click at the anchored link with the static URL of the resource you need. Instead, some Javascript is called; it forms some HTTP request to the server which is sent using Ajax to get the file downloaded. In some cases, the user needs to answer some question to confirm the request is done not by a robot. Such scenarios are crackable in principle. At the same time, it is not theoretically possible to devise a universal algorithm which could analyze the Javascript algorithm involved to mimic the required user actions by your crawler. In some separate cases you will be able to crack such scenario when you use your own brain. The result cannot be guaranteed in all cases.

—SA