extracting url in the code

Question

4.00/5 (1 vote)

See more:

Hi
I have following code that reads a webpage

C#

 using (Stream stream = request.GetResponse().GetResponseStream())
{
  StreamReader sr = new StreamReader(stream);
  htmlpage= sr.ReadToEnd();
  sr.Close();
}

once i get the webpage I am trying to get my website urls extracted to make sure they are correctly being forwarded.
My problem is when i get url out, some come out fine while some have extra code infront and end of the url for example

(Javascrip:xyw('http://www.mysite.com/xyzpage.html')
I am trying to get rid of anything infront of and at the end of url so i end up with
http://www.mysite.com/xyzpage.html
I tried following, which doesnt work at all

C#

string value = Regex.Match(str, @"\((\w+)\)").Groups[1].Value);

any idea how to write that regex as I am not good at it.

Posted 31-Jul-12 16:55pm

EricThe

Updated 31-Jul-12 17:05pm

JF2015

v2

Add a Solution

1 solution

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Christian Graus · Answer 1 · 2012-07-31T17:33:00

Solution 1

If you know that they must start with http://, why not make that part of your required match ? Your regex now is incredibly vague, it's just 'match everything between the quotes'

Posted 31-Jul-12 17:33pm

Christian Graus