Click here to Skip to main content
15,867,453 members
Please Sign up or sign in to vote.
1.42/5 (4 votes)
I need regex that would validate the following forms of URL

http://www.site.com
https://www.site.com
http://site.com
https://site.com
http://domain.site.com
https://domain.site.com
http://www.domain.site.com
https://www.domain.site.com
site.com 
domain.site.com
http://www.site.com/path/to/dir/
https://www.site.com/path/to/dir/
http://site.com/path/to/dir/
https://site.com/path/to/dir/
http://domain.site.com/path/to/dir/
https://domain.site.com/path/to/dir/
http://www.domain.site.com/path/to/dir/
https://www.domain.domain.site.com/path/to/dir/
site.com/path/to/dir/
domain.site.com/path/to/dir/
http://www.site.com/path/to/file.html
https://www.site.com/path/to/file.html
http://site.com/path/to/file.html
https://site.com/path/to/file.html
http://domain.site.com/path/to/file.html
https://domain.site.com/path/to/file.html
http://www.domain.site.com/path/to/file.html
https://www.domain.domain.site.com/path/to/file.html
site.com/path/to/file.html
domain.site.com/path/to/file.html

And relative paths
./path/to/file.html
./path/to/dir/
./path/to/dir
path/to/file.html
path/to/dir/
path/to/dir

(ftp:// NOT permitted)

The file extension may be html, php, gif, jpg, png.

With my knowledge of regex this would take me a year to accomplish (if not longer). Took me more then an hour yesterday to do a regex for relative URL and that didn't turn out the way I want it to! I feel like I need to apologize for not knowing REGEX! :(

Just to note, it's not a problem if the URL doesn't really point anywhere, my main concern is the format. I just need the format to be in those (and only those) as the examples show (it's an exhaustive list). Only those is what I would be using and need, but if it validates a different format its ok (as long as its only http / https... ftp, ftps or anything else is NOT permitted).
Posted
Updated 7-Sep-21 4:50am
v6
Comments
Maarten Kools 15-Feb-14 16:41pm    
RegExr[^] has a community library, which also contains plenty of URL validation expressions, that should give you a quick start. From there you'd have to tweak the expressions a bit to get the result you want.
EZW 15-Feb-14 16:56pm    
lol that's actually where I'm at right now and trying to accomplish while I wait for suggestions (thanks so much for your comment though). I either break it or not have any new results at all.
Vedat Ozan Oner 15-Feb-14 16:57pm    
thank you for the link. it is great :)
Maarten Kools 15-Feb-14 17:20pm    
You're welcome, happy to help :)
Vedat Ozan Oner 15-Feb-14 17:07pm    
((http|https)://)?[a-zA-Z]\w*(\.\w+)+(/\w*(\.\w+)*)*(\?.+)* is good for URL :)

Try this:
(^(http[s]?://)?([w]{3}[.])?([a-z0-9]+[.])+com(((/[a-z0-9]+)*(/[a-z0-9]+/))*([a-z0-9]+[.](html|php|gif|png))?)$)|(^([.]/)?((([a-z0-9]+)/?)+|(([a-z0-9]+)/)+([a-z0-9]+[.](html|php|gif|png)))?$)
 
Share this answer
 
Comments
EZW 15-Feb-14 23:23pm    
With a little modification, I got it to work!!! Thank you very much sir!!!!!!!

((^(http[s]?:\/\/)?([w]{3}[.])?(([a-z0-9\.]+)+(com|php))(((\/[a-z0-9]+)*(\/[a-z0-9]+\/?))*([a-z0-9]+[.](html|php|gif|png|jpg))?)$)|((^([.]\/)?((([a-z0-9]+)\/?)+|(([a-z0-9]+)\/)+([a-z0-9]+[.](html|php|gif|png|jpg))))$))
Peter Leow 15-Feb-14 23:36pm    
I see that you added in the jpg extension and escape for php. I have tested,it works for site.com too. Accept this as answer?
EZW 15-Feb-14 23:41pm    
It does work for everything :D I believed an online regex tester which showed different results then my Apache... now I know not to trust them lol Accepted! Thanks a lot
You might try this (slightly shorter than solution #1):
PHP
^((https?:[/][/])?\w+[.])+com|((https?:[/][/])?\w+[.])+com[/]|[.][/])?\w+([/]\w+)*([/]|[.]html|[.]php|[.]gif|[.]jpg|[.]png)?)$

[EDIT1]
The correct pattern was
txt
^((https?:[/][/])?(\w+[.])+com|((https?:[/][/])?(\w+[.])+com[/]|[.][/])?\w+([/]\w+)*([/]|[.]html|[.]php|[.]gif|[.]jpg|[.]png)?)$
There was a mistake with parenthesis.
[/EDIT1]

This decomposes into (the <yyy> need to be replaced by the respective patterns):
txt
<valid>        = <prefix>|(<prefix>[/]|[.][/])?<path>
<prefix>       = (https?:[/][/])?<host>
<host>         = \w+([.]\w+)*[.]com
<path>         = \w+([/]\w+)*([/]|[.]html|[.]php|[.]gif|[.]jpg|[.]png)?

[EDIT2]
The query comes after the path or if the path is absent, after the prefix - no query allowed for parh without prefix.
[/EDIT2]

[EDIT3]
To manage complexity, split the patterns into separate variables and concat to the full pattern. This enables you to test parts of the full pattern.

E.g.
PHP
// query
$rx_qpart = '\\w+=[^&]*';
$rx_qhead = '[?]'.$rx_qpart;
$rx_qnext = '[&]'.$rx_qpart;
$rx_qtail = '('.$rx_qnext.')*';
$rx_query = '('.$rx_qhead.$rx_qtail.')?'; // *** to be used in the main pattern
// path
$rx_ppart = '\\w+';
$rx_phead = $rx_ppart;
$rx_pnext = '[/]'.$rx_ppart;
$rx_ptail = '('.$rx_pnext.')*';
$rx_pdend = '[/]';
$rx_pfend = '[.]html|[.]php|[.]gif|[.]jpg|[.]png';
$rx_pend  = '('.$rx_pdend.'|'.$rx_pfend.')?':
$rx_rpath = $rx_phead.$rx_ptail.$rx_pend;                     // *** to be used in the main pattern
$rx_qpath = $rx_phead.$rx_ptail.'('.$rx_pfend.')?'.$rx_query; // *** to be used in the main pattern
// host
$rx_hpart = '\\w+';
$rx_hhead = $rx_hpart;
$rx_hnext = '[.]'.$rx_hpart;
$rx_htail = '('.$rx_hnext.')*';
$rx_top   = '[.]com'; // I suggest to replace by $rx_top = $rx_hnext;
$rx_host  = $rx_hhead.$rx_htail.$rx_top; // *** to be used in the main pattern
// protocol
$rx_protocol = '(https?:[/][/])?'; // *** to be used in the main pattern
// prefix
$rx_prefix = $rx_protocol.$rx_host;
// **** full pattern ****
$rx_url = '^('.$rx_prefix.'[/]?';
          .'|'.$rx_prefix.'[/]'.$rx_qpath
          .'|'.$rx_prefix.$rx_query
          .'|'.$rx_rpath
          .'|'.'[.][/]'.$rx_rpath
          .')$';
Note: you must use single quotes to avoid further interpretation by the PHP interpreter of the enclosed special characters like &, etc.
[/EDIT3]


Cheers
Andi
 
Share this answer
 
v8
Comments
EZW 16-Feb-14 20:36pm    
That is smaller but I get unknown modifier '?' error... if I put '/' at either ends the error changes to unknown modifier ']'. Thanks for the help :D
EZW 11-Apr-14 2:07am    
This works now... though slightly modified since the parenthesis were mis-matched

(^((https?:[/][/])?\w+[.])+com|(((https?:[/][/])?\w+[.])+com[/]|[.][/])?\w+([/]\w+)*([/]|[.]html|[.]php|[.]gif|[.]jpg|[.]png)?)$

Now I got another problem (didn't foresee it then :/ ) I need the REGEX to allow a query string.
Andreas Gieriet 11-Apr-14 3:07am    
Any list of examples?
Adding a query string has to be done in the prefix, after the host. E.g. <query> = ([?]\w+=\w*(&\w+=\w*)*)?
See my updated solution above.
Cheers
Andi
PS: I've corrected my pattern. It had indeed a problem with parenthesis.
EZW 11-Apr-14 22:39pm    
I get:

Warning: preg_match(): Unknown modifier ']'

:(
Andreas Gieriet 12-Apr-14 10:40am    
Somehow I missed that you want it for PHP. I did wonder why it worked for me but not for you. My solution is for .Net (e.g. C#) and not for PHP. I assume it is similar, but might differ in details.
Cheers
Andi
You might try this simply:

C#
Uri.IsWellFormedUriString(YourURLString, UriKind.RelativeOrAbsolute)


See MSDN
 
Share this answer
 
Comments
CHill60 1-May-19 6:04am    
Except URI and URL are not the same thing - the latter is only a sub-set of the former
I did try to post an answer many times, but the preview kept making me think the posted text was being modified??
So then I just deleted them to try again, but then getting blocked, so this almost what I'm was trying to post...
^((https?:\/\/)?([a-z0-9]+\.?)*[a-z0-9]+\.(com|php)(\/([a-z0-9]+\/)+)?|(\.\/)?([a-z0-9]+\/)+[a-z0-9]*$|(\.\/)?([a-z0-9]+\/)+)([a-z0-9]+\.(html?|php|png|jpg|gif))?$


If its not displaying/working properly, then instead just go to... https://regex101.com/r/jOvcz0/1
So it can show the unedited expression, also with having notes at the bottom for me guessing what not to match.
If anybody does have advice for the best pre tags to use with regex, its to be very much appreciated.

Im not know much about .html or 'query strings', but if posting samples I'm certain the experts can help modify.
Its very unfortunate that getting a truthful preview was like 1000x harder than trying to answer some questions.
Now Im finally learned to not look at that last preview, after clicking the "Submit your solution".
 
Share this answer
 
Comments
Richard Deeming 7-Sep-21 11:01am    
How does that pattern differ from Peter Leow's solution (solution 1, posted February 2014)?

If you're going to post a new solution to such an old question, you need to make sure you're adding something to the discussion, and you need to clearly explain why your solution is better than the existing ones.
[no name] 7-Sep-21 12:18pm    
First and foremost, I never saw the date, nor say it was better.
How's it different?...

1: Forward slashes delimited for PHP.
2: No superfluous www matching, no redundant file-extension matching.
3: It wont match simple text-strings like "abcdefg" or empty-lines.
4: Blah, blah, no point. Im not one to criticize, just trying to help.

My apologies for trying to help another human being, it wont happen again.
Not on this site. I would delete it, but maybe it can help someone else.
Dont worry, this account will be deleted, soon as I'm done thanking someone.
Richard Deeming 7-Sep-21 12:31pm    
If you update your question to explain how your solution differs from the existing solutions, and why someone might chose yours over any of the others, then it could be a valid solution, regardless of the age of the question.

Just dumping another regex without explanation, particularly when accompanied by an off-topic rant about the functionality of the site, is not a good solution.

And responding to constructive criticism with a threat to "rage-quit" is not a good approach to life.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900