Click here to Skip to main content
15,881,139 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
my code:

Python
import os
import re

desktop = os.path.join(os.path.join(os.environ['USERPROFILE']), 'Desktop') 

pattern = re.compile(r'((http|https)\:\/\/)?[a-zA-Z0-9\.\/\?\:@\-_=#]+\.([a-zA-Z]){2,6}([a-zA-Z0-9\.\&\/\?\:@\-_=#])*')
urlsFile = open(desktop + "/random/in.txt", "r")
outFile = open(desktop + "/random/out.txt", "w")
urlsDict = {}

for linein in urlsFile.readlines():
    match = pattern.search(linein)
    url = match.groups()
    domain = url[3]
    urlsDict[domain] = linein

outFile.write("".join(urlsDict.values()))

urlsFile.close()
outFile.close()


when i run this code it gives me this error
Terminal
line 13, in <module>
    url = match.groups()
AttributeError: 'NoneType' object has no attribute 'groups'


What I have tried:

Python
url = match.string()


but it didn't work, I searched many sites but couldn't find any solution
please help
Posted
Updated 9-Aug-22 20:49pm

1 solution

If the match fails, search returns None: re — Regular expression operations — Python 3.10.6 documentation[^]
You don't check the return at all, so when there is no match your code - rightly - fails.
Try:
Python
match = pattern.search(linein)
if match:
    url = match.groups()
    domain = url[3]
    urlsDict[domain] = linein
else:
    ...
 
Share this answer
 
Comments
Jon-Nut 10-Aug-22 3:11am    
i think my pattern is not getting some urls with problem

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900