I'm trying to scrape data into a CSV file from a website that lists contact information for people in my industry. My code works well until I get to a page where one of the entries doesn't have a specific item.
So for example:
I'm trying to collect
Name, Phone, Profile URL
If there isn't a phone number listed for a specific entry, there isn't even a tag for that entry with blank text, and my code errors out with
"IndexError: list index out of range"
The code I pasted below is what works for me across a few websites (with xpaths/urls changed obviously) as long as all fields exist on the page with relevant tags. But if one of the //div[contains(@class, "agent-phone")] tags isn't on one of the listings, it errors out.
from selenium import webdriver
driver = webdriver.Firefox()
MAX_PAGE_NUM = 23
MAX_PAGE_DIG = 2
with open('results.csv', 'w') as f:
f.write("Name, Number, URL \n")
for i in range(1, MAX_PAGE_NUM + 1):
page_num = (MAX_PAGE_DIG - len(str(i))) * "0" + str(i)
website = "https://www.website.com/area/pg-" + page_num
driver.get(website)
Name = driver.find_elements_by_xpath('//div[contains(@class, "agent-name")]/a')
Number = driver.find_elements_by_xpath('//div[contains(@class, "agent-phone")]')
URL = driver.find_elements_by_xpath('//div[contains(@class, "agent-name")]/a')
num_page_items = len(Name)
for i in range(num_page_items):
print(Name[i].text.replace(",", ".") + "," + Number[i].text + "," + URL[i].get_attribute('href') + "\n")
with open('results.csv', 'a') as f:
for i in range(num_page_items):
f.write(Name[i].text.replace(",", ".") + "," + Number[i].text + "," + URL[i].get_attribute('href') + "\n")
driver.close()
What I have tried:
I tried continue when I encountered an Index Error. The problem is, the error doesn't occur until the end of the page, which means the listing without a phone number just picks up a phone number from the next listing, making the phone numbers column out of order.
num_page_items = len(Name)
with open('results.csv', 'a') as f:
for i in range(num_page_items):
try:
f.write(Name[i].text.replace(",", ".") + "," + Number[i].text + "," + URL[i].get_attribute('href') + "\n")
print(Name[i].text.replace(",", ".") + "," + Number[i].text + "," + URL[i].get_attribute('href') + "\n")
except IndexError:
f.write("Nothing, Nothing, Nothing \n")
print("No element found")
continue
I'm trying to figure out how to check to see if all elements exist on each entry. If one or more is missing, either skip that entire entry or just put "Empty" for that cell in the CSV. I've tried various things with NoSuchElementException, but I just can't get anything to fire.
I'm fairly new to all this. Thanks in advance for any help.