Click here to Skip to main content
15,887,214 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I am using selenium and scrapy to navigate to a table of data and I would like to extract the links/href to a csv file. so far everything i have tried doesn't seem to work and I'm unsure what to try or how to go about getting the links.

here's the important part of the table I am trying to extract the links/href from:

HTML
<tr class="even">

<td class="paddingColumnValue"> </td>

<td class="nameColumnValue"><a href="/m/app?service=external/sdata_details&sp=12812" class="sdata" title="Click here for additional details.">click</a></td>

<td class="amountColumnValue">$600,000.00</td>

<td class="myListColumnValue"><a href="" onclick="doMyListButton(this.firstChild.getAttribute('src'),this.name);myListHandler(this.name);return false;"  önmouseover="return true" name="12812"><img src="/m/images/add.gif" border="0" title="Click to add this to your list" name="A12812"></a></td>


</tr>


the closest I've gotten to actually getting data is with this code...(note table id = search_results)
import time
from scrapy.item import Item, Field
from selenium import webdriver
from scrapy.spider import BaseSpider
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.selector import HtmlXPathSelector

class ElyseAvenueItem(Item):
    link = Field()

class ElyseAvenueSpider(BaseSpider):
    name = "elyse"
    allowed_domains = ["domain.com"]
    start_urls = [
    'http://www.domain.com']
    
    def __init__(self):
        self.driver = webdriver.Firefox()

    def parse(self, response):
        self.driver.get(response.url)
        el1 = self.driver.find_element_by_xpath("//*[@id='headerRelatedLinks']/ul/li[5]/a")
        el1.click()
        time.sleep(2)
        el2 = self.driver.find_element_by_xpath("/html/body/form/table/tbody/tr[2]/td[2]/table/tbody/tr/td[3]/p[3]/a[1]")
        if el2:
            el2.click()
            time.sleep(2)
        el3 = self.driver.find_element_by_xpath("/html/body/form/table/tbody/tr[2]/td[2]/table[1]/tbody/tr/td[3]/a")
        if el3:
            el3.click()
            time.sleep(20)
            
        
            titles = self.driver.find_elements_by_class_name("sdata")
            items = []
            for titles in titles:
                item = ElyseAvenueItem()
                item ["link"] = titles.find_element_by_xpath("//*[@id='search_results']/tbody/tr[2]/td[2]/a")
                items.append(item)
                return item


output to csv:
selenium.webdriver.remote.webelement.webelement object="" at="" 0x03f16e90=""

thank you for the help. i can post more of my attempts and their output if that will help. Like I said, what i need is the href
Posted
Updated 25-Jul-13 10:29am
v2

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900