Click here to Skip to main content
15,886,791 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
Hello guys,

I have a problem with the following:

I testing Python with selenium web driver.

I need download data of different web page, the schema is the same in all pages, the difference is in the URL, since the last value is variable, it can be a number between 1 and 100. These URLs are found in the a directory within a text file.

So, is there any way to review all those URLs and extract the data from each of them?

NOTE: The web pages is dynamic and are updated every five minutes with js and json.

But I get the following:

C:\Users\JDani\AppData\Local\Programs\Python\Python37-32\python.exe C:/Users/JDani/.PyCharmCE2019.1/config/scratches/scratch_7.py 
1 None , None 2 None , None 3 None , None 4 None , None 5 None


Thanks in advance

What I have tried:

''' Try with following code '''
baseurl = requests.get('http://mi.dominio.net/Report?server=xxx.xxx.xxx.xx&PC=')
valid_url = '1,2,3,4,5'

for n in range(len(valid_url)):
    url = f'{baseurl}{valid_url[n]}'
    driver.get(url)

    print(url)          
    print(pages.title)

''' after save data in text file '''


This is what I have, could you help me or any suggest?

<pre lang="Python"><pre>from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
import os
from pandas import ExcelWriter
import pandas as pd

mipath = "C:/test"

desired_capabilities = DesiredCapabilities.CHROME
desired_capabilities["pageLoadStrategy"] = "none"
driver = webdriver.Chrome('/Users/JDan/Documents/Proyect/chromedriver/chromedriver.exe')
wait = WebDriverWait(driver, 20)

''' Try with following code '''
baseurl = requests.get('http://mi.dominio.net/Report?server=xxx.xxx.xxx.xx&PC=')
valid_url = '1,2,3,4,5'

for n in range(len(valid_url)):
    url = f'{baseurl}{valid_url[n]}'
    driver.get(url)

    print(url)          
    print(pages.title)

''' after save data in text file '''
try:
    wait.until(EC.presence_of_all_elements_located((By.CLASS_NAME, "CutterValue")))
    '''os.stat(mipath)'''
except TimeoutException:
    print('Nope')
else:
    '''os.mkdir(mipath)'''
    driver.execute_script("window.stop();")
    content = driver.find_elements_by_class_name("CutterValue")
    codes = [element.text for element in content]
    '''Save data in Text file'''
    file = open(mipath + "/mytext.txt", "a")
    file.write('\n' + str(codes))
    file.close()

    '''print(codes)'''
    driver.close()

'''Save data in Excel file'''

df = pd.DataFrame(codigos)
writer = ExcelWriter('./a.xlsx')

df.to_excel(writer,'Sheet1')
writer.save()
Posted
Updated 6-Jun-19 8:03am
v3

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900