Hello guys,
I have a problem with the following:
I testing Python with selenium web driver.
I need download data of different web page, the schema is the same in all pages, the difference is in the URL, since the last value is variable, it can be a number between 1 and 100. These URLs are found in the a directory within a text file.
So, is there any way to review all those URLs and extract the data from each of them?
NOTE: The web pages is dynamic and are updated every five minutes with js and json.
But I get the following:
C:\Users\JDani\AppData\Local\Programs\Python\Python37-32\python.exe C:/Users/JDani/.PyCharmCE2019.1/config/scratches/scratch_7.py
1 None , None 2 None , None 3 None , None 4 None , None 5 None
Thanks in advance
What I have tried:
''' Try with following code '''
baseurl = requests.get('http://mi.dominio.net/Report?server=xxx.xxx.xxx.xx&PC=')
valid_url = '1,2,3,4,5'
for n in range(len(valid_url)):
url = f'{baseurl}{valid_url[n]}'
driver.get(url)
print(url)
print(pages.title)
''' after save data in text file '''
This is what I have, could you help me or any suggest?
<pre lang="Python"><pre>from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
import os
from pandas import ExcelWriter
import pandas as pd
mipath = "C:/test"
desired_capabilities = DesiredCapabilities.CHROME
desired_capabilities["pageLoadStrategy"] = "none"
driver = webdriver.Chrome('/Users/JDan/Documents/Proyect/chromedriver/chromedriver.exe')
wait = WebDriverWait(driver, 20)
''' Try with following code '''
baseurl = requests.get('http://mi.dominio.net/Report?server=xxx.xxx.xxx.xx&PC=')
valid_url = '1,2,3,4,5'
for n in range(len(valid_url)):
url = f'{baseurl}{valid_url[n]}'
driver.get(url)
print(url)
print(pages.title)
''' after save data in text file '''
try:
wait.until(EC.presence_of_all_elements_located((By.CLASS_NAME, "CutterValue")))
'''os.stat(mipath)'''
except TimeoutException:
print('Nope')
else:
'''os.mkdir(mipath)'''
driver.execute_script("window.stop();")
content = driver.find_elements_by_class_name("CutterValue")
codes = [element.text for element in content]
'''Save data in Text file'''
file = open(mipath + "/mytext.txt", "a")
file.write('\n' + str(codes))
file.close()
'''print(codes)'''
driver.close()
'''Save data in Excel file'''
df = pd.DataFrame(codigos)
writer = ExcelWriter('./a.xlsx')
df.to_excel(writer,'Sheet1')
writer.save()