Click here to Skip to main content
15,867,308 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
I have not found any python based solution which I think is kinda odd given that it is broadly used for webscraping on a huge scale.

So far all solutions are based on a headless browser which gives terrible performance or are not under development any longer.

What I have tried:

I´ve tried html_requests which is buggy, didn´t work and also uses headless chromium.

My current approach is to use PyQt5 but the data I want to scrape is in a login area and I don´t know how to pass a session from requests to pyqt5 webengine.
class Client(QWebEnginePage):
    def __init__(self,url):
        global app
        self.app = QApplication(sys.argv)
        QWebEnginePage.__init__(self)
        self.html = ""
        self.loadFinished.connect(self.on_load_finished)
        self.load(QUrl(url))
        self.app.exec_()

    def on_load_finished(self):
        self.html = self.toHtml(self.Callable)
        print("Load Finished")

    def Callable(self,data):
        self.html = data
        self.app.quit()

url = "https://www.highcharts.com/demo/line-basic"
client_response = Client(url)
print(client_response.html)
Posted
Comments
Afzaal Ahmad Zeeshan 2-Oct-22 10:58am    
Maybe, you can fill in the username/password (if you can execute JavaScript) and create a new session there. That way you do not need to pass the session around with each execution.

But that might be considered spam by the server.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900