How Do I Force Loading Of Dynamic Content (That Uses Lazy Loading) Without Scrolling Down

Question

0.00/5 (No votes)

See more:

Im trying to create a little script that will grab photos from a specific site ("ilike-photo.com") and save them in google-drive. but this site uses a thing called lazy-loading- more images are dynamically loaded when the user scrolls down. one way to work around this is using window.scroll() until no more content is loaded and only then grab the images, but this technique is slow and ugly because the user actually sees the page getting scrolled. is there a way to force dynamic content to be loaded but keep the scrollbar (and the page) on top?

all i can think of is some way of faking a scroll event somehow but i don't think it's possible...
may be there is a way to find the function that listens to the scroll event and manually call it?
the last option i can think of is using a server that will run an headless browser on it and will serve my request but the problem is that i don't have a server:)
any other suggestions?

Posted 13-Sep-14 22:07pm

yno1

Add a Solution

1 solution

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Sergey Alexandrovich Kryukov · Accepted Answer · 2014-09-14T03:26:00

Solution 1

The problem is not so simple; and I don't think it can be solved without looking at the site you are trying to scrape. I have no idea how exactly the lazy loading technique you described is implemented, but I'm sure it can be implemented is some different ways, and those differences would need difference scraping approaches. Only one aspect of the difference is important: in all cases, scrolling causes some additional HTTP requests, and the data related to the scrolling event (say, scrolling position, page, or something like that) can be passed in the HTTP request in different ways: HTTP parameters, URL parameters, etc.

So, you need to study this and act accordingly. How? Here is the approach I would use:

Use some existing HTTP spy software and then try to rich the full content manually, by loading the page and scrolling. Such HTTP spying tools are often available as plug-ins for Web browser. I, for example, use HttpFox, a plug-in for Mozilla browsers. If the tracking is turned on, it will list you all the HTTP requests and HTTP responses passed through the browser, with all the detail needed to understand how to do scraping.

—SA

Posted 14-Sep-14 3:26am

Sergey Alexandrovich Kryukov

Updated 14-Sep-14 5:42am

v2

Comments

yno1 14-Sep-14 16:39pm

thank you very much!
i will try this HttpFox and play with it and see what i get and i will post an update soon!

Sergey Alexandrovich Kryukov 14-Sep-14 19:07pm

You can do this research on particular site using any spy software, whichever is convenient for you.
This approach helped me in nearly all cases.

Please try it and consider accepting the answer formally (green "Accept" button).
—SA

yno1 16-Sep-14 11:08am

i'm was play around with httpFox and i was able to find the GET request that the browser is using to display the request the images, but how can i find who prepared the request?
it seems like the request is dynamically prepared...is there a tool to find out things like that?

Sergey Alexandrovich Kryukov 16-Sep-14 11:15am

I don't know such tools, but you may not need them...

Now, theoretically speaking, explicitly calculated HTTP request could be anything, even random and, hence, unpredictable in principle. In connection to that, I already explained in some of my previous answers, that not only the problem of "scrapping all the Web site" maybe unsolvable, it may not even make sense. Example: a game implemented on the server side...

—SA

yno1 16-Sep-14 11:46am

how does httpFox Intercepts the GET request? don't you see how absurd it is?? httpFox can give me exactly the string that i need, with all the images id's...if httpFox can do this, why cant i do it?

Sergey Alexandrovich Kryukov 16-Sep-14 16:01pm

Good point.

But I think... I wanted so say "coincidence", but it would be confusion expression... Here is what I mean: I don't know detail of this case, but it could be pretty simple.
I know more difficult cases, when HttpFox does not provide your the full content, not even close. In some cases, I even had to modify the URL based on the idea I got from looking at the packages.

So, that's right, if HttpFox gives you so much, you can do it, too. But it cannot be totally universal. You still can find out what you need to do by analyzing those HTTP requests/responses provided by HttpFox. I don't know all the detail of HttpFox operation: first, this is beyond of your question, second, I just did not go so far... I don't perceive it as a miracle; just the opposite; I would expect some more power from the tool...

—SA