Click here to Skip to main content
15,887,683 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Is there a difference between those two bs4 objects?

Python
from urllib2 import urlopen, Request
from bs4 import BeautifulSoup

req1 = Request("https://stackoverflow.com/")  # HTTPS
html1 = urlopen(req1).read()

req2 = Request("http://stackoverflow.com/")  # HTTP
html2 = urlopen(req2).read()

bsObj1 = BeautifulSoup(html1, "html.parser")
bsObj2 = BeautifulSoup(html2, "html.parser")

Do you really need to specify an HTTP protocol?

What I have tried:

I tried comparing those two, didn't seem to work
Posted
Updated 15-Mar-18 9:49am
Comments
Richard MacCutchan 14-Mar-18 9:56am    
"didn't seem to work"
What exactly does that mean?

1 solution

The first one will make a secure request.

The second one will make an insecure request, which should automatically be redirected to the secure version, but only because StackOverflow have configured their site to do that.

Using the first version will avoid the extra request, and any possibility of data being sent over an insecure connection. It will also remove any question of whether your code library handles the redirection transparently.

In other words: specifying the HTTP protocol (rather than HTTPS) saves you typing one character, at the expense of introducing a potential security problem into your application.
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900