How do I... WEBSCRAPPING

Question

1.00/5 (2 votes)

See more:

I have this exercise: Identify which are the fields that must be imported from the extracted urls, what type of data must be saved for each of them. and on the imported data filter by the appropriate categories. Make code that allows such an import to be performed. This is my code but it doesn't print anything. I'm a beginner

What I have tried:

from bs4 import BeautifulSoup
import requests

url = "https://www.senado.gov.co/index.php/el-senado/noticias"
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")

# Obtener todas las noticias
noticias = soup.find_all("div", class_="articulo-contenido")

# Recorrer cada noticia y obtener los datos de interés
for noticia in noticias:
    titulo = noticia.find("h2").text.strip()
    fecha = noticia.find("div", class_="fecha").text.strip()
    categoria = noticia.find("span", class_="categoria").text.strip()
    enlace = noticia.find("a")["href"]

    # Filtrar las noticias que cumplen con cierta condición
    if enlace.startswith("https://www.senado.gov.co/index.php/el-senado/noticias/item/"):
        print(titulo, fecha, categoria, enlace)

Posted 10-Apr-23 6:19am

Alis Avilez

Add a Solution

Comments

Richard MacCutchan 11-Apr-23 3:28am

It is not clear what data you are trying to access. How are you supposed to answer the query: "Identify which are the fields that must be imported from the extracted urls"?

Alis Avilez 13-Apr-23 18:26pm

the code I used of the previous exercise, from extracted urls, I will publish it in the response area, because it cannot be seen in full.

Graeme_Grant 13-Apr-23 18:40pm

Check your class names, that class name is not found on that page. Best way to look is with the Browser Developer Tools.

I can see (class names) per post:

article
    article-intro-image
    article-body
        article-header
        article-info
        article-introtext

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)