I'm not much familiar with python. I'm scraping data from a school site about its address, phone, email, and the school link. I scraped the json data, and one of the keys has all these values which i have shown below.
This is the output which i receive under the key "address" :
<br>title='École privée'/>
17 rue Jean Gallart <br>49650 ALLONNES #This is the address
<br>Téléphone : <a href="tel:0241528201">0241528201</a> #Phone no
<br>Adresse de courriel : <a href="mailto:ce.0491164Z@ac-nantes.fr">ce.0491164Z@ac-nantes.fr</a> # Email
<br><br><a href="./etablissement/Allonnes/ECOLE-PRIMAIRE-PRIVEE-SAINT-DOUCELIN/0491164Z.html"> #Link for school
Everything was in a single line, i formatted it to look clear and removed unneccesary font tags
I want to extract these items in the following format:
Address : 17 rue Jean Gallart 49650 ALLONNES
Telephone: 0241528201
Email: ce.0491164Z@ac-nantes.fr
Link for school = https://etablissement/Allonnes/ECOLE-PRIMAIRE-PRIVEE-SAINT-DOUCELIN/0491164Z.html
What I have tried:
I tried extracting email using regex :
emails = re.findall(r"[a-z0-9\.\-+_]+@[a-z0-9\.\-+_]+\.[a-z]+", address)
print (emails)
I tried extracting href tags:
soup = BeautifulSoup(address, 'lxml')
for anchor in soup.find_all("a"):
print(anchor.attrs)
I got a close output :
{'href': 'tel:0241528201'}<br>
{'href': 'mailto:ce.0491164Z@ac-nantes.fr'}<br>
{'href': './etablissement/Allonnes/ECOLE-PRIMAIRE-PRIVEE-SAINT-DOUCELIN/0491164Z.html'}<br>
How can I extract these items one by one under different variables so that i can easily save it into a csv file?
Thanks in advance