Click here to Skip to main content
15,867,966 members
Please Sign up or sign in to vote.
1.00/5 (1 vote)
Hi Team

I am struggling to scrap multiple pages that will translate from english to hindi and their total is to be 100 that needs to be translated. The application has no error just i cant seem to get translate the web-pages of the scrap website into hindu

What I have tried:

// scrapping the main page using python
# Author:Gcobani Mkontwana
#date:27/02/2023
#Script scraps the website using request and beautifulSoup library

from google_translate import browser
from google_translate import selenium
import requests
from bs4 import BeautifulSoup
URL = "https://www.classcentral.com/?"
headers = {'User-Agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36 Edge/12.246"}
# Here the user agent is for Edge browser on windows 10. You can find your browser user agent from the above given link.
r = requests.get(url=URL, headers=headers)
print(r.content)
# Parsing the HTML
soup = BeautifulSoup(r.content, 'html.parser')
# find all the anchor tags with "href"
for link in soup.find_all('a'):
    print(link.get('href'))


// google translate the web-page
#Author:Gcobani Mkontwana
#date:27/02/2023
#Script transalate text into Hindi using google translate API

import time
from selenium import webdriver
from selenium.webdriver.common.by import By
import selenium
# Give Language code in which you want to translate the text:=>
lang_code = 'hi '

# Provide text that you want to translate:=>
input1 = " Find your next course.Class Central aggregates courses from many providers to help you find the best courses on almost any subject, wherever they exist"

# launch browser with selenium:=>
browser = webdriver.Chrome() #browser = webdriver.Chrome('path of chromedriver.exe file') if the chromedriver.exe is in different folder

# copy google Translator link here:=>
browser.get("https://translate.google.co.in/?sl=auto&tl="+lang_code+"&text="+input1+"&op=translate")

# just wait for some time for translating input text:=>
time.sleep(6)

# Given below x path contains the translated output that we are storing in output variable:=>
output1 = browser.find_element(By.CLASS_NAME,'HwtZe').text

# Display the output:=>
print("Translated Paragraph:=> " + output1)
Posted
Updated 27-Feb-23 0:33am

1 solution

Your imports are incorrect. Rather create a function and make a call to the function. I have created the following and used it in a few projects as ready made in the past. Adjust this to your need, keeping in mind that you might need the function again for another project.

from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

def translate_english_to_hindi(text_to_translate):
    try:
        # Launch a Chrome browser window and go to the Google Translate website
        driver = webdriver.Chrome()
        driver.get('https://translate.google.com/')

        # Find the source language input box and the target language input box
        source_box = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, '//textarea[@id="source"]')))
        target_box = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, '//div[@id="gt-res-dir-ctr"]//textarea[@id="gt-res-dir-ctr"]')))

        # Enter the sentence into the source box OR into your input1 box and submit it
        source_box.send_keys(sentence)
        source_box.send_keys(Keys.RETURN)

        # Wait for the translation to appear and extract the Hindi translation, time set to 10 seconds
        translation_box = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, '//span[@class="tlid-translation translation"]')))
        hindi_translation = translation_box.text

        # Close the browser window and show the Hindi translation
        driver.quit()
        return hindi_translation

    except TimeoutException:
        print("Error: Your translation timed out at/not found within 10 seconds")
        return None

    except Exception as e:
        print("Error:", e)
        return None


To make the call, you can use the following -
input1 = "Find your next course. Class Central aggregates courses from many providers to help you find the best courses on almost any subject, wherever they exist"
hindi_translation = translate_english_to_hindi(input1)

if hindi_translation is not None:
    print(hindi_translation)
else:
    print("Translation failed")
 
Share this answer
 
Comments
Gcobani Mkontwana 27-Feb-23 11:00am    
@Andre Oosthuizen thanks for correcting me, last question when can make a call of this last logic. Should before the exception or before return translation? Please advice so can test and revert back.
Andre Oosthuizen 28-Feb-23 4:05am    
Pleasure, please read all of my solution, the call is made here - hindi_translation = translate_english_to_hindi(input1)

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900