Click here to Skip to main content
15,887,135 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Here is my dilemma:

I want to translate a video from English to French using an artificial intelligence-generated voice. Therefore, I search for an English video on YouTube, download it, and extract its subtitles. Then, I translate these subtitles from English to French. I use artificial intelligence to perform text-to-speech synthesis of the SRT file. So far, everything is going well.

Next, I open my video editing software and import the original video file, muting the audio. I then take the translated audio in French. However, I notice that the generated French text is longer. You understand that this is where my problem lies.

Therefore, I am considering a solution and thinking of using the method of "audio stretching." However, the videos are too long, and it is not feasible to apply this method to every phrase in the video, as it would be too time-consuming.

Since I have some knowledge in Python programming, I am considering creating a program that utilizes the "audio stretching" method to synchronize the translated audio in French with the original audio, using the translated SRT file in French as a reference. After this reflection, I am attempting to create the Python code, but I am facing difficulties and encountering errors consistently. I have also tried using ChatGPT, but I have not yet been able to solve the problem.

Essentially, what I want to achieve is to apply "audio stretching" to the French audio file generated by the "text-to-speech" AI so that it matches the original audio, using the translated SRT file in French as a reference. All of this needs to be accomplished using Python. I am using Visual Studio Code as my development environment. Furthermore, could you provide me with a code that makes it work, all in Python? Alternatively, if that's not possible, could you suggest an alternative solution?

What I have tried:

Here is the code I tried to program but didn't work:
from pydub import AudioSegment
import pysrt

# Chargement des fichiers audio
print("Chargement des fichiers audio...")
audio_fr = AudioSegment.from_file("1.mp3", format="mp3")
audio_en = AudioSegment.from_file("He Refused To Become A Hero, So The Gods Cursed Him And He Became A Villain - Manhwa Recap (online-video-cutter.com).mp3")

# Chargement du fichier de sous-titres
print("Chargement du fichier de sous-titres...")
subtitles = pysrt.open("11.srt")

# Ajustement de l'audio en fonction des sous-titres
print("Ajustement de l'audio en fonction des sous-titres...")
adjusted_audio_fr = AudioSegment.silent(duration=0)  # Audio vide initial

for i, subtitle in enumerate(subtitles):
    start_time = subtitle.start.to_time()
    end_time = subtitle.end.to_time()

    # Découpage de l'audio en fonction des durées des sous-titres
    start_ms = start_time.minute * 60 * 1000 + start_time.second * 1000 + start_time.microsecond // 1000
    end_ms = end_time.minute * 60 * 1000 + end_time.second * 1000 + end_time.microsecond // 1000
    segment = audio_fr[start_ms:end_ms]
    adjusted_audio_fr += segment

    # Affichage de l'étape en cours
    print(f"Étape {i + 1}/{len(subtitles)} - Sous-titre : {subtitle.text}")

# Sauvegarde du fichier audio modifié
print("Sauvegarde du fichier audio modifié...")
adjusted_audio_fr.export("chemin/vers/audio_fr_adjusted.mp3", format="mp3")

print("Terminé !")
Posted
Updated 7-Jun-23 12:38pm
Comments
[no name] 7-Jun-23 19:41pm    
Use SSML and the "Prosody" settings.

https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/speech-synthesis-markup-structure

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900