Srt to speech problème audio streching

Question

0.00/5 (No votes)

See more:

Here is my dilemma:

I want to translate a video from English to French using an artificial intelligence-generated voice. Therefore, I search for an English video on YouTube, download it, and extract its subtitles. Then, I translate these subtitles from English to French. I use artificial intelligence to perform text-to-speech synthesis of the SRT file. So far, everything is going well.

Next, I open my video editing software and import the original video file, muting the audio. I then take the translated audio in French. However, I notice that the generated French text is longer. You understand that this is where my problem lies.

Therefore, I am considering a solution and thinking of using the method of "audio stretching." However, the videos are too long, and it is not feasible to apply this method to every phrase in the video, as it would be too time-consuming.

Since I have some knowledge in Python programming, I am considering creating a program that utilizes the "audio stretching" method to synchronize the translated audio in French with the original audio, using the translated SRT file in French as a reference. After this reflection, I am attempting to create the Python code, but I am facing difficulties and encountering errors consistently. I have also tried using ChatGPT, but I have not yet been able to solve the problem.

Essentially, what I want to achieve is to apply "audio stretching" to the French audio file generated by the "text-to-speech" AI so that it matches the original audio, using the translated SRT file in French as a reference. All of this needs to be accomplished using Python. I am using Visual Studio Code as my development environment. Furthermore, could you provide me with a code that makes it work, all in Python? Alternatively, if that's not possible, could you suggest an alternative solution?

What I have tried:

Here is the code I tried to program but didn't work:

from pydub import AudioSegment
import pysrt

# Chargement des fichiers audio
print("Chargement des fichiers audio...")
audio_fr = AudioSegment.from_file("1.mp3", format="mp3")
audio_en = AudioSegment.from_file("He Refused To Become A Hero, So The Gods Cursed Him And He Became A Villain - Manhwa Recap (online-video-cutter.com).mp3")

# Chargement du fichier de sous-titres
print("Chargement du fichier de sous-titres...")
subtitles = pysrt.open("11.srt")

# Ajustement de l'audio en fonction des sous-titres
print("Ajustement de l'audio en fonction des sous-titres...")
adjusted_audio_fr = AudioSegment.silent(duration=0)  # Audio vide initial

for i, subtitle in enumerate(subtitles):
    start_time = subtitle.start.to_time()
    end_time = subtitle.end.to_time()

    # Découpage de l'audio en fonction des durées des sous-titres
    start_ms = start_time.minute * 60 * 1000 + start_time.second * 1000 + start_time.microsecond // 1000
    end_ms = end_time.minute * 60 * 1000 + end_time.second * 1000 + end_time.microsecond // 1000
    segment = audio_fr[start_ms:end_ms]
    adjusted_audio_fr += segment

    # Affichage de l'étape en cours
    print(f"Étape {i + 1}/{len(subtitles)} - Sous-titre : {subtitle.text}")

# Sauvegarde du fichier audio modifié
print("Sauvegarde du fichier audio modifié...")
adjusted_audio_fr.export("chemin/vers/audio_fr_adjusted.mp3", format="mp3")

print("Terminé !")

Posted 7-Jun-23 11:58am

Anime Insolite

Updated 7-Jun-23 12:38pm

Add a Solution

Comments

[no name] 7-Jun-23 19:41pm

Use SSML and the "Prosody" settings.

https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/speech-synthesis-markup-structure

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)