

- #English to japanese with voice movie#
- #English to japanese with voice download#
- #English to japanese with voice free#
The downside is that Whisper might be less accurate when transitioning between each chunk, but in the case of Japanese this is certainly more than worth the trade off when by default Whisper is not able to handle more than a couple of minutes before encountering the issues above. You can view this transcript directly on YouTube using the addon Substital. This is enough to mostly completely fix the issues with Japanese text, and I've even been able to run Whisper on 7+ hour videos with no major issues, for instance on this 07:21:20 video by Korone on YouTube: Finally, Whisper is run on each chunk and the output is automatically merged into one single transcript. Next, I also try to split each chunk such that it includes about 1 second of padding before and after, to ensure that Whisper is properly able to detect words in the beginning and end of each chunk.

I also pass previous detected text as prompt, if the text is close enough (prompt window is up to 3 seconds by default). Other than that, it's actually usable as opposed to just running Whisper on the whole audio.Įssentially, this is done by detecting continuous sections of speech using Silero VAD, then (for performance reasons) merge sections into up to 30 seconds chunks when sections are 5 seconds or less apart. There's still a few repeated lines, but these are hallucinations that occur during silent periods.
#English to japanese with voice movie#
Just take a look at the transcript for the Macross Frontier movie as an example:

I've been tinkering with my WebUI since the public release of Whisper, and I think I've found a solution using Silero VAD which dramatically improves the accuracy of both the text and timings of long transcripts in Japanese. However, I was able to avoid some of these issues by manually splitting the original movie into 10 minute chunks, run Whisper in each chunk, and then merge the resulting transcripts together into one long transcript (SRT).
#English to japanese with voice download#
You can even download the containers directly from GitLab (see the README for more information): Also note that it's relatively easy to host this WebUI on Google Colab, if you don't have enough GPU horsepower locally to run it yourself. You can also use the CLI version, which is identical to the Whisper CLI except that you can also use URL's rather than file paths, and specify a VAD (more about this below). There's also support for parallel execution on multiple GPUs, using the -auto_parallel True option (see the README for more information): It also supports more accurate transcripts for languages other than English using a VAD.

#English to japanese with voice free#
I've found Whisper to be an incredible free tool for transcribing audio, so I've made my own WebUI which integrates directly with YT-DLP for direct YouTube transcripts, and allows for easy downloads of a transcript or an SRT/VTT file.
