Whisper: An open-source model for speech recognition now available in Galaxy!

What is Whisper? What does it do?

Whisper is an automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data collected from the web and developed by OpenAi. It is an open-source general-purpose model that can be used to recognize and transcribe speech data in 108 languages.

Whisper can generate output in 5 formats: text, JSON, SubRip, WebVTT and Tab-Separated Values. It can also detect the language of the speech automatically.

How to use the tool?

Please follow these steps:

a) Upload your audio or video file to the Galaxy.

b) Open Speech to Text tool suite on Galaxy Europe.

c) Select the desired model for transcription and the language of the speech or use automatic language detection.

d) Select the output format and run the tool to transcribe your speech data.