[Enhancement]: Adding Transcription/Subtitle Viewing Support #1942

Open
opened 2026-02-20 10:14:14 -05:00 by deekerman · 1 comment
Owner

Originally created by @mfcar on GitHub (May 4, 2024).

Describe the feature/enhancement

Transcription/Subtitle support

Summary

Add initial support for transcriptions. Apple now supports transcriptions in podcasts.

Some audiobook files have transcriptions, and currently, we can use tools based on Whisper to transcribe audio to text.

In fact, most software based on Whisper support transcribing audio to text and exporting it as an SRT or VTT file. VTT is a native format for the web, and SRT is a common format for subtitles.

I'm creating this issue to discuss the best way to implement transcription support on the web player. I'm trying to implement some features on the pull request #

Podcast transcription is supported by:

Possible tasks:

  • Implement support for parsing and displaying VTT files. WebVTT #2918
  • Implement support for parsing and displaying SRT files (convert to vtt or try to parse directly?).
  • Support <podcast:transcript> tag in RSS feed. Apple Docs / Podcast Namespace
  • Handle CORS issues when fetching transcription files from external sources. Podcasting 2.0 CORS
  • Implement logic to read transcription from a file with the same name as the audio file. #2918
  • Define priority if both VTT and SRT files exist with the same name?
  • When a podcast has a transcription tag, automatically download the transcription and store it in the file system (offline mode).
  • Add a visual indication in the UI that a podcast/audiobook has a transcription available?
  • Implement a feature to search within the transcription (better if using a lateral panel)?
  • Implement a feature to highlight the current line of the transcription as the podcast/audiobook plays. #2918
  • Implement a feature to navigate to a specific part of the podcast/audiobook by clicking on the transcription text. Seek to the corresponding time in the audio. #2918
  • Implement a feature to toggle the display of the transcription on/off on the web player. #2918
  • Implement a button to download the transcription file? Can be useful for editing or sharing. #2918
  • Allows to upload a VTT file for transcription?

Note: I think we need define a standard for multi-language transcriptions. For example use some prefix in the file name like en- for English and es- for Spanish.

UI Ideas on the Web Player:

What's the best way to display the transcription on the web player?

  • Above/Below audio player controls. #2918
    • Good: because the audio player is omnipresent on all pages and the transcription can be displayed in a fixed position.
    • Bad: Not enough space to display multiple lines of text.
Screenshot 2024-05-04 at 20 47 33
  • Modal.
    • Good: More space to display multiple lines of text. Can float over the UI.
    • Bad: The modal can be intrusive.
  • Lateral Panel. (like the iTunes/Apple Music).
    • Good: More space to display multiple lines of text. Can have a search feature. Better for implement the seek feature (click on the line and seek to the corresponding time).
    • Bad: Take up space on the screen. Not good for small screens.
Screenshot 2024-05-02 at 09 09 47
Originally created by @mfcar on GitHub (May 4, 2024). ### Describe the feature/enhancement Transcription/Subtitle support === ### Summary Add initial support for transcriptions. Apple now supports transcriptions in podcasts. Some audiobook files have transcriptions, and currently, we can use tools based on [Whisper](https://github.com/openai/whisper) to transcribe audio to text. In fact, most software based on Whisper support transcribing audio to text and exporting it as an [SRT](https://en.wikipedia.org/wiki/SubRip) or [VTT](https://en.wikipedia.org/wiki/WebVTT) file. VTT is a native format for the web, and SRT is a common format for subtitles. I'm creating this issue to discuss the best way to implement transcription support on the web player. I'm trying to implement some features on the pull request # ### Podcast transcription is supported by: - [ ] https://podcasting2.org/podcast-namespace/tags/transcript ### Possible tasks: - [ ] Implement support for parsing and displaying VTT files. [WebVTT](https://developer.mozilla.org/en-US/docs/Web/API/WebVTT_API) #2918 - [ ] Implement support for parsing and displaying SRT files (convert to vtt or try to parse directly?). - [ ] Support `<podcast:transcript>` tag in RSS feed. [Apple Docs](https://help.apple.com/itc/podcasts_connect/#/itcb54353390) / [Podcast Namespace](https://github.com/Podcastindex-org/podcast-namespace) - [ ] Handle CORS issues when fetching transcription files from external sources. [Podcasting 2.0 CORS](https://github.com/Podcastindex-org/podcast-namespace/blob/main/podcasting2.0.md#cors) - [ ] Implement logic to read transcription from a file with the same name as the audio file. #2918 - [ ] Define priority if both VTT and SRT files exist with the same name? - [ ] When a podcast has a transcription tag, automatically download the transcription and store it in the file system (offline mode). - [ ] Add a visual indication in the UI that a podcast/audiobook has a transcription available? - [ ] Implement a feature to search within the transcription (better if using a lateral panel)? - [ ] Implement a feature to highlight the current line of the transcription as the podcast/audiobook plays. #2918 - [ ] Implement a feature to navigate to a specific part of the podcast/audiobook by clicking on the transcription text. Seek to the corresponding time in the audio. #2918 - [ ] Implement a feature to toggle the display of the transcription on/off on the web player. #2918 - [ ] Implement a button to download the transcription file? Can be useful for editing or sharing. #2918 - [ ] Allows to upload a VTT file for transcription? > Note: I think we need define a standard for multi-language transcriptions. For example use some prefix in the file name like `en-` for English and `es-` for Spanish. ### UI Ideas on the Web Player: What's the best way to display the transcription on the web player? - [ ] Above/Below audio player controls. #2918 - Good: because the audio player is omnipresent on all pages and the transcription can be displayed in a fixed position. - Bad: Not enough space to display multiple lines of text. <img width="989" alt="Screenshot 2024-05-04 at 20 47 33" src="https://github.com/advplyr/audiobookshelf/assets/814828/4ba864bb-c2cf-4ef3-8bc6-5affadc3d52d"> --- - [ ] Modal. - Good: More space to display multiple lines of text. Can float over the UI. - Bad: The modal can be intrusive. - [ ] Lateral Panel. (like the iTunes/Apple Music). - Good: More space to display multiple lines of text. Can have a search feature. Better for implement the seek feature (click on the line and seek to the corresponding time). - Bad: Take up space on the screen. Not good for small screens. <img width="1219" alt="Screenshot 2024-05-02 at 09 09 47" src="https://github.com/advplyr/audiobookshelf/assets/814828/f1e8320d-ae6c-4ce8-bb39-1c37c05aa0ab"> ### Related - #1723
Author
Owner

@zkvsky commented on GitHub (Jun 24, 2024):

IMHO, when it comes to the UI you should combine both, the big panel for browsing and the panel below controls perhaps just with the current line but bigger. You have to take into account accessibility, some folks will want it to be resizable.

@zkvsky commented on GitHub (Jun 24, 2024): IMHO, when it comes to the UI you should combine both, the big panel for browsing and the panel below controls perhaps just with the current line but bigger. You have to take into account accessibility, some folks will want it to be resizable.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/audiobookshelf-advplyr#1942
No description provided.