| | --- |
| | language: |
| | - nl |
| | pipeline_tag: automatic-speech-recognition |
| | license: cc-by-nc-4.0 |
| | --- |
| | |
| | # Model |
| |
|
| | This repository contains the third version of our Automatic Speech Recognition and Subtitle Generation model for Flemish Dutch. |
| | Compared to the [second version](https://huggingface.co/nelfproject/ASR_subtitles_v2), the model is a fully Pytorch-based model without dependency on Kaldi-features, facilitating simple deployment and finetuning. |
| |
|
| | The model has been trained on 300 hours of verbatim annotated Flemish data from CGN (with 3-fold noise augmentation), 700 hours of Netherlands Dutch data from CGN, as well as 14000 hours of weakly-supervised subtitled Flemish broadcast media data. |
| | Additionally, we enriched the training data by generating contextualised verbatim pseudo-labels conditioned on the subtitles, to improve rare word recognition in verbatim transcripts. |
| |
|
| | The model can generate both an exact verbatim transcription with annotation tags as well as a fully formatted and cleaned up subtitle transcription. It outputs both modalities with separate decoders. |
| | The model consists of 180M parameters and requires 2-6GB GPU RAM for inference. |
| |
|
| | **Version**: August 2025 |
| |
|
| | # Usage |
| |
|
| | This repository only hosts the pre-trained model itself and the configuration files. To download this repository, follow the instructions by Huggingface. |
| |
|
| | Usage of this model and an example test file to perform inference - fully integrated with a VAD pipeline - can be found on [Github](https://github.com/nelfproject/NeLF_Speech2Text_Pytorch). |
| | We incorporated the [Silero VAD](https://github.com/snakers4/silero-vad) in this pipeline, which is uploaded as part of the model files. |
| |
|
| | The model is released under a Creative Commons Non-Commercial license. |
| |
|
| | # Citation |
| |
|
| | If you use this model, please cite the research paper: |
| | ```bibtex |
| | @article{poncelet2024, |
| | author ={Poncelet, Jakob and Van hamme, Hugo}, |
| | title ={Leveraging broadcast media subtitle transcripts for automatic speech recognition and subtitling}, |
| | year={2026}, |
| | booktitle={Journal on Audio, Speech and Music Processing}, |
| | volume={11}, |
| | } |
| | ``` |
| |
|
| | # Contact |
| | Jakob Poncelet: jakob.poncelet@kuleuven.be |
| |
|