Standard
Pisets: A Robust Speech Recognition System for Lectures and Interviews. / Bondarenko, Ivan; Grebenkin, Daniil; Sedukhin, Oleg et al.
Proceedings of the 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies: Long Papers, NAACL-HLT 2025. ed. / Weizhu Chen; Yi Yang; Mohammad Kachuee; Xue-Yong Fu. Association for Computational Linguistics, 2025. p. 988-997 (Proceedings of the 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies: Long Papers, NAACL-HLT 2025; Vol. 3).
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Research › peer-review
Harvard
Bondarenko, I, Grebenkin, D, Sedukhin, O, Klementev, M, Derunets, R
& Budneva, L 2025,
Pisets: A Robust Speech Recognition System for Lectures and Interviews. in W Chen, Y Yang, M Kachuee & X-Y Fu (eds),
Proceedings of the 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies: Long Papers, NAACL-HLT 2025. Proceedings of the 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies: Long Papers, NAACL-HLT 2025, vol. 3, Association for Computational Linguistics, pp. 988-997, 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics, Albuquerque, New Mexico, United States,
29.04.2025.
https://doi.org/10.18653/v1/2025.naacl-industry.74
APA
Bondarenko, I., Grebenkin, D., Sedukhin, O., Klementev, M., Derunets, R.
, & Budneva, L. (2025).
Pisets: A Robust Speech Recognition System for Lectures and Interviews. In W. Chen, Y. Yang, M. Kachuee, & X-Y. Fu (Eds.),
Proceedings of the 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies: Long Papers, NAACL-HLT 2025 (pp. 988-997). (Proceedings of the 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies: Long Papers, NAACL-HLT 2025; Vol. 3). Association for Computational Linguistics.
https://doi.org/10.18653/v1/2025.naacl-industry.74
Vancouver
Bondarenko I, Grebenkin D, Sedukhin O, Klementev M, Derunets R
, Budneva L.
Pisets: A Robust Speech Recognition System for Lectures and Interviews. In Chen W, Yang Y, Kachuee M, Fu X-Y, editors, Proceedings of the 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies: Long Papers, NAACL-HLT 2025. Association for Computational Linguistics. 2025. p. 988-997. (Proceedings of the 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies: Long Papers, NAACL-HLT 2025). doi: 10.18653/v1/2025.naacl-industry.74
Author
Bondarenko, Ivan ; Grebenkin, Daniil ; Sedukhin, Oleg et al. /
Pisets: A Robust Speech Recognition System for Lectures and Interviews. Proceedings of the 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies: Long Papers, NAACL-HLT 2025. editor / Weizhu Chen ; Yi Yang ; Mohammad Kachuee ; Xue-Yong Fu. Association for Computational Linguistics, 2025. pp. 988-997 (Proceedings of the 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies: Long Papers, NAACL-HLT 2025).
BibTeX
@inproceedings{da091e1123cc485fa1c0303daf2b8ea4,
title = "Pisets: A Robust Speech Recognition System for Lectures and Interviews",
abstract = "This work presents a speech-to-text system {"}Pisets{"} for scientists and journalists which is based on a three-component architecture aimed at improving speech recognition accuracy while minimizing errors and hallucinations associated with the Whisper model. The architecture comprises primary recognition using Wav2Vec2, false positive filtering via the Audio Spectrogram Transformer (AST), and final speech recognition through Whisper. The implementation of curriculum learning methods and the utilization of diverse Russian-language speech corpora significantly enhanced the system's effectiveness. Additionally, advanced uncertainty modeling techniques were introduced, contributing to further improvements in transcription quality. The proposed approaches ensure robust transcribing of long audio data across various acoustic conditions compared to WhisperX and the usual Whisper model. The source code of {"}Pisets{"} system is publicly available at GitHub: https://github.com/bond005/pisets.",
author = "Ivan Bondarenko and Daniil Grebenkin and Oleg Sedukhin and Mikhail Klementev and Roman Derunets and Lyudmila Budneva",
note = "Ivan Bondarenko, Daniil Grebenkin, Oleg Sedukhin, Mikhail Klementev, Roman Derunets, and Lyudmila Budneva. 2025. Pisets: A Robust Speech Recognition System for Lectures and Interviews. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: Industry Track), pages 988–997, Albuquerque, New Mexico. Association for Computational Linguistics. The work is supported by the grant for the implementation of the strategic academic leadership program {"}Priority 2030{"} at Novosibirsk State University.; 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics, NAACL ; Conference date: 29-04-2025 Through 04-05-2025",
year = "2025",
doi = "10.18653/v1/2025.naacl-industry.74",
language = "English",
isbn = "9798891761940",
series = "Proceedings of the 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies: Long Papers, NAACL-HLT 2025",
publisher = "Association for Computational Linguistics",
pages = "988--997",
editor = "Weizhu Chen and Yi Yang and Mohammad Kachuee and Xue-Yong Fu",
booktitle = "Proceedings of the 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies: Long Papers, NAACL-HLT 2025",
address = "United States",
}
RIS
TY - GEN
T1 - Pisets: A Robust Speech Recognition System for Lectures and Interviews
AU - Bondarenko, Ivan
AU - Grebenkin, Daniil
AU - Sedukhin, Oleg
AU - Klementev, Mikhail
AU - Derunets, Roman
AU - Budneva, Lyudmila
N1 - Ivan Bondarenko, Daniil Grebenkin, Oleg Sedukhin, Mikhail Klementev, Roman Derunets, and Lyudmila Budneva. 2025. Pisets: A Robust Speech Recognition System for Lectures and Interviews. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: Industry Track), pages 988–997, Albuquerque, New Mexico. Association for Computational Linguistics.
The work is supported by the grant for the implementation of the strategic academic leadership program "Priority 2030" at Novosibirsk State University.
PY - 2025
Y1 - 2025
N2 - This work presents a speech-to-text system "Pisets" for scientists and journalists which is based on a three-component architecture aimed at improving speech recognition accuracy while minimizing errors and hallucinations associated with the Whisper model. The architecture comprises primary recognition using Wav2Vec2, false positive filtering via the Audio Spectrogram Transformer (AST), and final speech recognition through Whisper. The implementation of curriculum learning methods and the utilization of diverse Russian-language speech corpora significantly enhanced the system's effectiveness. Additionally, advanced uncertainty modeling techniques were introduced, contributing to further improvements in transcription quality. The proposed approaches ensure robust transcribing of long audio data across various acoustic conditions compared to WhisperX and the usual Whisper model. The source code of "Pisets" system is publicly available at GitHub: https://github.com/bond005/pisets.
AB - This work presents a speech-to-text system "Pisets" for scientists and journalists which is based on a three-component architecture aimed at improving speech recognition accuracy while minimizing errors and hallucinations associated with the Whisper model. The architecture comprises primary recognition using Wav2Vec2, false positive filtering via the Audio Spectrogram Transformer (AST), and final speech recognition through Whisper. The implementation of curriculum learning methods and the utilization of diverse Russian-language speech corpora significantly enhanced the system's effectiveness. Additionally, advanced uncertainty modeling techniques were introduced, contributing to further improvements in transcription quality. The proposed approaches ensure robust transcribing of long audio data across various acoustic conditions compared to WhisperX and the usual Whisper model. The source code of "Pisets" system is publicly available at GitHub: https://github.com/bond005/pisets.
UR - https://www.scopus.com/pages/publications/105027153282
UR - https://www.mendeley.com/catalogue/74454c3d-06fc-34dc-a3a0-724cb42896d7/
U2 - 10.18653/v1/2025.naacl-industry.74
DO - 10.18653/v1/2025.naacl-industry.74
M3 - Conference contribution
SN - 9798891761940
T3 - Proceedings of the 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies: Long Papers, NAACL-HLT 2025
SP - 988
EP - 997
BT - Proceedings of the 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies: Long Papers, NAACL-HLT 2025
A2 - Chen, Weizhu
A2 - Yang, Yi
A2 - Kachuee, Mohammad
A2 - Fu, Xue-Yong
PB - Association for Computational Linguistics
T2 - 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics
Y2 - 29 April 2025 through 4 May 2025
ER -