Standard
Convolutional Variational Autoencoders for Spectrogram Compression in Automatic Speech Recognition. / Yakovenko, Olga; Bondarenko, Ivan.
Recent Trends in Analysis of Images, Social Networks and Texts - 9th International Conference, AIST 2020, Revised Supplementary Proceedings. ed. / Wil M. van der Aalst; Vladimir Batagelj; Alexey Buzmakov; Dmitry I. Ignatov; Anna Kalenkova; Michael Khachay; Olessia Koltsova; Andrey Kutuzov; Sergei O. Kuznetsov; Irina A. Lomazova; Natalia Loukachevitch; Ilya Makarov; Amedeo Napoli; Alexander Panchenko; Panos M. Pardalos; Marcello Pelillo; Andrey V. Savchenko; Elena Tutubalina. Springer Science and Business Media Deutschland GmbH, 2021. p. 115-126 (Communications in Computer and Information Science; Vol. 1357 CCIS).
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Research › peer-review
Harvard
Yakovenko, O
& Bondarenko, I 2021,
Convolutional Variational Autoencoders for Spectrogram Compression in Automatic Speech Recognition. in WM van der Aalst, V Batagelj, A Buzmakov, DI Ignatov, A Kalenkova, M Khachay, O Koltsova, A Kutuzov, SO Kuznetsov, IA Lomazova, N Loukachevitch, I Makarov, A Napoli, A Panchenko, PM Pardalos, M Pelillo, AV Savchenko & E Tutubalina (eds),
Recent Trends in Analysis of Images, Social Networks and Texts - 9th International Conference, AIST 2020, Revised Supplementary Proceedings. Communications in Computer and Information Science, vol. 1357 CCIS, Springer Science and Business Media Deutschland GmbH, pp. 115-126, 9th International Conference on Analysis of Images, Social Networks, and Texts, AIST 2020, Virtual, Online,
15.10.2020.
https://doi.org/10.1007/978-3-030-71214-3_10
APA
Yakovenko, O.
, & Bondarenko, I. (2021).
Convolutional Variational Autoencoders for Spectrogram Compression in Automatic Speech Recognition. In W. M. van der Aalst, V. Batagelj, A. Buzmakov, D. I. Ignatov, A. Kalenkova, M. Khachay, O. Koltsova, A. Kutuzov, S. O. Kuznetsov, I. A. Lomazova, N. Loukachevitch, I. Makarov, A. Napoli, A. Panchenko, P. M. Pardalos, M. Pelillo, A. V. Savchenko, & E. Tutubalina (Eds.),
Recent Trends in Analysis of Images, Social Networks and Texts - 9th International Conference, AIST 2020, Revised Supplementary Proceedings (pp. 115-126). (Communications in Computer and Information Science; Vol. 1357 CCIS). Springer Science and Business Media Deutschland GmbH.
https://doi.org/10.1007/978-3-030-71214-3_10
Vancouver
Yakovenko O
, Bondarenko I.
Convolutional Variational Autoencoders for Spectrogram Compression in Automatic Speech Recognition. In van der Aalst WM, Batagelj V, Buzmakov A, Ignatov DI, Kalenkova A, Khachay M, Koltsova O, Kutuzov A, Kuznetsov SO, Lomazova IA, Loukachevitch N, Makarov I, Napoli A, Panchenko A, Pardalos PM, Pelillo M, Savchenko AV, Tutubalina E, editors, Recent Trends in Analysis of Images, Social Networks and Texts - 9th International Conference, AIST 2020, Revised Supplementary Proceedings. Springer Science and Business Media Deutschland GmbH. 2021. p. 115-126. (Communications in Computer and Information Science). doi: 10.1007/978-3-030-71214-3_10
Author
Yakovenko, Olga
; Bondarenko, Ivan. /
Convolutional Variational Autoencoders for Spectrogram Compression in Automatic Speech Recognition. Recent Trends in Analysis of Images, Social Networks and Texts - 9th International Conference, AIST 2020, Revised Supplementary Proceedings. editor / Wil M. van der Aalst ; Vladimir Batagelj ; Alexey Buzmakov ; Dmitry I. Ignatov ; Anna Kalenkova ; Michael Khachay ; Olessia Koltsova ; Andrey Kutuzov ; Sergei O. Kuznetsov ; Irina A. Lomazova ; Natalia Loukachevitch ; Ilya Makarov ; Amedeo Napoli ; Alexander Panchenko ; Panos M. Pardalos ; Marcello Pelillo ; Andrey V. Savchenko ; Elena Tutubalina. Springer Science and Business Media Deutschland GmbH, 2021. pp. 115-126 (Communications in Computer and Information Science).
BibTeX
@inproceedings{470a3888063940ea86e516ed9b9cb3ff,
title = "Convolutional Variational Autoencoders for Spectrogram Compression in Automatic Speech Recognition",
abstract = "For many Automatic Speech Recognition (ASR) tasks audio features as spectrograms show better results than Mel-frequency Cepstral Coefficients (MFCC), but in practice they are hard to use due to a complex dimensionality of a feature space. The following paper presents an alternative approach towards generating compressed spectrogram representation, based on Convolutional Variational Autoencoders (VAE). A Convolutional VAE model was trained on a subsample of the LibriSpeech dataset to reconstruct short fragments of audio spectrograms (25 ms) from a 13-dimensional embedding. The trained model for a 40-dimensional (300 ms) embedding was used to generate features for corpus of spoken commands on the GoogleSpeechCommands dataset. Using the generated features an ASR system was built and compared to the model with MFCC features.",
keywords = "Audio feature representation, Speech recognition, Variational autoencoder",
author = "Olga Yakovenko and Ivan Bondarenko",
note = "Publisher Copyright: {\textcopyright} 2021, Springer Nature Switzerland AG.; 9th International Conference on Analysis of Images, Social Networks, and Texts, AIST 2020 ; Conference date: 15-10-2020 Through 16-10-2020",
year = "2021",
doi = "10.1007/978-3-030-71214-3_10",
language = "English",
isbn = "9783030712136",
series = "Communications in Computer and Information Science",
publisher = "Springer Science and Business Media Deutschland GmbH",
pages = "115--126",
editor = "{van der Aalst}, {Wil M.} and Vladimir Batagelj and Alexey Buzmakov and Ignatov, {Dmitry I.} and Anna Kalenkova and Michael Khachay and Olessia Koltsova and Andrey Kutuzov and Kuznetsov, {Sergei O.} and Lomazova, {Irina A.} and Natalia Loukachevitch and Ilya Makarov and Amedeo Napoli and Alexander Panchenko and Pardalos, {Panos M.} and Marcello Pelillo and Savchenko, {Andrey V.} and Elena Tutubalina",
booktitle = "Recent Trends in Analysis of Images, Social Networks and Texts - 9th International Conference, AIST 2020, Revised Supplementary Proceedings",
address = "Germany",
}
RIS
TY - GEN
T1 - Convolutional Variational Autoencoders for Spectrogram Compression in Automatic Speech Recognition
AU - Yakovenko, Olga
AU - Bondarenko, Ivan
N1 - Publisher Copyright:
© 2021, Springer Nature Switzerland AG.
PY - 2021
Y1 - 2021
N2 - For many Automatic Speech Recognition (ASR) tasks audio features as spectrograms show better results than Mel-frequency Cepstral Coefficients (MFCC), but in practice they are hard to use due to a complex dimensionality of a feature space. The following paper presents an alternative approach towards generating compressed spectrogram representation, based on Convolutional Variational Autoencoders (VAE). A Convolutional VAE model was trained on a subsample of the LibriSpeech dataset to reconstruct short fragments of audio spectrograms (25 ms) from a 13-dimensional embedding. The trained model for a 40-dimensional (300 ms) embedding was used to generate features for corpus of spoken commands on the GoogleSpeechCommands dataset. Using the generated features an ASR system was built and compared to the model with MFCC features.
AB - For many Automatic Speech Recognition (ASR) tasks audio features as spectrograms show better results than Mel-frequency Cepstral Coefficients (MFCC), but in practice they are hard to use due to a complex dimensionality of a feature space. The following paper presents an alternative approach towards generating compressed spectrogram representation, based on Convolutional Variational Autoencoders (VAE). A Convolutional VAE model was trained on a subsample of the LibriSpeech dataset to reconstruct short fragments of audio spectrograms (25 ms) from a 13-dimensional embedding. The trained model for a 40-dimensional (300 ms) embedding was used to generate features for corpus of spoken commands on the GoogleSpeechCommands dataset. Using the generated features an ASR system was built and compared to the model with MFCC features.
KW - Audio feature representation
KW - Speech recognition
KW - Variational autoencoder
UR - http://www.scopus.com/inward/record.url?scp=85107369094&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-71214-3_10
DO - 10.1007/978-3-030-71214-3_10
M3 - Conference contribution
AN - SCOPUS:85107369094
SN - 9783030712136
T3 - Communications in Computer and Information Science
SP - 115
EP - 126
BT - Recent Trends in Analysis of Images, Social Networks and Texts - 9th International Conference, AIST 2020, Revised Supplementary Proceedings
A2 - van der Aalst, Wil M.
A2 - Batagelj, Vladimir
A2 - Buzmakov, Alexey
A2 - Ignatov, Dmitry I.
A2 - Kalenkova, Anna
A2 - Khachay, Michael
A2 - Koltsova, Olessia
A2 - Kutuzov, Andrey
A2 - Kuznetsov, Sergei O.
A2 - Lomazova, Irina A.
A2 - Loukachevitch, Natalia
A2 - Makarov, Ilya
A2 - Napoli, Amedeo
A2 - Panchenko, Alexander
A2 - Pardalos, Panos M.
A2 - Pelillo, Marcello
A2 - Savchenko, Andrey V.
A2 - Tutubalina, Elena
PB - Springer Science and Business Media Deutschland GmbH
T2 - 9th International Conference on Analysis of Images, Social Networks, and Texts, AIST 2020
Y2 - 15 October 2020 through 16 October 2020
ER -