Standard

On Practical Approach to Uniform Quantization of Non-redundant Neural Networks. / Goncharenko, Alexander; Denisov, Andrey; Alyamkin, Sergey и др.

Artificial Neural Networks and Machine Learning – ICANN 2019: Deep Learning - 28th International Conference on Artificial Neural Networks, Proceedings. ред. / Igor V. Tetko; Pavel Karpov; Fabian Theis; Vera Kurková. Springer-Verlag GmbH and Co. KG, 2019. стр. 349-360 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Том 11728 LNCS).

Результаты исследований: Публикации в книгах, отчётах, сборниках, трудах конференцийстатья в сборнике материалов конференциинаучнаяРецензирование

Harvard

Goncharenko, A, Denisov, A, Alyamkin, S & Terentev, E 2019, On Practical Approach to Uniform Quantization of Non-redundant Neural Networks. в IV Tetko, P Karpov, F Theis & V Kurková (ред.), Artificial Neural Networks and Machine Learning – ICANN 2019: Deep Learning - 28th International Conference on Artificial Neural Networks, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Том. 11728 LNCS, Springer-Verlag GmbH and Co. KG, стр. 349-360, 28th International Conference on Artificial Neural Networks, ICANN 2019, Munich, Германия, 17.09.2019. https://doi.org/10.1007/978-3-030-30484-3_29

APA

Goncharenko, A., Denisov, A., Alyamkin, S., & Terentev, E. (2019). On Practical Approach to Uniform Quantization of Non-redundant Neural Networks. в I. V. Tetko, P. Karpov, F. Theis, & V. Kurková (Ред.), Artificial Neural Networks and Machine Learning – ICANN 2019: Deep Learning - 28th International Conference on Artificial Neural Networks, Proceedings (стр. 349-360). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Том 11728 LNCS). Springer-Verlag GmbH and Co. KG. https://doi.org/10.1007/978-3-030-30484-3_29

Vancouver

Goncharenko A, Denisov A, Alyamkin S, Terentev E. On Practical Approach to Uniform Quantization of Non-redundant Neural Networks. в Tetko IV, Karpov P, Theis F, Kurková V, Редакторы, Artificial Neural Networks and Machine Learning – ICANN 2019: Deep Learning - 28th International Conference on Artificial Neural Networks, Proceedings. Springer-Verlag GmbH and Co. KG. 2019. стр. 349-360. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-030-30484-3_29

Author

Goncharenko, Alexander ; Denisov, Andrey ; Alyamkin, Sergey и др. / On Practical Approach to Uniform Quantization of Non-redundant Neural Networks. Artificial Neural Networks and Machine Learning – ICANN 2019: Deep Learning - 28th International Conference on Artificial Neural Networks, Proceedings. Редактор / Igor V. Tetko ; Pavel Karpov ; Fabian Theis ; Vera Kurková. Springer-Verlag GmbH and Co. KG, 2019. стр. 349-360 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

BibTeX

@inproceedings{c0c2dda0d3a14d11b0d6ce06eaeb32b7,
title = "On Practical Approach to Uniform Quantization of Non-redundant Neural Networks",
abstract = "The neural network quantization is highly desired procedure to perform before running the neural networks on mobile devices. Quantization without fine-tuning leads to accuracy drop of the model, whereas commonly used training with quantization is done on the full set of the labeled data and therefore is both time- and resource-consuming. Real life applications require simplification and acceleration of the quantization procedure that will maintain the accuracy of full-precision neural network, especially for modern mobile neural network architectures like Mobilenet-v1, MobileNet-v2 and MNAS. Here we present two methods to significantly optimize the training with the quantization procedure. The first one is introducing the trained scale factors for discretization thresholds that are separate for each filter. The second one is based on mutual rescaling of consequent depth-wise separable convolution and convolution layers. Using the proposed techniques, we quantize the modern mobile architectures of neural networks with the set of train data of only ∼ 10% of the total ImageNet 2012 sample. Such reduction of the train dataset size and a small number of trainable parameters allow to fine-tune the network for several hours while maintaining the high accuracy of the quantized model (the accuracy drop was less than 0.5%). The ready-for-use models and code are available at: https://github.com/agoncharenko1992/FAT-fast-adjustable-threshold.",
keywords = "Distillation, Machine learning, Neural networks, Quantization",
author = "Alexander Goncharenko and Andrey Denisov and Sergey Alyamkin and Evgeny Terentev",
year = "2019",
month = jan,
day = "1",
doi = "10.1007/978-3-030-30484-3_29",
language = "English",
isbn = "9783030304836",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer-Verlag GmbH and Co. KG",
pages = "349--360",
editor = "Tetko, {Igor V.} and Pavel Karpov and Fabian Theis and Vera Kurkov{\'a}",
booktitle = "Artificial Neural Networks and Machine Learning – ICANN 2019",
address = "Germany",
note = "28th International Conference on Artificial Neural Networks, ICANN 2019 ; Conference date: 17-09-2019 Through 19-09-2019",

}

RIS

TY - GEN

T1 - On Practical Approach to Uniform Quantization of Non-redundant Neural Networks

AU - Goncharenko, Alexander

AU - Denisov, Andrey

AU - Alyamkin, Sergey

AU - Terentev, Evgeny

PY - 2019/1/1

Y1 - 2019/1/1

N2 - The neural network quantization is highly desired procedure to perform before running the neural networks on mobile devices. Quantization without fine-tuning leads to accuracy drop of the model, whereas commonly used training with quantization is done on the full set of the labeled data and therefore is both time- and resource-consuming. Real life applications require simplification and acceleration of the quantization procedure that will maintain the accuracy of full-precision neural network, especially for modern mobile neural network architectures like Mobilenet-v1, MobileNet-v2 and MNAS. Here we present two methods to significantly optimize the training with the quantization procedure. The first one is introducing the trained scale factors for discretization thresholds that are separate for each filter. The second one is based on mutual rescaling of consequent depth-wise separable convolution and convolution layers. Using the proposed techniques, we quantize the modern mobile architectures of neural networks with the set of train data of only ∼ 10% of the total ImageNet 2012 sample. Such reduction of the train dataset size and a small number of trainable parameters allow to fine-tune the network for several hours while maintaining the high accuracy of the quantized model (the accuracy drop was less than 0.5%). The ready-for-use models and code are available at: https://github.com/agoncharenko1992/FAT-fast-adjustable-threshold.

AB - The neural network quantization is highly desired procedure to perform before running the neural networks on mobile devices. Quantization without fine-tuning leads to accuracy drop of the model, whereas commonly used training with quantization is done on the full set of the labeled data and therefore is both time- and resource-consuming. Real life applications require simplification and acceleration of the quantization procedure that will maintain the accuracy of full-precision neural network, especially for modern mobile neural network architectures like Mobilenet-v1, MobileNet-v2 and MNAS. Here we present two methods to significantly optimize the training with the quantization procedure. The first one is introducing the trained scale factors for discretization thresholds that are separate for each filter. The second one is based on mutual rescaling of consequent depth-wise separable convolution and convolution layers. Using the proposed techniques, we quantize the modern mobile architectures of neural networks with the set of train data of only ∼ 10% of the total ImageNet 2012 sample. Such reduction of the train dataset size and a small number of trainable parameters allow to fine-tune the network for several hours while maintaining the high accuracy of the quantized model (the accuracy drop was less than 0.5%). The ready-for-use models and code are available at: https://github.com/agoncharenko1992/FAT-fast-adjustable-threshold.

KW - Distillation

KW - Machine learning

KW - Neural networks

KW - Quantization

UR - http://www.scopus.com/inward/record.url?scp=85072865641&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-30484-3_29

DO - 10.1007/978-3-030-30484-3_29

M3 - Conference contribution

AN - SCOPUS:85072865641

SN - 9783030304836

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 349

EP - 360

BT - Artificial Neural Networks and Machine Learning – ICANN 2019

A2 - Tetko, Igor V.

A2 - Karpov, Pavel

A2 - Theis, Fabian

A2 - Kurková, Vera

PB - Springer-Verlag GmbH and Co. KG

T2 - 28th International Conference on Artificial Neural Networks, ICANN 2019

Y2 - 17 September 2019 through 19 September 2019

ER -

ID: 21793125