QPP: Real-Time Quantization Parameter Prediction for Deep Neural Networks

Standard

QPP: Real-Time Quantization Parameter Prediction for Deep Neural Networks. / Kryzhanovskiy, Vladimir; Balitskiy, Gleb; Kozyrskiy, Nikolay et al.

Proceedings - 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2021. IEEE Computer Society, 2021. p. 10679-10687 (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Research › peer-review

Harvard

Kryzhanovskiy, V, Balitskiy, G, Kozyrskiy, N & Zuruev, A 2021, QPP: Real-Time Quantization Parameter Prediction for Deep Neural Networks. in Proceedings - 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2021. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE Computer Society, pp. 10679-10687, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, Online, United States, 19.06.2021. https://doi.org/10.1109/CVPR46437.2021.01054

APA

Kryzhanovskiy, V., Balitskiy, G., Kozyrskiy, N., & Zuruev, A. (2021). QPP: Real-Time Quantization Parameter Prediction for Deep Neural Networks. In Proceedings - 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2021 (pp. 10679-10687). (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition). IEEE Computer Society. https://doi.org/10.1109/CVPR46437.2021.01054

Vancouver

Kryzhanovskiy V, Balitskiy G, Kozyrskiy N, Zuruev A. QPP: Real-Time Quantization Parameter Prediction for Deep Neural Networks. In Proceedings - 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2021. IEEE Computer Society. 2021. p. 10679-10687. (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition). doi: 10.1109/CVPR46437.2021.01054

Author

Kryzhanovskiy, Vladimir ; Balitskiy, Gleb ; Kozyrskiy, Nikolay et al. / QPP: Real-Time Quantization Parameter Prediction for Deep Neural Networks. Proceedings - 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2021. IEEE Computer Society, 2021. pp. 10679-10687 (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition).

BibTeX

@inproceedings{423a3a42cb6a49acad5dafd770bb930b,

title = "QPP: Real-Time Quantization Parameter Prediction for Deep Neural Networks",

abstract = "Modern deep neural networks (DNNs) cannot be effectively used in mobile and embedded devices due to strict requirements for computational complexity, memory, and power consumption. The quantization of weights and feature maps (activations) is a popular approach to solve this problem. Training-aware quantization often shows excellent results but requires a full dataset, which is not always available. Post-training quantization methods, in turn, are applied without fine-tuning but still work well for many classes of tasks like classification, segmentation, and so on. However, they either imply a big overhead for quantization parameters (QPs) calculation at runtime (dynamic methods) or lead to an accuracy drop if pre-computed static QPs are used (static methods). Moreover, most inference frameworks don't support dynamic quantization. Thus we propose a novel quantization approach called QPP: quantization parameter prediction. With a small subset of a training dataset or unlabeled data from the same domain, we find the predictor that can accurately estimate QPs of activations given only the NN's input data. Such a predictor allows us to avoid complex calculation of precise values of QPs while maintaining the quality of the model. To illustrate our method's efficiency, we added QPP into two dynamic approaches: 1) Dense+Sparse quantization, where the predetermined percentage of activations are not quantized, 2) standard quantization with equal quantization steps. We provide experiments on a wide set of tasks including super-resolution, facial landmark, segmentation, and classification.",

author = "Vladimir Kryzhanovskiy and Gleb Balitskiy and Nikolay Kozyrskiy and Aleksandr Zuruev",

note = "Funding Information: 1We acknowledge support of this work by the project “Computational Sciences and Technologies for Data, Content and Interaction” (MIS 5002437) which is implemented under the Action “Reinforcement of the Research and Innovation Infrastructure”, funded by the Operational Programme “Competitiveness, Entrepreneurship and Innovation” (NSRF 2014-2020) and co-financed by Greece and the European Union (European Regional Development Fund). Publisher Copyright: {\textcopyright} 2021 IEEE; 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition ; Conference date: 19-06-2021 Through 25-06-2021",

year = "2021",

doi = "10.1109/CVPR46437.2021.01054",

language = "English",

isbn = "9781665445092",

series = "Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition",

publisher = "IEEE Computer Society",

pages = "10679--10687",

booktitle = "Proceedings - 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2021",

address = "United States",

}

RIS

TY - GEN

T1 - QPP: Real-Time Quantization Parameter Prediction for Deep Neural Networks

AU - Kryzhanovskiy, Vladimir

AU - Balitskiy, Gleb

AU - Kozyrskiy, Nikolay

AU - Zuruev, Aleksandr

N1 - Conference code: CVPR 2021

PY - 2021

Y1 - 2021

N2 - Modern deep neural networks (DNNs) cannot be effectively used in mobile and embedded devices due to strict requirements for computational complexity, memory, and power consumption. The quantization of weights and feature maps (activations) is a popular approach to solve this problem. Training-aware quantization often shows excellent results but requires a full dataset, which is not always available. Post-training quantization methods, in turn, are applied without fine-tuning but still work well for many classes of tasks like classification, segmentation, and so on. However, they either imply a big overhead for quantization parameters (QPs) calculation at runtime (dynamic methods) or lead to an accuracy drop if pre-computed static QPs are used (static methods). Moreover, most inference frameworks don't support dynamic quantization. Thus we propose a novel quantization approach called QPP: quantization parameter prediction. With a small subset of a training dataset or unlabeled data from the same domain, we find the predictor that can accurately estimate QPs of activations given only the NN's input data. Such a predictor allows us to avoid complex calculation of precise values of QPs while maintaining the quality of the model. To illustrate our method's efficiency, we added QPP into two dynamic approaches: 1) Dense+Sparse quantization, where the predetermined percentage of activations are not quantized, 2) standard quantization with equal quantization steps. We provide experiments on a wide set of tasks including super-resolution, facial landmark, segmentation, and classification.

AB - Modern deep neural networks (DNNs) cannot be effectively used in mobile and embedded devices due to strict requirements for computational complexity, memory, and power consumption. The quantization of weights and feature maps (activations) is a popular approach to solve this problem. Training-aware quantization often shows excellent results but requires a full dataset, which is not always available. Post-training quantization methods, in turn, are applied without fine-tuning but still work well for many classes of tasks like classification, segmentation, and so on. However, they either imply a big overhead for quantization parameters (QPs) calculation at runtime (dynamic methods) or lead to an accuracy drop if pre-computed static QPs are used (static methods). Moreover, most inference frameworks don't support dynamic quantization. Thus we propose a novel quantization approach called QPP: quantization parameter prediction. With a small subset of a training dataset or unlabeled data from the same domain, we find the predictor that can accurately estimate QPs of activations given only the NN's input data. Such a predictor allows us to avoid complex calculation of precise values of QPs while maintaining the quality of the model. To illustrate our method's efficiency, we added QPP into two dynamic approaches: 1) Dense+Sparse quantization, where the predetermined percentage of activations are not quantized, 2) standard quantization with equal quantization steps. We provide experiments on a wide set of tasks including super-resolution, facial landmark, segmentation, and classification.

UR - http://www.scopus.com/inward/record.url?scp=85115671004&partnerID=8YFLogxK

UR - https://www.mendeley.com/catalogue/b873af9e-1509-34a3-a13f-ac3918c26a33/

U2 - 10.1109/CVPR46437.2021.01054

DO - 10.1109/CVPR46437.2021.01054

M3 - Conference contribution

AN - SCOPUS:85115671004

SN - 9781665445092

T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

SP - 10679

EP - 10687

BT - Proceedings - 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2021

PB - IEEE Computer Society

T2 - 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition

Y2 - 19 June 2021 through 25 June 2021

ER -

ID: 35397886