Research output: Contribution to journal › Article › peer-review
Probabilistic Clustering for Data Aggregation in Air Pollution Monitoring System. / Shakhov, Vladimir; Sokolova, Olga.
In: Sensors, Vol. 25, No. 23, 7285, 29.11.2025.Research output: Contribution to journal › Article › peer-review
}
TY - JOUR
T1 - Probabilistic Clustering for Data Aggregation in Air Pollution Monitoring System
AU - Shakhov, Vladimir
AU - Sokolova, Olga
N1 - This work was supported by a grant for research centers, provided by the Ministry of Economic Development of the Russian Federation in accordance with the subsidy agreement with the Novosibirsk State University dated 17 April 2025 No. 139-15-2025-006: IGK 000000C313925P3S0002.
PY - 2025/11/29
Y1 - 2025/11/29
N2 - Air pollution monitoring systems use distributed sensors that record dynamic environmental conditions, often producing large volumes of heterogeneous and stochastic data. Efficient aggregation of this data is essential for reducing communication overhead while maintaining the quality of information for decision making. In this paper, we propose an unsupervised learning approach for soft clustering of sensors in air pollution monitoring systems. Our method utilizes the Expectation–Maximization algorithm, which is an unsupervised machine learning method and probabilistic technique, to cluster sensors into distinct sets corresponding to normal and polluted zones. This clustering is driven by the need for a dynamic data transmission policy: sensors in polluted zones must intensify their operation for detailed monitoring, while sensors in clean zones can reduce reporting rates and transmit condensed data summaries to alleviate network load and conserve energy. The cluster membership probability enables a tunable trade-off between data redundancy and monitoring accuracy. The high efficiency of the proposed AI-based clustering is validated by the simulation results. Under common pollution scenarios and with adequate sample sizes, the EM algorithm exhibits a relative error below 5%. The presented approach provides a foundation for a wide range of intelligent and adaptive data aggregation protocols.
AB - Air pollution monitoring systems use distributed sensors that record dynamic environmental conditions, often producing large volumes of heterogeneous and stochastic data. Efficient aggregation of this data is essential for reducing communication overhead while maintaining the quality of information for decision making. In this paper, we propose an unsupervised learning approach for soft clustering of sensors in air pollution monitoring systems. Our method utilizes the Expectation–Maximization algorithm, which is an unsupervised machine learning method and probabilistic technique, to cluster sensors into distinct sets corresponding to normal and polluted zones. This clustering is driven by the need for a dynamic data transmission policy: sensors in polluted zones must intensify their operation for detailed monitoring, while sensors in clean zones can reduce reporting rates and transmit condensed data summaries to alleviate network load and conserve energy. The cluster membership probability enables a tunable trade-off between data redundancy and monitoring accuracy. The high efficiency of the proposed AI-based clustering is validated by the simulation results. Under common pollution scenarios and with adequate sample sizes, the EM algorithm exhibits a relative error below 5%. The presented approach provides a foundation for a wide range of intelligent and adaptive data aggregation protocols.
KW - air quality monitoring
KW - artificial intelligence
KW - expectation–maximization algorithm
KW - mobile sensor networks
KW - smart clustering
KW - unsupervised learning
UR - https://www.scopus.com/pages/publications/105024619239
UR - https://www.mendeley.com/catalogue/b675a693-6b7a-324a-b916-b7c2a566712e/
U2 - 10.3390/s25237285
DO - 10.3390/s25237285
M3 - Article
C2 - 41374659
VL - 25
JO - Sensors
JF - Sensors
SN - 1424-3210
IS - 23
M1 - 7285
ER -
ID: 72827367