Research output: Contribution to journal › Article › peer-review
Exact Algorithms of Search for a Cluster of the Largest Size in Two Integer 2-Clustering Problems. / Kel′manov, A. V.; Panasenko, A. V.; Khandeev, V. I.
In: Numerical Analysis and Applications, Vol. 12, No. 2, 01.04.2019, p. 105-115.Research output: Contribution to journal › Article › peer-review
}
TY - JOUR
T1 - Exact Algorithms of Search for a Cluster of the Largest Size in Two Integer 2-Clustering Problems
AU - Kel′manov, A. V.
AU - Panasenko, A. V.
AU - Khandeev, V. I.
N1 - Publisher Copyright: © 2019, Pleiades Publishing, Ltd.
PY - 2019/4/1
Y1 - 2019/4/1
N2 - We consider two related discrete optimization problems of searching for a subset in a finite set of points in Euclidean space. Both problems are induced by versions of a fundamental problem in data analysis, namely, that of selecting a subset of similar elements in a set of objects. In each problem, given an input set and a positive real number, it is required to find a cluster (i.e., a subset) of the largest size under constraints on a quadratic clusterization function. The points in the input set, which are outside the sought-for subset, are treated as a second (complementary) cluster. In the first problem, the function under the constraint is the sum over both clusters of the intracluster sums of the squared distances between the elements of the clusters and their centers. The center of the first (i.e., the sought-for) cluster is unknown and determined as a centroid, while the center of the second one is fixed at a given point in Euclidean space (without loss of generality, at the origin of coordinates). In the second problem, the function under the constraint is the sum over both clusters of the weighted intracluster sums of the squared distances between the elements of the clusters and their centers. As in the first problem, the center of the first cluster is unknown and determined as a centroid, while the center of the second one is fixed at the origin of coordinates. In this paper, we show that both problems are strongly NP-hard. Also, we present exact algorithms for the problems in which the input points have integer components. If the space dimension is bounded by some constant, the algorithms are pseudopolynomial.
AB - We consider two related discrete optimization problems of searching for a subset in a finite set of points in Euclidean space. Both problems are induced by versions of a fundamental problem in data analysis, namely, that of selecting a subset of similar elements in a set of objects. In each problem, given an input set and a positive real number, it is required to find a cluster (i.e., a subset) of the largest size under constraints on a quadratic clusterization function. The points in the input set, which are outside the sought-for subset, are treated as a second (complementary) cluster. In the first problem, the function under the constraint is the sum over both clusters of the intracluster sums of the squared distances between the elements of the clusters and their centers. The center of the first (i.e., the sought-for) cluster is unknown and determined as a centroid, while the center of the second one is fixed at a given point in Euclidean space (without loss of generality, at the origin of coordinates). In the second problem, the function under the constraint is the sum over both clusters of the weighted intracluster sums of the squared distances between the elements of the clusters and their centers. As in the first problem, the center of the first cluster is unknown and determined as a centroid, while the center of the second one is fixed at the origin of coordinates. In this paper, we show that both problems are strongly NP-hard. Also, we present exact algorithms for the problems in which the input points have integer components. If the space dimension is bounded by some constant, the algorithms are pseudopolynomial.
KW - 2-clustering
KW - Euclidean space
KW - exact algorithm
KW - largest subset
KW - NP-hardness
KW - pseudopolynomial-time solvability
UR - http://www.scopus.com/inward/record.url?scp=85066927865&partnerID=8YFLogxK
U2 - 10.1134/S1995423919020010
DO - 10.1134/S1995423919020010
M3 - Article
AN - SCOPUS:85066927865
VL - 12
SP - 105
EP - 115
JO - Numerical Analysis and Applications
JF - Numerical Analysis and Applications
SN - 1995-4239
IS - 2
ER -
ID: 20537513