این متن باید زود ترجمه بشه ، خیلی سرعت ترجمه ام کمه نمی گم بد ترجمه می کنم شاید خیلی وسواس به خرج می دم .
و دایم از گوگل ترنسلیت پدربیامرز کمک میگیرم .
اینم التماس دعای ما : البته قسمتی از ابتدای مقاله است
1.1 The k-means clustering objective
One of the most widely-cited clustering objectives for
data in Euclidean space is the k-means objective. For
a finite set, S, of n points in Rd, and a fixed positive
integer, k, the k-means objective is to choose a set of
k cluster centers, C in Rd, to minimize:
اینجا فرمول داشت
which we refer to as the “k-means cost” of C on X.
This objective formalizes an intuitive measure of goodness
for a clustering of points in Euclidean space. Optimizing
the k-means objective is known to be NP-hard,
even for k = 2 [4]. Therefore the goal is to design approximation
algorithms.
Definition 1. A b-approximate k-means clustering
algorithm, for a fixed constant b, on any input data
set X, returns a clustering C such that: ! X(C) < b·OPTX , where OPTX is the optimum of the k-means
objective on data set X.
Definition 2. An (a, b)-approximate k-means clustering
algorithm, is a b-approximation algorithm that
returns at most a · k centers.
Surprisingly few algorithms have approximation guarantees
with respect to k-means, even in the batch setting.
Even the algorithm known as “k-means” does
not have an approximation guarantee .
Our contribution is a family of online clustering algorithms,
with regret bounds, and approximation guarantees
with respect to the k-means objective, of a novel
form for the online clustering setting. Notably, our
approximation bounds are with respect to the optimal
k-means cost on the entire data stream seen so far,
even though the algorithm is online.
We extend algorithms from [19] and [25] to the unsupervised
learning setting and introduce a flexible
framework in which our algorithms take a set of candidate
clustering algorithms, as experts, and track the
performance of the “best” expert, or best sequence
of experts, for the data. Instead of computing prediction
errors in order to re-weight the experts, the
algorithms compute an approximation to the current
value of the k-means objective obtained by each expert.
Our approach lends itself to settings in which
the user is unsure of which clustering algorithm to use
for a given data stream, and exploits the performance
advantages of any batch clustering algorithms used as
experts. Our algorithms vary in their models of the
time-varying nature of the data; we demonstrate encouraging
performance on a variety of data sets.