# section{Cluster in information retrieval, data compression, machine learning, etc.egin{figure}[h!]

section{Cluster Analysis}Clustering is a method of dividing a dataset into segments in such a way that each segment contain sets of points with similar characteristics. Clustering uses statistical data analysis and has a very wide range of applications in machine learning, data mining, etc. egin{figure}h!  centering  includegraphicswidth=30mm{figs/clustering.png}  caption{Stages of clustering}  label{fig:fig38}end{figure}As shown in Figure
ef{fig:fig38}, source data is provided for clustering, then appropriate clustering algorithm (e.g: K-Means, Hierarchical clustering, etc.) are selected with different approaches as discussed here below, then clustered data output is generated.There are different approaches applied during the clustering process. We have extbf{exclusive clustering} and extbf{non-exclusive clustering}.In $exclusive$ $clustering$, source data are partitioned exclusively in such a way that sets of data points are owned by one well-known cluster.In the other hand, $non-exclusive$ $clustering$, source data are owned by many clusters.\  Clustering technique is an unsupervised machine learning process.Clustering is commonly used in information retrieval, data compression, machine learning, etc.egin{figure}h!  centering  includegraphicswidth=80mm{figs/Before_Clustering.png}  caption{Sample Data before Clustering}  label{fig:fig16}end{figure} The analogies of clustering methods can be demonstrated by the distance function.section{K-means Clustering}K-means clustering is an iterative algorithm that aims to detect the local maxima in each iteration.\Example:Knowing $x_1,x_2,x_3, …. , x_n$ observations.With each observation being a real vector component. K-means clustering will split-up all the $n$ observations into $k leq n$ sets $S = {S_1, S_2, … ,S_k}$egin{equation} operatorname*{argmin}_S {sum^k_{i=1}sum_{xepsilon S_i}||x – mu_i||^2} = operatorname*{argmin}_S {sum^k_{i=1}|S_i|Var S_i}end{equation}where $mu_i$ is the mean of points in $S_i$. This is identical to minimizing the squared deviations of points in the same cluster:egin{equation} operatorname*{argmin}_S {sum^k_{i=1}{frac{1}{2|S_i|}sum_{x,yepsilon S_i}||x – y||^2}}end{equation}This identity can be deducted from:egin{equation} sum_{xepsilon S_i}||x – y||^2 = sum_{x
eq yepsilon S_i}(x – mu_i)(mu_i – y)end{equation}So, as the overall variance is consistent, this will be identical to minimizing the squared deviations between different points in the clusters. egin{figure}h!  centering  includegraphicswidth=80mm{figs/K-Means_Clustering.png}  caption{Data set After Clustering}  label{fig:fig17}end{figure}The Figure
ef{fig:fig16} shows data set of longitude and latitude before applying clustering and Figure
ef{fig:fig17} shows the same data set after applying K-Means clustering. As we can see, by applying K-means, it repeatedly finds centroids that match the least variance in the groups and marked with a marker.Clustering helps us analyze the data, as accurate data will belong to the cluster centroid whereas anomalies will be far away from the cluster data sets.