Week 9 – Day 4

Today, I got some output.

I used K-Means to cluster my vectors.

To find anomalies, I use bar charts with “user ID” on the x-axis and “distance to the centroid” on the y-axis. So, for each user’s cluster, I plot a bar chart showing the distance of all the members of the cluster to the centre of the cluster. An abnormal user will be one which is very far from the centroid. The figures below show the bar charts when running the algorithm respectively with 2 and 4 centroids.

Distance to centroids using 2 clusters
Distance to centroids using 4 clusters

All those users lying very far from their centroid are potential abnormal users. Having this, I now need to decide on a final number of centroids. I intend to determine that number by making some analysis (visualization) on my initial vectors.

Leave a comment