WORLD MAP | K-MEANS CUSTERING | DATA VISUALIZATION
read the dataset
df <- read.csv("C:/Users/Asus/Desktop/VERİ BİLİMİ YÜKSEK LİSANS/R -Data Visualisation/blog/blog-9/worldcities.csv")
head(df)
> str(df)
'data.frame': 26569 obs. of 11 variables:
$ city : chr "Tokyo" "Jakarta" "Delhi" "Mumbai" ...
$ city_ascii: chr "Tokyo" "Jakarta" "Delhi" "Mumbai" ...
$ lat : num 35.69 -6.21 28.66 18.97 14.6 ...
$ lng : num 139.7 106.8 77.2 72.8 121 ...
$ country : chr "Japan" "Indonesia" "India" "India" ...
$ iso2 : chr "JP" "ID" "IN" "IN" ...
$ iso3 : chr "JPN" "IDN" "IND" "IND" ...
$ admin_name: chr "Tōkyō" "Jakarta" "Delhi" "Mahārāshtra" ...
$ capital : chr "primary" "primary" "admin" "admin" ...
$ population: num 37977000 34540000 29617000 23355000 23088000 ...
$ id : int 1392685764 1360771077 1356872604 1356226629 1608618140 1156073548 1076532519 1410836482 1484247881 1156237133 ...
> summary(df)
install required packages and libraries
install.packages("factoextra") # to plot k-means clusters
library(factoextra)
visual of the cities in the world
city_df<-plot(x = df$lng,y = df$lat,
xlab = "Longitude",
ylab = "Latitude",
main = "Locations of all cities")
city_df
dataset <- subset(df, select = c(lng, lat))
dataset <- dataset[, c(2,1)] # interchanging the columns
head(dataset)
lat lng
1 35.6897 139.6922
2 -6.2146 106.8451
3 28.6600 77.2300
4 18.9667 72.8333
5 14.5958 120.9772
6 31.1667 121.4667
visualize the continents
This code is trying to cluster a dataset into 7 clusters using the k-means algorithm, with 25 different starting points for the algorithm. The last line of code is trying to access the "size" attribute of the output of the k-means function, which would give the number of observations in each cluster.df_continent <- kmeans(dataset,7,nstart = 25)
df_continent$size # gives the size of each of the 7 clusters
8102 3086 2692 2785 1410 6841 1653
The fviz_cluster() function is from the "factoextra" package, which is a package for visualizing clustering results.The coord_flip() function is used to flip the coordinates of the plot so that it can be more easily read.
fviz_cluster(df_continent, data = dataset, label=NA)+coord_flip()
The latitudes and longitudes of the continents
- North America: 58.152158,-118.701860
- South America: -17.911852,-57.167183
- Africa: 6.664615,21.818767
- Europe: 51.002693,7.028192
- Asia: 48.365006,95.363121
- Australia: -31.293584,141.955051
Comments
Post a Comment