WORLD MAP | K-MEANS CUSTERING | DATA VISUALIZATION

read the dataset

df <- read.csv("C:/Users/Asus/Desktop/VERİ BİLİMİ YÜKSEK LİSANS/R -Data Visualisation/blog/blog-9/worldcities.csv")

head(df)


> str(df)

'data.frame': 26569 obs. of  11 variables:

 $ city      : chr  "Tokyo" "Jakarta" "Delhi" "Mumbai" ...

 $ city_ascii: chr  "Tokyo" "Jakarta" "Delhi" "Mumbai" ...

 $ lat       : num  35.69 -6.21 28.66 18.97 14.6 ...

 $ lng       : num  139.7 106.8 77.2 72.8 121 ...

 $ country   : chr  "Japan" "Indonesia" "India" "India" ...

 $ iso2      : chr  "JP" "ID" "IN" "IN" ...

 $ iso3      : chr  "JPN" "IDN" "IND" "IND" ...

 $ admin_name: chr  "Tōkyō" "Jakarta" "Delhi" "Mahārāshtra" ...

 $ capital   : chr  "primary" "primary" "admin" "admin" ...

 $ population: num  37977000 34540000 29617000 23355000 23088000 ...

 $ id        : int  1392685764 1360771077 1356872604 1356226629 1608618140 1156073548 1076532519 1410836482 1484247881 1156237133 ...

> summary(df)



install required packages and libraries

install.packages("factoextra") # to plot k-means clusters

library(factoextra)

visual of the cities in the world

city_df<-plot(x = df$lng,y = df$lat,

             xlab = "Longitude",

             ylab = "Latitude",

             main = "Locations of all cities")

city_df


dataset <- subset(df, select = c(lng, lat))

dataset <- dataset[, c(2,1)] # interchanging the columns 

head(dataset)

        lat         lng

1 35.6897 139.6922

2 -6.2146 106.8451

3 28.6600  77.2300

4 18.9667  72.8333

5 14.5958 120.9772

6 31.1667 121.4667

visualize the continents

This code is trying to cluster a dataset into 7 clusters using the k-means algorithm, with 25 different starting points for the algorithm. The last line of code is trying to access the "size" attribute of the output of the k-means function, which would give the number of observations in each cluster.

df_continent <- kmeans(dataset,7,nstart = 25)

df_continent$size # gives the size of each of the 7 clusters

8102 3086 2692 2785 1410 6841 1653

The fviz_cluster() function is from the "factoextra" package, which is a package for visualizing clustering results.
The coord_flip() function is used to flip the coordinates of the plot so that it can be more easily read.
fviz_cluster(df_continent, data = dataset, label=NA)+coord_flip()


If we divided the world into 10 continents, what would it look like. Let's see.

df_continent <- kmeans(dataset,10,nstart = 25)
df_continent$size # gives the size of each of the 6 clusters
# Clusters plot
fviz_cluster(df_continent, data = dataset, label=NA)+coord_flip()


The latitudes and longitudes of the continents

  • North America: 58.152158,-118.701860
  • South America: -17.911852,-57.167183
  • Africa: 6.664615,21.818767
  • Europe: 51.002693,7.028192
  • Asia: 48.365006,95.363121
  • Australia: -31.293584,141.955051
then let's really draw a world map showing the continents

con_real <- matrix(c(58.152158,-118.701860,-17.911852,-57.167183,
                        6.664615,21.818767,51.002693,7.028192,
                        48.365006,95.363121,-31.293584,141.955051)
                      ,ncol=2, byrow=TRUE)
# k means clustering
con_real <- kmeans(dataset,centers = con_real,nstart = 25)
con_real$size # size of the continents' clusters
[1] 9555 1744 1555 9640 3542  533

fviz_cluster(con_real, data = dataset, label=NA)+coord_flip()


Notice that Asia isn't geographically clustered perfectly with its nearby continenets.



Comments

Popular posts from this blog

LATTICE PACKAGE IN R-DATA VISUALIZATION