Artificial Intelligence-Data Science-Python-R-Deep Learning

January 27, 2023

WORLD MAP | K-MEANS CUSTERING | DATA VISUALIZATION

read the dataset

df <- read.csv("C:/Users/Asus/Desktop/VERİ BİLİMİ YÜKSEK LİSANS/R -Data Visualisation/blog/blog-9/worldcities.csv")

head(df)

> str(df)

'data.frame': 26569 obs. of 11 variables:

$ city : chr "Tokyo" "Jakarta" "Delhi" "Mumbai" ...

$ city_ascii: chr "Tokyo" "Jakarta" "Delhi" "Mumbai" ...

$ lat : num 35.69 -6.21 28.66 18.97 14.6 ...

$ lng : num 139.7 106.8 77.2 72.8 121 ...

$ country : chr "Japan" "Indonesia" "India" "India" ...

$ iso2 : chr "JP" "ID" "IN" "IN" ...

$ iso3 : chr "JPN" "IDN" "IND" "IND" ...

$ admin_name: chr "Tōkyō" "Jakarta" "Delhi" "Mahārāshtra" ...

$ capital : chr "primary" "primary" "admin" "admin" ...

$ population: num 37977000 34540000 29617000 23355000 23088000 ...

$ id : int 1392685764 1360771077 1356872604 1356226629 1608618140 1156073548 1076532519 1410836482 1484247881 1156237133 ...

> summary(df)

install required packages and libraries

install.packages("factoextra") # to plot k-means clusters

library(factoextra)

visual of the cities in the world

city_df<-plot(x = df$lng,y = df$lat,

xlab = "Longitude",

ylab = "Latitude",

main = "Locations of all cities")

city_df

dataset <- subset(df, select = c(lng, lat))

dataset <- dataset[, c(2,1)] # interchanging the columns

head(dataset)

lat lng

1 35.6897 139.6922

2 -6.2146 106.8451

3 28.6600 77.2300

4 18.9667 72.8333

5 14.5958 120.9772

6 31.1667 121.4667

visualize the continents

This code is trying to cluster a dataset into 7 clusters using the k-means algorithm, with 25 different starting points for the algorithm. The last line of code is trying to access the "size" attribute of the output of the k-means function, which would give the number of observations in each cluster.

df_continent <- kmeans(dataset,7,nstart = 25)

df_continent$size # gives the size of each of the 7 clusters

8102 3086 2692 2785 1410 6841 1653

The fviz_cluster() function is from the "factoextra" package, which is a package for visualizing clustering results.
The coord_flip() function is used to flip the coordinates of the plot so that it can be more easily read.

fviz_cluster(df_continent, data = dataset, label=NA)+coord_flip()

If we divided the world into 10 continents, what would it look like. Let's see.

df_continent <- kmeans(dataset,10,nstart = 25)

df_continent$size # gives the size of each of the 6 clusters

# Clusters plot

fviz_cluster(df_continent, data = dataset, label=NA)+coord_flip()

The latitudes and longitudes of the continents

North America: 58.152158,-118.701860
South America: -17.911852,-57.167183
Africa: 6.664615,21.818767
Europe: 51.002693,7.028192
Asia: 48.365006,95.363121
Australia: -31.293584,141.955051

then let's really draw a world map showing the continents

con_real <- matrix(c(58.152158,-118.701860,-17.911852,-57.167183,

6.664615,21.818767,51.002693,7.028192,

48.365006,95.363121,-31.293584,141.955051)

,ncol=2, byrow=TRUE)

# k means clustering

con_real <- kmeans(dataset,centers = con_real,nstart = 25)

con_real$size # size of the continents' clusters

[1] 9555 1744 1555 9640 3542 533

fviz_cluster(con_real, data = dataset, label=NA)+coord_flip()

Notice that Asia isn't geographically clustered perfectly with its nearby continenets.

Search This Blog

RESHAPE THE FUTURE WITH DATA SCIENCE

Comments

Post a Comment

Popular posts from this blog