DO YOU WANT TO VISUALIZE YOUR DATA MORE EASILY!! ๐THEN TRY DATAEXPLORER PACKAGE ๐
The most time-consuming and tedious data science endeavor is cleaning and organizing data. DataExplorer is one of the resources available that has the express goal of reducing the 80% and making it pleasurable. As a result, being exceedingly user-friendly is a basic design principle. One function call is typically all you need.
๐Provides a variety of summary statistics and visualizations for different types of data, including numerical, categorical, and text data.
๐Allows users to easily create plots, tables, and summary statistics for a single variable or for multiple variables.
๐Offers options for customizing the appearance and formatting of plots and tables.
๐Can handle large datasets and missing data.
๐Allows users to easily create plots, tables, and summary statistics for a single variable or for multiple variables.
๐Offers options for customizing the appearance and formatting of plots and tables.
๐Can handle large datasets and missing data.
install and activate the package
>install.packages("DataExplorer")
>library(DataExplorer)
You can find detailed information about dataset in my other blog posts.
library(PimaIndiansDiabetes2)
df = PimaIndiansDiabetes2
Structural Features
> str(df)
'data.frame': 768 obs. of 9 variables:
$ pregnant: num 6 1 8 1 0 5 3 10 2 8 ...
$ glucose : num 148 85 183 89 137 116 78 115 197 125 ...
$ pressure: num 72 66 64 66 40 74 50 NA 70 96 ...
$ triceps : num 35 29 NA 23 35 NA 32 NA 45 NA ...
$ insulin : num NA NA NA 94 168 NA 88 NA 543 NA ...
$ mass : num 33.6 26.6 23.3 28.1 43.1 25.6 31 35.3 30.5 NA ...
$ pedigree: num 0.627 0.351 0.672 0.167 2.288 ...
$ age : num 50 31 32 21 33 30 26 29 53 54 ...
$ diabetes: Factor w/ 2 levels "neg","pos": 2 1 2 1 2 1 2 1 2 2 ...
> plot_missing(df)
>plot_bar(df)
๐The plot_histogram function in the DataExplorer package is used to create a histogram of one or more numerical variables. In this case, the function is being applied to a subset of the variables in a data frame called df.
๐The reason why we only get 3 features here is that it gives an error because other features have null values.
> plot_boxplot(df, by = "age")
๐plot_scatterplot is a function from the DataExplorer package that creates a scatter plot of two variables in a data frame. subset(df) is a function that returns a subset of the data frame df, based on the criteria specified in the subset function. Because no criteria are specified, so the function will return the entire data frame. by = "diabetes" is an argument to the plot_scatterplot function that specifies the variable by which the data should be grouped. In this case, the data will be grouped by the "diabetes" variable.
๐plot_correlation is a function from the DataExplorer package that creates a matrix of scatter plots showing the relationships between pairs of variables in a data frame.
> plot_boxplot(df, by = "diabetes")
> plot_scatterplot(
subset(df),
by = "diabetes")
> plot_correlation(df)







Comments
Post a Comment