DO YOU WANT TO VISUALIZE YOUR DATA MORE EASILY!! 👉THEN TRY DATAEXPLORER PACKAGE 👇
The most time-consuming and tedious data science endeavor is cleaning and organizing data. DataExplorer is one of the resources available that has the express goal of reducing the 80% and making it pleasurable. As a result, being exceedingly user-friendly is a basic design principle. One function call is typically all you need.
👉Provides a variety of summary statistics and visualizations for different types of data, including numerical, categorical, and text data.
👉Allows users to easily create plots, tables, and summary statistics for a single variable or for multiple variables.
👉Offers options for customizing the appearance and formatting of plots and tables.
👉Can handle large datasets and missing data.
👉Allows users to easily create plots, tables, and summary statistics for a single variable or for multiple variables.
👉Offers options for customizing the appearance and formatting of plots and tables.
👉Can handle large datasets and missing data.
install and activate the package
>install.packages("DataExplorer")
>library(DataExplorer)
You can find detailed information about dataset in my other blog posts.
library(PimaIndiansDiabetes2)
df = PimaIndiansDiabetes2
Structural Features
> str(df)
'data.frame': 768 obs. of 9 variables:
$ pregnant: num 6 1 8 1 0 5 3 10 2 8 ...
$ glucose : num 148 85 183 89 137 116 78 115 197 125 ...
$ pressure: num 72 66 64 66 40 74 50 NA 70 96 ...
$ triceps : num 35 29 NA 23 35 NA 32 NA 45 NA ...
$ insulin : num NA NA NA 94 168 NA 88 NA 543 NA ...
$ mass : num 33.6 26.6 23.3 28.1 43.1 25.6 31 35.3 30.5 NA ...
$ pedigree: num 0.627 0.351 0.672 0.167 2.288 ...
$ age : num 50 31 32 21 33 30 26 29 53 54 ...
$ diabetes: Factor w/ 2 levels "neg","pos": 2 1 2 1 2 1 2 1 2 2 ...
> plot_missing(df)
>plot_bar(df)
👉The plot_histogram function in the DataExplorer package is used to create a histogram of one or more numerical variables. In this case, the function is being applied to a subset of the variables in a data frame called df.
👉The reason why we only get 3 features here is that it gives an error because other features have null values.
> plot_boxplot(df, by = "age")
👉plot_scatterplot is a function from the DataExplorer package that creates a scatter plot of two variables in a data frame. subset(df) is a function that returns a subset of the data frame df, based on the criteria specified in the subset function. Because no criteria are specified, so the function will return the entire data frame. by = "diabetes" is an argument to the plot_scatterplot function that specifies the variable by which the data should be grouped. In this case, the data will be grouped by the "diabetes" variable.
👉plot_correlation is a function from the DataExplorer package that creates a matrix of scatter plots showing the relationships between pairs of variables in a data frame.
> plot_boxplot(df, by = "diabetes")
> plot_scatterplot(
subset(df),
by = "diabetes")
> plot_correlation(df)
Comments
Post a Comment