![]() Performing univariate analysis with Seaborn Now that we are comfortable with the features in our dataset, we can start plotting them to uncover more insights. Looking at the mean of X and Y features, we see that diamonds, on average, have the same height and width.The minimum weight of a diamond is 0.2 carats, while the max is 5.01.The cheapest diamond in the dataset costs $326, while the most expensive costs almost 60 times more , $18,823.Here are some observations from the above output: The describe function displays some critical metrics of each numeric variable in a data frame. Now, let’s print a five-number summary of the dataset: > scribe() There are 53,940 diamonds recorded, along with their ten different features. Instead of counting all variables one by one, we can use the shape attribute of the data frame: > diamonds.shape ![]() Table: the ratio of the height of a diamond to its widest point.Depth: total depth percentage calculated as Z / average(X, Y).Clarity: the clarity of a diamond with eight clarity codes.Color: the color of a diamond with color codes from D (the best) to J (the worst).Cut: the cut quality with five possible values in increasing order: Fair, Good, Very Good, Premium, Ideal.Notice the dataset has ten variables - three categorical and seven numeric. head should be the first function you use when you load a dataset into your environment for the first time. We have used the handy head function of Pandas that prints out the first five rows of the data frame. ![]() Exploring the datasetīefore we dive head-first into visuals, let’s ensure we have a high-level understanding of our dataset: > diamonds.head() You can download the original dataset as a CSV file from here on Kaggle, but we will be using a shortcut: diamonds = sns.load_dataset("diamonds")īecause the dataset is already built into Seaborn, we can load it as pandas.DataFrame using the load_dataset function. ![]() The dataset contains physical measurements of 54,000 diamonds and their prices. Now, let’s import the libraries under their standard aliases: import matplotlib.pyplot as plt Running the below command will install the Pandas, Matplotlib, and Seaborn libraries for data visualization: pip install pandas matplotlib seaborn We will start by installing the libraries and importing our data. ![]() Installing the libraries and loading the data By the end, you’ll have a solid understanding of how to visualize data. You will be introduced to histograms, KDEs, bar charts, and more. In this blog post, we’ll learn how to perform data analysis through visualizations created with Seaborn. To follow along with this project, you’ll also need to know about Pandas, a powerful library that manipulates and analyzes tabular data. Seaborn is another Python data visualization library built on top of Matplotlib that introduces some features that weren’t previously available, and, in this tutorial, we’ll use Seaborn. Matplotlib is the king of Python data visualization libraries and makes it a breeze to explore tabular data visually. So, what are these two libraries, exactly? Matplotlib and Seaborn are widely used to create graphs that enable individuals and companies to make sense of terabytes of data. The majority of data visuals created by data scientists are created with Python and its twin visualization libraries: Matplotlib and Seaborn. Nothing is more satisfying for a data scientist than to take a large set of random numbers and turn it into a beautiful visual. Data visualization occupies a special place at the heart of all data-related professions. ![]()
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |