Box plot explanation
Box plot explanation


Approach for numerical variables is useful to use ”BoxP lot”. In descriptive statistics, a box plot (also known as box and whisker plot) is a type of chart often used in explanatory data analysis. Box plots visually show the distribution of numerical data and skewness through displaying the data quartiles (or percentiles) and averages. Box plots show the five-number summary of a set of data: including the minimum score, first (lower) quartile, median, third (upper) quartile, and maximum score [4].

The Box plot visualising quantiles and outliers. 1-external lower, 2-internal lower, 3-internal upper, 4-External upper. An observation is defined as an outlier if it falls out of these 4 ranges. Minimum score, the lowest score, excluding outliers (shown at the end of the left whisker). Lower quartile, twenty-five percent of scores fall below the lower quartile value (also known as the first quartile). The median marks the mid-point of the data and is shown by the line that divides the box into two parts (sometimes known as the second quartile).

Half the scores are greater than or equal to this value and half are less. Upper quartile seventy-five percent of the scores fall below the upper quartile value (also known as the third quartile). Thus, 25% of data are above this value. Maximum score, the highest score, excluding outliers (shown at the end of the right whisker). Whiskers, the upper and lower whiskers represent scores outside the middle 50% (i.e. the lower 25% of scores and the upper 25% of scores). The interquartile range (or IQR), this is the box plot showing the middle 50% of scores (i.e., the range between the 25th and 75th percentile). The representation of the box plot is shown in the figure 4 below.

Box plot explanation
                                                                       Figure 4: Box plot explanation

The IBM data set has 11 numerical variables, as it is shown in the figure 2 above. The box plot of numerical values are shown in the figure 5 below. Another analysis is the density curve. A density curve is called to be symmetric if the mean is equal to the median. Asymmetric: when mean and median do not coincide. Mean greater than the median: skewed to the right. Median greater than the mean: skewed to the left. The third sample moment can be calculated with the formula 3 below. From this you can define the sample skewness as formula 4.

skewness formula
                                                                    Skewness Formula                                                                                                                  First Article :    

Leave a Reply

Your email address will not be published. Required fields are marked *