Home Exploring Categorical Data
Post
Cancel

Exploring Categorical Data


πŸ“‘ Requirements:
Basic Reading Comprehension
😝 Cool Level:
0/10
➰ Abstract:
Where is this in the framework of Data Science?
Categorical Data is a foundational topic of Statistics which is a core concept of Data Science
Why is it important? How important?
Categorical data is important because it is repeatedly used in all aspect of Data Analysis. It’s pretty important.
Limitations and Connections of this topic?
Categorical data will help you understand the impact of non numerical data. This topic is connected to all of statistics and data science
πŸ“š Further resources?
Yale, UCLA, Code Academy
πŸ‘©β€βš•οΈ Field Experts?
Sir David Cox, C R Rao, Andrew Gelman, Xiao-li Meng, Bradley Efron

What is a Categorical Variable?

Categorical or Nominal Variable that describes data that can be split into groups with no intrinsic ordering

Ordinal is like a categorical variable however there is a clear ordering of the category

How do you represent a categorical variables?

Bar Graphs

A diagram in which the numerical values of variables are represented by the height or length of lines or rectangles of equal width.

Two Way Frequency Tables

Two-way frequency tables are a visual representation of the possible relationships between two sets of categorical data. The categories are labeled at the top and the left side of the table, with the frequency (count) information appearing in the four (or more) interior cells of the table. The β€œtotals” of each row appear at the right, and the β€œtotals” of each column appear at the bottom.

Venn Diagram

A Venn diagram is an illustration that uses circles to show the relationships among things or finite groups of things

Mosaic Plots

A mosaic plot is a special type of stacked bar chart. For two variables, the width of the columns is proportional to the number of observations in each level of the variable plotted on the horizontal axis. The vertical length of the bars is proportional to the number of observations in the second variable within each level of the first variable. Mosaic plots help show relationships and give a visual way to compare groups.

Segmented Bar Chart

A Bar Graph is used to portray the (grouped) frequency distribution of a variable at the Nominal level of measurement. Such variables are called Category Variables in ViSta. It consists of vertical bars drawn above categories such that The height of the bar corresponds to the frequency The bars are separated by empty space since the nominal level of measurement has separate, discrete categories.

Analyzing categorical variables

Joint Relative Frequency: The ratio that compares an intersection of qualitative data to the total frequency. Marginal Relative Frequency: The ratio that compares a qualitative total to the total frequency. Conditional Relative Frequency: A frequency that compares a specific joint relative frequency to a marginal relative frequency.

This post is licensed under CC BY 4.0 by the author.

Welcome

Statistic Review Path