topic badge
AustraliaVIC
VCE 11 General 2023

1.01 Types of data

Lesson

Introduction

In statistics, a 'variable' refers to a characteristic of data that is measurable or observable. A variable could be something like temperature, mass, height, make of car, type of animal or goals scored.

Types of data

Data variables can be defined as either numerical or categorical.

  • Numerical data is where each data point is represented by a number. Examples include: number of items sold each month, daily temperatures, heights of people, and ages of a population. The data can be further defined as either discrete (associated with counting) or continuous (associated with measuring). Numerical data is also known as quantitative data.

  • Categorical data is where each data point is represented by a word or label. Examples include: brand names, types of animals, favourite colours, and names of countries. The data can be further defined as either ordinal (it can be ordered) or nominal (un-ordered). Categorical data is also known as qualitative data.

A chart showing categories of data such as numerical and categorical. Ask your teacher for more information.

Discrete numerical data involve data points that are distinct and separate from each other. There is a definite 'gap' separating one data point from the next. Discrete data usually, but not always, consists of whole numbers, and is often collected by some form of counting.

Examples of discrete data
\text{Number of goals scored per match}1,\, 3,\, 0,\, 1,\,2,\,0,\,2,\,4,\,2,\,0,\,1,\,1,\,2,\ldots
\text{Number of children per family}2,\, 3,\, 1,\, 0,\,1,\,4,\,2,\,2,\,0,\,1,\,1,\,5,\,3,\ldots
\text{Number of products sold each day}437,\, 410,\, 386,\, 411,\,401,\,397,\,422,\ldots

In each of these cases, there are no in-between values. We cannot have 2.5 goals or 1.2 people, for example.

This doesn't mean that discrete data always consists of whole numbers. Shoe sizes, an example of discrete data, are often separated by half-sizes. For example, 8,\,8.5,\,9,\,9.5. Even still, there is a definite gap between the sizes. A shoe won't ever come in size 8.145.

Continuous numerical data involves data points that can occur anywhere along a continuum. Any value is possible within a range of values. Continuous data often involves the use of decimal numbers, and is often collected using some form of measurement.

Examples of continuous data
\text{Height of trees in a forest (in metres)}12.359,\, 14.022,\, 14.951,\, 18.276,\,11.032,\ldots
\text{Times taken to run a 10 km race (minutes)}55.34,\, 58.03,\, 57.25,\, 61.49,\,66.11,\,59.87,\ldots
\text{Daily temperature (\degree C)}24.4,\, 23.0,\, 22.5,\, 21.6,\,20.7,\,20.2,\,19.7,\ldots

In practice, continuous data will always be subject to the accuracy of the measuring device being used. So in some sense, continuous data will always appear discrete to some degree. This can make the distinction between continuous and discrete data often unclear. However, think about the type of data being measured to determine whether it's continuous.

For instance, it makes sense that time is continuous, even though we can only measure it to the nearest second (or millisecond, and so on).

The word 'ordinal' means 'ordered'. Ordinal categorical data involves data points, consisting of words or labels, that can be ordered or ranked in some way.

Examples of ordinal data
\text{Product rating on a survey}\text{good,\, satisfactory,\,good,\,excellent,\,excellent,\,good,\,good,\ldots}
\text{Exam grades}A,\, C,\,A,\,B,\,B,\,C,\,A,\,B,\,A,\,A,\,C,\,B,\,A,\,B,\,B,\,B,\,C,\,A,\,C,\ldots
\text{Size of fish in a lake}\text{medium,\, small,\,small,\,medium,\,small,\,large,\,medium,\,large,\ldots}

Examples of ordinal data: product rating on a survey (good, satisfactory, excellent), Level of achievement (high distinction, distinction, credit, pass, fail)

The word 'nominal' basically means 'name'. Nominal categorical data consists of words or labels, that name individual data points that have no clear rank order.

Examples of nominal data
\text{Nationalities in a team}\text{German,\, Austrian,\,Italian,\,Spanish,\,Dutch,\,Italian,\ldots}
\text{Make of car driving} \\ \text{through an intersection}\text{Toyota,\, Holden,\,Mazda,\,Toyota,\,Ford,\,Toyota,\,Mazda,\ldots}
\text{Hair colour of students in a class}\text{blonde,\, red,\,brown,\,blonde,\,black,\,brown,\,black,\,red,\ldots}

Nominal data is often described as unordered because it can't be ordered in a way that is obviously meaningful.

Examples

Example 1

Which one of the following data types is discrete?

A
Your height
B
The time it takes to swim 200 meters
C
Daily temperature
D
The number of pets in your family
Worked Solution
Create a strategy

Choose the option that can be counted but are distinct and separate from each other.

Apply the idea

The correct answer is option D: The number of pets in your family.

Example 2

Classify this data into its correct category: Weights of kittens

A
Quantitative Discrete
B
Qualitative Nominal
C
Quantitative Continuous
D
Qualitative Ordinal
Worked Solution
Create a strategy

Determine if the data is numerical or in categories.

Apply the idea

A weight of a kitten can be measured. So it is numerical or quantitative.

Weight is a measurement that can have any number of decimal places, so it is continuous. The correct answer is Option C.

Idea summary
  • Numerical data is where each data point is represented by a number. The data can be further defined as either discrete (associated with counting) or continuous (associated with measuring). Numerical data is also known as quantitative data.

  • Categorical data is where each data point is represented by a word or label. The data can be further defined as either ordinal (it can be ordered) or nominal (un-ordered). Categorical data is also known as qualitative data.

Level of measurement

To perform statistical analysis of data, it is important to understand what statistics can be meaningfully calculated, interpreted, and compared from a given set of data. We can apply different levels of measurement depending on the properties of data we have.

The four widely applied levels of measurement are:

  • Nominal scale - describes a variable with categories that do not have a natural order or ranking. Examples include employment status, blood type, or eye colour.

  • Ordinal scale - describes a variable with categories together with an explicit ranking. Examples include customer satisfaction rating(very low, low, average, high very high), or exam grades (A, B, C, D, E, F).

  • Interval scale - is a numerical scale which orders the variables and the distances between adjacent values on the scale are equal. An equal scale allows for meaningful interpretation of differences between values. However, the scale lacks a true zero. Examples include temperature in degrees Celsius, pH scale, or dates.

  • Ratio scale - is a numerical scale with all the properties of an interval scale plus the inclusion of a true zero. A true zero allows for meaningful interpretation of ratios of values, and using ratios we can compare the magnitude of values. Examples include temperature measured in kelvin, weight, or speed.

The significant difference between an interval scale and a ratio scale is the inclusion of a 'true zero' or 'absolute zero'. A true zero means the absence of the quantity being measured.

Interval scales such as temperature in degrees Celsius do not have a true zero. 0\degree\text{C} does not mean the absence of heat, the zero of this scale is arbitrary. This means while we can compare the difference between two values, we cannot meaningfully compare the ratio of two values. For example, we could say 30\degree\text{C} is 20\degree\text{C} more than 10\degree\text{C} but we cannot say that 30\degree\text{C} is three times as hot as 10\degree\text{C}.

Ratio scales such as temperature in kelvin, weight, or speed, all contain a zero point that is absolute. For example, 0 \text{ K} does indeed mean the absence of heat and 0 \text{ km} means weightless. The zero of the scale is not an arbitrary number. With ratio data, not only can you meaningfully measure distances between data points (i.e. add and subtract), you can also meaningfully multiply and divide. For example, 40 \text{ km/h} is indeed twice as fast as 20 \text{ km/hr}.

We can compare how the properties of each scale lend themselves to be used to calculate and compare statistics:

NominalOrdinalIntervalRatio
Categorises the valuesYYYY
Ranks values in order YYY
Frequency distributionYYYY
ModeYYYY
Median YYY
Mean YY
Can say one value is _ units greater or smaller than the other YY
Can say one value is _ times greater or smaller than the other Y

Note: we can calculate the mean of ordinal data by assigning numbers to the outcomes. For example, for a survey asking for a customer satisfaction rating of very low, low, average, high, very high, we could assign the numbers 1 to 5 to the outcomes and calculate the mean level of satisfaction reported. However, it is debateable whether this can be meaningfully interpreted, as the original categories are not separated into equal intervals.

Examples

Example 3

Classify this data into its correct category: Population of your town

A
Nominal
B
Ordinal
C
Interval
D
Ratio
Worked Solution
Create a strategy

Use the table:

NominalOrdinalIntervalRatio
Categorises the valuesYYYY
Ranks values in order YYY
Frequency distributionYYYY
ModeYYYY
Median YYY
Mean YY
Can say one value is _ units greater or smaller than the other YY
Can say one value is _ times greater or smaller than the other Y
Apply the idea

We can compare the population through a ratio of values. For example, a population of 2000 is twice as large as a population of 1000. The correct answer is Option C.

Idea summary
  • Nominal scale - describes a variable with categories that do not have a natural order or ranking.

  • Ordinal scale - describes a variable with categories together with an explicit ranking.

  • Interval scale - is a numerical scale which orders the variables and the distances between adjacent values on the scale are equal. An equal scale allows for meaningful interpretation of differences between values. However, the scale lacks a true zero.

  • Ratio scale - is a numerical scale with all the properties of an interval scale plus the inclusion of a true zero. A true zero allows for meaningful interpretation of ratios of values, and using ratios we can compare the magnitude of values.

Outcomes

U1.AoS1.1

types of data, including categorical (nominal or ordinal) or numerical (discrete and continuous)

What is Mathspace

About Mathspace