topic badge
AustraliaVIC
VCE 11 General 2023

7.02 Scatterplots

Lesson

Introduction

When given bivariate data as a table of values, a scatterplot can be created to graph the data, where the explanatory variable is shown on the horizontal axis and the response variable is shown on the vertical axis. In this way, each data point is displayed as a point in a two-dimensional coordinate system.

Correlation and scatter plots

An association between two variables is known as a correlation. A correlation may (or may not) signify a relationship between two variables. To identify any correlation between the two variables, there are three things to focus on when analysing a scatterplot:

  • Direction

  • Form

  • Strength

The direction of the scatterplot refers to the pattern shown by the data points. We can describe the direction of the pattern as having positive correlation, negative correlation, or no correlation:

  • Positive correlation

    • A positive correlation occurs when the RV increases as the EV increases.

    • From a graphical perspective, this occurs when the y-coordinate increases as the x-coordinate increases, which is similar to a line with a positive gradient.

  • Negative correlation

    • A negative correlation occurs when the RV decreases as the EV increases.
    • From a graphical perspective, this occurs when the y-coordinate decreases as the x-coordinate increases, which is similar to a line with a negative gradient.

  • No correlation

    • No correlation describes a data set that has no relationship between the variables.

    • This can come in the form of totally unrelated data, or data that indicates no change of RV as the EV changes (like a horizontal straight line, which has zero gradient).

The form of a scatterplot refers to the type of relationship the two variables may appear to share. For example, if the data points lie on or close to a straight line, the scatterplot has a linear form.

Forms other than a line may be apparent in a scatterplot. If the data points lie on or close to a curve, it may be appropriate to infer a non-linear form between the variables.

The strength of a linear correlation relates to how closely the points reassemble a straight line.

  • If the points lie exactly on a straight line then we can say that there is a perfect correlation.

  • If the points are scattered randomly then we can say there is no correlation.

Most scatterplots will fall somewhere in between these two extremes and will display a weak, moderate or strong correlation.

A perfect positive correlation graph where the data points line up on a straight line with a positive gradient.
A perfect negative correlation graph where the data points line up on a straight line with a negative gradient.
A strong positive correlation graph where the points are close to a straight line with a positive gradient.
A strong negative correlation graph where points are close to a straight line with a negative gradient.
A weak positive correlation graph where the relationship is still positive but the points do not lie on a line
A weak negative correlation graph where the relationship is still negative but the points do not lie on a line
A no correlation graph where data points are randomly scattered in the graph.
A no correlation graph where data points are closely clustered and resemble a horizontal line.

Examples

Example 1

Create a scatter plot for the set of data in the table.

x13579
y37111519
Worked Solution
Create a strategy

Draw the scatter plot by plotting each point from the table.

Apply the idea
5
10
x
5
10
15
20
y

Since we are given table of values, then the ordered pairs of points to be plotted on coordinate plane are (1,3),\,(3,7),\,(5,11)\,(7,15) and (9,19).

Example 2

Identify the type of correlation in the following scatter plot.

1
2
3
4
5
6
7
8
9
10
11
x
1
2
3
4
5
6
7
8
9
y
Worked Solution
Create a strategy

Consider whether the points lie approximately in a line with a positive or negative slope.

Apply the idea

As the x-values increase, the y-values increase. The points resemble a straight line with a positive slope so this is a strong positive correlation.

Example 3

Consider the two variables: eye color and IQ. Do you think there is a relationship between them?

Worked Solution
Create a strategy

Consider if a person's eye color has anything to do with their IQ.

Apply the idea

No, there is no relationship between them. A person's eye color does not affect their IQ in any way.

Example 4

The scatter plot shows the relationship between sea temperature and the amount of healthy coral.

A scatter plot showing a negative correlation between sea temperature on the x axis and coral on the y axis.
a

Describe the correlation between sea temperature the amount of healthy coral.

Worked Solution
Create a strategy

Describe what happens to the coral (dependent variable) as the sea temperature (independent variable) increases.

Apply the idea

The sea temperature increases from left to right. We can see from the graph that the coral decreases (falls) from left to right.

So as the sea temperature of bananas increases, the coral decreases. So there is a negative linear relationship between the variables.

b

Which variable is the dependent variable?

A
Level of healthy coral
B
Sea temperature
Worked Solution
Create a strategy

The dependent variable is placed on the vertical axis and is affected by the independent variable.

Apply the idea

The level of healthy coral is determined by sea temperature and is on the vertical axis, making it the dependent variable. So, the correct answer is A.

c

Which variable is the independent variable?

A
Level of healthy coral
B
Sea temperature
Worked Solution
Create a strategy

An independent variable is a variable that stands alone and is not changed by the other variables you are measuring.

Apply the idea

From the previous problem, we know that the level of healthy coral is the dependent variable, so this means the sea temperature is the independent variable. The correct answer is B.

Example 5

The following table shows the number of traffic accidents associated with a sample of drivers of different age groups.

AgeAccidents
2041
2544
3039
3534
4030
4525
5022
5518
6019
6517
a

Construct a scatter plot to represent the above data.

Worked Solution
Create a strategy

Draw the scatter plot by plotting each point from the table.

Apply the idea
20
25
30
35
40
45
50
55
60
65
70
\text{Age}
20
25
30
35
40
45
\text{Accidents}

Age is the independent variable, so should be put on the horizontal axis. So Accidents should be put on the vertical axis.

So the first row from the table corresponds to the point (20,41) on the graph.

b

Is the correlation between a person's age and the number of accidents they are involved in positive or negative?

Worked Solution
Create a strategy

Check the trend of the data on the scatter plot.

Apply the idea

Based on the scatter plot, as one variable increases, the other one decreases. So the correlation between a person's age and the number of accidents is negative.

c

Is the correlation between a person's age and the number of accidents they are involved in strong or weak?

Worked Solution
Create a strategy

Check how closely clustered the data points are.

Apply the idea

Since the points on a scatter plot tend to follow a single line, the correlation is strong.

d

Which age group's data represent an outlier?

A
30-years-olds
B
None of them
C
65-years-olds
D
20-years-olds
Worked Solution
Create a strategy

Check on the scatter plot if any points are positioned away from the rest of the data.

Apply the idea

Based on the scatter plot, there is no outlying point. So, the correct answer is B.

Example 6

Consider the table of values that show four excerpts from a database comparing the income per capita of a country and the child mortality rate of the country. If a scatter plot was created from the entire database, what relationship would you expect it to have?

Income per capitaChild Mortality rate
1\,46567
11\,42816
2\,62135
32\,4689
A
Strongly positive
B
No relationship
C
Strongly negative
Worked Solution
Create a strategy

Order the countries using income per capita from least to greatest.

Apply the idea

If we arrange the table in order of income we get:

Income per capitaChild Mortality rate
1\,46567
2\,62135
11\,42816
32\,4689

As the income per capita increases the child mortality rate decreases, so there is a negative correlation between the two. So, the correct answer is C.

Idea summary

There are three things we focus on when analysing a scatterplot:

  • Form: linear or non-linear, what shape the data has

If it is linear:

  • Direction: positive or negative, whether a line drawn through the data have a positive or negative gradient

  • Strength: strong, moderate, weak - how tightly the points model a line

If there is no connection between the two variables we say there is no correlation.

A positive correlation is when the data appears to gather in a positive direction, similar to a straight line with a positive slope. The variables change in the same direction.

A negative correlation is when the data appears to gather in a negative direction. Similar to a straight line with a negative slope. In other words, as one variable increases, the other one decreases.

When there is no relationship between the variables we say they have no correlation.

Outcomes

U2.AoS1.2

scatterplots and their use in identifying and describing the association between two numerical variables

U2.AoS1.5

use a scatterplot to describe an observed association between two numerical variables in terms of strength, direction and form

What is Mathspace

About Mathspace