GCSE specification fit
Scatter Graphs and Correlation is part of GCSE Maths Statistics.
Use scatter graphs to plot paired data, describe correlation, draw a line of best fit and judge whether an estimate is sensible. Questions may ask for a description in context, an interpolated value or an explanation of why correlation is not the same as causation.
What you will learn
Why this matters
Scatter graphs help spot relationships in real data, but the interpretation needs care.
Prior knowledge
You should already be comfortable with:
Clear explanation
Main idea
A scatter graph shows paired data. Positive correlation means the points tend to rise as you move left to right. Negative correlation means they tend to fall. No correlation means there is no clear pattern.
Method
Plot each pair from the two scales, describe the trend, then draw a line of best fit through the middle of the points if an estimate is needed. Use it mainly for interpolation: estimates inside the range of the data.
If one point is far away from the pattern, call it an outlier or unusual point and decide whether it should affect the best-fit line. Extrapolation, where you estimate outside the plotted range, can be useful for a rough prediction but is much less reliable than interpolation.
Exam tip
Say what the correlation means in the context of the question. Do not write that one variable causes the other unless the question gives extra evidence.
Worked examples
Using a line of best fit
A scatter graph compares revision time with test score. A line of best fit passes through about (2, 38) and (6, 70).
Spotting a weak estimate
Data were plotted for x-values from 4 to 16. A pupil uses the line of best fit to predict y when x = 24.
Judging a best-fit line
A pupil draws a scatter graph with positive correlation. Their line of best fit goes through the origin, but all the plotted points lie between x = 3 and x = 9 and cluster around y = 20 to y = 55.
Quick checks
Choose an answer, then check your thinking.
1. Points rise from left to right. What correlation is shown?
2. An estimate beyond the plotted data range is called what?
Practice questions
Question 1
A scatter graph of outdoor temperature and ice-cream sales slopes upward from left to right. Describe the correlation in context.
Reveal answer and marking guidance
Answer: Positive correlation: as outdoor temperature increases, ice-cream sales tend to increase.
Marking: Name the correlation and refer to both variables in context.
Question 2
The plotted revision times range from 1 hour to 7 hours. Using the best-fit line to estimate the score for 5 hours is interpolation or extrapolation?
Reveal answer and marking guidance
Answer: Interpolation.
Marking: 5 hours lies inside the plotted data range of 1 to 7 hours.
Question 3
A tutor draws a line of best fit for revision time against test score. At 3 hours, the line reads 48 marks. What estimate should you write, and why should it be approximate?
Reveal answer and marking guidance
Answer: About 48 marks.
Marking: Give an approximate value because it is read from a graph.
Question 4
A scatter graph shows positive correlation between shoe size and reading age in a primary school. Why should you avoid saying bigger shoes cause better reading?
Reveal answer and marking guidance
Answer: Correlation does not prove causation; age could explain both variables.
Marking: State that another factor may cause the pattern.
Question 5
A science class plots hours spent revising against quiz score. The line of best fit passes through about (2, 30) and (8, 66). Estimate the score when x = 5 hours.
Reveal answer and marking guidance
Answer: About 48.
Marking: The line rises 36 over 6 x-units, so the gradient is about 6 per x-unit. From x = 2 to x = 5 is 3 units, so 30 + 3 × 6 = 48.
Question 6
The plotted x-values run from 10 to 40. A pupil uses the line of best fit to estimate y when x = 55. What should they say about reliability?
Reveal answer and marking guidance
Answer: The estimate is extrapolation and is less reliable because 55 is outside the plotted data range.
Marking: Name extrapolation and explain that the estimate goes beyond the data shown.
Question 7
A line of best fit passes through about (4, 18) and (10, 42). Estimate y when x = 7.
Reveal answer and marking guidance
Answer: About 30.
Marking: The rise is 24 over 6 x-units, so the gradient is about 4. From x = 4 to x = 7 is 3 units, so 18 + 3 × 4 = 30.
Question 8
Most points follow a downward trend, but one point is far above the rest. What should you call that point?
Reveal answer and marking guidance
Answer: An outlier or unusual point.
Marking: Use accepted correlation vocabulary and say it does not fit the main pattern.
Question 9
A scatter graph of car age and resale value slopes downward from left to right. Describe the correlation in context.
Reveal answer and marking guidance
Answer: Negative correlation: as car age increases, resale value tends to decrease.
Marking: Name negative correlation and link both variables to the real context.
Question 10
A line of best fit passes through about (10, 34) and (50, 74). Estimate y when x = 35, and say whether this is interpolation or extrapolation if the plotted x-values run from 10 to 50.
Reveal answer and marking guidance
Answer: About 59; this is interpolation.
Marking: The line rises 40 over 40 x-units, so the gradient is about 1. From x = 10 to x = 35 is 25 units, so 34 + 25 = 59. The estimate is inside the plotted range.
Answers and marking guidance
The exact practice answers are hidden under each question so you can try first. For scatter graphs, marks usually come from accurate plotting, a clear trend description, a sensible best-fit line and contextual interpretation. Use “about” for graph readings, distinguish interpolation from extrapolation, and avoid causal claims unless the question gives evidence for them.
Common mistakes
- Forgetting the context: write “as temperature increases, sales tend to increase”, not just “positive”.
- Forcing a line through every point: a best-fit line follows the overall trend with points roughly balanced either side.
- Trusting extrapolation too much: estimates outside the data range are usually less reliable.
- Claiming causation: correlation shows association, not proof that one variable causes the other.
Extension challenge
Create a GCSE-style question on scatter graphs and correlation, solve it, then write one sentence explaining why your method works.
Reveal answer
Example answer: A good answer includes a correct method, a checked final answer and a short reason using the key vocabulary from this lesson.
Exam-board guidance
Scatter Graphs and Correlation appears within the statistics content used by the supported GCSE Maths exam boards. The shared skill is to plot paired data accurately, describe the relationship, use a line of best fit sensibly and interpret the result in context.
AQA GCSE Maths
Plot paired data accurately, describe the type and strength of correlation, use a best-fit line for interpolation, and explain that correlation does not prove one variable causes the other. Treat estimates outside the plotted range as less reliable.
OCR GCSE Maths
Give a contextual sentence, not just “positive” or “negative”; check whether the estimate is inside the plotted data range before trusting it.
Pearson Edexcel GCSE Maths
Draw points carefully from the scales, balance the line of best fit through the trend, and avoid statements that claim causation from correlation alone.
Eduqas GCSE Maths
Expect questions that ask for a description in context, a line-of-best-fit estimate, or a comment on reliability of an estimate. Say whether an unusual point affects the trend.
WJEC Wales
Link the correlation to the real variables in the question and be cautious with extrapolation beyond the data shown.
CCEA GCSE Maths
Focus on accurate plotting, clear correlation language, strength of relationship, and whether a best-fit estimate is sensible for the given data range.
Next lesson
Next, continue with Cumulative Frequency.