What Is Standard Deviation? — From First Principles to the Unbiased Formula

The starting point

What Is Variation in Data?

Suppose someone asks: "What is the average height of high school students?" You would not measure just one student and call it done. You would measure several people and compute the average — because every individual is different. That difference is variation.

Fig.01 — Heights of five students and their deviations (mean = 169 cm)

Student	Height (cm)	Deviation (value − mean)	Deviation²
A	170	+1	1
B	165	−4	16
C	160	−9	81
D	172	+3	9
E	175	+6	36
Mean = 169 cm		Sum = 0 (always)	Sum = 143

Deviation = individual value − mean. The deviations always sum to zero — which is why simple averaging does not work.

The deviation for each individual is their personal value minus the overall mean. Standard deviation is something close to "the average of these deviations" — but there is a problem: if you add the deviations directly, they always cancel out to zero. A fix is needed (explained below).

Why the simple average of deviations is always zero

By definition, the mean is the balance point of all values. When you subtract it from every data point and add the results, the positives and negatives cancel exactly — the sum is always zero, for any dataset. This is a mathematical certainty, not a coincidence. It means "sum all deviations and divide by n" can never describe spread — it always yields zero.

Practical value

What Can You Do with Variation?

① Compare groups with a single number

Two classes can share the same mean height yet have completely different characters depending on their spread.

Fig.02 — Same mean, very different spread

Both classes have mean 169 cm. Class 1 is a tight cluster; Class 2 spans a wide range. The mean alone does not distinguish them — standard deviation does.

② Decide whether a difference in means is real

Class A has a mean height of 170 cm; Class B has 171 cm. Is that 1 cm gap meaningful?

If spread is small (e.g., σ = 0.5 cm) → the 1 cm gap is large relative to spread. Real difference.
If spread is large (e.g., σ = 10 cm) → the 1 cm gap is noise. Not meaningful.

Standard deviation converts that judgment from intuition into a number-backed decision.

Same idea, different domain

Connection to Surface Roughness Ra

The logic of standard deviation — "collect all deviations from a mean and express them as one number" — applies to anything that has "unevenness" or "scatter." Surface texture is one such thing.

Think of a machined surface as a series of high and low points. Each point's distance from the mean plane is a deviation. Aggregate those deviations and you get a number that describes how rough the surface is.

Surface roughness Ra (JIS B 0601)

Ra — the arithmetic mean deviation of a profile — is the average of the absolute distances between each profile point and the mean line, measured over a sampling length. It is a close relative of standard deviation: the same "average deviation from a reference" concept, using absolute values instead of squares to keep the unit linear. Whenever you wonder how to turn a visual observation of "unevenness" into a number, the standard deviation idea is the place to start.

Step-by-step derivation

Building the Formula — Why This Equation?

"If averaging deviations would describe spread, why not just do that?" Because the sum is always zero. Here is the fix, one step at a time.

Step 1

Calculate each deviation

Subtract the mean from each individual value.

deviation = individual value − mean

Step 2

Square each deviation to eliminate negatives

Summing raw deviations gives zero — positives and negatives cancel. Squaring every deviation makes them all positive. (Taking absolute values would also work, but squaring is standard because it is mathematically tractable in later analysis.)

deviation² = (individual value − mean)²

Step 3

Sum all squared deviations → Sum of Squares

Add up all the squared deviations. This total is called the sum of squares (SS). In the example above: 1 + 16 + 81 + 9 + 36 = 143.

SS = Σ(xᵢ − x̄)²

Step 4

Divide by n → Variance

Dividing the sum of squares by the number of data points gives the variance — the average squared deviation. For the example: 143 ÷ 5 = 28.6 cm². Variance is a statistically powerful quantity but its unit is squared (cm², mm², etc.).

variance = SS ÷ n = 28.6 cm²

Step 5

Take the square root → Standard Deviation

The square root undoes the squaring from Step 2, restoring the original unit. √28.6 ≈ 5.35 cm. This is the standard deviation.

σ = √variance = √28.6 ≈ 5.35 cm

Standard deviation formula (dividing by n)

σ = √[ Σ(xᵢ − x̄)² ÷ n ]

xᵢ : each individual data value · x̄ : mean · n : number of data points · Σ : sum over all data points

Fig.03 — The calculation chain

Five steps: deviation → square → sum of squares → variance → standard deviation.

Estimating the population

Unbiased Standard Deviation (n−1)

In real data analysis, you almost never measure the entire population. You measure a sample and estimate the properties of the whole. You cannot measure every high school student in the country — you measure a few hundred and infer from them.

Fig.04 — Population vs. sample

A sample's standard deviation is mathematically biased to be smaller than the true population σ. Dividing by n−1 instead of n corrects for this underestimation.

There is a mathematical fact: standard deviation calculated from a sample tends to underestimate the true population standard deviation. To correct this, the denominator is changed from n to n−1. Making the denominator smaller makes the result larger — pushing the estimate closer to the true value.

Dividing by n Standard Deviation σ

σ = √[ Σ(xᵢ−x̄)² ÷ n ]

Use when:

Describing the data you have (not estimating a broader population)
You measured the entire population (100% inspection)

Excel: STDEV.P

Dividing by n−1 (recommended) Unbiased Std. Dev. s

s = √[ Σ(xᵢ−x̄)² ÷ (n−1) ]

Use when:

Estimating the population from a sample
Virtually all real-world quality analysis

Excel: STDEV.S

Why "unbiased"?

Dividing by n alone produces an estimate that is systematically low — it has a downward bias. Dividing by n−1 removes that bias, making the estimator unbiased (Japanese: 不偏, fuhen). In practice, the difference is small when n is large, but it matters for small samples common in quality work.

Practical tools

Excel Functions

Standard deviation is easy to compute in Excel. Understanding what each function actually does — after building the formula from scratch — makes the choice obvious.

Function	Formula used	When to use
STDEV.S	÷ (n−1) — unbiased	Recommended for almost all practical use. Estimates population from a sample.
STDEV.P	÷ n	Use only when your data IS the entire population (full inspection, descriptive stats only).
STDEV	÷ (n−1) — same as STDEV.S	Legacy function kept for backwards compatibility. Identical behavior to STDEV.S.
VAR.S / VAR.P	Returns variance (before √)	When you need the variance itself. Equal to STDEV.S² or STDEV.P².

When in doubt

Use STDEV.S (or the legacy STDEV). Full inspection is rare in manufacturing; almost every quality dataset is a sample drawn from a broader process or lot. STDEV.S is the right choice by default. The difference between .S and .P is negligible for large samples anyway.

Key takeaways

Summary

Point 01

Deviations always sum to zero

The simple average of deviations is always zero — which is why squaring (then taking the square root) is necessary to describe spread.

Point 02

Variance → √ → Standard Deviation

Variance is the average squared deviation. Taking the square root restores the original unit. That result is standard deviation.

Point 03

Use n−1 in practice (STDEV.S)

Samples underestimate population spread. Dividing by n−1 corrects the bias. Use STDEV.S in Excel for virtually all shop floor analysis.

Point 04

Ra is a cousin of σ

Surface roughness Ra uses the same "average deviation from a reference line" idea. The standard deviation mindset applies anywhere unevenness needs a number.

Memorizing the formula is the starting point, not the destination. Understanding why we square, why we take the square root, and why n−1 corrects for sampling bias opens the door to applying this logic wherever scatter, unevenness, or roughness needs to be quantified. Next time you encounter "variation" — on a chart, on a surface, or in a dataset — you have the tools to turn it into a single, defensible number.

Jaw

Based in Shiga Prefecture, Japan. A 36-year veteran of quality control and precision measurement in the automotive parts industry, specializing in CMM measurement of cylinder blocks and crankshafts, as well as surface texture analysis. Now in a management role focused on developing the next generation of measurement engineers. This blog, Gemba no Memori, shares practical measurement and quality knowledge from the shop floor.

❖

← All Articles