Averages are everywhere: batting averages, the Dow Jones Industrial Average, grade point averages, the average temperature this time of year, the average age of death of ex-presidents, and so forth.
Average is probably the most commonly used statistical term. But like most things in statistics, sussing out the term’s true meaning can be tricky. We use it a lot, but we often use it loosely, or even get it wrong. And that can be a big problem. Here are the different kinds of averages, how they are used, and why it’s important to use the right one in the right place.
What Do You Mean by That?
When we say “average” we’re usually using shorthand for the more technical concept of “measure of central tendency,” a statistical term that designates a single value that best represents all the values in a group. Or to put it in less stuffy language, it tells you how much whatever you’re measuring tends to cluster around a central value or middle point.
There are three basic measures of central tendency, and three not-so-basic. The three basic ones are the mean, the median, and the mode. (We’ll get to the not-so-basic later.) As we’ll see below, it matters which method you choose.
The mean, or more precisely, the arithmetic mean, is what we’re most commonly talking about when we say “average.” This is the number you get when you take the sum of all the values in a group and divide that sum by the number of values in the group. For example, if you want to know the average income of the families who live on Chestnut Street, you add all their incomes and divide by the number of families. Here’s the spread of incomes on Chestnut Street:
101 Chestnut — $50,000
102 Chestnut — $50,000
103 Chestnut — $15,000
104 Chestnut — $1,300,000
105 Chestnut — $45,000
Now you total up all the incomes (aren’t you glad Chestnut is a short street?) and divide by five, the number of families. You get $292,000. That’s the mean income of the families on Chestnut Street. We often call the arithmetic mean the average, for short.
Keep Off the Median
The median is the value that’s smack in the middle of a group of values that are arranged in order from lowest to highest (or highest to lowest). In the Chestnut Street example, the median income is $50,000. Finding the median is easier when there are an odd number of values. In the example above, which has five values, 103 Chestnut is right in the middle. To find the median of an even number of values, take the middle two and then find the pair’s arithmetic mean of those two.
Numbers à la Mode
The mode is the value that occurs most frequently. On Chestnut Street the mode income is $50,000, because it comes up twice; the others appear only once each.
But sometimes there is no mode, because there is no value that appears more than once. If the family at 102 Chestnut had made just a few dollars more or less, there would be no mode on Chestnut Street. On the other hand, some data sets may contain multiple modes. (For example, across town on Elm Street four families make $50,000, four make $65,000, and four make $70,000.)
So which method of calculating an average is best? That depends on what you’re trying to find out.
The mean is most useful when there aren’t a lot of outliers or extremes in the data. Because the mean uses all the values in a group, extreme values (low or high) can skew the outcome in one direction or another. Chestnut Street is a good example of this, since the mean is actually somewhat misleading. Most of the people on the street make far less than $292,000 a year. That one overachiever at 104 skews the average to make Chestnut Street look ritzier overall than it is.
The median, on the other hand, doesn’t take into account the extremes, so it’s a useful measure in data sets that contain outliers that could skew the calculation of a mean. Although the median is a better representation of income on Chestnut than the mean, the median is generally more suited to larger data sets. Let’s say you were trying to determine the value that best represented the incomes of everyone in a given state. Then you would probably have a lot of incomes near the middle of the group, and you could ignore the
outliers. That’s why income statistics usually report the median rather than the mean.
The mode, like the median, is not affected by extreme values, so it gives a much better idea of the economic situation on Chestnut Street than the mean does. But the mode doesn’t always work well with numerical values; it’s most useful when your data set isn’t made up of numbers: A good example would be voting patterns in a set of congressional districts. Say you want to know what district has voted for a female candidate most often in the past few years. The mode would give you that.
By All Means
That’s the basics of averages. For most people, those three — mean, median, and mode — are all you need. But for those who really get into this stuff (professional statisticians and their ilk) there are more. There are actually three other kinds of mean: the geometric mean, the weighted mean, and the harmonic mean. But for now, we’ll leave those to the pros.