A basic issue in research involves classifying the type of data with which we’re working. The most common way to do this was developed by a psychologist named Stanley Smith Stevens in 1946 (and if that’s not true, I blame Wikipedia). Anyway, in this post I’d like to go over the four levels of measurement using examples that are relevant to trading data.
Also known as categorical data, these are values which have no numerical relationship to each other (higher, lower, more, less). Things like names, classifications, or even arbitrary ID numbers are nominal data.
Let’s say we’re studying chart patterns like flags and triangles. These are just categories, so it makes no sense to ask if triangles are bigger or smaller than flags or if double tops are before or after triangles. We could sort them alphabetically of course, but this is just an ordering based on arbitrary names. It has no bearing on how the patterns might affect price movements.
Statistically, it makes sense to look at the “mode” or the most common occurrence of nominal data. So for chart patterns, we might find that flags are very common while double tops are rare. But we can’t take a median or an average of these patterns (what’s the average of flags, triangles and double tops?). We can only count their occurrences.
These are values which we can sort in some ordered way; higher or lower, bigger or smaller, etc. If you can put the data in order, then it’s ordinal! Simple right?
Say we’re looking at economic reports on a site like Forex Factory. Each report is categorized by a red, orange, or yellow icon representing high, medium and low importance. In this case, our categories (Red, Orange, and Yellow) carry more information than simple nominal level data. We can sort them in a meaningful order. Red is higher than Orange which is higher than Yellow.
Just as with nominal data, we can take the mode of ordinal data. So in a given month, we might see that Yellow reports are most common and that Red reports are rare. But with ordinal data we can also find the “median” or the middle value. In this case, the Orange category would be the median because there are the same number of classifications above it as below it (one above and one below).
We can’t find the mean (average) of ordinal data though. What’s the sum of Red, Orange and Yellow divided by three? We know how they’re ordered, but we can’t assign quantities to ordinal data. For that, we need to move to the next level; interval data.
OK, so nominal values can be counted, and ordinal values can be both counted and sorted. Interval level values have these two characteristics plus one more. Not only can we tell that one value is larger or smaller than another, but we can measure the distance between two values. We can also compare that distance to distances between other values.
However, these measurements have no “absolute zero” point, so while we can measure and compare distances, it makes no sense to divide one distance by another. Price change is a good example of this.
A change in a stock’s price from 5 to 8 has the same interval as a change from 50 to 53. The interval is 3 in both cases. We can measure this and compare it to other price changes. For instance, a change from 60 to 67 has an interval of 7 which is larger than an interval of 3. Obvious right? But hold the phone (what phone?) Hold the phone I say!
Here’s what we can’t do with interval data. We can’t meaningfully divide one interval by another. So we couldn’t say that a price change of 6 is twice as much as a price change of 3. Why not? Because it depends on where you started from. A change from 50 to 56 has an interval of 6, with a percentage return of 12%. A change from 5 to 8 has an interval of 3 with a percentage return of 60%. So is the change from 50 to 56 really “twice as great” as the change from 5 to 8? No, it’s not really meaningful to say that.
We can find the mode and median of interval data, just like with ordinal data. But now we can also calculate the mean (average) as well. If a stock rises 4 points in one day, then 8 points the next, then 3 points the next day, the average daily price rise is 15/3 or 5 points per day during this period.
When we’re dealing with raw price change data, we’re dealing with interval data, so we can’t divide one interval by another. For that, we need to turn to ratio level data.
This type of data exists on an absolute scale where the zero point is not arbitrary. Thus it is meaningful to divide one ratio value by another, which is why this is called ratio level data of course!
We’ve seen that changes in price are interval level data, but what about price itself? Is it meaningful to say that a price of $40 per share of GE is twice as high as a price of $20 a share for GE? Yes, because now we have a non-arbitrary absolute zero point.
Another example is percentage price change. In our example from the last section, we had a price change from 5 to 8 and another from 50 to 53. In terms of percentage change, the first was 60% and the second was 6%. So is it meaningful to say that the return of the first was ten times the return of the second? Yes, we can meaningfully divide these values. So while raw price change is only interval level data, percentage price change qualifies as ratio level data.
Ratio data is the highest level of data, and any form of statistical analysis may be performed on it.
Well that wraps up this discussion of levels of measurement. As we go forward in our exploration of markets and money, we’ll frequently have to refer back to this little foray into the world of math.by