How to make a good psychological test

Apr 8, 2006 12:16 GMT  ·  By

We are bombarded with all sorts of statistical studies showing that x is correlated with y, that such percentage of the population does so and so, and so on. But it's very easy to do a messy statistical study, and much more difficult to do good one; so many statisticians are tempted to do bad statistics and are pleased by the public's ignorance because they think they can get away with it. And actually, they can get away with it. So, I'm afraid that many people are simply fooled by these lousy statisticians.

Take the following example: A psychologist wants to create a new psychological test, testing say, the social abilities of people. So he thinks about what "social abilities" might mean and comes up with a set of, say, 50 questions that he thinks are relevant for "social abilities". Then, he takes a group of say, 1 000 people, and talks with each of them, hopefully at some length, in order to determine how social each participant is. Then, he gives them the 50 question quiz. So, now he can tell how the highly social individuals answer the questions, how the highly asocial individuals answer, and how the average individuals answer. Once he has made this correlation between quiz answers on one hand and the known "social abilities" on the other hand, he can start using the quiz to test other people.

What's wrong with this procedure? What's wrong is that this hypothetical psychologist did not test whether the questions he came up with are really relevant for "social abilities". The point is even if the quiz questions would test a completely different thing, his procedure would still have permitted him to make a correlation between the answers and the supposedly tested quality. You can always find some correlation between any two things. And the problem is he then uses this correlation to test other people. But he has no proof that the correlation he has found has any predictive power whatsoever.

So, what the good psychologist would do is this: He would take a second group of 1000 individuals, he would again talk to each, hopefully at some length, and try to determine how social each of them is, and then he would give them the 50 question quiz. If the correlation found with this second group is the same as the first correlation, than it means the questions are truly relevant. Otherwise, it means he is back to the drawing board - the set of questions he came up with proved to have no predictive power and therefore they are useless.

So, you can wonder how many good psychologists are out there. To be sincere, I suspect there is none. (As you may have observed, the good study is overwhelmingly more expensive than the faulty and meaningless one, at least two times more expensive.)

The fact that there are so few psychologists out there is unfortunate, because good psychological quizzes could tell us a lot about ourselves. I suspect a good psychological quiz would probably end up having some curious and non-intuitive questions.

Another problem, less important, but still impossible to overlook, is how the psychologist chooses the 1000 people in the two groups in the first place. These people have to be representative for the entire population. In physics this is known as the ergodic problem and it is much more difficult than you might think. (I have no idea how psychologists' might call it.) You cannot just choose the people at random because the human population is highly non-ergodic (or homogenous). So, you have to partition the human population in relatively ergodic sub-populations.

The psychologist chooses the 1 000 people in the following way: He thinks about things like: people having higher incomes probably resemble each other more than people having low incomes. If he chooses 1 000 people all having high incomes, the group will not be representative for the low income people. But he wants the quiz to be representative for the entire population, not just for high income people. So, he looks at population statistics: how many low income people are there (e.g. x%), and how many high income people are (e.g. y%). So, he then chooses x% of the 1 000 with small incomes, and y% of them with high incomes.

And so on, he comes up with a series of criteria for partitioning the population into supposedly homogenous groups. So, he ends up with a number of groups and he then checks the proportion each such group has in the general population in order to establish the profile the 1 000 people must have. For example, he will have to pick at random 12 people with blue eyes, married and having heart problems, 45 with brown eyes, bachelors and having heart problems and so on.

The problem, again, is that he comes up with how to partition the population. So, he has to check whether the partitioning criteria are really sound. How could he check that? Well, first of all he has to check this independently of checking the relevance of his quiz questions.

For example, is income relevant for social abilities? I would guess not, but that's just a guess. How could I check whether it's relevant or not? Hopefully, you already know the answer: It's not sufficient to take 1 000 people at random and to see what's the correlation between social abilities and income, I would have to take two such groups, one for determining the correlation and another one for testing the correlation (to see whether the correlation is something or is just random).

So, the good psychologist needs two additional 1 000 people groups for each proposed partitioning criteria. After determining what partitioning criteria are relevant for the quality he wants to test he would finally know how to properly choose the two 1 000 people groups on which to determine, and then test, the correlation between "social abilities" and the quiz answers.

This is so complicated that it sounds ridiculous. But that's how complicated human societies are. There's no escape from that. Using bad statistical methods simply because they allow the studies to be simpler is simply wrong - it leads to bad science and irrelevant results. But unfortunately that's what we have today in psychology. I think that the least thing they could do is to take two groups of people in order to test whether the correlations they find are bogus or not.

Finally, the funny thing about psychological tests is that in fact they are overtly non-scientific - if you know to read between the lines: any book of psychological tests has a note saying something like "if you really want to know yourself, you have to go a psychologist, these tests are only informative". Imagine a physics book telling you how to compute the trajectory of a projectile and than telling you something like "if you really want to know the trajectory, you have to go to a physicist; your computations are only informative". This sounds more like what a priest might tell you: "you can read this book (e.g. the Bible), but if you really want to understand it and be saved, you have to go to church".

Cartoon Credit: Dr. Judith Grabiner