Frequently Asked Questions About Testing
Why do we use published tests like PSAT and CogAT?
Commercially published tests provide much important information about the pupils, which we cannot get from a teacher-made test. Of course, teachers get a great deal of information about their pupils by observing their day-to-day work in class and by testing their progress with teacher-made tests. Most commercially published tests cover a wide range of skills in one test. Perhaps the most important reason for using commercially published tests is that the school can use the results obtained from them to compare a pupil’s school progress with the school progress of other children throughout the country. These comparisons can be made because the tests are norm-referenced and standardized on a national population.
What does a norm-referenced mean?
Knowing that a pupil got 40 questions right on a test doesn’t give you enough information by itself. How many questions were there? Were they easy or hard? Is 40 a "good," "average," or "poor" score? Often, what we really want to know is how this score compares with the scores of other pupils of the same age or in the same grade. This way of describing performance is called norm-referenced and the numbers that are used to give meaning to a pupil’s performance are called norms, or norm-referenced scores.
What does standardized mean?
The test publisher develops the norms or norm-referenced scores by a process called standardization. In order to find out what scores are high, medium, or low, the test must be given to a large number of schoolchildren across the country. Once the test has been written and the standardization group has been selected, the test publisher must make sure that the test’s directions are so clear and so specific that the test can always be presented in the same way to all pupils. A test which has been written in this way and given to a carefully selected group of pupils in a controlled manner is said to be a standardized test.
How do you get norms from standardization?
The norms are a way of summarizing how the pupils in the standardization group did on the test. In this sense, the pupils make the norms, not the test-maker. One way of doing this is by reporting, for each test, the average score in each grade. These are called grade equivalent norms. Another way is to report what percentage of the pupils in a grade scored at or below a certain score. These are called percentile rank norms. A third type of norm describes how far a pupil’s performance is above or below the average performance for that grade. These are called standard scores. (The most common standard score is a stanine.)
What is a percentile rank?
A percentile rank tells you what percent of the pupils in the norm group got the same score or a lower score on the test. For example, if a score of 25 correct answers on a certain test for fourth graders has a percentile rank of 52, it means that 52 percent of the pupils in the norm group scored 25 or lower on the test. Since the norm group was representative of all fourth graders in the nation, it is estimated that a pupil scoring 25 on the test is performing at a level equal to or above 52% of all the fourth graders in the nation. For most standardized achievement tests, percentile ranks are developed separately for each grade and for a particular time of the year. A score of 25, for example, may have a percentile rank of 52 for a fourth grader in the fall of fourth grade and a percentile rank of 47 in the spring of fourth grade. A percentile rank is not in any sense a "percent correct." It is not the percent of questions the pupil answered correctly, but rather the percent of pupils in the norm group who scored at or below that score.
What is a stanine?
A stanine is a score on a nine-unit scale from 1 to 9, where a score of 5 describes average performance. The highest stanine is 9; the lowest is 1. Stanines are based on the pattern of scores described earlier. Except for 1 and 9, they divide the baseline into equal amounts of the characteristic being measured. Stanine 8 is as far above average (5) as stanine 2 is below average. Remember, stanines, like all other norms, describe comparative, not absolute, performance.
If a child’s reading is "below the norm," does that means he is a poor reader?
Not necessarily. It probably means he is not reading as well as the average American child in his grade, assuming that the test was well standardized. But it doesn’t tell you how well the average child reads. If most of the children in the norm group read "well," the norm or average represents good reading. If most children read poorly, the norm would represent "poor" reading. Whether the norm group reads well or poorly is a judgment the test cannot make. Such decisions must be made by schools and parents.
Are national norms valid for all children?
Yes, national norms do have meaning and significance for all school systems. National norms represent one reality -- they represent the pattern of performance of all the nation’s schoolchildren. All kinds of schools in all parts of the country are represented in that total pattern.
Aren’t there other useful comparisons to be made?
Of course! And there are other kinds of norm groups besides the national norm group. The group chosen for comparison should depend on what information the school needs. It is quite possible and often advisable to compare individual pupils with pupils in a district or city, with other pupils in similar communities nearby, with all pupils in the state, and so on. These regional or local norms are developed in a way similar to that for national norms. However, they describe the pattern of performance for some more narrowly defined group.
Why don’t you have tests that tell you whether or not a pupil has learned a skill, regardless of what other pupils know?
Such tests do exist; they are called objective-referenced or criterion-referenced tests. In fact, the tests teachers use in their own classrooms are more like this kind of test than they are like norm-referenced tests. An objective-referenced or criterion-referenced test is a test which is used to determine whether or not an individual pupil has met an objective or a criterion of performance. Of course, it is not necessary to choose between these two kinds of tests or ways of interpreting test results. Each way of looking at a pupil’s performance provides useful information about what the schools are teaching and about what pupils are learning. Some tests are designed to offer both kinds of interpretation.