Understanding Imagine Galileo Benchmark and Formative test scores

Image Galileo Benchmark tests are pre-built comprehensive standardized assessments developed by the Imagine Learning Assessment Design and Research teams. These assessments are analyzed using state-of-the-art statistical procedures, including Item Response Theory (IRT) techniques. They are designed to measure student achievement, growth, and progress toward grade-level or course-based standards mastery.

Benchmarks are typically administered 3 times each year. Image Galileo Benchmark tests are fixed-form assessments, which means all students see the same set of items. Imagine Galileo Benchmarks provide multiple measures of student performance, including advanced statistical measures.

Galileo Formatives are less formal assessments designed for use in the classroom. Imagine Galileo includes pre-built Formatives, but educators can also build their own Imagine Galileo Formatives.


Download the FAQs document for more information. 

Use the table below for a description of the different types of performance data Imagine Galileo provides for Benchmark and Formative tests.

Heading Applicable Test Description
Raw Score Benchmark tests, Formative tests A raw score is the number of points a student earned for an assessment out of the total possible points. This score can also be converted to a percent correct score.
Scaled Score Benchmark tests A scaled score is a raw score that has been adjusted and converted to a standardized scale. Scaled scores allow for accurate comparisons by taking into account the difficulty of the items on an assessment. Scaled scores ensure that students who took a more difficult test are not penalized, and students who took a less difficult test are not given an unfair advantage. Scaled scores also allow for accurately measuring student growth across tests. Only scaled scores can be directly compared across tests that contain unique sets of items that may vary in difficulty. Imagine Galileo generates scaled scores using an analysis based on IRT. In Imagine Galileo, the scaled score is called a Developmental Level (DL) Score. The DL score measures student ability within a grade and subject.


The DL scores take into account the difficulty of the items on the test. Since each Benchmark includes a different set of items, there can be small variations in difficulty across tests. These variations in overall difficulty result in variations in the range of possible DL scores for each Benchmark; however, these variations are typically minimal and are adjusted in the students’ DL scores.

Norm-referenced score Benchmark Tests A norm-referenced score illustrates the student’s position in a norm group. Screening assessments commonly use norm-referenced scores to identify at-risk students. Imagine Galileo Benchmarks provide three different types of norm-referenced scores.
Percentile Rank: A student’s percentile rank indicates the percentage of students in a norm group who scored at or below the student’s score. For Imagine Galileo Benchmarks, the norm group represents a very large group of students across a wide variety of Imagine Galileo clients in multiple states. The norm group includes all students who took Imagine Galileo assessments in the same grade level and subject and at the same time of year. In Imagine Galileo, these scores are based on observed data from a prior year.
Standard Score: A student’s standard score indicates the student’s position in the norm group when the mean is set to zero, and the standard deviation is set at one. Standard scores assume a normal distribution of scores in the norm group. In Imagine Galileo, these scores reflect expectations of student performance given the known characteristics of the items on the assessment. They are model-based values, not observed values, and do not take the time of year into account.
Normal Curve Equivalent (NCE) Score: A student’s NCE score indicates the student’s position in the norm group when the mean is set to 50, and the standard deviation is set to 21.06. NCE scores assume a normal distribution of scores in the norm group. They are model-based values, not observed values, and do not take the time of year into account.
Imagine Galileo can also provide criterion-referenced scores for districts utilizing Imagine Galileo Benchmarks to forecast state test performance. Districts must submit historical state test data to utilize this Imagine Galileo feature.
Lexile® Reading measure Benchmark Tests A Lexile® Reading measure is a scaled score within the Lexile® Framework for Reading. This framework is designed by MetaMetrics to match students with text at the appropriate reading level. Imagine Learning and MetaMetrics conducted a linking study to link the DL-scaled scores from Imagine Galileo English Language Arts Benchmarks to the Lexile® Framework for Reading. This linking study enables Imagine Galileo to report a Lexile® Reading measure for students who take the Imagine Galileo ELA Benchmarks in grades 2-12. Educators can use Lexile® Reading measures along with Lexile® measures for texts to identify appropriately challenging texts for students. Imagine Galileo also provides Lexile® measures for Imagine Galileo text passages used in assessments.
Performance Levels

Benchmark Tests

The Imagine Learning Research team defines a set of cut scores for each Imagine Galileo Benchmark. Cut scores identify the range of Development Level (DL) scores associated with each Performance Level. A student is classified into a Performance Level based on where their DL score falls relative to these cut scores. By default, the cut scores on the benchmark assessments are set to correspond to percentile ranks that were identified to facilitate the classification of students for intervention or enrichment. These cut scores identify four Performance Levels:

Intervene (0-20th percentile)

Monitor (21st-50th percentile)

Support (51st-80th percentile

Enrich (81st-99th percentile)

Risk Levels

Benchmark Tests

Imagine Galileo Risk Levels reflect student performance across all the Benchmarks administered in a grade and subject. An initial Risk Level is provided based on performance on the first Benchmark. This Risk Level is then refined as the student takes additional Benchmarks. The following chart illustrates the possible Risk Levels for a student after each Benchmark administration.

A Risk Level is based on whether a student scored at or above the “proficient” cut score for the Benchmarks taken so far. By default, the “proficient” cut score corresponds to the cut score for the Performance Level labeled Support. The estimate of the student’s risk increases if they do not score as “proficient” on multiple Benchmarks.


If you've only administered one or two tests, the high-risk portion of the Risk Level Summary graph won't appear in your risk level widget. High-risk data only appears after administering the third test.