Appendix Contents

Tables and Figures

Abstract

This report provides details about the Student Growth Percentile (SGP) methodology. Topics addressed include an introduction to the concept of student growth and how SGPs approach questions that parents, teachers and other stakeholders have about how students are growing academically. The technical aspects of SGP calculation are covered in detail, and then expanded to show how the concept of adequate growth can be addressed and growth targets established using the SGP methodological framework.

Introduction - Why Student Growth?

Accountability systems constructed according to federal adequate yearly progress (AYP) requirements currently rely upon annual “snap-shots” of student achievement to make judgments about school quality. Since their adoption, such status measures have been the focus of persistent criticism (Linn 2003; Linn, Baker, and Betebenner 2002). Though appropriate for making judgments about the achievement level of students at a school for a given year, they are inappropriate for judgments about educational effectiveness. In this regard, status measures are blind to the possibility of low achieving students attending effective schools. It is this possibility that has led some critics of No Child Left Behind (NCLB) to label its accountability provisions as unfair and misguided and to demand the use of growth analyses as a better means of auditing school quality.

A fundamental premise associated with using student growth for school accountability is that “good” schools bring about student growth in excess of that found at “bad” schools. Students attending such schools - commonly referred to as highly effective/ineffective schools - tend to demonstrate extraordinary growth that is causally attributed to the school or teachers instructing the students. The inherent believability of this premise is at the heart of current enthusiasm to incorporate growth into accountability systems. It is not surprising that the November 2005 announcement by Secretary of Education Spellings for the Growth Model Pilot Program (GMPP) permitting states to use growth model results as a means for compliance with NCLB achievement mandates and the Race to the Top (R2T, RTTT or RTT) competitive grants program were met with great enthusiasm by states (Spellings 2005).

Following these use cases, the primary thrust of growth analyses over the last two decades has been to determine, using sophisticated statistical techniques, the amount of student progress/growth that can be justifiably attributed to the school or teacher - that is, to disentangle current aggregate level achievement from effectiveness (Braun 2005; Rubin, Stuart, and Zanutto 2004; Ballou, Sanders, and Wright 2004; Raudenbush 2004). Such analyses, often called value-added analyses, attempt to estimate the teacher or school contribution to student achievement. This contribution, called the school or teacher effect, purports to quantify the impact on achievement that this school or teacher would have, on average, upon similar students assigned to them for instruction. Clearly, such analyses lend themselves to accountability systems that hold schools or teachers responsible for student achievement.

Despite their utility in high stakes accountability decisions, the causal claims of teacher/school effectiveness addressed by value-added models (VAM) often fail to address questions of primary interest to education stakeholders. For example, VAM analyses generally ignore a fundamental interest of stakeholders regarding student growth: How much growth did a student make? The disconnect reflects a mismatch between questions of interest and the statistical model employed to answer those questions. Along these lines, Harris (2007) distinguishes value-added for program evaluation (VAM-P) and value-added for accountability (VAM-A) - conceptualizing accountability as a difficult type of program evaluation. Indeed, the current climate of high-stakes, test-based accountability has blurred the lines between program evaluation and accountability. This, combined with the emphasis of value-added models toward causal claims regarding school and teacher effects has skewed discussions about growth models toward causal claims at the expense of description. Research (Yen 2007) and personal experience suggest stakeholders are more interested in the reverse: description first that can be used secondarily as part of causal fact finding.

In a survey conducted by Yen (2007), supported by our experience working with state departments of education to implement growth models, parents, teacher, and administrators were asked what “growth” questions were of most interest to them.

As Yen remarks, all these questions rest upon a desire to understand whether observed student progress is “reasonable or appropriate” (Yen 2007). More broadly, the questions seek a description rather than a parsing of responsibility for student growth. Ultimately, questions may turn to who/what is responsible. However, as indicated by this list of questions, they are not the starting point for most stakeholders.

In the following sections, student growth percentiles and percentile growth projections/trajectories are introduced as a means of understanding student growth in both norm-referenced and criterion referenced ways. With these values calculated we show how growth data can be utilized in both a norm- and in a criterion-referenced manner to inform discussion about education quality. We assert that the establishment of a norm-referenced basis for student growth eliminates a number of the problems of incorporating growth into accountability systems providing needed insight to various stakeholders by addressing the basic question of how much a student has progressed (Damian W. Betebenner 2008; Damian W. Betebenner 2009).

Student Growth Percentiles

It is a common misconception that to quantify student progress in education, the subject matter and grades over which growth is examined must be on the same scale - referred to as a vertical scale. Not only is a vertical scale not necessary, but its existence obscures concepts necessary to fully understand student growth. Growth, fundamentally, requires change to be examined for a single construct like math achievement across time - growth in what?

Consider the familiar situation from pediatrics where the interest is on measuring the height and weight of children over time. The scales on which height and weight are measured possess properties that educational assessment scales aspire towards but can never meet.The scales on which students are measured are often assumed to possess properties similar to height and weight but they don’t. Specifically, scales are assumed to be interval where it is assumed that a difference of 100 points at the lower end of the scale refers to the same difference in ability/achievement as 100 points at the upper end of the scale. (See Lord 1975; and Yen 1986 for more detail on the interval scaling in educational measurement.)

An infant male toddler is measured at 2 and 3 years of age and is shown to have grown 4 inches. The magnitude of increase - 4 inches - is a well understood quantity that any parent can grasp and measure at home using a simple yardstick. However, parents leaving their pediatrician’s office knowing only how much their child has grown would likely be wanting for more information. In this situation, parents are not interested in an absolute criterion of growth, but instead in a norm-referenced criterion locating that 4 inch increase alongside the height increases of similar children. Examining this height increase relative to the increases of similar children permits one to diagnose how (a)typical such an increase is.

Given this reality in the examination of change where scales of measurement are perfect, we argue that it is unreasonable to think that in education, where scales are at best quasi-interval (Lord 1975; Yen 1986) one can/should examine growth differently.

Going further, suppose that scales did exist in education similar to height/weight scales that permitted the calculation of absolute measures of annual academic growth for students. The response to a parent’s question such as, “How much did my child progress?”, would be a number of scale score points - an answer that would leave most parents confused wondering whether the number of points is good or bad. As in pediatrics, the search for a description regarding changes in achievement over time (i.e., growth) is best served by considering a norm-referenced quantification of student growth - a student growth percentile (Damian W. Betebenner 2008; Damian W. Betebenner 2009).

A student’s growth percentile (SGP) describes how (a)typical a student’s growth is by examining their current achievement relative to their academic peers - those students beginning at the same place. That is, a student growth percentile examines the current achievement of a student relative to other students who have, in the past, “walked the same achievement path” (see this presentation for a detailed description of the academic peer concept). Heuristically, if the state assessment data set were extremely large (in fact, infinite) in size, one could open the infinite data set and select out those students with the exact same prior scores and compare how the selected student’s current year score compares to the current year scores of those students with the same prior year’s scores - their academic peers. If the student’s current year score exceeded the scores of most of their academic peers, in a norm-referenced sense they have done as well. If the student’s current year score was less than the scores of their academic peers, in a norm-referenced sense they have not done as well.

The four panels of Figure B1 depict what a student growth percentile represents in a situation considering students having only two consecutive achievement test scores.

Figure B1 also serves to illustrate the relationship between the state’s assessment scale and student growth percentiles. The scale depicted in the panels of Figure B1 is not vertical. Thus the comparisons or subtraction of scale scores for individual students is not supported. However, were such a scale in place, the figure would not change. With or without a vertical scale, the conditional distribution can be constructed.

Figure A1: Bivariate student score conditional distribution and associated growth percentile

Bivariate student score conditional distribution and associated growth percentileBivariate student score conditional distribution and associated growth percentileBivariate student score conditional distribution and associated growth percentileBivariate student score conditional distribution and associated growth percentile

In situations where a vertical scale exists, the increase/decrease in scale score points can be calculated and the growth percentile can be understood alongside this change. For example, were the scales presented in Figure B1 vertical, then one can calculate that the student grew 40 points (from 460 to 500) between 2011 and 2012. This 40 points represents the absolute magnitude of change. Quantifying the magnitude of change is scale dependent. For example, different vertical achievement scales in 2011 and 2012 would yield different annual scale score increases: A scale score increase of 40 could be changed to a scale score increase of 10 using a simple transformation of the vertical scale on which all the students are measured. However, relative to other students, their growth has not changed - their growth percentile is invariant to scale transformations common in educational assessment. Student growth percentiles norm-referencedly situate achievement change bypassing questions associated with the magnitude of change, and directing attention toward relative standing which, we would assert, is what stakeholders are most interested in.

To fully understand how many states intend to use growth percentiles to make determinations about whether a student’s growth is sufficient, the next section details specifics of how student growth percentiles are calculated. These calculations are subsequently used to calculate percentile growth projections/trajectories that are used to establish how much growth it will take for each student to reach their achievement targets.

SGP Calculation

Quantile regression is used to establish curvilinear functional relationships between the cohort’s prior scores and their current scores. Specifically, for each grade by subject cohort, quantile regression is used to establish 100 (1 for each percentile) curvilinear functional relationships between the students prior score(s) and their current score. For example, consider 7th graders. Their grade 3, grade 4, grade 5, and grade 6 prior scores are used to describe the current year grade 7 score distribution.For the mathematical details underlying the use of quantile regression in calculating student growth percentiles, see the SGP Estimation section The result of these 100 separate analyses is a single coefficient matrix that can be employed as a look-up table relating prior student achievement to current achievement for each percentile. Using the coefficient matrix, one can plug in any grade 3, 4, 5, and 6 prior score combination to the functional relationship to get the percentile cutpoints for grade 7 conditional achievement distribution associated with that prior score combination. These cutpoints are the percentiles of the conditional distribution associated with the individual’s prior achievement. Consider a student with the following mathematics scores:

Figure A2: Hypothetical student scale scores across 5 years

Hypothetical student scale scores across 5 years
Table B1: Scale scores for a hypothetical student across 5 years in mathematics.
Grade 3 Grade 4 Grade 5 Grade 6 Grade 7
419 418 422 434 436

Using the coefficient matrix derived from the quantile regression analyses based upon grade 3, 4, 5, and 6 scale scores as independent variables and the grade 7 scale score as the dependent variable together with this student’s vector of grade 3, 4, 5, and 6 grade scale scores provides the scale score percentile cutpoints associated with the grade 7 conditional distribution for these prior scores.

Figure A3: Hypothetical student grade 7 percentile cutscores

Hypothetical student grade 7 percentile cutscores
Table B2: Percentile cutscores for grade 7 mathematics based upon the grade 3, 4, 5, and 6 mathematics scale scores given in Table B2.
1st 2nd 3rd 10th 25th 50th 51st 75th 90th 99th
404.8 414.9 419.9 425.9 430.8 435.5 436.3 468.9 487.1 509.8

The percentile cutscores for 7th grade mathematics in Table B2 are used with the student’s actual grade 7 mathematics scale score to establish their growth percentile. In this case, the student’s grade 7 scale score of 436 lies above the 50th percentile cut and below the 51st percentile cut, yielding a growth percentile of 50. Thus, the progress demonstrated by this student between grade 6 and grade 7 exceeded that of 50 percent of their academic peers - those students with the same achievement history. States can qualify student growth by defining ranges of growth percentiles. For example, some states designate growth percentiles between 35 and 65 as being typical. Using Table B2, another student with the exact same grade 3, 4, 5, and 6 prior scores but with a grade 7 scale score of 404, would have a growth percentile of 1, which is designated as low.

This example provides the basis for beginning to understand how growth percentiles in the SGP Methodology are used to determine whether a student’s growth is (in)adequate. Suppose that in grade 6 a one-year (i.e., 7th grade) achievement goal/target of proficiency was established for the student. Using the lowest proficient scale score for 7th grade mathematics, this target corresponds to a scale score of 500 Based upon the results of the growth percentile analysis, this one year target corresponds to 95th percentile growth. Their growth, obviously, is less than this and the student has not met this individualized growth standard.

Percentile Growth Projections/Trajectories

Building upon the example just presented involving only a one-year achievement target translated into a growth standard, this section extends this basic idea and shows how multi-year growth standards are established based upon official state achievement targets/goals. That is, by defining a future (e.g., a 2 year) achievement target for each student, we show how growth percentile analyses can be used to quantify what level of growth, expressed as a per/year growth percentile, is required by the student to reach their achievement target. Unique to the SGP Methodology is the ability to stipulate both what the growth standard is as well as how much the student actually grew in a metric that is informative to stakeholders.

Defining Adequate Growth

Establishing thresholds for growth for each student that can be used to make adequacy judgments requires pre-established achievement targets and a time-frame to reach the target for each student against which growth can be assessed (i.e., growth-to-standard). Three years from the establishment of the target is a typical time frame many states have chosen for purposes of describing students growth to standard. Targets are initially established in the prior academic year, so that in the current year a student is considered to be catching-up to or keeping-up with proficiency. Other targets may also be considered (for example, moving-up to or staying-up with an advanced achievement level).

Using a three year target as an example, these adequacy categories are defined as:

  • Catch-Up Those students currently not proficient (from the prior spring testing) are expected to be proficient within 3 years following the establishment of the achievement target or by the final grade, whichever comes sooner.The establishment of the achievement target occurs in the year prior, therefore the time frame of 3 years includes the current year as “year 1”, which is the year in which the first growth adequacy judgment can be made for the student. The targets are then projected out two years beyond the current year to give a maximum time horizon of 3 years in which to make the adequacy judgement.
  • Keep-Up Those students currently at or above proficient are expected to remain at or above proficient in all of the 3 years following the establishment of the achievement target or by the final grade, whichever comes sooner.
  • Move-Up Those students currently proficient are expected to reach advanced within 3 years following the establishment of the achievement target or by the final grade, whichever comes sooner.
  • Stay-Up Those students currently advanced and are expected to remain advanced in all of the 3 years following the establishment of the achievement target or by the final grade, whichever comes sooner.

The previous definitions specify “3 years following the establishment of the achievement target” as the time frame. For example, an non-proficient 3rd grader would be expected to be proficient by 6th grade. The first check of the student’s progress occurs in 4th grade, when the student’s growth over the last year is compared against targets calculated to assess their progress along a multi-year time-line. The question asked following the 4th grade for the student is: Did the student become proficient and if not are they on track to become proficient within 3 years?

Calculation of Growth Percentile Targets

As mentioned previously, the calculation of student growth percentiles across all grades and students results in the creation of numerous coefficient matrices that relate prior with current student achievement. These matrices constitute an annually updated statewide historical record of student progress. For the SGP Methodology, they are used to determine what level of percentile growth is necessary for each student to reach future achievement targets. For example, imagine that the following coefficient matrices are produced for Mathematics in a state after the annual calculation of student growth percentiles using up to three prior years of test data:

  • Grade 4 Using grade 3 prior achievement.
  • Grade 5 Using grade 4 and grades 3 & 4 prior achievement.
  • Grade 6 Using grade 5, grades 4 & 5, and grades 3, 4, & 5 prior achievement.
  • Grade 7 Using grade 6, grades 5 & 6, grades 4, 5, & 6, and grades 3, 4, 5, & 6 prior achievement.
  • Grade 8 Using grade 7, grades 6 & 7, grades 5, 6, & 7, and grades 4, 5, 6, & 7 prior achievement.

To describe how these numerous coefficient matrices are used together to produce growth targets, consider, for example, a 4th grade student in reading with 3rd and 4th grade state reading scores of 425 (Unsatisfactory) and 440 (Partially Proficient), respectively. The following are the steps that transpire over 3 years to determine whether this student is on track to reach proficient.

  • Spring Year 0 - The growth target for Year 1 is established requiring students to reach state defined achievement levels within 3 years or by grade
    1. In this example, the student under consideration was Partially Proficient in 3rd grade (in Year 0) and is expected to be proficient by grade 6 in Year 3.
  • Spring Year 1 - B-ecause our example student was not proficient based on their prior year test score her initial status for the current year is a catching-up student. We want to see if the growth she demonstrated in Year 1 was adequate enough to make her proficient, or at least put her on a trajectory towards proficiency within the next two years. Employing the coefficient matrices derived in the calculation of Year 1 student growth percentiles:
    1. The coefficient matrix relating grade 4 with grade 3 prior achievement is used to establish the percentile cuts (i.e., one-year growth percentile projections/trajectories). If the student’s actual Year 1 growth percentile exceeds the percentile cut associated with proficient, then the student’s one year growth is enough to reach proficient.Checking growth adequacy using one-year achievement targets is equivalent to confirming whether the student reached their one-year achievement target since the coefficient matrices used to produce the percentile cuts are based on current data.
    2. The 2 year growth percentile projections/trajectories are calculated, extending from Year 0 to Year 2. The student’s actual grade 3 scale score together with the 99 hypothetical one-year growth percentile projections/trajectories derived in the previous step are plugged into the Year 1 coefficient matrix relating grade 5 with grade 3 & 4 prior achievement. This yields the percentile cuts for the student indicating what consecutive two-year 1st through 99th percentile growth will lead to.Two or more year growth targets are estimated based upon the most recent student growth histories in the state. In this example, estimates for growth that will be needed in the 5th and 6th grades are based on students in 5th and 6th grades (concurrently) in Year 1. The student’s Year 1 growth percentile is compared to the 2 year growth percentile cut required to reach proficiency. If the student’s growth percentile exceeds this target, then the student is deemed on track to reach proficiency by the 5th grade.
    3. Last, the 3 year growth percentile projections/trajectories are established. The student’s actual grade 3 scale score together with the 99 hypothetical 1 and 2 year growth percentile projections/trajectories derived in the previous two steps are plugged into the coefficient matrix relating grade 6 with prior achievement in grades 3, 4, & 5. This yields the percentile cuts for each student indicating what three consecutive years of 1st through 99th percentile growth will lead to in terms of future achievement. The student’s observed Year 1 growth percentile is again compared to the percentile cut required to reach proficiency, and if it meets or exceeds it her growth is deemed adequate enough to reach proficiency by the 6th grade.
  • Spring Year 1/Fall Year 2 - The growth target for Year 2 is now established. The student in this example has now presumably completed grade 4 and beginning grade 5 in the Fall. She was again Partially Proficient in 4th grade and is now expected to be on track to proficient by grade 7 in Year 4.
  • Spring Year 2 - Employing the coefficient matrices derived in the calculation of Year 2 student growth percentiles:
    1. The coefficient matrix relating grade 5 with grade 3 & 4 prior achievement is used to establish 99 percentile cuts (i.e., one-year growth percentile projections/trajectories). If the student’s actual Year 2 growth percentile exceeds the cut associated with proficient, then the student’s one year growth was enough to reach proficient.
    2. The student’s actual scores from grades 3 & 4 together with the 99 hypothetical one-year growth percentile projections/trajectories derived in the previous step are plugged into the coefficient matrix relating grade 6 with grade 3, 4, & 5 prior achievement. This yields 99 percentile cuts (i.e., 2 year growth percentile projections/trajectories) for the student indicating what consecutive two-year 1st through 99th percentile growth will lead to in terms of future achievement. The student’s Year 2 growth percentile is compared to the 2 year growth percentile cut required to reach proficiency. If the student’s growth percentile meets or exceeds it then the student is deemed on track to reach proficient.
    3. The 3 year growth percentile projections/trajectories are established. The student’s actual grades 3 & 4 scale scores together with the 99 hypothetical 1 and 2 year growth percentile projections/trajectories derived in the previous two steps are plugged into the coefficient matrix relating grade 7 with prior achievement in grades 3, 4, 5 & 6. This yields the percentile cuts for each student indicating what three consecutive years of 1st through 99th percentile growth will lead to in terms of future achievement. The student’s observed Year 2 growth percentile is again compared to the percentile cut required to reach proficiency, and if it exceeds it her growth is deemed adequate enough to reach proficiency by the 7th grade.

This process repeats in a similar fashion as the student progresses from one grade to the next, year after year. The complexity of the process just described is minimized by the use of the R Software Environment (R Core Team 2022) in conjunction with an open source software package SGP (Damian W. Betebenner et al. 2022) developed by the National Center for the Improvement of Educational Assessment in consultation with the state department of education to calculate student growth percentiles and percentile growth projections/trajectories. Every year, following the completion of the test score reconciliation, student growth percentiles and percentile growth trajectories are calculated for each student. Once calculated, these values are easily used to make the yes/no determinations about the adequacy of each student’s growth relative to their fixed achievement targets.

System-wide Growth and Achievement Charts

Operational work calculating student growth percentiles with state assessment data yields a large number of coefficient matrices derived from estimating Equation 4 (see the SGP Estimation section below). These matrices, similar to a lookup table, “encode” the relationship between prior and current achievement scores for students in the norm group (usually an entire grade cohort of students for the state) across all percentiles and can be used both to qualify a student’s current level growth as well as predict, based upon current levels of student progress, what different rates of growth (quantified in the percentile metric) will yield for students statewide.

When rates of growth necessary to reach performance standards are investigated, such calculations are often referred to as “growth-to-standard”. These analyses serve a dual purpose in that they provide the growth rates necessary to reach these standards and also shed light on the standard setting procedure as it plays out across grades. To establish growth percentiles necessary to reach different performance/achievement levels, it is necessary to investigate what growth percentile is necessary to reach the desired performance level thresholds based upon the student’s achievement history.

Establishing criterion referenced growth thresholds requires consideration of multiple future growth/achievement scenarios. Instead of inferring that prior student growth is indicative of future student growth (e.g., linearly projecting student achievement into the future based upon past rates of change), predictions of future student achievement are contingent upon initial student status (where the student starts) and subsequent rates of growth (the rate at which the student grows). This avoids fatalistic statements such as, “Student \(X\) is projected to be (not) proficient in two years” and instead promotes discussions about the different rates of growth necessary to reach future achievement targets: “In order that Student \(X\) reach/maintain proficiency within two years, she will have to demonstrate \(n^{th}\) percentile growth consecutively for the next two years.” The change in phraseology is minor but significant. Stakeholder conversations turn from “where will (s)he be” to “what will it take?”

Parallel growth/achievement scenarios are more easily understood with a picture. Using the results of a statewide assessment growth percentile analyses, Figures B2 and B3 depict future growth scenarios in mathematics for a student starting in third grade and tracking that student’s achievement time-line based upon different rates of annual growth expressed in the growth percentile metric. The figures depict the four state achievement levels across grades 3 to 10 in shades of red to light blue (Unsatisfactory, Partially Proficient, Proficient and Advanced) together with the 2022 achievement percentiles (inner most vertical axis) superimposed in white. Beginning with the student’s achievement starting point at grade 3, a grade 4 achievement projection is made based upon the most recent growth percentile analyses derived using prior 3rd to 4th grade student progress. More specifically, using the coefficient matrices derived in the quantile regression of grade 4 on grade 3 (see Equation 4), predictions of what 10th, 35th, 50th, 65th, and 90th percentile growth lead to are calculated. Next, using these seven projected 4th grade scores combined with the student actual 3rd grade score, 5th grade achievement projections are calculated using the most recent quantile regression of grade 5 on grades 3 and 4. Similarly, using these seven projected 5th grade scores, the 5 projected 4th grade scores with the students actual third grade score, achievement projections to the 6th grade are calculated using the most recent quantile regression of grade 6 on grades 3, 4, and 5. The analysis extends recursively for grades 6 to 10 yielding the percentile growth trajectories in Figures B2 and B3. The figures allow stakeholders to consider what consecutive rates of growth, expressed in growth percentiles, yield for students starting at different points.

Figure A4: Growth and achievement plot: math level 1/2 cutpoint

Growth and achievement plot: math level 1/2 cutpoint

Figure A5: Growth and achievement plot: reading level 2/3 cutpoint

Growth and achievement plot: reading level 2/3 cutpoint

Figure B2 depicts percentile growth trajectories in mathematics for a student beginning at the threshold between achievement level 1 and achievement level 2. Based upon the achievement percentiles depicted (the white contour lines), approximately 25 percent of the population of 3rd graders rate as “Partially Proficient” or below. Moving toward grade 8, the percentage of Partially Proficient students increases to near 45 percent. The dashed, colored lines in the figure represent seven different growth scenarios for the student based upon consecutive growth at a given growth percentile, denoted by the right axis. At the lower end, for example, consecutive 10th percentile growth leaves the student, unsurprisingly, mired in the Unsatisfactory category. Consecutive 10th, through 60th percentile growth also leave the student in the Partially Proficient category. Even consecutive 65th percentile growth may not be enough to lift these students above Partially Proficient into the Proficient category. This demonstrates how difficult probabilistically, based upon current rates of progress, it is for students to move up in performance level in math statewide. Considering a goal of reaching proficient (next to top region) by 8th grade, a student would need to demonstrate growth percentiles consecutively in excess of 65 to reach this achievement target indicating how unlikely such an event currently is. In light of policy mandates for universal proficiency, the growth necessary for non-proficient students to reach proficiency, absent radical changes to growth rates of students statewide, is likely unattainable for a large percentage of non-proficient students.

Figure B3 depicts percentile growth trajectories in reading for a student beginning at the level 2/level 3 threshold in grade 3. In a normative sense, the performance standards in reading are more demanding than those in mathematics (particularly in the higher grades) with approximately 20-30 percent of students are Partially Proficient in grades 3 to 10. The dashed, colored lines in the figure represent growth scenarios for the hypothetical student based upon consecutive growth at a the given growth percentile. Compared with the growth required in mathematics, more modest growth is required to maintain proficiency in reading. Typical growth (50th percentile growth) appears adequate for such a student to move up into the proficiency category by the end of 10th grade.

SGP Estimation

Calculation of a student’s growth percentile is based upon the estimation of the conditional density associated with a student’s score at time \(t\) using the student’s prior scores at times \(1, 2, \ldots, t-1\) as the conditioning variables. Given the conditional density for the student’s score at time \(t\), the student’s growth percentile is defined as the percentile of the score within the time \(t\) conditional density. By examining a student’s current achievement with regard to the conditional density, the student’s growth percentile situates the student’s outcome at time \(t\) taking account of past student performance. The percentile result reflects the likelihood of such an outcome given the student’s prior achievement. In the sense that the student growth percentile translates to the probability of such an outcome occurring (i.e., rarity), it is possible to compare the progress of individuals not beginning at the same starting point. However, occurrences being equally rare does not necessarily imply that they are equally “good.” Qualifying student growth percentiles as “(in)adequate,” “good,” or as satisfying “a year’s growth” is a standard setting procedure requiring external criteria (e.g., growth relative to state performance standards) combined with the wisdom and judgments of stakeholders.

Estimation of the conditional density is performed using quantile regression (Koenker 2005). Whereas linear regression methods model the conditional mean of a response variable \(Y\), quantile regression is more generally concerned with the estimation of the family of conditional quantiles of \(Y\). Quantile regression provides a more complete picture of both the conditional distribution associated with the response variable(s). The techniques are ideally suited for estimation of the family of conditional quantile functions (i.e., reference percentile curves). Using quantile regression, the conditional density associated with each student’s prior scores is derived and used to situate the student’s most recent score. Position of the student’s most recent score within this density can then be used to characterize the student’s growth. Though many state assessments possess a vertical scale, such a scale is not necessary to produce student growth percentiles.

In analogous fashion to the least squares regression line representing the solution to a minimization problem involving squared deviations, quantile regression functions represent the solution to the optimization of a loss function (Koenker 2005). Formally, given a class of suitably smooth functions, \(\mathcal{G}\), one wishes to solve

\[\hspace{2pt} \text{(1)} \hspace{55pt} \mathit{arg} \;\mathit{min}_ {g \in \mathcal{G}} \sum_ {i=1}^n \rho_ {\tau} (Y(t_ i) - g(t_ i)), \]

where \(t_i\) indexes time, \(Y\) are the time dependent measurements, and \(\rho_{\tau}\) denotes the piecewise linear loss function defined by

\[\hspace{2pt} \text{(2)} \hspace{55pt} \rho_ {\tau} (u) = u \cdot (\tau - I(u < 0)) = \begin{cases} u \cdot \tau & u \geq 0 \\ u \cdot (\tau - 1) & u < 0. \end{cases} \]

The elegance of the quantile regression Expression 1 can be seen by considering the more familiar least squares estimators. For example, calculation of \(\mathit{arg} \;\mathit{min} \sum_ {i=1}^n (Y_ i - \mu)^2\) over \(\mu \in \mathbb{R}\) yields the sample mean. Similarly, if \(\mu(x) = x^{\prime} \beta\) is the conditional mean represented as a linear combination of the components of \(x\), calculation of \(\mathit{arg} \;\mathit{min} \sum_ {i=1}^n (Y_ i - x_ i^{\prime} \beta)^2\) over \(\beta \in \mathbb{R}^p\) gives the familiar least squares regression line. Analogously, when the class of candidate functions \(\mathcal{G}\) consists solely of constant functions, the estimation of Expression 1 gives the \(\tau\)th sample quantile associated with \(Y\). By conditioning on a covariate \(x\), the \(\tau\)th conditional quantile function is given by

\[\hspace{2pt} \text{(3)} \hspace{55pt} Q_y (\tau | x) = \mathit{arg} \;\mathit{min}_ {\beta \in \mathbb{R}^{^p}} \sum_{i=1}^n \rho_{\tau} (y_i - x_i^{\prime} \beta). \]

In particular, if \(\tau=0.5\), then the estimated conditional quantile line is the median regression line.For a detailed treatment of the procedures involved in solving the optimization problem associated with Expression 1, see (Koenker 2005), particularly Chapter 6.

Following Wei and He (2006), we parameterize the conditional quantile functions as a linear combination of B-spline cubic basis functions. B-splines are employed to accommodate non-linearity, heteroscedasticity and skewness of the conditional densities associated with values of the independent variable(s). B-splines are attractive both theoretically and computationally in that they provide excellent data fit, seldom lead to estimation problems (Harrell 2001), and are simple to implement in available software.

Figure B4 gives a bivariate representation of linear and B-splines parameterization of decile growth curves. The assumption of linearity imposes conditions upon the heteroscedasticity of the conditional densities. Close examination of the linear deciles indicates slightly greater variability for higher grade 5 scale scores than for lower scores. By contrast, the B-spline based decile functions better capture the greater variability at both ends of the scale score range together with a slight, non-linear trend to the data.

Figure A6: Linear and B-spline conditional deciles for bivariate math data

Linear and B-spline conditional deciles for bivariate math dataLinear and B-spline conditional deciles for bivariate math data

Calculation of student growth percentiles is performed using R (R Core Team 2022), a language and environment for statistical computing, with SGP package (Damian W. Betebenner et al. 2022). Other possible software (untested with regard to student growth percentiles) with quantile regression capability include SAS and Stata. Estimation of cohort referenced student growth percentiles is conducted using all available prior data, subject to certain suitability conditions. Estimation of baseline referenced student growth percentiles typically uses a restricted number of prior years’ data (for example, some states have used a maximum of two prior years’ data). Given assessment scores for \(t\) occasions, (\(t \geq 2\)), the \(\tau\)th conditional quantile for \(Y_ t\) based upon \(Y_ {t-1}, Y_ {t-2}, \ldots, Y_1\) is given by

\[\hspace{2pt} \text{(4)} \hspace{55pt} Q_ {Y_ t} (\tau | Y_ {t-1}, \ldots, Y_ 1) = \sum_ {j=1}^{t-1} \sum_ {i=1}^3 \phi_ {ij}(Y_ j)\beta_ {ij}(\tau), \]

where \(\phi_ {i,j}\), \(i=1,2,3\) and \(j=1, \ldots, t-1\) denote the B-spline basis functions. Currently, bases consisting of 7 cubic polynomials are used to “smooth” irregularities found in the multivariate assessment data. A bivariate rendering of this is found is Figure B4 where linear and B-spline conditional deciles are presented. The cubic polynomial B-spline basis functions model the heteroscedasticity and non-linearity of the data to a greater extent than is possible using a linear parameterization.

The B-spline basis functions require the selection of boundary and interior knots. Boundary knots are end points outside of the scale score distribution that anchor the B-spline basis. These are generally selected by extending the range of scale scores by 10%. That is, they are defined as lying 10% below the lowest obtainable (or observed) scale score (LOSS) and 10% above the highest obtainable scale score (HOSS). The interior knots are the internal breakpoints that define the spline.

The default choice in the SGP package (Damian W. Betebenner et al. 2022) is to select the 20th, 40th, 60th and 80th quantiles of the observed scale score distribution. In general the knots and boundaries are computed using a distribution from several years of compiled test data (i.e. multiple cohorts combined into a single distribution) so that any irregularities in a single year are smoothed out. Subsequent annual analyses then use these same knots and boundaries as well.

Finally, it should be noted that the independent estimation of the regression functions can potentially result in the crossing of the quantile functions. This occurs near the extremes of the distributions and is potentially more likely to occur given the use of non-linear functions. The result of allowing the quantile functions to cross in this manner would be lower percentile estimations of growth for higher observed scale scores at the extremes (give all else equal in prior scores) and vice versa. In order to deal with these contradictory estimates, quantile regression results are isotonized to prevent quantile crossing following the methods derived by Chernozhukov, Fernandez-Val and Glichon (2010).

Discussion of Model Properties

Student growth percentiles possess a number of attractive properties from both a theoretical as well as a practical perspective. Foremost among practical considerations is that the percentile descriptions are familiar and easily communicated to teachers and other non-technical stakeholders. Furthermore, implicit within the percentile quantification of student growth is a statement of probability. Questions of “how much growth is enough?” or “how much is a year’s growth?” ask stakeholders to establish growth percentile thresholds deemed adequate. These thresholds establish growth standards that translate to probability statements. In this manner, percentile based growth forms a basis for discussion of rigorous yet attainable growth standards for all children supplying a norm-referenced context for Linn’s existence proof (Linn 2003) with regard to student level growth.

In addition to practical utility, student growth percentiles possess a number of technical attributes well suited for use with assessment scores. The more important theoretical properties of growth percentiles include:

Formally, given a monotone transformation \(h\) of a random variable \(Y\),

\[\hspace{2pt} \text{(5)} \hspace{55pt} Q_ h(Y)|X (\tau | X) = h(Q_ Y|X (\tau | X)). \]

This result follows from the fact that \(\Pr (T < t | X) = \Pr (h(T) < h(t) | X)\) for monotone \(h\). It is important to note that equivariance to monotone transformation does not, in general, hold with regard to least squares estimation of the conditional mean. That is, except for affine transformations \(h\), \(E(h(Y)|X) \not= h(E(Y|X))\). Thus, analyses built upon mean based regression methods are, to an extent, scale dependent.

References

Ballou, Dale, William Sanders, and Paul Wright. 2004. “Controlling for Student Background in Value-Added Assessment for Teachers.” Journal of Educational and Behavioral Statistics 29 (1): 37–65.
Betebenner, Damian W. 2009. “Norm- and Criterion-Referenced Student Growth.” Educational Measurement: Issues and Practice 28 (4): 42–51.
Betebenner, Damian W. 2008. “Toward a Normative Understanding of Student Growth.” In The Future of Test-Based Educational Accountability, edited by Katherine E. Ryan and Lorrie A. Shepard, 155–70. New York: Taylor & Francis.
Betebenner, Damian W., Adam VanIwaarden, Ben Domingue, and Yi Shang. 2022. SGP: Student Growth Percentiles & Percentile Growth Trajectories. sgp.io.
Braun, Henry I. 2005. “Using Student Progress to Evaluate Teachers: A Primer on Value-Added Models.” Princeton, New Jersey: Educational Testing Service.
Chernozhukov, Victor, Ivan Fernandez-Val, and Alfred Galichon. 2010. “Quantile and Probability Curves Without Crossing.” Econometrica 78 (3): 1093–1125.
Harrell, F. E. 2001. Regression Modeling Strategies. New York: Springer.
Harris, Douglas N. 2007. “The Policy Uses and ‘Policy Validity’ of Value-Added and Other Teacher Quality Measures.” Princeton, NJ: Educational Testing Service.
Koenker, Roger. 2005. Quantile Regression. Cambridge: Cambridge University Press.
Linn, Robert L. 2003. “Accountability: Responsibility and Reasonable Expectations.” Los Angeles, CA: Center for the Study of Evaluation, CRESST.
Linn, Robert L., Eva L. Baker, and Damian W. Betebenner. 2002. “Accountability Systems: Implications of Requirements of the No Child Left Behind Act of 2001.” Educational Researcher 31 (6): 3–16.
Lord, Frederick M. 1975. “The ‘Ability’ Scale in Item Characteristic Curve Theory.” Psychometrika 20: 299–326.
R Core Team. 2022. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. http://www.R-project.org.
Raudenbush, Stephen W. 2004. “What Are Value-Added Models Estimating and What Does This Imply for Statistical Practice?” Journal of Educational and Behavioral Statistics 29 (1): 121–29.
Rubin, Donald B., Elizabeth A. Stuart, and Elaine L. Zanutto. 2004. “A Potential Outcomes View of Value-Added Assessment in Education.” Journal of Educational and Behavioral Statistics 29 (1): 103–16.
Singer, Judith D., and John B. Willett. 2003. Applied Longitudinal Data Analysis. New York: Oxford University Press.
Spellings, Margaret. 2005. “Secretary Spellings Announces Growth Model Pilot.” Press Release. Secretary Spellings Announces Growth Model Pilot, November.
Wei, Ying, and Xuming He. 2006. “Conditional Growth Charts.” The Annals of Statistics 34 (5): 2069–97.
Yen, Wendy M. 1986. “The Choice of Scale for Educational Measurement: An IRT Perspective.” Journal of Educational Measurement 23: 299–325.
———. 2007. “Vertical Scaling and No Child Left Behind.” In Linking and Aligning Scores and Scales, edited by Neil J. Dorans, Mary Pommerich, and Paul W. Holland, 273–83. New York: Springer.