Adequate Yearly Progress (AYP): Implementation of the No Child Left Behind Act

August 20, 2009 (RL32495)

Contents

Tables

Summary

Title I, Part A of the Elementary and Secondary Education Act (ESEA), authorizes financial aid to local educational agencies (LEAs) for the education of disadvantaged children and youth at the preschool, elementary, and secondary levels. Over the last several years, the accountability provisions of this program have been increasingly focused on achievement and other outcomes for participating pupils and schools. Since 1994, and particularly under the No Child Left Behind Act of 2001 (NCLB), a key concept embodied in these requirements is that of "adequate yearly progress (AYP)" for schools, LEAs, and states. AYP is defined primarily on the basis of aggregate scores of various groups of pupils on state assessments of academic achievement. The primary purpose of AYP requirements is to serve as the basis for identifying schools and LEAs where performance is unsatisfactory, so that inadequacies may be addressed first through provision of increased support and, ultimately, a variety of "consequences."

Under NCLB, the Title I-A requirements for state-developed standards of AYP were substantially expanded. AYP calculations must be disaggregated—determined separately and specifically for not only all pupils but also for several demographic groups of pupils within each school, LEA, and state. In addition, while AYP standards had to be applied previously only to pupils, schools, and LEAs participating in Title I-A, AYP standards under NCLB must be applied to all public schools, LEAs, and to states overall, if a state chooses to receive Title I-A grants. However, consequences for failing to meet AYP standards need be applied under federal law only to schools and LEAs participating in Title I-A. Another major break with the pre-NCLB period is that state AYP standards must incorporate concrete movement toward meeting an ultimate goal of all pupils reaching a proficient or advanced level of achievement by 2014.

The overall percentage of public schools identified as failing to make AYP for one or more years on the basis of test scores in 2007-2008 was approximately 35% of all public schools. The percentage of schools for individual states in 2007-2008 varied from 7% to 80%. Approximately 13% of public schools were in the "needs improvement" status (i.e., they had failed to meet AYP standards for 2 consecutive years or more) on the basis of AYP determinations for 2007-2008 and preceding school years.

The AYP provisions of NCLB are challenging and complex, and they have generated substantial interest and debate. Debates regarding NCLB provisions on AYP have focused on the provision for an ultimate goal, use of confidence intervals and data-averaging, population diversity effects, minimum pupil group size (n), separate focus on specific pupil groups, number of schools identified and state variations therein, the 95% participation rule, state variations in assessments and proficiency standards, and several issues specific to the use of growth models to determine AYP. The authorization for ESEA programs expired at the end of FY2008, and the 111th Congress may consider whether to amend and extend the ESEA. This report will be updated regularly to reflect major legislative developments and available information.


Adequate Yearly Progress (AYP): Implementation of the No Child Left Behind Act

Background: Title I Outcome Accountability and the AYP Concept

Title I, Part A of the Elementary and Secondary Education Act (ESEA), the largest federal K-12 education program, authorizes financial aid to local educational agencies (LEAs) for the education of disadvantaged children and youth at the preschool, elementary, and secondary levels.

Since the 1988 reauthorization of the ESEA (The Augustus F. Hawkins-Robert T. Stafford Elementary and Secondary School Improvement Amendments of 1988, or "School Improvement Act," P.L. 100-297), the accountability provisions of this program have been increasingly focused on achievement and other outcomes for participating pupils and schools. Since the subsequent ESEA reauthorization in 1994 (the Improving America's Schools Act of 1994, P.L. 103-382), and particularly under the No Child Left Behind Act of 2001 (NCLB, P.L. 107-110), a key concept embodied in these outcome accountability requirements is that of "adequate yearly progress (AYP)" for schools, LEAs, and (more recently) states overall. The primary purpose of AYP requirements is to serve as the basis for identifying schools and LEAs where performance is inadequate, so that these inadequacies may be addressed, first through provision of increased support and, ultimately, through a variety of consequences.1

This report is intended to provide an overview of the AYP concept and several related issues, a description of the AYP provisions of the No Child Left Behind Act, and an analysis of the implementation of these provisions by the U.S. Department of Education (ED) and the states. The authorization for ESEA programs expired at the end of FY2008, and the 111th Congress may consider whether to amend and extend the ESEA. This report will be updated regularly to reflect major legislative developments and available information.

General Elements of AYP Provisions

ESEA Title I, Part A has included requirements for participating LEAs and states to administer assessments of academic achievement to participating pupils, and to evaluate LEA programs at least every two years, since the program was initiated in 1965. However, relatively little attention was paid to school- or LEA-wide outcome accountability until adoption of the School Improvement Act of 1988.2 Under the School Improvement Act, requirements for states and LEAs to evaluate the performance of Title I-A schools and individual participating pupils were expanded. In addition, LEAs and states were for the first time required to develop and implement improvement plans for pupils and schools whose performance was not improving. However, in comparison to current Title I-A outcome accountability provisions, these requirements were broad and vague. Under the School Improvement Act of 1988, states and LEAs were given little direction as to how they were to determine whether performance was satisfactory, or how performance was to be defined, with one partial exception.

The exception applied to schools conducting schoolwide programs under Title I-A. In schoolwide programs, Title I-A funds may be used to improve instruction for all pupils in the school, rather than being targeted on only the lowest-achieving individual pupils in the school (as under the other major Title I-A service model, targeted assistance schools). Under the 1988 version of the ESEA, schoolwide programs were limited to schools where 75% or more of the pupils were from low-income families (currently this threshold has been reduced to 40%). The School Improvement Act required schoolwide programs, in order to maintain their special authority, to demonstrate that the academic achievement of pupils in the school was higher than either of the following: (a) the average level of achievement for pupils participating in Title I-A in the LEA overall; or (b) the average level of achievement for disadvantaged pupils enrolled in that school during the three years preceding schoolwide program implementation.

The embodiment of outcome accountability in the specific concept of AYP began with the 1994 Improving America's Schools Act (IASA). Under the IASA, states participating in Title I-A were required to develop AYP standards as a basis for systematically determining whether schools and LEAs receiving Title I-A grants were performing at an acceptable level. Failure to meet the state AYP standards was to become the basis for directing technical assistance, and ultimately consequences, toward schools and LEAs where performance was consistently unacceptable.

Generic AYP Factors

Before proceeding to a description of the Title I-A AYP provisions under the IASA of 1994 and the NCLB of 2001, we outline below the general types of major provisions frequently found in AYP provisions, actual or proposed.

Primary Basis: AYP requirements are based primarily on aggregate measures of academic achievement by pupils. As long as Title I-A has contained AYP provisions, it has provided that these be based ultimately on state standards of curriculum content and pupil performance, and assessments linked to these standards. More specifically, the Title I-A requirements have been focused on the percentage of pupils scoring at the "proficient" or higher level of achievement on state assessments, not a common national standard. However, when AYP provisions were first adopted in 1994, states were given an extended period of time to adopt and implement these standards and assessments, and for a lengthy period after the 1994 amendments, various "transitional" performance standards and assessments were used to measure academic achievement.3

Ultimate Goal: AYP standards may or may not incorporate an ultimate goal, which may be relatively specific and demanding (e.g., all pupils should reach the proficient or higher level of achievement, as defined by each state, in a specified number of years), or more ambiguous and less demanding (e.g., pupil achievement levels must increase in relation to either LEA or state averages or past performance). If there is a specific ultimate goal, there may also be requirements for specific, numerical, annual objectives either for pupils in the aggregate or for each of several pupil groups. The primary purpose of such a goal is to require that levels of achievement continuously increase over time in order to be considered satisfactory.

Subject Areas: With respect to subject areas, AYP standards might focus only on reading and math achievement, or they might include additional subject areas.

Additional Indicators: In addition to pupil scores on assessments, AYP standards often include one or more supplemental indicators. Examples include high school graduation rates, attendance rates, or assessment scores in subjects other than those that are required.

Levels at Which Applied: States may be required to develop AYP standards for, and apply them to, schools, LEAs, or states overall. Further, it may be required that AYP standards be applicable to all schools and LEAs, or only to those participating in ESEA Title I-A.

Disaggregation of Pupil Groups: AYP standards might be applied simply to all pupils in a school, LEA, or state, or they might also be applied separately and specifically to a variety of demographic groups of pupils—such as economically disadvantaged pupils, pupils with disabilities, pupils in different ethnic or racial groups, or limited English proficient pupils. In a program such as Title I-A, the purpose of which is to improve education for the disadvantaged, it may be especially important to consider selected disadvantaged pupil groups separately, to identify situations where overall pupil achievement may be satisfactory but the performance of one or more disadvantaged pupil groups is not.

Basic Structure: While AYP definitions or standards may vary in a multitude of respects, their basic structure generally falls into one of three general categories. The No Child Left Behind Act statute places primary emphasis on one of these models, while incorporating a second model as an explicitly authorized alternative. In recent years, critics of current policy have increasingly focused their attention on a third model of AYP, which is now authorized through regulations and Secretarial waivers.

The three basic structural forms for AYP of schools or LEAs are the group status, successive group improvement, and individual growth models. The key characteristic of the group status model is a fixed "annual measurable objective" (AMO), or required threshold level of achievement, that is the same for all pupil groups, schools, and LEAs statewide in a given subject and grade level. Under this model, performance at a point in time is compared to a benchmark at that time, with no direct consideration of changes over a previous period.

The key characteristic of the successive group improvement model is a focus on the rate of change in achievement in a subject area from one year to the next among groups of pupils in a grade level at a school or LEA (e.g., the percentage of this year's 5th grade pupils in a school who are at a proficient or higher level in mathematics compared to the percentage of last year's 5th grade pupils who were at a proficient or higher level of achievement).

Finally, the key characteristic of the individual growth model is a focus on the past or projected rate of change in the level of achievement among the same pupils. Such models may compare current performance of specific pupils to past performance, or may project future performance of pupils based on past changes in their performance level. Growth models are longitudinal, based upon the tracking of the same pupils as they progress through their K-12 education careers. While the progress of pupils is tracked individually, results are typically aggregated when used for accountability purposes. Aggregation may be by demographic group, by school or LEA, or other relevant characteristics. In general, growth models would give credit for meeting steps along the way to proficiency in ways that a status model typically does not.4

To help illustrate the basic differences among these three AYP models, simplified examples of basic aspects of each are described below. The reader should keep in mind many other variations of these model types are possible.

Growth models as used for AYP determinations under NCLB should be distinguished from value-added models. Value-added models incorporate a variety of statistical controls, adjustments to account for pupil demographic characteristics or past achievement, to sharpen the focus on estimating the impact of specific teachers, schools, or LEAs on pupil achievement and to measure pupil growth against predicted growth for pupils with similar characteristics. The application of such controls has not been allowed by ED in growth models used for AYP determinations under ESEA Title I-A. Proponents argue that such models, with their controls for background characteristics and past learning, maximize the focus on factors that are under the control of teachers and other school staff. The Tennessee Value-Added Assessment System (TVAAS) is one specific form of growth model that uses pupil background characteristics, previous performance, and other data as statistical controls in order to focus on estimating the specific effects of particular schools, districts, teachers, or programs on pupil achievement.7

Assessment Participation Rate: It might be required that a specified minimum percentage of a school's or LEA's pupils participate in assessments in order for the school or LEA to be deemed to have met AYP standards. The primary purposes of such a requirement are to assure that assessment results are broadly representative of the achievement level of the school's pupils, and to minimize the incentives for school staff to discourage test participation by pupils deemed likely to perform poorly on assessments.

Exclusion of Certain Pupils: Beyond general participation rate requirements (see above), states may be specifically required to include, or allowed to exclude, certain groups of pupils in determining whether schools or LEAs meet AYP requirements. For example, statutory provisions might allow the exclusion of pupils who have attended a school for less than one year in determining whether a school meets AYP standards.

Special Provisions for Pupils with Particular Educational Needs: Beyond requirements that all pupils be included in assessments, with accommodations where appropriate, there may be special provisions for limited English proficient (LEP) pupils or pupils with the most significant cognitive disabilities.

Averaging or Other Statistical Manipulation of Data: Finally, there are a variety of ways in which statistical manipulation of AYP-related data or calculations might be either authorized or prohibited. Major possibilities include averaging of test score data over periods of two or more years, rather than use of the latest data in all cases; or the use of "confidence intervals" in calculating whether the aggregate performance of a school's pupils is at the level specified by the state's AYP standards. These techniques, and the implications of their use, are discussed further below. In general, their use tends to improve the reliability and validity of AYP determinations, while often reducing the number of schools or LEAs identified as failing to meet AYP standards.

AYP Provisions Under the IASA of 1994

Under the IASA, states were to develop and implement AYP standards soon after enactment. However, states were given several years (generally until the 2000-2001 school year) to develop and implement curriculum content standards, pupil performance standards, and assessments linked to these for at least three grade levels in math and reading.8 Thus, during the period between adoption of the IASA in 1994 and of NCLB in early 2002, for most states the AYP provisions were based on "transitional" assessments and pupil performance standards that were widely varying in nature. AYP standards based on such "transitional" assessments were considered to be "transitional" themselves, with "final" AYP standards to be based on states' "final" assessments, when implemented. The subject areas required to be included in state AYP standards (as opposed to required assessments) were not explicitly specified in statute; ED policy guidance required states to include only math and reading achievement in determining AYP, and the inclusion in AYP standards of other measures was optional.

With respect to the ultimate goal of the state AYP standards, the IASA provided broadly that there must be continuous and substantial progress toward a goal of having all pupils meet the proficient and advanced levels of achievement. However, no timeline was specified for reaching this goal, and most states did not incorporate it into their AYP plans in any concrete way.

The IASA's AYP standards were to be applied to schools and LEAs, but not to the states overall. Further, while states were encouraged to apply the AYP standards to all public schools and LEAs, states could choose to apply them only to schools and LEAs participating in Title I-A, and most did so limit their application.

The IASA provided that all relevant pupils9 were to be included in assessments and AYP determinations, although assessments were to include results for pupils who had attended a school for less than one year only in tabulating LEA-wide results (i.e., not for individual schools). LEP pupils were to be assessed in the language that would best reflect their knowledge of subjects other than English; and accommodations were to be provided to pupils with disabilities.

Importantly, while the IASA required state assessments to ultimately (by 2000-2001) provide test results that were disaggregated by pupil demographic groups, it did not require such disaggregation of data in AYP standards and calculations. The 1994 statute provided that state AYP standards must consider all pupils, "particularly" economically disadvantaged and LEP pupils, but did not specify that the AYP definition must be based on each of these pupil groups separately. Finally, the statute was silent with respect to data-averaging or other statistical techniques, as well as the basic structure of state AYP standards (i.e., whether a "group status," "successive group improvement," or "individual growth" model must be employed).

Concerns About the AYP Provisions of the IASA

Thus, the IASA's provisions for state AYP standards broke new ground conceptually, but were comparatively broad and ambiguous. Although states were required to adopt and implement at least "transitional" AYP standards, on the basis of "transitional" state assessment results, soon after enactment of the IASA, they were not required to adopt "final" AYP standards, in conjunction with final assessments and pupil performance standards, until the 2000-2001 school year. Further, states were not allowed to implement most consequences, such as reconstituting school staff, until they adopted final assessments, so these provisions were not implemented by most states until the IASA was replaced by NCLB.

A compilation was prepared by the Consortium for Policy Research in Education (CPRE) of the "transitional" AYP standards that states were applying in administering their Title I-A programs during the 1999-2000 school year.10 Overall, according to this compilation, the state AYP definitions for 1999-2000 were widely varied and sometimes complex. General patterns in these AYP standards, outlined below, reflect state interpretation of the IASA's statutory requirements.

A report published by ED in 2004, on the basis of state AYP policies for the 2001-2002 school year, contains similar conclusions about state AYP policies in the period immediately preceding implementation of NCLB.11 There was tremendous variation among the states in the impact of their AYP policies under the IASA on the number and percentage of Title I-A schools and LEAs that were identified as failing to meet the AYP standards. In some states, a substantial majority of Title I-A schools were identified as failing to make AYP, while in others almost no schools were so identified. In July 2002, just before the initial implementation of the new AYP provisions of NCLB, ED released a compilation of the number of schools identified as failing to meet AYP standards for two or more consecutive years (and therefore identified as being in need of improvement) in 2001-2002 (for most states) or 2000-2001 (in states where 2001-2002 data were not available).12 The national total number of these schools was 8,652; the number in individual states ranged from zero in Arkansas and Wyoming to 1,513 in Michigan and 1,009 in California.13 While there are obvious differences in the size of these states, there were also wide variations in the percentage of all schools participating in Title I-A that failed to meet AYP for either one year or two or more consecutive years.

AYP Under NCLB Statute

NCLB provisions regarding AYP may be seen as an evolution of, and to a substantial degree as a reaction to perceived weaknesses in, the AYP requirements of the 1994 IASA. The latter were frequently criticized as being insufficiently specific, detailed, or challenging. Criticism often focused specifically on their failure to focus on specific disadvantaged pupil groups, failure to require continuous improvement toward an ultimate goal, and their required applicability only to schools and LEAs participating in Title I-A, not to all public schools or to states overall.

Under NCLB, the Title I-A requirements for state-developed standards of AYP were substantially expanded in scope and specificity. As under the IASA, AYP is defined primarily on the basis of aggregate scores of pupils on state assessments of academic achievement. However, under NCLB, state AYP standards must also include at least one additional academic indicator, which in the case of high schools must be the graduation rate. The additional indicators may not be employed in a way that would reduce the number of schools or LEAs identified as failing to meet AYP standards.

One of the most important differences between AYP standards under NCLB and previous requirements is that under NCLB, AYP calculations must be disaggregated; that is, they must be determined separately and specifically for not only all pupils but also for several demographic groups of pupils within each school, LEA, and state. Test scores for an individual pupil may be taken into consideration multiple times, depending on the number of designated groups of which they are a member (e.g., a pupil might be considered as part of the LEP and economically disadvantaged groups, as well as the "all pupils" group). The specified demographic groups are as follows:

However, as is discussed further below, there are three major constraints on the consideration of these pupil groups in AYP calculations. First, pupil groups need not be considered in cases where their number is so relatively small that achievement results would not be statistically significant or the identity of individual pupils might be divulged.14 As is discussed further below, the selection of the minimum number (n) of pupils in a group for the group to be considered in AYP determinations has been left largely to state discretion. State policies regarding "n" have varied widely, with important implications for the number of pupil groups actually considered in making AYP determinations for many schools and LEAs, and the number of schools or LEAs potentially identified as failing to make AYP. Second, it has been left to the states to define the "major racial and ethnic groups" on the basis of which AYP must be calculated. And third, as under the IASA, pupils who have not attended the same school for a full year need not be considered in determining AYP for the school, although they are still to be included in LEA and state AYP determinations.

In contrast to the previous statute, under which AYP standards had to be applied only to pupils, schools, and LEAs participating in Title I-A, AYP standards under NCLB must be applied to all public schools, LEAs, and for the first time to states overall, if a state chooses to receive Title I-A grants. However, consequences for failing to meet AYP standards need only be applied under federal law to schools and LEAs participating in Title I-A.

Another major break with the past is that state AYP standards must incorporate concrete movement toward meeting an ultimate goal of all pupils reaching a proficient or advanced level of achievement by the end of the 2013-2014 school year. The steps—that is, required levels of achievement—toward meeting this goal, known as Annual Measurable Objectives (AMOs), must increase in "equal increments" over time. The first increase in the thresholds must occur after no more than two years, and remaining increases at least once every three years. As is discussed further below, several states have accommodated this requirement in ways that require much more rapid progress in the later years of the period leading up to 2013-2014 than in the earlier period.

The NCLB AYP provisions include an assessment participation rate requirement. In order for a school to meet AYP standards, at least 95% of all pupils, as well as at least 95% of each of the demographic groups of pupils considered for AYP determinations for the school or LEA, must participate in the assessments that serve as the primary basis for AYP determinations.15

The primary model of AYP under the NCLB currently is a group status model. As noted in the example above, group status models set as their AMOs threshold levels of performance, expressed specifically in terms of the percentage of pupils scoring at a proficient or higher (advanced) level on state assessments of reading and mathematics. These AMOs must be met by any school or LEA, both overall and with respect to all relevant pupil subgroups, in order to make AYP, whatever the school's or LEA's "starting point" (for the multi-year period covered by the accountability policy) or performance in the previous year. This AMO "uniform bar" is applicable to all pupil subgroups of sufficient size to be considered in AYP determinations. The threshold levels of achievement are to be set separately for reading and math, and may be set separately for each level of K-12 education (elementary, middle, and high schools). For example, it might be required that 65% or more of the pupils in any of a state's public elementary schools score at the proficient or higher level of achievement in reading in order for a school to make AYP.16

The initial minimum starting point for the "uniform bar" was to be the greater of (a) the percentage of pupils at the proficient or advanced level of achievement for the lowest-achieving pupil subgroup in the base year (2001-2002), or (b) the percentage of pupils at the proficient or advanced level of achievement for the lowest-performing quintile (5th)17 of schools statewide in the base year.18 The "uniform bar" must generally be raised at least once every three years, although in the initial period it must be increased after no more than two years. Such group status models attempt to emphasize the importance of meeting certain minimum levels of achievement for all pupil groups, schools, and LEAs, and arguably apply consistent expectations to all pupil groups.

The secondary model of AYP under the NCLB currently is the "safe harbor" provision, an example of a successive group improvement model. This is an alternative provision under which schools or LEAs that fail to meet the usual requirements may still be deemed to have made AYP if they meet certain other conditions. A school where aggregate achievement is below the level required under the group status model described above would still be deemed to have made AYP, through the "safe harbor" provision, if, among relevant pupil groups who did not meet the primary AYP standard, the percentage of pupils who are not at the proficient or higher level in the school declines by at least 10%19, and those pupil groups make progress on at least one other academic indicator included in the state's AYP standards.20 For example, if the standard AMO is 65%, and a school fails to meet AYP because of the performance of one pupil group (e.g., the math performance of white pupils) for whom the percentage scoring at a proficient or higher level the previous year was 30%, then the school could still make AYP if the percentage of white pupils scoring at a proficient or higher level in math increases to at least 37% (the 30% from the previous year plus 10% of (100%-30%), or seven percentage points).

A third model of AYP, individual growth, is not explicitly authorized by the NCLB/ESEA statute. However, as discussed later in this report, it has been allowed through waivers and revised program regulations. For the sake of simplicity, in the remainder of this report we will refer to the three AYP models by the abbreviated titles of "status," "improvement," and "growth" models.

Implementation of the AYP Provisions of NCLB by ED

Initial Implementation Actions

States began determining AYP for schools, LEAs, and the states overall on the basis of NCLB provisions beginning with the 2002-2003 school year. The deadline for states to submit to ED their AYP standards based on NCLB provisions was January 31, 2003, and according to ED, all states met this deadline. On June 10, 2003, ED announced that accountability plans had been approved for all states. However, many of the approved plans required states to take additional actions following submission of their plan.21

Aspects of state AYP plans that apparently received special attention in ED's initial reviews included (1) the pace at which proficiency levels are expected to improve (e.g., equal increments of improvement over the entire period, or much more rapid improvement expected in later years than at the beginning); (2) whether schools or LEAs must fail to meet AYP with respect to the same pupil group(s), grade level(s), or subject areas to be identified as needing improvement, or whether two consecutive years of failure to meet AYP with respect to any of these categories should lead to identification;22 (3) the length of time over which pupils should be identified as being LEP; (4) the minimum size of pupil groups in a school in order for the group to be considered in AYP determinations or for reporting of scores; (5) whether to allow schools credit for raising pupil scores from below basic to basic (as well as from basic or below to proficient or above) in making AYP determinations; and (6) whether to allow use of statistical techniques such as "confidence intervals" (i.e., whether scores are below the required level to a statistically significant extent) in AYP determinations.

Developments Following Initial Implementation of the NCLB by ED Growth Models

In November 2005, the Secretary of Education announced a growth model pilot program under which initially up to 10 states would be allowed to use growth models to make AYP determinations for the 2005-2006 or subsequent school years.23 In December 2007, the Secretary lifted the cap on the number of states that could participate in the growth model pilot, and regulations published in October 200824 incorporate this expanded policy. The models proposed by the states must meet at least the following criteria:

In addition, applicant states must have their annual assessments for each of grades 3-8 approved by ED, and these assessments must have been in place for at least one year previous to implementation of the growth models.

In January 2006, ED published peer review guidance for growth model pilot applications.25 In general, this guidance elaborates upon the requirements described above, with special emphasis on the following: (a) pupil growth targets may not consider their "race/ethnicity, socioeconomic status, school AYP status, or any other non-academic" factor; (b) growth targets are to be established on the basis of achievement standards, not typical growth patterns or past achievement; and (c) the state must have a longitudinal pupil data system, capable of tracking individual pupils as they move among schools and LEAs within the state.

The requirements for growth models of AYP under ED's policies are relatively restrictive. The models must be consistent with the ultimate goal of all pupils at a proficient or higher level by 2013-2014, a major goal of the statutory AYP provisions of NCLB. More significantly, they must incorporate comparable annual assessments, at least for each of grades 3-8 plus at least one senior high school year, and those assessments must be approved by ED and in place for at least one year before implementation of the growth model. Further, all performance expectations must be individualized, and the state must have an infrastructure of a statewide, longitudinal database for individual pupils. Proposed models would have to be structured around expectations and performance of individual pupils, not demographic groups of pupils in a school or LEA, although individual results would have to be aggregated for the demographic groups designated in NCLB.

Two states, North Carolina and Tennessee, were approved to use proposed growth models in making AYP determinations on the basis of assessments administered in the 2005-2006 school year.26 Thirteen additional states—Alaska, Arkansas, Arizona, Colorado, Delaware, Florida, Iowa, Michigan, Minnesota, Missouri, Ohio, Pennsylvania, and Texas—have been approved to participate in the pilot program subsequently. The growth models for individual states are briefly described below.

Overall, most of the growth models approved by ED thus far are based upon supplementing the number of pupils scoring at a proficient or higher level with those who are projected, or deemed to be on a trajectory, to be at a proficient level within a limited number of years. Eleven of the fifteen approved models follow this general approach. Among these states, a distinction may be made between eight states (North Carolina, Arkansas, Florida, Alaska, Arizona, Missouri, Michigan, and Texas) that combine currently proficient pupils with those not proficient who are "on track" toward proficiency, and four states (Ohio, Pennsylvania, Tennessee, and Colorado) that consider only projected proficiency levels for all pupils (i.e., currently proficient pupils who are not on track to remain proficient are counted as not proficient) when the growth model is applied. In contrast, the models used by at least three other states—Delaware, Iowa, and Minnesota—focus on awarding credit for movement of pupils among achievement categories up to proficiency.

A 2009 evaluation report by ED focuses on the two states approved to use a growth model for AYP determinations in the 2005-2006 school year, North Carolina and Tennessee.29 In these two states, use of the growth models had minimal impact on AYP determinations based on 2005-2006 test results—no schools in North Carolina and only seven schools in Tennessee made AYP through use of the growth model that would not have made AYP through the methods explicitly authorized in the ESEA.

October 2008 Regulations on Title I-A Assessments and Accountability

Several new final regulations affecting the Title I-A assessment, AYP, and accountability policies were published in the Federal Register on October 29, 2008 (pp. 64435-64513). Most of the regulations deal with policy areas other than AYP. Many of the regulations clarify previous regulations or codify as regulations policies that have previously been established through less formal mechanisms (such as policy guidance or peer reviewer guidance). The regulations related to AYP are briefly described below.

Group Size-Related Provisions in State AYP Policies

States must provide a more extensive rationale than previously required for their selection of minimum group sizes, use of confidence intervals, and related aspects of their AYP policies. Although no specific limits are placed on these parameters, states must explain in their Accountability Workbooks how their policies provide statistically reliable information while minimizing the exclusion of designated pupil groups in AYP determinations, especially at the school level. States must also report on the number of pupils in designated groups that are excluded from separate consideration in AYP determinations due to minimum group size policies. In addition, the regulations codify provisions for the National Technical Advisory Council that was established in August 2008 to advise the Secretary on a variety of technical aspects of state standards, assessments, AYP, and accountability policies.

Under the regulations as published in October 2008, each state would have been required to submit its Accountability Workbook, modified in accordance with the regulations, to ED for a new round of technical assistance and peer review. Workbooks were to be submitted in time to implement any needed changes before making AYP determinations based on assessment results for the 2009-2010 school year. However, in a letter to chief state school officers dated April 1, 2009, the Secretary of Education stated that a new round of peer reviews of state Accountability Workbooks would not be conducted at this time.30

Assessments and Accountability Policies in General

The regulations clarify that assessments required under Title I-A may include multiple formats as well as multiple academic assessments within each subject area (reading, mathematics, and science). This does not include the concept of "multiple measures," as this term has been used by many to refer to proposals to expand NCLB through inclusion of a variety of indicators other than standards-based assessments in reading, mathematics, and science. Also, states are required to include results from the most recent National Assessment of Educational Progress (NAEP) assessments on their state and LEA performance report cards. Further, ED policies regarding provisions for states to request waivers allowing them to use growth models of AYP are codified in the October 2008 regulations (previously they were published only in policy guidance and peer reviewer guidance documents.)

Graduation Rates

Numerous changes have been made to previous policies regarding graduation rates used as the "additional indicator" in AYP determinations for high schools. Previously, states were allowed a substantial degree of flexibility in their method for calculating graduation rates and were not required to disaggregate the rates by pupil group (except for reporting purposes). Also, although states were required to determine a level of, or rate of improvement in, graduation rates that would be adequate for AYP purposes, they were not required to set an ultimate goal toward which these rates should be progressing.

Under the October 2008 regulations, states must adopt a uniform method for calculating graduation rates. This method must be used for school, LEA, and state report cards showing results of assessments administered during the 2010-2011 school year, and for purposes of determining AYP based on assessments administered during the 2011-2012 school year (states unable to meet these deadlines may request an extension). This method has been endorsed by the National Governors Association. The graduation rate is defined as the number of students who graduate from high school in four years31 divided by the number of students in the cohort for the students' class, adjusted for student transfers among schools. States may also propose using a supplementary extended-year graduation rate, in addition to the four-year rate, in order to accommodate selected groups of students (such as certain students with disabilities) who may need more than four years to graduate. These graduation rates must be disaggregated by subgroup.

States must set an ultimate goal for graduation rates that they expect all high schools to meet. No federal standard is established, but the state goal, as well as annual targets toward meeting that goal, must be approved by ED as part of the state's accountability policy.

Pupils with Disabilities

Some of the most substantial of ED's AYP policy changes following enactment of the NCLB involves pupils with disabilities.32 First, regulations addressing the application of the Title I-A standards and assessment requirements to certain pupils with disabilities were published in the Federal Register on December 9, 2003 (pp. 68698-68708). The purpose of these regulations is to clarify the application of standard, assessment, and accountability provisions to pupils "with the most significant cognitive disabilities." Under the regulations, states and LEAs may adopt alternate assessments based on alternate achievement standards—aligned with the state's academic content standards and reflecting "professional judgment of the highest achievement standards possible"—for a limited percentage of pupils with disabilities.33 The number of pupils whose proficient or higher scores on these alternate assessments may be considered as proficient or above for AYP purposes is limited to a maximum of 1.0% of all tested pupils (approximately 9% of all pupils with disabilities) at the state and LEA level (there is no limit for individual schools). SEAs may request from the U.S. Secretary of Education an exception allowing them to exceed the 1.0% cap statewide, and SEAs may grant such exceptions to LEAs within their state. According to ED staff, three states in 2003-2004 (Montana, Ohio, and Virginia), and four states in 2004-2005 (the preceding three states plus South Dakota), received waivers to go marginally above the 1.0% limit statewide. In the absence of a waiver, the number of pupils scoring at the "proficient or higher" level on alternate assessments, based on alternate achievement standards, in excess of the 1.0% limit is to be added to those scoring "below proficient" in LEA or state-level AYP determinations.

A new ED policy affecting an additional group of pupils with disabilities was announced initially in April 2005, with final regulations based on it published in the Federal Register on April 9, 2007. The new policy is divided into short-term and long-term phases. It is focused on pupils with disabilities whose ability to perform academically is assumed to be greater than that of the pupils with "the most significant cognitive disabilities" discussed in the above paragraph, and who are capable of achieving high standards, but may not reach grade level within the same time period as their peers. In ED's terminology, these pupils would be assessed using alternate assessments based on modified achievement standards.

The short-term policy may apply, with the approval of the Secretary, to states until they develop and administer alternative assessments under the long-term policy (described below).34 Under this short-term policy, in eligible states that have not yet adopted modified achievement standards, schools may add to their proficient pupil group a number of pupils with disabilities equal to 2.0% of all pupils assessed (in effect, deeming the scores of all of these pupils to be at the proficient level).35 This policy would be applicable only to schools and LEAs that would otherwise fail meet AYP standards due solely to their pupils with disabilities group. Alternatively, in eligible states that have adopted modified achievement standards, schools and LEAs may count proficient scores for pupils with disabilities on these assessments, subject to a 2.0% (of all assessed pupils) cap at the LEA and state levels.

The long-term policy is embodied in final regulations published in the Federal Register on April 9, 2007. These regulations affect standards, assessments, and AYP for a group of pupils with disabilities who are unlikely to achieve grade level proficiency within the current school year, but who are not among those pupils with the most significant cognitive disabilities (whose situation was addressed by an earlier set of regulations, discussed above). For this second group of pupils with disabilities, states would be authorized to develop "modified academic achievement standards" and alternate assessments linked to these. The modified achievement standards must be aligned with grade-level content standards, but may reflect reduced breadth or depth of grade-level content in comparison to the achievement standards applicable to the majority of pupils. The standards must provide access to grade-level curriculum, and not preclude affected pupils from earning a regular high school diploma.

As with the previous regulations regarding pupils with the most significant cognitive disabilities, there would be no direct limit on the number of pupils who take alternate assessments based on modified achievement standards. However, in AYP determinations, pupil scores of proficient or advanced on alternate assessments based on modified achievement standards may be counted only as long as they do not exceed a number equal to 2.0% of all pupils tested at the state or LEA level (i.e., an estimated 20% of pupils with disabilities); such scores in excess of the limit would be considered "non-proficient." As with the 1.0% cap for pupils with the most significant cognitive disabilities, this 2.0% cap does not apply to individual schools. In general, LEAs or states could exceed the 2.0% cap only if they did not reach the 1.0% limit with respect to pupils with the most significant cognitive disabilities. Thus, in general, scores of proficient or above on alternate assessments based on alternate and modified achievement standards may not exceed a total of 3.0% of all pupils tested at a state or LEA level.36 In particular, states are no longer allowed to request a waiver of the 1.0% cap regarding pupils with the most significant cognitive disabilities.

The April 9, 2007, regulations also include provisions that are widely applicable to AYP determinations. First, states are no longer allowed to use varying minimum group sizes ("n") for different demographic groups of pupils. This prohibits the previously common practice of setting higher "n" sizes for pupils with disabilities or LEP pupils than for other pupil groups. Second, when pupils take state assessments multiple times, states and LEAs may use the highest score for pupils who take tests more than once. Finally, as with LEP pupils, states and LEAs may include the test scores of former pupils with disabilities in the disability subgroup for up to two years after such pupils have exited special education.37

In summary, there are now five groups of pupils with disabilities with respect to achievement standards, assessments, and the use of scores in AYP determinations. These groups are summarized below in Table 1.

Table 1. Categories of Pupils with Disabilities with Respect to Achievement Standards, Assessments, and AYP Determinations Under ESEA Title I-A

Type of Content Standards

Type of Achievement Standards

Type of Assessment

Cap on # of Proficient or Advanced Scores That May Be Included in AYP Determinations

Grade-level content standards

Grade-level academic achievement standards

Regular (i.e., the same as that applicable to pupils generally)

None

Grade-level content standards

Grade-level academic achievement standards

Regular with accommodations (e.g., special assistance for those with sight or hearing disabilities)

None

Grade-level content standards

Grade-level academic achievement standards

Alternate assessments based on regular, grade-level achievement standards (e.g., portfolios or performance assessments)

None

Grade-level content standards

Modified academic achievement standards

Alternate assessments based on modified academic achievement standards

In general, 2.0% of all pupils assessed

Alternate content standards

Alternate academic achievement standards

Alternate assessments based on alternate achievement standards

In general, 1.0% of all pupils assessed

Participation Rates

On March 29, 2004, ED announced that schools could meet the requirement that 95% or more of pupils (all pupils as well as pupils in each designated demographic group) participate in assessments (in order for the school or LEA to make AYP) on the basis of average participation rates for the last two or three years, rather than having to post a 95% or higher participation rate each year. In other words, if a particular demographic group of pupils in a public school has a 93% test participation rate in the most recent year, but had a 97% rate the preceding year, the 95% participation rate requirement would be met. In addition, the new guidance would allow schools to exclude pupils who fail to participate in assessments due to a "significant medical emergency" from the participation rate calculations. The new guidance further emphasizes the authority for states to allow pupils who miss a primary assessment date to take make-up tests, and to establish a minimum size for demographic groups of pupils to be considered in making AYP determinations (including those related to participation rates). According to ED, in some states, as many as 20% of the schools failing to make AYP did so on the basis of assessment participation rates alone. It is not known how many of these schools would meet the new, somewhat more relaxed standard.

LEP Pupils

In a letter dated February 19, and proposed regulations published on June 24, 2004, ED officials announced two new policies with respect to LEP pupils.38 First, with respect to assessments, LEP pupils who have attended schools in the United States (other than Puerto Rico) for less than 10 months must participate in English language proficiency and mathematics tests. However, the participation of such pupils in reading tests (in English), as well as the inclusion of any of these pupils' test scores in AYP calculations, is to be optional (i.e., schools and LEAs need not consider the scores of first year LEP pupils in determining whether schools or LEAs meet AYP standards). Such pupils are still considered in determining whether the 95% test participation has been met.

Second, in AYP determinations, schools and LEAs may continue to include pupils in the LEP demographic category for up to two years after they have attained proficiency in English. However, these formerly LEP pupils need not be included when determining whether a school or LEA's count of LEP pupils meets the state's minimum size threshold for inclusion of the group in AYP calculations, and scores of formerly LEP pupils may not be included in state, LEA, or school report cards. Both these options, if exercised, should increase average test scores for pupils categorized as being part of the LEP group, and reduce the extent to which schools or LEAs fail to meet AYP on the basis of LEP pupil groups.

AYP Determinations for Targeted Assistance Schools

ED has released a February 4, 2004, letter to a state superintendent of education providing more flexibility in AYP determinations for targeted assistance schools.39 Title I-A services are provided at the school level via one of two basic models: targeted assistance schools, where services are focused on individual pupils with the lowest levels of academic achievement, or schoolwide programs, in which Title I-A funds may be used to improve academic instruction for all pupils. Currently, most Title I-A programs are in targeted assistance schools, although the number of schoolwide programs has grown rapidly in recent years, and most pupils served by Title I-A are in schoolwide programs.

This policy letter gives schools and LEAs the option of considering only pupils assisted by Title I-A for purposes of making AYP determinations for individual schools. LEA and state level AYP determinations would still have to be made on the basis of all public school pupils. The impact of this authority, and the extent of its utilization, are unclear. In schools using this authority, there would be an increased likelihood that pupil demographic groups would be below minimum size to be considered. At the same time, if Title I-A participants are indeed the lowest-performing pupils in targeted assistance schools, it seems unlikely that many schools would choose to base AYP determinations only on those pupils.

Flexibility for Areas Affected by the Gulf Coast Hurricanes

Following the damage to school systems and dispersion of pupils in the wake of Hurricanes Katrina and Rita in August and September 2005, interest was expressed by officials of states and LEAs that were damaged by the storms, or that enrolled pupils displaced by these storms, in the possibility of waiving some of NCLB's assessment, AYP, or other accountability requirements. In a series of policy letters to chief state school officers (CSSOs), the Secretary of Education emphasized forms of flexibility already available under current law and announced a number of policy revisions and potential waivers that might be granted in the future.

In a September 29, 2005, letter to all CSSOs,40 the Secretary of Education noted that they could exercise existing natural disaster provisions of NCLB [§1116(b)(7)(D) and (c)(10)(F)] to postpone the implementation of school or LEA improvement designations and consequences for schools or LEAs failing to meet AYP standards that are located in the major disaster areas in Louisiana, Alabama, Mississippi, Texas, or Florida, without a specific waiver being required. In addition, waivers of these requirements could be considered for other LEAs or schools heavily affected by enrolling large numbers of evacuee pupils. Further, all affected LEAs and schools could establish a separate subgroup for displaced students in AYP determinations on the basis of assessments administered during the 2005-2006 school year. Pupils would appear only in the evacuee subgroup, not other demographic subgroups (e.g., economically disadvantaged or LEP). Waivers could be requested in 2006 to allow schools or LEAs to meet AYP requirements if only the test scores of the evacuee subgroup would prevent them from making AYP. In any case, all such students must still be assessed and the assessment results reported to the public.41

State Revisions of Their Accountability Plans

Over the period following the initial submission and approval of state accountability plans for AYP and related policies in 2003 through the present, many states have proposed a number of revisions to their plans. Sometimes these revisions seem clearly intended to take advantage of new forms of flexibility announced by ED officials, such as those discussed above, while in other cases states appear to be attempting to take advantage of options or forms of flexibility that reportedly been approved for other states previously.

The proposed changes in state accountability plans have apparently almost always been in the direction of increased flexibility for states and LEAs, with reductions anticipated in the number or percentage of schools or LEAs identified as failing to make AYP. Issues that have arisen with respect to these changes include a lack of transparency, and possibly inconsistencies (especially over time), in the types of changes that ED officials have approved; debates over whether the net effect of the changes is to make the accountability requirements more reasonable or to undesirably weaken them; concern that the changes may make an already complicated accountability system even more complex; and timing—whether decisions on proposed changes are being made in a timely manner by ED.

The major aspects of state accountability plans for which changes have been proposed and approved include the following: (a) changes to take advantage of revised federal regulations and policy guidance regarding assessment of pupils with the most significant cognitive disabilities, LEP pupils, and test participation rates; (b) limiting identification for improvement to schools that fail to meet AYP in the same subject area for two or more consecutive years, and limiting identification of LEAs for improvement to those that failed to meet AYP in the same subject area and across all three grade spans for two or more consecutive years; (c) using alternative methods to determine AYP for schools with very low enrollment; (d) initiating or expanding use of confidence intervals in AYP determinations, including "safe harbor" calculations; (e) changing (usually effectively increasing) minimum group size; and (f) changing graduation rate targets for high schools. Accountability plan changes that have frequently been requested but not approved by ED include (a) identification of schools for improvement only if they failed to meet AYP with respect to the same pupil group and subject area for two or more consecutive years, and (b) retroactive application of new forms of flexibility to recalculation of AYP for previous years.42

Data on Schools and LEAs Identified as Failing to Meet AYP

The most recent available compilations of state AYP data are discussed below in two categories: reports focusing on the number and percentage of schools failing to meet AYP standards for one or more years versus reports on the number and percentage of public schools and LEAs identified for improvement—that is, they had failed to meet AYP standards for two consecutive years or more.

Schools Failing to Meet AYP Standards for One Year

Table 2 provides the percentage of schools and LEAs failing to make adequately yearly progress, on the basis of 2007-2008 assessment results, for each state, as reported by ED, based on Consolidated State Performance Reports. 43 These data are based on all public schools and LEAs in each state, not just those participating in Title I-A.44 The percentage of public schools failing to make adequate yearly progress for 2007-2008 varied widely among the states, from 7% for Oklahoma and Wisconsin to 76% for Florida, 77% for the District of Columbia, and 80% for South Carolina. Nationwide, 35% of public schools failed to make AYP based on test scores for the 2007-2008 school year. This is an increase compared to 29% for 2006-2007 and 27% for 2005-2006.

According to the ED report, "Title I Implementation—Update on Recent Evaluation Findings," published by ED in 2009, of schools failing to make AYP in the 2005-2006 school year, 35% did so with respect to achievement in reading or math (or both) for the "all pupils" group. In contrast, 24% of schools failing to make AYP did so on the basis of achievement in reading or math (or both) for only one subgroup while making AYP with respect to the "all pupils" group, and 20% of schools failing to make AYP did so on the basis of achievement in reading or math (or both) for two or more subgroups while making AYP with respect to the "all pupils" group. The remaining 21% of schools failing to make AYP that year did so with respect to test participation rates only (4%), the "other academic indicator" only (6%), or other combinations of AYP criteria (11%). Among schools with numbers of pupils in each of the designated categories to meet the minimum group size criterion for their state, the percentage of schools failing to make AYP with respect to math or reading achievement in 2005-2006 was found to vary from 2% for the Asian pupil group, 3% for White pupils, 18% for pupils from low-income families, 20% for Hispanic pupils, 25% for African-American pupils, 30% for LEP pupils, and 43% for pupils with disabilities. With respect to education level, 41% of middle schools failed to make AYP in 2007-2008, compared to 34% of high schools and 19% of elementary schools.

Table 2. Reported Percentage of Public Schools and Local Educational Agencies (LEAs) Failing to Make Adequate Yearly Progress (AYP) on the Basis of 2007-2008 Assessment Results

State

Reported Percentage of Rated Schools Not Making AYP Based on 2007-2008 Test Results

Reported Percentage of LEAs Not Making AYP Based on 2007-2008 Test Results

Alabama

16%

1%

Alaska

41%

50%

Arizona

27%

39%

Arkansas

42%

16%

California

48%

60%

Colorado

43%

58%

Connecticut

42%

74%

Delaware

29%

32%

District of Columbia

77%

81%

Florida

76%

97%

Georgia

20%

70%

Hawaii

58%

100%

Idaho

44%

57%

Illinois

32%

39%

Indiana

46%

16%

Iowa

31%

10%

Kansas

10%

9%

Kentucky

28%

40%

Louisiana

19%

naa

Maine

34%

4%

Maryland

17%

67%

Massachusetts

63%

78%

Michigan

27%

10%

Minnesota

49%

58%

Mississippi

14%

51%

Missouri

57%

74%

Montana

28%

32%

Nebraska

20%

34%

Nevada

40%

6%

New Hampshire

62%

44%

New Jersey

35%

15%

New Mexico

68%

46%

New York

16%

7%

North Carolina

68%

92%

North Dakota

37%

39%

Ohio

36%

48%

Oklahoma

7%

7%

Oregon

37%

59%

Pennsylvania

28%

8%

Rhode Island

27%

37%

South Carolina

80%

100%

South Dakota

16%

11%

Tennessee

20%

9%

Texas

15%

32%

Utah

19%

14%

Vermont

37%

39%

Virginia

25%

57%

Washington

62%

72%

West Virginia

19%

89%

Wisconsin

7%

1%

Wyoming

24%

8%

Puerto Rico

59%

100%

National Average

35%

35%

Source: State Consolidated Performance Reports; see http://www.ed.gov/admins/lead/account/consolidated/sy07-08part1/index.html.

a. NA = Not available. Thus, the national total percentage for LEAs excludes this state.

Schools Failing to Meet AYP Standards for Two Consecutive Years (and Any Additional Years)

ED recently posted45 data from the Consolidated State Performance Reports on the number of schools identified for improvement, corrective action, or restructuring for the 2008-2009 school year, on the basis of assessment results through the 2007-2008 school year. A total of 12,599 schools were identified, constituting approximately 13% of all public schools. As with the percentage of schools failing to make AYP, the percentage of schools identified varied widely among the states.

A theme reflected in these results is a high degree of state variation in the percentage of schools identified as failing to meet AYP standards or as needing improvement. These variations appear to be based, at least in part, not only on underlying differences in achievement levels but also on differences in the degree of rigor or challenge in state pupil performance standards, and on variations in state-determined standards for the minimum size of pupil demographic groups in order for them to be considered in AYP determinations of schools or LEAs. (In general, larger minimum sizes for pupil demographic groups reduce the likelihood that many disadvantaged groups, such as LEP pupils or pupils with disabilities, will be considered in determining whether a school or LEA meets AYP.)

LEAs Failing to Meet AYP Standards

Although most attention, in both the statute and implementation activities, thus far has been focused on application of the AYP concept to schools, a limited amount of information is becoming available about LEAs that fail to meet AYP requirements, and the consequences for them. As shown in Table 2, according to the Consolidated State Performance Reports referred to above, approximately 35% of all LEAs failed to meet AYP standards on the basis of assessment results for the 2007-2008 school year.46 Among the states, there was even greater variation for LEAs than for schools. Two states—Alabama and Wisconsin—reported that 1% of their LEAs failed to make adequate yearly progress, while 100% of the LEAs in South Carolina, plus the single, statewide LEA in Hawaii, failed to meet AYP standards.

Issues in State Implementation of NCLB Provisions

Introduction

The primary challenge associated with the AYP concept is to develop and implement school, LEA, and state performance measures that are: (a) challenging, (b) provide meaningful incentives to work toward continuous improvement, (c) are at least minimally consistent across LEAs and states, and (d) focus attention especially on disadvantaged pupil groups. At the same time, it is generally deemed desirable that AYP standards should allow flexibility to accommodate myriad variations in state and local conditions, demographics, and policies, and avoid the identification of so many schools and LEAs as failing to meet the standards that morale declines significantly systemwide and it becomes extremely difficult to target technical assistance and consequences on low-performing schools. The AYP provisions of NCLB are challenging and complex, and have generated substantial criticism from several states, LEAs, and interest groups. Many critics are especially concerned that efforts to direct resources and apply consequences to low-performing schools would likely be ineffective if resources and attention are dispersed among a relatively large proportion of public schools. Others defend NCLB's requirements as being a measured response to the weaknesses of the pre-NCLB AYP provisions, which were much more flexible but, as discussed above, had several weaknesses.

The remainder of this report provides a discussion and analysis of several specific aspects of NCLB's AYP provisions that have attracted significant attention and debate. These include the provision for an ultimate goal, use of confidence intervals and data-averaging, population diversity effects, minimum pupil group size (n), separate focus on specific pupil groups, number of schools identified and state variations therein, the 95% participation rule, state variations in assessments and proficiency standards, and several issues specific to the use of growth models to determine AYP.

It should be noted that this report focuses on issues that have arisen in the implementation of NCLB provisions on AYP. As such, it generally does not focus on alternatives to the current statutory provisions of NCLB.

Ultimate Goal

The required incorporation of an ultimate goal—of all pupils at a proficient or higher level of achievement within 12 years of enactment—is one of the most significant differences between the AYP provisions of NCLB and those under previous legislation. Setting such a date is perhaps the primary mechanism requiring state AYP standards to incorporate annual increases in expected achievement levels, as opposed to the relatively static expectations embodied in most state AYP standards under the previous IASA. Without an ultimate goal of having all pupils reach the proficient level of achievement by a specific date, states might simply establish relative goals (e.g., performance must be as high as the state average) that provide no real movement toward, or incentives for, significant improvement, especially among disadvantaged pupil groups.

Nevertheless, a goal of having all pupils at a proficient or higher level of achievement, within 12 years or any other specified period of time, may be easily criticized as being "unrealistic," if one assumes that "proficiency" has been established at a challenging level. Proponents of such a demanding ultimate goal argue that schools and LEAs frequently meet the goals established for them, even rather challenging goals, if the goals are very clearly identified, defined, and established, if they are attainable, and if it is made visibly clear that they will be expected to meet them. This is in contrast to a pre-NCLB system under which performance goals were often vague, undemanding, and poorly communicated, with few, if any, consequences for failing to meet them. A demanding goal might maximize efforts toward improvement by state public school systems, even if the goal is not met. Further, if a less ambitious goal were to be adopted, what lower level of pupil performance might be acceptable, and for which pupils?

At the same time, by setting deadlines by which all pupils must achieve at the proficient or higher level, the AYP provisions of NCLB create an incentive for states to weaken their pupil performance standards to make them easier to meet. In many states, only a minority of pupils are currently achieving at the proficient or higher level on state reading and mathematics assessments. Even in states where the percentage of all pupils scoring at the proficient or higher level is substantially higher, the percentage of those in many of the pupil groups identified under NCLB's AYP provisions is substantially lower. It would be extremely difficult for such states to reach a goal of 100% of their pupils at the proficient level without reducing their performance standards.

There has thus far been some apparent movement toward lowering proficiency standards in a small number of states. Reportedly, a few states have redesignated lower standards (e.g., "basic" or "partially proficient") as constituting a "proficient" level of performance for Title I-A purposes, or established new "proficient" levels of performance that are below levels previously understood to constitute that level of performance, and other states have considered such actions.47 For example, in submitting its accountability plan (which was approved by ED), Colorado stated that it would deem students performing at both its "proficient" and "partially proficient" levels, as defined by that state, as being "proficient" for NCLB purposes.48 In its submission, the state argued that "Colorado's standards for all students remain high in comparison to most states. Colorado's basic proficiency level on CSAP is also high in comparison to most states." Similarly, Louisiana decided to identify its "basic" level of achievement as the "proficient" level for NCLB purposes, stating that "[t]hese standards have been shown to be high; for example, equipercentile equating of the standards has shown that Louisiana's 'Basic' is somewhat more rigorous than NAEP's 'Basic.' In addition, representatives from Louisiana's business community and higher education have validated the use of 'Basic' as the state's proficiency goal."49

This is an aspect of NCLB's AYP provisions on which there will likely be continuing debate. It is unlikely that any state, and few schools or LEAs of substantial size and a heterogeneous pupil population, will meet NCLB's ultimate AYP goal, unless state standards of proficient performance are significantly lowered or states aggressively pursue the use of such statistical techniques as setting high minimum group sizes and confidence intervals (described below) to substantially reduce the range of pupil groups considered in AYP determinations or effectively lower required achievement level thresholds.

Some states have addressed this situation, at least in the short run, by "backloading" their AYP standards, requiring much more rapid improvements in performance at the end of the 12-year period than at the beginning. These states have followed the letter of the statutory language that requires increases of "equal increments" in levels of performance after the first two years, and at least once every three years thereafter.50 However, they have "backloaded" this process by, for example, requiring increases only once every two-three years at the beginning, then requiring increases of the same degree every year for the final years of the period leading up to 2013-2014. For example, both Indiana and Ohio established incremental increases in the threshold level of performance for schools and LEAs that are equal in size, and that are to take effect in the school years beginning in 2004, 2007, 2010, 2011, 2012, and 2013. As a result, the required increases per year are three times greater during 2010-2013 than in the 2004-2009 period. These states may be trying to postpone required increases in performance levels until NCLB provisions are reconsidered, and possibly revised, by Congress.

Confidence Intervals and Data-Averaging

Many states have used one or both of a pair of statistical techniques to attempt to improve the validity and reliability of AYP determinations. Use of these techniques also tends to have an effect, whether intentional or not, of reducing the number of schools or LEAs identified as failing to meet AYP standards.

The averaging of test score results for various pupil groups over two- or three-year periods is explicitly authorized under NCLB, and this authority is used by many states. In some cases, schools or LEAs are allowed to select whether to average test score data, and for what period (two years or three), whichever is most favorable for them. As discussed above, recent policy guidance also explicitly allows the use of averaging for participation rates.

The use of another statistical technique was not explicitly envisioned in the drafting of NCLB's AYP provisions, but its inclusion in the accountability plans of several states has been approved by ED. This is the use of "confidence intervals," usually with respect to test scores, but in a couple of states also to the determination of minimum group size (see below). This concept is based on the assumption that any test administration represents a "sample survey" of pupils' educational achievement level. As with all sample surveys, there is a degree of uncertainty regarding how well the sample results—average test scores for the pupil group—reflect pupils' actual level of achievement. As with surveys, the larger the number of pupils in the group being tested, the greater the probability that the group's average test score will represent their true level of achievement, all else being equal. Put another way, confidence intervals are used to evaluate whether achievement scores are below the required threshold to a statistically significant extent.

"Confidence intervals" may be seen as "windows" surrounding a threshold test score level (i.e., the percentage of pupils at the proficient or higher level required under the state's AYP standards).51 The size of the window varies with respect to the number of pupils in the relevant group who are tested, and with the desired degree of probability that the group's average score represents their true level of achievement. This is analogous to the "margin of error" commonly reported along with opinion polls. While test results are not based on a small sample of the relevant population, as are opinion poll results, since the tests are to be administered to the full "universe" of pupils, the results from any particular test administration are considered to be only estimates of pupils' true level of achievement, or of the effectiveness of a school or LEA in educating specified pupil groups, and thus the "margin of error" or "confidence interval" concepts are deemed by many to be relevant to these test scores. The probability, or level of confidence, is most often set at 95%, but in some cases may be as low as 90% or as high as 99%—that is, it is 95% (or 90% or 99%) certain that the true achievement level for a group of pupils is within the relevant confidence interval of test scores above and below the average score for the group. All other relevant factors being equal, the smaller the pupil group, and the higher the desired degree of probability, the larger is the window surrounding the threshold percentage.

For example, consider a situation where the threshold percentage of pupils at the proficient or higher level of achievement in reading for elementary schools required under a state's AYP standards is 40%. Without applying confidence intervals, a school would simply fail to make AYP if the average scores of all of its pupils, or of any of its relevant pupil groups meeting minimum size thresholds, is below 40%. In contrast, if confidence intervals are applied, windows are established above and below the 40% threshold, turning the threshold from a single point to a variable range of scores. The size of this score range or window will vary depending on the size of the pupil group whose average scores are being considered, and the desired degree of probability (95% or 99%) that the average achievement levels for pupils in each group are being correctly categorized as being "truly" below the required threshold. In this case, a school would fail to make AYP with respect to a pupil group only if the average score for the group is below the lowest score in that range.52

The use of confidence intervals to determine whether group test scores fall below required thresholds to a statistically significant degree improves the validity of AYP determinations, and addresses the fact that test scores for any group of pupils will vary from one test administration to another, and these variations may be especially large for a relatively small group of pupils. At the same time, the use of confidence intervals reduces the likelihood that schools or (to a lesser extent) LEAs will be identified as failing to make AYP. Also, for relatively small pupil groups and high levels of desired accuracy (especially a 99% probability), the size of confidence intervals may be relatively large. Ultimately, the use of this technique may mean that the average achievement levels of pupil groups in many schools will be well below 100% proficiency by 2013-2014, yet the schools would still meet AYP standards because the groups' scores are within the relevant confidence interval.

Population Diversity Effects

Minimum Pupil Group Size (n)

Another important technical factor in state AYP standards is the establishment of the minimum size (n) for pupil groups to be considered in AYP calculations. NCLB recognizes that in the disaggregation of pupil data for schools and LEAs, there might be pupil groups that are so small that average test scores would not be statistically reliable, or the dissemination of average scores for the group might risk violation of pupils' privacy rights.

Both the statute and ED regulations and other policy guidance have left the selection of this minimum number to state discretion. While most states have reportedly selected a minimum group size between 30 and 50 pupils, the range of selected values for "n" is rather large, varying from as few as five to as many as 200 pupils53 under certain circumstances. One state (North Dakota) has set no specific level for "n," relying only on the use of confidence intervals (see above) to establish reliability of test results. Although most states have always set a standard minimum size for all pupil groups, some states until recently established higher levels of "n" for pupils with disabilities or LEP pupils.54

In general, the higher the minimum group size, the less likely that many pupil groups will actually be separately considered in AYP determinations. (Pupils will still be considered, but only as part of the "all pupils" group, or possibly other specified groups.) This gives schools and LEAs fewer thresholds to meet, and reduces the likelihood that they will be found to have failed to meet AYP standards. In many cases, if a pupil group falls below the minimum group size at the school level, it is still considered at the LEA level (where it is more likely to meet the threshold). In addition, since minimum group sizes for reporting achievement data are typically lower than those used for AYP purposes,55 scores are often reported for pupil groups who are not separately considered in AYP calculations. At the same time, relatively high levels for "n" weaken NCLB's specific focus on a variety of pupil groups, many of them disadvantaged, such as LEP pupils, pupils with disabilities, or economically disadvantaged pupils.

Separate Focus on Specific Pupil Groups

There are several ongoing issues regarding NCLB's requirement for disaggregation of pupil achievement results in AYP standards, namely the requirement that a variety of pupil groups be separately considered in AYP calculations. The first of these was discussed immediately above: the establishment of minimum group size, with the possible result that relatively small pupil groups will not be considered in the schools and LEAs of states that set "n" at a comparatively high level, especially in states that set a higher level for certain groups (e.g., pupils with disabilities) than others.

A second issue arises from the fact that the definition of the specified pupil groups has been left essentially to state discretion. This is noteworthy particularly with respect to two groups of pupils: LEP pupils and pupils in major racial and ethnic groups. Regarding LEP pupils, many have been concerned about the difficulty of demonstrating that these pupils are performing at a proficient level if this pupil group is defined narrowly to include only pupils unable to perform in regular English-language classroom settings. In other words, if pupils who no longer need special language services are no longer identified as being LEP, how will it be possible to bring those who are identified as LEP up to a proficient level of achievement?

In developing their AYP standards, some states addressed this concern by including pupils in the LEP category for one or more years after they no longer need special language services. As was discussed above, ED has recently published policy guidance encouraging all states to follow this approach, allowing them to continue to include pupils in the LEP group for up to two years after being mainstreamed into regular English language instruction, and further allowing the scores of LEP pupils to be excluded from AYP calculations for the first year of pupils' enrollment in United States schools. If widely adopted, these policies should reduce the extent that schools or LEAs are identified as failing to meet AYP standards on the basis of the LEP pupil group.

Another aspect of this issue arises from the discretion given to states in defining "major racial and ethnic groups." Neither the statute nor ED has defined this term. Some states defined the term relatively comprehensively (e.g., Maryland includes American Indian, African American, Asian, White, and Hispanic pupil groups) and some more narrowly (e.g., Texas identifies only three groups—White, African American, and Hispanic). A more narrow interpretation may reduce the attention focused on excluded pupil groups. It would also reduce the number of different thresholds some schools and LEAs would have to meet in order to make AYP.

A final, overarching issue arises from the relationship between pupil diversity in schools and LEAs and the likelihood of being identified as failing to meet AYP standards. All other relevant factors being equal (especially the minimum group size criteria), the more diverse the pupil population, the more thresholds a school or LEA must meet in order to make AYP. While in a sense this was an intended result of legislation designed to focus (within limits) on all pupil groups, the impact of making it more difficult for schools and LEAs serving diverse populations to meet AYP standards may also be seen as an unintended consequence of NCLB. This issue has been analyzed in a recent study by Thomas J. Kane and Douglas O. Staiger, who concluded that such "subgroup targets cause large numbers of schools to fail ... arbitrarily single out schools with large minority subgroups for sanctions ... or statistically disadvantage diverse schools that are likely to be attended by minority students.... Moreover, while the costs of the subgroup targets are clear, the benefits are not. Although these targets are meant to encourage schools to focus more on the achievement of minority youth, we find no association between the application of subgroup targets and test score performance among minority youth."56 According to the ED report, "Title I Implementation—Update on Recent Evaluation Findings," published in 2009, the percentage of schools failing to make AYP ranged from 7% for those with only 1 subgroup to 20% for those with 2 subgroups, 37% for those with 3 subgroups, and 43-51% for those with 4-8 subgroups.

However, without specific requirements for achievement gains by each of the major pupil groups, it is possible that insufficient attention would be paid to the performance of the disadvantaged pupil groups among whom improvements are most needed, and for whose benefit the Title I-A program was established. Under previous law, without an explicit, specific requirement that AYP standards focus on these disadvantaged pupil groups, most state AYP definitions considered only the performance of all pupils combined. And it is theoretically possible for many schools and LEAs to demonstrate substantial improvements in achievement by their pupils overall while the achievement of their disadvantaged pupils does not improve significantly, at least until the ultimate goal of all pupils at the proficient or higher level of achievement is approached. This is especially true under a "status" model of AYP such as the one in NCLB, under which advantaged pupil groups may have achievement levels well above what is required, and an overall achievement level could easily mask achievement well below the required threshold by various groups of disadvantaged pupils.

One possible alternative to current policy would be to allow states to count each student only once, in net, in AYP calculations, with equal fractions for each relevant demographic category (e.g., a Hispanic LEP pupil from a low-income family would count as one-third of a pupil in each group).

Number of Schools Identified and State Variations Therein

As was discussed earlier, concern has been expressed by some analysts since early debates on NCLB that a relatively high proportion of schools would fail to meet AYP standards. On the basis of assessment results for 2007-2008, 35% of all public schools nationwide failed to make AYP. Further, approximately 13% of all public schools were identified as needing improvement (i.e., failed to meet AYP standards for two or more consecutive years) for 2007-2008. Future increases in performance thresholds, as the ultimate goal of all pupils at the proficient or higher level of achievement is approached, may result in higher percentages of schools failing to make AYP.

In response to these concerns, ED officials have emphasized the importance of taking action to identify and move to improve underperforming schools, no matter how numerous. They have also emphasized the possibilities for flexibility and variation in taking consequences with respect to schools that fail to meet AYP, depending on the extent to which they fail to meet those standards. It should also be re-emphasized that many of the schools reported as having failed to meet AYP standards have failed to meet AYP for one year only, while NCLB requires that a series of actions be taken only with respect to schools or LEAs participating in ESEA Title I-A that fail to meet AYP for two consecutive years or more.

Further, some analysts argue that a set of AYP standards that one-third or more of public schools fail to meet may accurately reflect pervasive weaknesses in public school systems, especially with respect to the performance of disadvantaged pupil groups. To these analysts, the identification of large percentages of schools is a positive sign of the rigor and challenge embodied in NCLB's AYP requirements, and is likely to provide needed motivation for significant improvement (and ultimately a reduction in the percentage of schools so identified).

Others have consistently expressed concern about the accuracy and efficacy of an accountability system under which such a high percentage of schools is identified as failing to make adequate progress, with consequent strain on financial and other resources necessary to provide technical assistance, public school choice and supplemental services options, as well as other consequences. In addition, some have expressed concern that schools might be more likely to fail to meet AYP simply because they have diverse enrollments, and therefore more groups of pupils to be separately considered in determining whether the school meets AYP standards. They also argue that the application of technical assistance and, ultimately, consequences to such a high percentage of schools will dilute available resources to such a degree that these responses to inadequate performance would be insufficient to markedly improve performance.

The proportion of public schools identified as failing to meet AYP standards is not only relatively large in the aggregate, but also varies widely among the states. As was discussed above, the percentage of public schools identified as failing to make AYP on the basis of assessment results for 2007-2008 ranged from 7% to 80% among the states. This result is somewhat ironic, given that one of the major criticisms of the pre-NCLB provisions for AYP was that they resulted in a similarly wide degree of state variation in the proportion of schools identified, and the more consistent structure required under NCLB was widely expected to lead to greater consistency among states in the proportion of schools identified.

It is likely that state variations in the percentage of schools failing to meet AYP standards are based not only on underlying differences in achievement levels, as well as a variety of technical factors in state AYP provisions, but also on differences in the degree of rigor or challenge in state pupil performance standards and assessments. Particularly now that all states receiving Title I-A grants must also participate in state-level administration of NAEP tests in 4th and 8th grade reading and math every two years, this variation can be illustrated for all states by comparing the percentage of pupils scoring at the proficient level or above on NAEP versus state assessments. Such a comparison was conducted by a private organization, Achieve, Inc., based on 8th grade reading and math assessments administered in the spring of 2003.57 For a variety of reasons, the analysis excluded several states; 29 states were included in the comparison for reading, and 32 states for math. According to this analysis, the percentage of pupils statewide who score at a proficient or higher level on state assessments, using state-specific pupil performance standards, was generally much higher than the percentage deemed to be at the proficient or higher level on the NAEP tests, and employing NAEP's pupil performance standards. Of the states considered, the percentage of pupils scoring at a proficient or higher level on the state assessment was lower than on NAEP (implying a more rigorous state standard) for five states58 (out of 32) in math and only two states (out of 29) in reading. Further, among the majority of states where the percentage of pupils at the proficient level or above was found to be higher on state assessments than on NAEP, the relationship between the size of the two groups varied widely—in some cases only marginally higher on the state assessment, and in others the percentage at the proficient level was more than twice as high on the state assessment as on NAEP.

More recently, a report by the National Center for Education Statistics mapped each state's standard for a proficient level of performance in reading and mathematics at the 4th and 8th grade levels for the 2004-2005 school year onto the equivalent NAEP scales.59 The purpose was to compare the level of performance deemed to be proficient under each state's assessment program with the proficient level of performance on the equivalent NAEP test. The report's authors concluded that in comparison to the common standard embodied in NAEP, state standards of proficiency varied widely, and in almost all cases were lower on state tests than under NAEP.60 In fact, the proficient level of performance in many states was found to be lower than the basic level of performance on NAEP.61

A second issue is whether some states might choose to lower their standards of "proficient" performance, in order to reduce the number of schools identified as failing to meet AYP and make it easier to meet the ultimate NCLB goal of all pupils at the proficient or higher level by the end of the 2013-2014 school year. In the affected states, this would increase the percentage of pupils deemed to be achieving at a "proficient" level, and reduce the number of schools failing to meet AYP standards.

It seems likely that the pre-NCLB variations in the proportion of schools failing to meet AYP reflected large differences in the nature and structure of state AYP standards, as well as major differences in the nature and rigor of state pupil performance standards and assessments. While the basic structure of AYP definitions is now substantially more consistent across states, significant variations remain with respect to the factors discussed in this section of the report (such as minimum group size or use of confidence intervals), and substantial differences in the degree of challenge embodied in state standards and assessments remain. Overall, it seems likely that the key influences determining the percentage of a state's schools that fails to make AYP include (in no particular order): (1) degree of rigor in state content and pupil performance standards; (2) minimum pupil group size (n) in AYP determinations; (3) use of confidence intervals in AYP determinations (and whether at a 95% or 99% level of confidence); (4) extent of diversity in pupil population; (5) extent of communication about, and understanding of, the 95% test participation rule; and (6) possible actual differences in educational quality.

95% Participation Rule

It appears that in many cases, schools or LEAs have failed to meet AYP solely because of low participation rates in assessments, meaning that fewer than 95% of all pupils, or of pupils in relevant demographic groups meeting the minimum size threshold, took the assessments. While, as discussed above, ED recently published policy guidance that relaxes the participation rate requirement somewhat—allowing use of average rates over two- to three-year periods, and excusing certain pupils for medical reasons—the high rate of assessment participation that is required in order for schools or LEAs to meet AYP standards is likely to remain an ongoing focus of debate.

Although few argue against having any participation rate requirement, it may be questioned whether it needs to be as high as 95%. In recent years, the overall percentage of enrolled pupils who attend public schools each day has been approximately 93.5%, and it is generally agreed that attendance rates are lower in schools serving relatively high proportions of disadvantaged pupils. Even though schools are explicitly allowed to administer assessments on make-up days following the primary date of test administration, and it is probable that more schools and LEAs will meet this requirement as they become more fully aware of its significance, it is likely to continue to be very difficult for many schools and LEAs to meet a 95% test participation requirement.

Issues Regarding Growth Model Alternatives to AYP Models in the NCLB Statute

Why is there increased interest in growth models for determining AYP under NCLB? What might be the major advantages and disadvantages of growth models of AYP, in comparison to status or improvement models? These questions are addressed in the following pages.

Are Growth Models of AYP More Fair and Accurate than Status or Improvement Models?

Many proponents of growth models for school/LEA AYP see them as being more fair—to both pupils and school staff—and accurate than status or improvement models, primarily because they can be designed to take into consideration the currently widely varying levels of achievement of different pupil groups. Growth models generally recognize the reality that different schools and pupils have very different starting points in their achievement levels and recognize progress being made at all levels (e.g., from below basic to basic, or from proficient to advanced), giving credit for all improvements over previous performance.

Growth models would likely increase the ability to attribute pupil achievement to their current school, as opposed to their past schools or background characteristics, especially (but not only) if controls (and/or predicted growth elements) are included in the model. They more directly measure the effect of schools on the specific pupils they serve over a period of years, attempting to track the movement of pupils between schools and LEAs, rather than applying a single standard to all pupils in each state. They have the ability to focus on the specific effectiveness of schools and teachers with pupils whom they have actually taught for multiple years, rather than the change in performance of pupil groups among whom there has usually been a substantial amount of mobility. They can directly (as well as indirectly) adjust for non-school influences on achievement, comparing the same students across years and reducing errors due to student mobility.

Proponents of growth models often argue that status models of AYP in particular make schools and LEAs accountable for factors over which they have little control, and that status models focus insufficiently on pupil achievement gains, especially if those gains are below the threshold for proficient performance, or gains from a proficient to an advanced level. Status models, such as the current primary model of AYP under NCLB, might even create an undesirable incentive for teachers and schools to focus their attention, at least in the short run, on pupils who are only marginally below a proficient level of achievement, in hopes of bringing them above that sole key threshold, rather than focusing on the most disadvantaged pupils whose achievement is well below the proficient level. The current status model of AYP also confers no credit for achievement increases above the proficient level, that is, bringing pupils from the proficient to the advanced level.

At the same time, growth models of AYP have the significant disadvantage of implicitly setting lower thresholds or expectations for some pupil groups and/or schools. Although any growth model deemed consistent with NCLB would likely need to incorporate that act's ultimate goal of all pupils at a proficient or higher level of achievement by 2013-2014 (see below), the majority of such models used currently or in the past do not include such goals, and tend to allow disadvantaged schools and pupils to remain at relatively low levels of achievement for considerable periods of time.

Growth models of AYP may be quite complicated, and may address the accountability purposes of NCLB less directly and clearly than status or (to a lesser extent) improvement models. If the primary purpose of AYP is to determine whether schools and LEAs are succeeding at raising the achievement of their current pupils to challenging levels, with those goals and expectations applied consistently to all pupil groups, then the current provisions of NCLB might more simply and directly meet that purpose than growth model alternatives.

Pupil mobility among schools and LEAs is substantial, and has important implications for all models of AYP. However, its implications are multifaceted, and do not necessarily favor a particular AYP model. Growth models have the advantage of attempting to track pupils through longitudinal data systems. But if they thereby attribute the achievement of highly mobile pupils among a variety of schools and LEAs, accountability is dispersed. At the same time, the presence of highly mobile pupils in the groups considered in determining AYP under status and improvement models may seem unfair to school staff. However, the impact of such pupils in school-level AYP determinations is limited by NCLB's provision that pupils who have attended a particular school for less than one year need not be considered in such determinations.

Do States Have Sufficient Resources to Develop and Implement Growth Models?

It is generally agreed that growth models of AYP are more demanding than status or improvement models in several respects, especially in terms of data requirements and analytical capacity. For a longitudinal data system sufficient to support a growth model, it is likely that states would need to have pupil data systems incorporating at least the following:

1. a unique statewide student identifier;

2. the ability to produce comparable results from grade-to-grade and from year-to-year (vertically-scaled assessments);

3. student-level enrollment, demographic and program participation information;

4. information on untested students;

5. student level graduation and dropout data; and

6. a statewide audit system.62

Although the availability of information on state data systems is insufficient to enable one to determine with precision how many states could or could not currently implement such models if they chose to do so, it is very likely that growth models generally require resources and data systems that some states currently lack.

This concern is being addressed through an ED program intended to help states design, develop, and implement statewide, longitudinal data systems. An initial appropriation of $24.8 million was provided for this program, administered by ED's Institute of Education Sciences (IES),63 for FY2005. Subsequently, $24.6 million was appropriated for each of FY2006 and FY2007, $48.3 million for FY2008, and $65 million for FY2009. In addition, $250 million was appropriated for this program for FY2009 under P.L. 111-5, the American Recovery and Reinvestment Act (ARRA). Further, the establishment of longitudinal data systems for education is a priority for state participation in the State Fiscal Stabilization Fund and the "Race to the Top" discretionary grant competition under the ARRA.64

Thus far, at least 41 states have received awards through three rounds of competition.65 Under this program, aid is provided to state educational agencies (SEAs) via cooperative agreements, not grants, to allow increased federal involvement in the supported activities. According to the announcement in the April 15, 2005, Federal Register, the program is intended "to enable SEAs to design, develop, and implement statewide, longitudinal data systems to efficiently and accurately manage, analyze, disaggregate, and use individual student data.... Applications from states with the most limited ability to collect, analyze, and report individual student achievement data will have a priority.... " According to ED, the program is designed to help SEAs meet the AYP and reporting requirements of NCLB, as well as to conduct value-added or achievement growth research, including "meaningful longitudinal analyses of student academic growth within all subgroups specified by the No Child Left Behind Act of 2001." There will also be an emphasis on encouraging data sharing among states, while at the same time protecting the security and privacy of data.

Are Growth Models Consistent with NCLB's Ultimate Goal?

Most growth models used before initiation of ED's growth model pilot, or still used as part of state-specific accountability systems, have not incorporated an ultimate goal such as the one under NCLB—that all pupils reach a proficient or higher level of achievement by 2013-2014. Non-NCLB growth models have generally incorporated one of two types of growth target, the "how much improvement is enough" aspect of the model: (a) data driven/predicted growth, or (b) policy driven/required growth targets. The first type of growth target has been most common, while NCLB's ultimate goal would represent a growth target of the second variety, with separate paths (with presumably separate starting points) for each relevant pupil cohort. The models approved thus far under ED's growth model pilot arguably meet the ultimate goal requirement. However, under some of these models, pupils need only be proficient or on track toward proficiency within a limited number of years as of 2013-2014.

Acknowledgments

[author name scrubbed], former Specialist in Education Policy, was the original author of this report.

Footnotes

1.

These consequences, as well as possible performance-based awards, are not discussed in detail in this report. For information on them, see CRS Report RL33731, Education for the Disadvantaged: Reauthorization Issues for ESEA Title I-A Under the No Child Left Behind Act, by [author name scrubbed].

2.

For additional information on this legislation, see CRS Report 89-7, Education for Disadvantaged Children: Major Themes in the 1988 Reauthorization of Chapter 1, by [author name scrubbed] (out of print, available upon request).

3.

For additional information on the standard and assessment requirements under ESEA title I-A, see CRS Report RL31407, Educational Testing: Implementation of ESEA Title I-A Requirements Under the No Child Left Behind Act, by [author name scrubbed].

4.

There is a variant of the group status model, sometimes called an "index model," under which partial credit would be attributed to performance improvements below the proficient level—e.g., from below basic to basic.

5.

Scores are typically combined for pupils in all assessed grade levels in a school.

6.

One state, Massachusetts, has injected a partial growth element into its safe harbor provision. In that state, a school or LEA that fails to meet the standard AYP requirements still makes AYP if the number of pupils in relevant groups and subjects scoring below the proficient level declines by 10% or more from the previous year or declines sufficiently to put them on track toward proficiency by the end of the 2013-2014 school year.

7.

See, for example, Issues in the Design of Accountability Systems, by Robert L. Linn, CSE Technical Report 650, National Center for Research on Evaluation, Standards, and Student Testing, April 2005.

8.

 For more information on all aspects of the ESEA Title I-A assessment requirements, see CRS Report RL31407, Educational Testing: Implementation of ESEA Title I-A Requirements Under the No Child Left Behind Act, by [author name scrubbed].

9.

All pupils in states where AYP determinations were made for all public schools, or all pupils served by ESEA Title I-A in states where AYP determinations were made only for such schools and pupils.

10.

See http://www.cpre.org/Publications/Publications_Accountability.htm.

11.

U.S. Department of Education, Office of the Undersecretary, Policy and Program Studies Service, Evaluation of Title I Accountability Systems and School Improvement Efforts (TASSIE): First-Year Findings, 2004. Hereafter referred to as the TASSIE First-Year Report.

12.

See the U.S. Department of Education, "Paige Releases Number of Schools in School Improvement in Each State," press release, July 1, 2002 at http://www.ed.gov/news/pressreleases/2002/07/07012002a.html.

13.

Another report published by ED in 2004 (the TASSIE First-Year Report—see footnote 11) stated that 8,078 public schools had been identified as failing to meet AYP standards for two or more consecutive years in the 2001-2002 school year.

14.

Program regulations published in 2002 did not require graduation rates and other additional academic indicators to be disaggregated in determining whether schools or LEAs meet AYP standards. However, regulations published subsequently in October 2008 (discussed later in this report) require graduation rates to be disaggregated in AYP determinations.

15.

If the number of pupils in a specified demographic group is too small to meet the minimum group size requirements for consideration in AYP determinations, then the participation rate requirement does not apply.

16.

It has occasionally been said that the AYP systems approved by ED for a few states before initiation of the growth model pilot announced in November 2005 incorporate "growth" elements. However, such claims appear to be based primarily on the inclusion in the AYP systems of "pupil achievement indexes" that give partial credit for achievement gains below the proficient level, comparing this year's pupil groups with last year's. They do not meet the definition of growth model as used in this report.

17.

 This is determined by ranking all public schools (of the relevant grade level) statewide according to their percentage of pupils at the proficient or higher level of achievement (based on all pupils in each school), and setting the threshold at the point where one-fifth of the schools (weighted by enrollment) have been counted, starting with the schools at the lowest level of achievement.

18.

Under program regulations [34 C.F.R. § 200.16(c)(2)], the starting point may vary by grade span (e.g., elementary, middle, etc.) and subject.

19.

As noted earlier, under the accountability policy approved for use in Massachusetts, a school or LEA also meets the safe harbor requirement if the number of pupils in relevant groups and subjects scoring below the proficient level declines sufficiently to put them on track toward proficiency by the end of the 2013-2014 school year.

20.

Under NCLB, state AYP systems must include at least one indicator, other than achievement test scores. For senior high schools, the additional indicator must be the graduation rate. A typical additional indicator for elementary and middle schools is the attendance rate.

21.

The plans, as updated over time, have been posted online by ED at http://www.ed.gov/admins/lead/account/stateplans03/index.html.

22.

ED has approved state accountability plans under which schools or LEAs would be identified as failing to meet AYP only if they failed to meet the required level of performance in the same subject for two or more consecutive years, but has not approved proposals under which a school would be identified only if it failed to meet AYP in the same subject and pupil group for two or more consecutive years.

23.

See http://www.ed.gov/news/pressreleases/2005/11/11182005.html.

24.

See the Federal Register for October 29, 2008 (pages 64435-64513).

25.

See http://www.ed.gov/policy/elsec/guid/growthmodelguidance.pdf.

26.

One other state, Massachusetts, incorporates a partial growth element into its safe harbor provision. In that state, a school or LEA that fails to meet the standard AYP requirements still makes AYP if the number of pupils in relevant groups and subjects scoring below the proficient level declines by 10% or more from the previous year or declines sufficiently to put them on track toward proficiency by the end of the 2013-2014 school year.

27.

Delaware's proposal included use of confidence intervals at an unspecified level in implementing the growth model; however, ED approved use of the model without confidence intervals.

28.

Most states use confidence intervals in their AYP determinations. However, in most cases, the confidence intervals are applied to group average percentages of students scoring proficient or above, not individual student scores.

29.

"Evaluation of the 2005-2006 Growth Model Pilot Program," January 15, 2009, U.S. Department of Education, available at http://www.ed.gov/admins/lead/account/growthmodel/index.html.

30.

See http://www.ed.gov/policy/elsec/guid/secletter/090401.html.

31.

This includes students who graduate following a summer program after their fourth year.

32.

For a more complete discussion and analysis of this topic, see CRS Report R40701, Alternate Assessments for Students with Disabilities, by Erin D. Caffrey.

33.

This limitation does not apply to the administration of alternate assessments based on the same standards applicable to all students, for other pupils with (non-cognitive or less severe cognitive) disabilities.

34.

Under current regulations, the short-term policy cannot be extended beyond the 2008-2009 school year.

35.

This would be calculated on the basis of statewide demographic data, with the resulting percentage applied to each affected school and LEA in the state. In making the AYP determination using the adjusted data, no further use may be made of confidence intervals or other statistical techniques. (The actual, not just the adjusted, percentage of pupils who are proficient must also be reported to parents and the public.)

36.

The 3.0% limit might be exceeded for LEAs, but only if—and to the extent that—the SEA waives the 1.0% cap applicable to scores on alternate assessments based on alternate achievement standards.

37.

In such cases, the former pupils with disabilities would not have to be counted in determining whether the minimum group size was met for the disability subgroup.

38.

See Federal Register, June 24, 2004, pp. 35462-35465; and http://www.ed.gov/nclb/accountability/schools/factsheet-english.html.

39.

See http://www.ed.gov/policy/elsec/guid/stateletters/asaypnc.html.

40.

See http://www.ed.gov/policy/elsec/guid/secletter/050929.html.

41.

For additional information on this topic, see CRS Report RL33236, Education-Related Hurricane Relief: Legislative Action, by [author name scrubbed] et al.

42.

See Center on Education Policy, Rule Changes Could Help More Schools Meet Test Score Targets for the No Child Left Behind Act, October 22, 2004, available at http://www.cep-dc.org/nclb/StateAccountabilityPlanAmendmentsReportOct2004.pdf; Title I Monitor, Changes in Accountability Plans Dilute Standards, Critics Say, November 2004; Council of Chief State School Officers, Revisiting Statewide Educational Accountability Under NCLB, September 2004, available at http://www.ccsso.org; and "Requests Win More Leeway Under NCLB," Education Week, July 13, 2005, p. 1.

43.

See http://www.ed.gov/admins/lead/account/consolidated/index.html.

44.

Data are also available on Title I-A-recipient schools and LEAs that fail to make adequate yearly progress. However, in the aggregate, the results are quite similar. For 2007-2008, 35% of all public schools and 36% of all Title I-A schools were reported as failing to make AYP. For LEAs, 35% of all LEAs and 38% of all Title I-A LEAs failed to make AYP.

45.

See http://www.ed.gov/programs/statestabilization/schooldata.pdf.

46.

This calculation was based on data for all states except Louisiana.

47.

See, for example, "States Revise the Meaning of 'Proficient'," Education Week, October 9, 2002.

48.

See http://www.ed.gov/admins/lead/account/stateplans03/cocsa.pdf, p. 7.

49.

See http://www.ed.gov/admins/lead/account/stateplans03/lacsa.doc, p 12.

50.

According to Section 1111(b)(2)(H), "Each State shall establish intermediate goals for meeting the requirements, ... of this paragraph and that shall—(i) increase in equal increments over the period covered by the State's timeline.... " The program regulations also would seem to require increases in equal increments: "Each State must establish intermediate goals that increase in equal increments over the period covered by the timeline.... " (34 C.F.R. § 200.17).

51.

Alternatively, the confidence interval "window" may be applied to average test scores for each relevant pupil group, that would be compared to a fixed threshold score level to determine whether AYP has been met.

52.

The text above describes the way in which confidence intervals have been used by states for AYP determinations. The concept could be applied in a different way, requiring scores to be at or above the highest score in the "window" in order to demonstrate that a pupil group had meet AYP standards to a statistically significant degree. This would reflect confidence (at the designated level of probability) that a school or LEA had met AYP standards, whereas the current usage reflects confidence that the school or LEA had failed to meet AYP standards.

53.

In Texas, the minimum group size for pupil groups (other than the "all pupils" group, where the minimum is 40) is the greater of 50 students or 10% of all students in a school or LEA (up to a maximum of 200). In California, the minimum group size is the greater of 50 students or 15% of all students in the school or LEA (up to a maximum of 100).

54.

Under regulations published on April 9, 2007, this practice is no longer allowed.

55.

Minimum group sizes for AYP purposes are typically in the range of 30 to 40 pupils, while those for reporting are typically in the range of five to 20 pupils.

56.

Thomas J. Kane and Douglas O. Staiger, "Unintended Consequences of Racial Subgroup Rules," in Paul Peterson and Martin West, eds., No Child Left Behind? The Politics and Practice of School Accountability (Washington: Brookings Institution Press, 2003), pp. 152-176.

57.

Center on Education Policy, From the Capital to the Classroom, Year 2 of the No Child Left Behind Act (January 2004), p. 61.

58.

In two additional states, the percentages were essentially the same.

59.

National Center for Education Statistics, U.S. Department of Education, Mapping 2005 State Proficiency Standards Onto the NAEP Scales, Washington, DC, June 2007, http://nces.ed.gov/nationsreportcard/pdf/studies/2007482.pdf.

60.

Of the four subject and grade level comparisons, the state proficiency standard was found to be equivalent to or higher than the NAEP proficiency standard for no states in 4th or 8th grade reading, for two states in 4th grade math, and for three states in 8th grade math.

61.

Of the four subject and grade level comparisons, the state proficiency standard was found to be lower than the NAEP basic standard for 24 states in 4th grade reading, for 12 states in 8th grade reading, for six states in 4th grade math, and for eight states in 8th grade math.

62.

Aimee Guidera, director of the Data Quality Campaign, as quoted in: Commission on No Child Left Behind, Commission Staff Research Report, "Growth Models: An examination within the context of NCLB," August 2006, available at http://www.aspeninstitute.org/atf/cf/{DEB6F227-659B-4EC8-8F84-8DF23CA704F5}/Growth%20Models%20and%20NCLB%20Report.pdf, visited on September 6, 2006.

63.

This program is authorized by Section 208 of the Education Sciences Reform Act of 2002, P.L. 107-279. The authorized funding level is $80 million for FY2003 and "such sums as may be necessary" for each of the succeeding five fiscal years.

64.

See CRS Report R40151, Funding for Education in the American Recovery and Reinvestment Act of 2009 (P.L. 111-5), by [author name scrubbed] et al. for details.

65.

For additional information, see http://nces.ed.gov/Programs/SLDS/.