KMR - Kansas Music Review

We live and work in a time of extreme accountability. Many recent events have made the need for accountability apparent. Some that come immediately to mind are the banking industry and major college football. Faced with the repercussions from events where, in hindsight, it was apparent that accountability was lost, the general public is primed to hold those in the public eye accountable. As educators, we come under this same scrutiny.

Music educators are no strangers to accountability, but it is often not the same for them as it is for their colleagues in other subject areas. Music teachers and directors are aware of the seemingly endless days of testing that often require students to miss their music classes and rehearsals. They are aware that the results of these tests will reflect on the school and district in which they teach and specifically on the educators who teach these 'tested' subjects or grade levels. Music teachers, however, often think of accountability differently and usually in terms of music performance. Whether a third grade program or a national ensemble festival, many music teachers look to these events as a means of receiving evaluations of their work. At the secondary level, Russell and Austin (2010) report that even though music classes are given the same credit for graduation as other subjects, accountability in grading practices varies considerably. They state that assessment at this level often includes a "combination of achievement and non-achievement criteria, with non-achievement criteria receiving greater weight in determining grades" (p. 37). This practice may be a concern in the culture of accountability that surrounds public schooling.

Excessive testing has, however, adversely impacted music education. The constant push for increased test scores has narrowed the curriculum and has narrowed instructional practices to those simply aimed at increasing test scores. Additionally, testing does not recognize the individual differences among a wide variety of student populations, providing inconclusive data that is viewed as factual. In vain attempts to 'improve' our educational system, policy makers use these flawed data to develop additional systems of accountability. I share Paul Lehman's (2012) opinion as he expressed it in the most recent Music Educators Journal, where he states, "we did not really try to improve education; instead, we just opted to test kids regularly to see if education has somehow, magically improved itself" (p. 29). Yet, additional accountability particularly in the form of teacher evaluation is coming and for the informed music educator this could be an opportunity to make some very positive additions to the practice of the profession.

A Brief History of Teacher Supervision and Evaluation

The idea of teacher supervision and evaluation is not a new concept. Early endeavors to evaluate teachers, even when teaching was not considered a profession, were attempted with local control, most often by clergy. From this beginning, the industrial revolution impacted educational thought as the 'factory model' was imposed upon schools. Teachers took on the role of workers and students were viewed as "raw products" (Chubberly, 1916, p. 338). Using productivity as a goal, teacher assessment became dependent upon 'scientific measurement' of prescribed teacher behaviors (Marzano, Frontier, & Livingston, 2011). Post World War II the focus began to change, viewing teachers as individuals. Contextual elements were considered as teachers were evaluated using more collaborative models where teachers and supervisors met and discussed issues that impacted learning in the classroom. From this sprang the concept of clinical supervision (CS) and it quickly became the pervasive model. Marzano et al. (2011) cite Bruce and Hoehn (1980) who claim that by 1980 aporximately 90 percent of teachers were evaluated via a CS model. Five phases comprise the normal CS procedure, (1) a pre-observation conference, (2) the actual observation, (3) analysis by the observer, (4) a post-observation conference, and (5) supervisory analysis to ensure reliability of the process. The Hunter (1984) model emerged from the CS movement. Her seven-stage model (anticipatory set, objective and purpose, input, modeling, checking for understanding, guided practice, and independent practice) quickly became the foundation of many observation instruments (Fehr, 2001).

In a reaction to the prescriptive use of CS models, a movement toward more developmental and reflective approaches began in the 1980's. Designed around teacher input and control, many of these models featured differentiated levels of observation and evaluation based upon teacher expertise. The models were not prescriptive, but were open-ended applying a narrative approach to the evaluation process. Teachers did not like the process, citing that feedback was vague and there were no clear expectations because the systems were not built on teacher competencies (Wise, Darling-Hammond, McLaughlin, & Bernstein, 1984).

The 21st century has seen a shift from teacher supervision to teacher evaluation. With this shift has come the call to link student achievement with teacher evaluation. Tucker and Stronge (2005, as cited in Marzano et al., 2011) forcefully support their position stating, "Given the clear and undeniable link that exists between teacher effectiveness and student learning, we support the use of student achievement information in teacher assessment" (p. 102). Recent examinations of teacher evaluation systems note that the majority is based almost entirely upon teacher credentials and most often only require a marking of either 'satisfactory' or 'unsatisfactory' (Toch & Rothman, 2008). Using the current teacher evaluation models, "The New Teacher Project of twelve districts in four states revealed that more than 99 percent of teachers in districts using binary ratings were rated satisfactory whereas 94 percent received one of the top two ratings in districts using a broader range of ratings" (Glazerman et al., 2011, p. 3). Attempts to address the use of student achievement data and lack of differentiation in the current systems appear to be driving current policy decision-making.

Current Foundations of Teacher Evaluation

Emphasis on overhauling teacher evaluation in the United States has garnered much attention in the educational practice and research communities. As a place to start, Linda Darling-Hammond (2012) suggests that, "in the process of measuring teacher effectiveness, it is important to distinguish between teacher quality and teaching quality" (p. i). Teacher quality addresses what the person brings to education as an individual. Personal backgrounds and contextual understandings are included in teacher quality. Teaching quality "refers to strong instruction that enables a wide range of students to learn" (p. i). In this view, teaching quality is a sub-set of teacher quality and not only includes the knowledge, skills, and dispositions teachers bring to a learning environment, but also how these are put to use in consideration of all the factors that can affect student learning. Darling-Hammond contends that, "policymakers must address the teaching and learning environment as well as the capacity of individual teachers" (p. i) if any attempt at teacher evaluation is to be effective. To this end, she suggests five points of a high-quality teacher evaluation system. They include:

Common statewide standards
Performance assessments, base on statewide standards, guiding state function
Local evaluation systems aligned to the same standards
Support structures (e.g., evaluator training, teacher mentoring)
Aligned professional learning opportunities. (p. ii)

Additionally, the US Department of Education lists six design principles for effective teacher evaluation. These include:

All teachers should be evaluated annually.
Evaluations should be based on clear standards of instructional excellence that prioritize student learning.
Evaluations should consider multiple measures, with emphasis on a teacher's impact on student academic growth.
Evaluations should employ four to five rating levels.
Evaluations should encourage frequent observations and constructive critical feedback.
Evaluation outcomes must matter; evaluation data should be a major factor in key employment decisions.

As state departments of education apply for their flexibility waivers from the requirements of the Elementary and Secondary Education Act (ESEA) (commonly referred to as NCLB), these are the standards that their proposals must meet to be accepted. To date, 32 states have received waivers and five more are waiting on the dispositions of their applications. It appears that teacher evaluation will undergo major reform in most states within the next two years.

Standards and Measurement

Developing, adopting, or reforming common state standards is where many initiatives have begun. Unclear in many of these initiatives is what these standards address. To help make these clearer, it may be best to consider these standards in two groups. The first group contains student-learning standards. These include collections such as common core standards, the current national standards for music education, and any number of state standards that address student-learning outcomes. This is a critical time for music learning standards. Music education, along with all the arts, has enjoyed its position as 'core curriculum' as defined by the ESEA. However, with the ability to move away from the demands of this legislation via the flexibility waivers it is possible that this position within the curriculum may be reevaluated by individual states. Additionally, the National Coalition for State Standards is in the process of rewriting the national standards for all the arts. These are both issues to keep an eye on.

The second group of standards addresses teaching. These include frameworks developed by the National Board for Professional Teaching Standards (NBPTS), Interstate New Teacher Assessment and Support Consortium (InTASC), The Marzano Evaluation Model, The Danielson Framework for Teaching, and various other state and district designs. These frameworks address the teacher and teaching qualities previously discussed. Through multiple measures, users of these frameworks seek to evaluate the presence (or lack thereof) of research-based behaviors shared by effective teachers.

Standards measurement requires the collection of two different sets of data and two different methodological approaches to analyze these data. Teaching standards will be measured qualitatively via the frameworks described above. While similar to current practice in teacher evaluation, many of these frameworks are much more systematic than current frameworks including measures of teacher reflection and peer assessment. In addition, most include differentiated rating scales of up to five levels (i.e. ineffective, partially effective, effective, highly effective). Student learning standards are to be measured quantitatively using students' standardized test scores or 'other academic measures.' Test score gains will be used to calculate what is called a value-added component and "typically incorporates a variety of statistical controls for differences among teachers in the circumstances in which they teach. Such a measure is called teacher value-added because it estimates the value that individual teachers add to the academic growth of their students" (Glazerman et.al., 2011). Attempting to understand exactly what data will be used is obviously an area of concern for all subject areas where mandatory assessment of individual student achievement is not required.

Calculating final evaluation scores varies from state to state. Typically 50% of the final score is comprised of the qualitative evaluation with the other 50% coming from the quantitative evaluation. The distribution of percentage weights given to various measures within each of these broader areas varies considerably. Teachers are advised to investigate the evaluation plan in their states, districts or schools and make note of what measures are used in calculating each of the broader scores.

Value-Added Measures

While there are two portions to most proposed teacher evaluation models, it appears that music educators are most concerned with how the value-added component of their evaluation will be calculated. This is a valid concern and the remainder of this article will address this component of the evaluations. Perhaps, the best way to address these concerns to provide examples of what some states are proposing at this time. While every attempt was made to access the most current data possible, the reader is cautioned to confirm these before using these examples as conclusive evidence of practice. Legislation and state rules are changing at such a rapid rate it is impossible for any static publication to be entirely up to date.

Colorado. A number of features within the Colorado system are worthy of note. First, within the plan there is recognition that teachers and students do not interact in a vacuum and that some student achievement is attributable to all educators who come in contact with a student. Writers of this plan suggest, "Schools are highly encouraged to include measures of student growth for students that are attributable to multiple teachers" (www.cde.state.co.us/EducatorEffectiveness/Partner-SCEE.asp). The model also includes a useful graphic designed to help schools and districts decide what measures may be best suited for given circumstances. This may be of particular help to those who have input on the measures that are being proposed to address the value-added component for 'non-tested' subjects like music (See Figure 1).

Figure 1 - Colorado model for measures in non-tested subjects

Figure 1 - Colorado model for measures in non-tested subjects

Delaware. A 3-part measure of student growth is proposed for use in Delaware. Part 1 comprises 30% of the total student growth (quantitative) score. Each teacher in the school will receive the same score based on either school-wide reading scores or school-wide math scores, whichever is higher. Another 20% is based upon a student cohort assessment measure. For educators in 'non-tested' areas, each teacher will identify a cohort of students, with administrative approval, who are regularly "touched" by the teacher. Mean scores from this group on the approved measures will constitute this portion of the score. The remaining 50% will be teacher specific assessment measures that will be developed by the state department of education and directly tied to each teacher's teaching assignment.

Georgia. During the initial conference between the evaluator and teacher, student achievement goals will be set. These may be individual or common to a group of teachers. All data collected (multiple measures are recommended) are to be related to these goals alone. Regardless of teaching assignment in either state tested or non-tested curricula, all teachers are to be scored on the same continuum of improvement. This continuum is a clearly defined rubric (See Figure 2) that is made available to all teachers prior to the beginning of the evaluation process.

Figure 2 - Georgia Continuum of Improvement

Figure 2 - Georgia Continuum of Improvement

Figure 2: Source - Georgia State Department of Education

Other Approaches

It appears that other states are still considering a number of different approaches to address the value-added calculation on each teacher's evaluation. Information from North Carolina states that student achievement data will only be used when the evaluator and the teacher disagree on the final rating. In Ohio, the state department of education (SDE) states that 50% of the final evaluation will include a value-added progress dimension and that for non-tested areas the board will administer an assessment "on the list." However, the list has yet to be published. In Tennessee the value added component will account for 35% of the student growth component. For teachers in non-tested subjects, the school-wide value-added score in literacy and math, or both will be used. The SDE is developing tests for non-tested areas. In Oklahoma, the SDE states that they are conducting more research to determine how to assess the value-added component in non-tested subject areas. As a 'Tier 1' state, Oklahoma educators pilot implementation this year. The Arkansas Senate Weekly Sessions Update, dated June 28, 2012 states, "Several types of standardized tests are possible for evaluating a teacher. Any tests used must be 'external,' that is, graded by an impartial third party such as a national testing firm," bringing into question what external measures can be used to assess student achievement in music.

Final Thoughts

The need to reform teacher evaluation has been well established and is likely to move forward regardless of how the political winds blow. There are, however, valid concerns about how some parts will be conducted and what data will be used to calculate scores, particularly for music teachers. As we have seen here, for some music teachers the "value-added" components of their evaluations may have very little to do with the musical knowledge, skills and dispositions they are teaching in their classroom. In most cases, this does not appear to be an attempt to discredit music education, but without valid and reliable measures of individual student growth in music, states are left with few options. So what needs to be done? All music educators need to be part of the conversation. Contact school, district, and state leaders to offer assistance in addressing this issue. Support KMEA along with all other state MEAs as they work to gain a voice at the table as decisions are being made. All music teachers are busy and have more than enough to manage in their own classroom(s). However, this is a critical time when music teachers' voices need to be heard (in more than just song). Meaningful teacher evaluation needs to be a collaborative effort that is done with the teacher and not a process that is done to the teacher. We need to meet our obligations to be equal partners in the process.

References

Bruce, R., & Hoehn, L. (1980). Supervisory practice in Georgia and Ohio. Paper presented at the Annual Meeting of the Council of Professors of Instructional Supervision, Hollywood, FL.

Chubberly, E. (1916). Public school administration: A statement of the fundamental principles underlying the organization of administration of public education. Boston: Houghton Mifflin Co.

Darling-Hammond, L. (2012). Creating a comprehensive system for evaluating and supporting effective teaching. Stanford, CA: Stanford Center for Opportunity Policy in Education.

Fehr, S. (2001). The role of educational supervision in the United States public schools from 1970 to 2000 as reflected in the supervision literature. Unpublished doctoral dissertation, Pennsylvania State University, State College.

Glazerman, S., Goldhaber, D., Loeb, S., Raudenbush, S., Staiger, D., & Whitehurst, G. (2011). Passing muster: Evaluating teacher evaluation systems. Washington D.C.: The Brookings Institute.

Hunter, M. (1984). Knowing, teaching, and supervising. In P. Hosford (Ed.), Using what we know about teaching (pp. 169-192). Alexandria, VA: Association for Supervision and Curriculum Development.

Lehman, P. (2012). Reforming education: The big picture. Music Educators Journal 98(4), 29-30.

Marzano, R., Frontier, T., & Livingston, D. (2011). Effective supervision: Supporting the art and science of teaching. Alexandria, VA: Association for Supervision and Curriculum Development.

Russell, J., & Austin, J. (2010). Assessment practices of secondary music teachers. Journal of Research in Music Education 58(1), 37-54.

Toch, T., & Rothman, R. (2008). Rush to judgment: Teacher evaluation in public education 1. Washington, DC: Education Sector.

Tucker, P. D., & Stronge, J. H. (2005). Linking teacher evaluation and student learning. Alexandria, VA: Association for Supervision and Curriculum Development.

Wise, E., Darling-Hammond, L., McLaughlin, M., & Bernstein, H. (1984). Teacher evaluation: A study of effective practices. Santa Monica, CA: RAND.