We live and work in a time of extreme accountability. Many recent events
have made the need for accountability apparent. Some that come immediately to
mind are the banking industry and major college football. Faced with the
repercussions from events where, in hindsight, it was apparent that accountability
was lost, the general public is primed to hold those in the public eye accountable. As
educators, we come under this same scrutiny.
Music educators are no strangers to accountability, but it is often not the
same for them as it is for their colleagues in other subject areas. Music teachers and
directors are aware of the seemingly endless days of testing that often require
students to miss their music classes and rehearsals. They are aware that the results
of these tests will reflect on the school and district in which they teach and specifically
on the educators who teach these 'tested' subjects or grade levels. Music teachers,
however, often think of accountability differently and usually in terms of music
performance. Whether a third grade program or a national ensemble festival, many
music teachers look to these events as a means of receiving evaluations of their
work. At the secondary level, Russell and Austin (2010) report that even though
music classes are given the same credit for graduation as other subjects,
accountability in grading practices varies considerably. They state that assessment at
this level often includes a "combination of achievement and non-achievement criteria,
with non-achievement criteria receiving greater weight in determining grades" (p.
37). This practice may be a concern in the culture of accountability that surrounds
public schooling.
Excessive testing has, however, adversely impacted music education. The
constant push for increased test scores has narrowed the curriculum and has
narrowed instructional practices to those simply aimed at increasing test scores.
Additionally, testing does not recognize the individual differences among a wide
variety of student populations, providing inconclusive data that is viewed as factual.
In vain attempts to 'improve' our educational system, policy makers use these flawed
data to develop additional systems of accountability. I share Paul Lehman's (2012)
opinion as he expressed it in the most recent
Music Educators Journal, where he
states, "we did not really try to improve education; instead, we just opted to test kids
regularly to see if education has somehow, magically improved itself" (p. 29). Yet,
additional accountability particularly in the form of teacher evaluation is coming and
for the informed music educator this could be an opportunity to make some very
positive additions to the practice of the profession.
A Brief History of Teacher Supervision and Evaluation
The idea of teacher supervision and evaluation is not a new concept. Early
endeavors to evaluate teachers, even when teaching was not considered a
profession, were attempted with local control, most often by clergy. From this
beginning, the industrial revolution impacted educational thought as the 'factory
model' was imposed upon schools. Teachers took on the role of workers and
students were viewed as "raw products" (Chubberly, 1916, p. 338). Using
productivity as a goal, teacher assessment became dependent upon 'scientific
measurement' of prescribed teacher behaviors (Marzano, Frontier, & Livingston,
2011). Post World War II the focus began to change, viewing teachers as individuals.
Contextual elements were considered as teachers were evaluated using more
collaborative models where teachers and supervisors met and discussed issues that
impacted learning in the classroom. From this sprang the concept of clinical
supervision (CS) and it quickly became the pervasive model. Marzano et al. (2011)
cite Bruce and Hoehn (1980) who claim that by 1980 aporximately 90 percent of
teachers were evaluated via a CS model. Five phases comprise the normal CS
procedure, (1) a pre-observation conference, (2) the actual observation, (3) analysis
by the observer, (4) a post-observation conference, and (5) supervisory analysis to
ensure reliability of the process. The Hunter (1984) model emerged from the CS
movement. Her seven-stage model (anticipatory set, objective and purpose, input,
modeling, checking for understanding, guided practice, and independent practice)
quickly became the foundation of many observation instruments (Fehr, 2001).
In a reaction to the prescriptive use of CS models, a movement toward
more developmental and reflective approaches began in the 1980's. Designed
around teacher input and control, many of these models featured differentiated levels
of observation and evaluation based upon teacher expertise. The models were not
prescriptive, but were open-ended applying a narrative approach to the evaluation
process. Teachers did not like the process, citing that feedback was vague and there
were no clear expectations because the systems were not built on teacher
competencies (Wise, Darling-Hammond, McLaughlin, & Bernstein, 1984).
The 21st century has seen a shift from teacher supervision to teacher
evaluation. With this shift has come the call to link student achievement with teacher
evaluation. Tucker and Stronge (2005, as cited in Marzano et al., 2011) forcefully
support their position stating, "Given the clear and undeniable link that exists between
teacher effectiveness and student learning, we support the use of student
achievement information in teacher assessment" (p. 102). Recent examinations of
teacher evaluation systems note that the majority is based almost entirely upon
teacher credentials and most often only require a marking of either 'satisfactory' or
'unsatisfactory' (Toch & Rothman, 2008). Using the current teacher evaluation
models, "The New Teacher Project of twelve districts in four states revealed that
more than 99 percent of teachers in districts using binary ratings were rated
satisfactory whereas 94 percent received one of the top two ratings in districts using
a broader range of ratings" (Glazerman et al., 2011, p. 3). Attempts to address the
use of student achievement data and lack of differentiation in the current systems
appear to be driving current policy decision-making.
Current Foundations of Teacher Evaluation
Emphasis on overhauling teacher evaluation in the United States has
garnered much attention in the educational practice and research communities. As a
place to start, Linda Darling-Hammond (2012) suggests that, "in the process of
measuring teacher effectiveness, it is important to distinguish between
teacher quality
and
teaching quality" (p. i). Teacher quality addresses what the person brings to
education as an individual. Personal backgrounds and contextual understandings are
included in teacher quality. Teaching quality "refers to strong instruction that enables
a wide range of students to learn" (p. i). In this view, teaching quality is a sub-set of
teacher quality and not only includes the knowledge, skills, and dispositions teachers
bring to a learning environment, but also how these are put to use in consideration of
all the factors that can affect student learning. Darling-Hammond contends that,
"policymakers must address the teaching and learning environment as well as the
capacity of individual teachers" (p. i) if any attempt at teacher evaluation is to be
effective. To this end, she suggests five points of a high-quality teacher evaluation
system. They include:
- Common statewide standards
- Performance assessments, base on statewide standards, guiding state function
- Local evaluation systems aligned to the same standards
- Support structures (e.g., evaluator training, teacher mentoring)
- Aligned professional learning opportunities. (p. ii)
Additionally, the US Department of Education lists six design principles for effective
teacher evaluation. These include:
- All teachers should be evaluated annually.
- Evaluations should be based on clear standards of instructional excellence that prioritize student learning.
- Evaluations should consider multiple measures, with emphasis on a teacher's impact on student academic growth.
- Evaluations should employ four to five rating levels.
- Evaluations should encourage frequent observations and constructive critical feedback.
- Evaluation outcomes must matter; evaluation data should be a major factor in key employment decisions.
As state departments of education apply for their flexibility waivers from the
requirements of the Elementary and Secondary Education Act (ESEA) (commonly
referred to as NCLB), these are the standards that their proposals must meet to be
accepted. To date, 32 states have received waivers and five more are waiting on the
dispositions of their applications. It appears that teacher evaluation will undergo
major reform in most states within the next two years.
Standards and Measurement
Developing, adopting, or reforming common state standards is where many
initiatives have begun. Unclear in many of these initiatives is what these standards
address. To help make these clearer, it may be best to consider these standards in
two groups. The first group contains student-learning standards. These include
collections such as
common core standards, the current national standards for music
education, and any number of state standards that address student-learning
outcomes. This is a critical time for music learning standards. Music education, along
with all the arts, has enjoyed its position as 'core curriculum' as defined by the ESEA.
However, with the ability to move away from the demands of this legislation via the
flexibility waivers it is possible that this position within the curriculum may be
reevaluated by individual states. Additionally, the National Coalition for State
Standards is in the process of rewriting the national standards for all the arts. These
are both issues to keep an eye on.
The second group of standards addresses teaching. These include
frameworks developed by the National Board for Professional Teaching Standards
(NBPTS), Interstate New Teacher Assessment and Support Consortium (InTASC), The
Marzano Evaluation Model, The Danielson Framework for Teaching, and various other
state and district designs. These frameworks address the
teacher and
teaching
qualities previously discussed. Through multiple measures, users of these
frameworks seek to evaluate the presence (or lack thereof) of research-based
behaviors shared by effective teachers.
Standards measurement requires the collection of two different sets of data
and two different methodological approaches to analyze these data. Teaching
standards will be measured qualitatively via the frameworks described above. While
similar to current practice in teacher evaluation, many of these frameworks are much
more systematic than current frameworks including measures of teacher reflection
and peer assessment. In addition, most include differentiated rating scales of up to
five levels (i.e. ineffective, partially effective, effective, highly effective). Student
learning standards are to be measured quantitatively using students' standardized
test scores or 'other academic measures.' Test score gains will be used to calculate
what is called a value-added component and "typically incorporates a variety of
statistical controls for differences among teachers in the circumstances in which they
teach. Such a measure is called teacher value-added because it estimates the value
that individual teachers add to the academic growth of their students" (Glazerman
et.al., 2011). Attempting to understand exactly what data will be used is obviously an
area of concern for all subject areas where mandatory assessment of individual
student achievement is not required.
Calculating final evaluation scores varies from state to state. Typically 50%
of the final score is comprised of the qualitative evaluation with the other 50% coming
from the quantitative evaluation. The distribution of percentage weights given to
various measures within each of these broader areas varies considerably. Teachers
are advised to investigate the evaluation plan in their states, districts or schools and
make note of what measures are used in calculating each of the broader scores.
Value-Added Measures
While there are two portions to most proposed teacher evaluation models, it
appears that music educators are most concerned with how the value-added
component of their evaluation will be calculated. This is a valid concern and the
remainder of this article will address this component of the evaluations. Perhaps, the
best way to address these concerns to provide examples of what some states are
proposing at this time. While every attempt was made to access the most current
data possible, the reader is cautioned to confirm these before using these examples
as conclusive evidence of practice. Legislation and state rules are changing at such a
rapid rate it is impossible for any static publication to be entirely up to date.
Colorado. A number of features within the Colorado system are worthy of note.
First, within the plan there is recognition that teachers and students do not interact in
a vacuum and that some student achievement is attributable to all educators who
come in contact with a student. Writers of this plan suggest, "Schools are highly
encouraged to include measures of student growth for students that are attributable
to multiple teachers" (
www.cde.state.co.us/EducatorEffectiveness/Partner-SCEE.asp).
The model also includes a useful graphic designed to help schools and districts
decide what measures may be best suited for given circumstances. This may be of
particular help to those who have input on the measures that are being proposed to
address the value-added component for 'non-tested' subjects like music (See Figure
1).
Figure 1 - Colorado model for measures in non-tested subjects
Delaware. A 3-part measure of student growth is proposed for use in Delaware.
Part 1 comprises 30% of the total student growth (quantitative) score. Each teacher
in the school will receive the same score based on either school-wide reading scores
or school-wide math scores, whichever is higher. Another 20% is based upon a
student cohort assessment measure. For educators in 'non-tested' areas, each
teacher will identify a cohort of students, with administrative approval, who are
regularly "touched" by the teacher. Mean scores from this group on the approved
measures will constitute this portion of the score. The remaining 50% will be teacher
specific assessment measures that will be developed by the state department of
education and directly tied to each teacher's teaching assignment.
Georgia. During the initial conference between the evaluator and teacher, student
achievement goals will be set. These may be individual or common to a group of
teachers. All data collected (multiple measures are recommended) are to be related
to these goals alone. Regardless of teaching assignment in either state tested or
non-tested curricula, all teachers are to be scored on the same continuum of
improvement. This continuum is a clearly defined rubric (See Figure 2) that is made
available to all teachers prior to the beginning of the evaluation process.
Figure 2 - Georgia Continuum of Improvement
Figure 2: Source - Georgia State Department of Education
Other Approaches
It appears that other states are still considering a number of different
approaches to address the value-added calculation on each teacher's evaluation.
Information from
North Carolina states that student achievement data will only be
used when the evaluator and the teacher disagree on the final rating. In
Ohio, the
state department of education (SDE) states that 50% of the final evaluation will
include a value-added progress dimension and that for non-tested areas the board
will administer an assessment "on the list." However, the list has yet to be published.
In
Tennessee the value added component will account for 35% of the student growth
component. For teachers in non-tested subjects, the school-wide value-added score
in literacy and math, or both will be used. The SDE is developing tests for non-tested
areas. In
Oklahoma, the SDE states that they are conducting more research to
determine how to assess the value-added component in non-tested subject areas. As
a 'Tier 1' state, Oklahoma educators pilot implementation this year. The
Arkansas
Senate Weekly Sessions Update, dated June 28, 2012 states, "Several types of
standardized tests are possible for evaluating a teacher. Any tests used must be
'external,' that is, graded by an impartial third party such as a national testing firm,"
bringing into question what external measures can be used to assess student
achievement in music.
Final Thoughts
The need to reform teacher evaluation has been well established and is
likely to move forward regardless of how the political winds blow. There are,
however, valid concerns about how some parts will be conducted and what data will
be used to calculate scores, particularly for music teachers. As we have seen here,
for some music teachers the "value-added" components of their evaluations may
have very little to do with the musical knowledge, skills and dispositions they are
teaching in their classroom. In most cases, this does not appear to be an attempt to
discredit music education, but without valid and reliable measures of individual
student growth in music, states are left with few options. So what needs to be done?
All music educators need to be part of the conversation. Contact school, district, and
state leaders to offer assistance in addressing this issue. Support KMEA along with all
other state MEAs as they work to gain a voice at the table as decisions are being
made. All music teachers are busy and have more than enough to manage in their
own classroom(s). However, this is a critical time when music teachers' voices need
to be heard (in more than just song). Meaningful teacher evaluation needs to be a
collaborative effort that is done
with the teacher and not a process that is done
to the
teacher. We need to meet our obligations to be equal partners in the process.
References
Bruce, R., & Hoehn, L. (1980).
Supervisory practice in Georgia and Ohio. Paper
presented at the Annual Meeting of the Council of Professors of Instructional
Supervision, Hollywood, FL.
Chubberly, E. (1916).
Public school administration: A statement of the fundamental
principles underlying the organization of administration of public education. Boston:
Houghton Mifflin Co.
Darling-Hammond, L. (2012).
Creating a comprehensive system for evaluating and
supporting effective teaching. Stanford, CA: Stanford Center for Opportunity Policy
in Education.
Fehr, S. (2001).
The role of educational supervision in the United States public
schools from 1970 to 2000 as reflected in the supervision literature. Unpublished
doctoral dissertation, Pennsylvania State University, State College.
Glazerman, S., Goldhaber, D., Loeb, S., Raudenbush, S., Staiger, D., & Whitehurst,
G. (2011).
Passing muster: Evaluating teacher evaluation systems. Washington
D.C.: The Brookings Institute.
Hunter, M. (1984). Knowing, teaching, and supervising. In P. Hosford (Ed.),
Using
what we know about teaching (pp. 169-192). Alexandria, VA: Association for
Supervision and Curriculum Development.
Lehman, P. (2012). Reforming education: The big picture.
Music Educators Journal
98(4), 29-30.
Marzano, R., Frontier, T., & Livingston, D. (2011).
Effective supervision: Supporting
the art and science of teaching. Alexandria, VA: Association for Supervision and
Curriculum Development.
Russell, J., & Austin, J. (2010). Assessment practices of secondary music teachers.
Journal of Research in Music Education 58(1), 37-54.
Toch, T., & Rothman, R. (2008).
Rush to judgment: Teacher evaluation in public
education 1. Washington, DC: Education Sector.
Tucker, P. D., & Stronge, J. H. (2005).
Linking teacher evaluation and student
learning. Alexandria, VA: Association for Supervision and Curriculum Development.
Wise, E., Darling-Hammond, L., McLaughlin, M., & Bernstein, H. (1984).
Teacher
evaluation: A study of effective practices. Santa Monica, CA: RAND.