Rethinking Literacy (and All) Assessment
To whatever degree I have been an effective teacher over a 33-year (and counting) career directly and indirectly connected to teaching literacy has been grounded in my inclination to assess constantly my practices against my instructional goals.
Teaching is some combination of curriculum (content, the what of teaching), instruction (pedagogy, the how of teaching), and assessment (testing, the monitoring of learning). When I was in teacher education as a candidate, the world of teaching was laser-focused on instruction — our learning objectives scrutinized and driving everything.
Over the three decades of accountability grounded in standards and high-stakes testing, however, and the rise of backward design, both how students are tested (test formats) and what tests address have become the primary focus of K-12 teaching.
Accountability’s state and national impact has increased the importance of standardized testing — the amount of tests students are required to take but also the format of in-class assessments teachers use to prepare students for those tests.
High-stakes and large-scale testing is governed in many ways by efficiency — formats such as multiple choice that can be marked by computer; and therefore, many K-12 teachers model their assessment content and formats on what students will face in these high-stakes environments.
Over my career, then, I have watched teaching to the test move from a practice shunned by best practice to the default norm of K-12 education.
As a committed practitioner of de-grading and de-testing the classroom, I offer below some big picture concepts that I believe every teacher should consider in order to improve the quality of grading and testing practices, in terms of if and how our assessments match our instructional goals instead of how efficient our tests are or how well our classroom assessments prepare students for (really awful) large-scale high-stakes tests.
The principles and practices below are imperative for literacy instruction and learning, but apply equally well to all learning goals and content.
Holistic v. skills (standardized tests). Let’s imagine for a moment that you wish to learn to play the piano, and you are given lessons on scales, proper fingering, etc., using worksheets. After a unit on playing the piano, you are given a multiple-choice test on that material, scoring an A.
Having never played the piano or practiced at the piano, what do you think of that A?
To be proficient in the context of efficient skills-based tests is not the same as being proficient in holistic behaviors. While the testing industry has sold us on the idea that efficient skills-based tests (usually multiple choice) correlate strongly with the authentic goals for learning we seek, we should be far more skeptical of that claim.
Along with the problem of efficiency in standardized tests and selected-response tests in class-based assessment is the historical and current purposes of large-scale testing — for example, IQ and college entrance exams such as the SAT and ACT.
IQ testing has its roots in identifying low academic ability (identifying people who were expendable) and has never overcome problems with race, class, and gender bias.
College entrance exams began as a process for distinguishing among top students; therefore, test items that create spread are “good,” regardless of how well the question achieves our instructional goals.
For classroom teachers who seek assessments that support better teaching and learning, then, we should be seeking to assess in holistic ways first, and then to expose students to the formats and expectations of high-stakes testing.
One goal for rethinking assessment is to emphasize allowing and requiring students to practice whole behaviors (composing original texts, reading full texts by choice, etc.) and then to assess students’ levels of proficiency by asking them to repeat whole behaviors in testing situations.
Accomplishment v. deficit perspective. I am certain we have all experienced and many of us have practiced this standard approach to grading a student’s test: Marking with an “X” the missed items and then totaling the grade somewhere on the sheet, such as 100–35 = 65.
Let’s consider for a moment the assumptions and implications (as well as negative consequences) of this process.
First, this implies that students begin tests with 100 points — for doing nothing. Further, that creates an environment in which students are trying not to lose something they did not earn to begin with.
Now, a much more honest and healthy process for all assessments is that students begin with zero, nothing, and then the teacher evaluates the test for what the student accomplishes, not looking for and marking errors (something Connie Weaver calls, and rejects, as the “error hunt”).
By avoiding a deficit perspective (starting with 100 and marking errors) and embracing an accomplishment perspective (starting with zero and giving credit for achievement), we are highlighting what our students know and helping them to overcome risk aversion fostered by traditional (behavioral) practices in school.
Moving toward an accomplishment perspective is particularly vital for literacy development since taking risks is essential for growth. It is particularly powerful when giving feedback on and grading student writing (I learned this method during Advanced Placement training on scoring written responses to the exam).
Collaboration v. isolation. “[T]he knowledge we use resides in the community,” explains Gareth Cook, examining Steven Sloman and Philip Fernbach’s The Knowledge Illusion: Why We Never Think Alone, adding, “We participate in a community of knowledge. Thinking isn’t done by individuals; it is done by communities.”
However, traditional approaches to assessment are nearly always done in isolation; collaboration in testing situations is deemed cheating, in fact.
Consider for a moment your own lives as readers and writers. What do we love to do when reading a new novel? Talk with a trusted friend about the book, right? Community and collaboration fuel a better understanding of the work.
When writing, feedback is essential, another eye on our ideas, an uninvested editor to catch our mistakes.
While many of us have embraced community and collaboration in our instruction — implementing workshops or elements of workshops — we rarely allow collaboration in assessment.
See this post for an example of collaborative assessment in my introductory education course.
Feedback v. grades. One of the most frustrating aspects of practicing a de-graded classroom is that my students often identify on their opinion surveys of my courses that I do not provide adequate feedback — because they conflate grades (which I do not give throughout the semester) with actual feedback on their assignments (which I do offer, abundantly and quickly).
Most teachers, I believe, spend far too much time grading and then students receive insufficient feedback that requires them to interact with and learn from that help.
One element of my concern is that when teachers provide extensive feedback on graded work, most students check the grade and do not engage at all with the feedback; this is a waste of the teacher’s time and not contributing to student learning.
Ideally, we should be providing ample and manageable feedback on work that requires students to address that feedback, either in some response or through revision (see below).
For literacy instruction, fore-fronting feedback, requiring and allowing revision, and then delaying grades all support a much more effective process than traditional grading.
Revision v. summative assessment. That process above embraces revision over summative grading.
Whole literacy experiences, low-stakes environments that encourage risk, high-proficiency modeling and mentoring, and then opportunities to try again, to revise — these are the tenets of powerful and effective literacy instruction and assessment.
When students experience reading and writing as one-shot events mainly produced to be graded, they are cheated out of the awareness that literacy is cyclical, and recursive — to read and then to read again, to write and then to write again.
For Paulo Freire, literacy is agency, empowerment; we must read the world and re-read the world, write and re-write the world.
At the very least, we should decrease summative assessments and grading while increasing how often we require and allow revision.
Many argue that reducing grading also removes necessary accountability for student engagement, and while I find these arguments less compelling, I do replace my use of grades with minimum requirements for credit in any class or course. And I use those minimum requirements to emphasize the aspects of learning experiences I believe are most important.
Therefore, drafting of essays and revision are required, just as conferencing is.
Ultimately, our assessment and grading policies and practices send very strong messages about what matters in our classes; we must be diligent we are sending the messages we truly embrace.
Recalibrating grade scales (with a caveat) and no more averaging grades. Debates and policies about what numerical grades constitute each letter grade — such as whether a 90, a 93, or a 94 is the lower end of the A-range — are little more, to me, than rearranging chairs on the deck of the Titanic.
Instituting uniform grade scales in schools, districts, or entire states is unlikely to produce the results proponents claim; however, some policy moves concerning grades are both warranted and highly controversial — such as creating a floor score (such as a 50 or 62) for an F.
Low numerical summative grades and the flawed practice of averaging grades have very negative consequences for students — the worst of which is creating a statistical death penalty for students early in a course that may encourage those students to stop trying.
Creating a floor grade on F’s is instructionally and statistically sound, then, but only if combined with the minimum requirement concept discussed above. In other words, converting a zero to 50 or 62 when a student does poorly on an assignment is not the same thing as converting a zero to 50 or 62 when a student submits no work at all.
The latter must not be allowed since students can game the system by doing no work until late in the grading period and depending on averages to produce a passing grade for the course.
Therein lies the failure of averaging grades.
Averages skew the weight of grades earned while learning instead of honoring the assessment or assessments after students have had ample time to learn, practice, and create a showcase artifact of learning.
As well, averages are not as representative of reality as modes, for example. Consider the following grades earned by a student: 10, 10, 85, 85, 85, 85, 85, 85, 100, 100.
The average for these grades is 73, but the mode is 85, and if these grades are earned in this order (10 early and the 100 last) on cumulative assessments, the 100 is also a potentially fair grade.
Grade and grade scales, then, are incredibly flawed in their traditional uses. Combining a revised, equitable numerical/letter grade structure (with minimum requirements of participation included) and choosing modes over averaging or portfolio assessment instead of averaging is recommended if de-grading is not an option.
The concepts above about rethinking assessment are effective ways to interrogate current assessment practices, and they are urgent for improving literacy instruction.
I do urge seeking ways to de-grade and de-test the classroom regardless of what is being taught, but in the real world, I recognize that goal may seem impossible.
The ways I offer above to rethink assessment, I believe, are quite practical and certainly are justifiable once we consider if and how our assessment practices do or don’t reflect our teaching and learning goals.
And thus: “A critical pedagogy asks us to reconsider grading entirely,” argues Sean Morris, “and if we can’t abandon it whole-hog, then we must revise how and why we grade.”