Dilemmas in Education: a matter of balance

The rose and the thorn, and sorrow and gladness are linked together.
Saadi

According to the theory of quantum physics the universe is a chaotic, unpredictable place. Yet, patterns of organization and order are seen throughout nature from butterfly wings to crystals of ice. Vermont's state motto is"Freedom and Unity", while neighboring New Hampshire advertises "Live Free or Die" on their license plate. According to Lewis (1986) reality is pluralistic, shifting, and often appears to be in conflict. However, these opposing forces are not contradictory any more than the pull of gravity and push of centrifugal force. A measure of both are needed to keep the earth in orbit.

Dilemmas in education
Since schools reflect our society they too are riddled with ambiguity and apparent contradiction on both the macro and micro levels. Some examples of contemporary conflicts include:

    (1) Equity funding versus local funding This is an issue reported upon almost daily in the Vermont media and focuses on the dilemma of funding for public education. Local property taxes have been the base of financial support, and local community control the tradition. Vast differences in tax bases of communities, and the non-equitable burden of a property tax on the 'land rich" and income poor has made equity of funding a critical issue and a recent American Civil Liberties Union (A.C.L.U.) law suit against the state (Miller, 1995). The teachers' union fears that negotiation with a state-wide body will mean greater state control and greater homogeneity of the schools. How can we extol diversity and homogenize education? How can we expect wealthier communities to pay for education in other locals? Taxation without representation takes on a new meaning and has already been identified as the precipitator of at least one war whose shot was heard 'round the world.

    (2) Integration versus segregation by gender in science and math classes is another conflict pressing at our public school system. The lack of female representation in the math and physical science professions has led some schools to adopt all female math and physics courses. A number of research studies demonstrate that girls get better grades in these classes and are more likely to take additional math and science courses (Gross, J. 1993; Grossman & Grossman, 1994). How does this impact upon what we believe about equity and inclusion? Can separate ever be equal? Can segregation by gender be considered an acceptable solution, or should it be considered an affirmative action to level the playing field?

    (3) Constructivism and lecture methods are antithetical models of education. Research shows that children learn by construction of information more than by transfer of knowledge, (Abbot, 1995) yet lecture models of education are efficient for teaching large numbers of students. There are still lecture halls seating 300 students in many universities but lab classes, which enable students to construct information by developing hypotheses and performing experiments, are rarely larger than 20. How does a school work with limited resources and still provide the kind of education for each child that we know works best? When I taught in New York City it was rare that I had a class smaller than 35 students. How do schools provide authentic experiences for students with large class sizes, a bevy of state mandates and few resources?

    (4) Computer laboratories or computers in every class? I frequently receive calls from principals telling me of a bond they managed to get passed enabling them to buy 15 to 20 computers for their school. They invariably ask whether they should put one machine in every teacherÕs room or place them all in one computer lab. What is the best way to dole out the limited number of computers? Some teachers use them more than others. Should those teachers get more in their classrooms? Is it fair to a child who happens to be in a class where their teacher is computer illiterate? How does this effect equity of access to a valuable educational tool?

    (5) Standardized or alternative assessment How should we assess learning? Vermont has been very progressive about initiating changes in assessment which has precipitated many questions and conflicts. As the university supervisor of student interns, I have had the opportunity to discuss portfolio assessment with a number of cooperating teachers. Although most schools in the state have embraced this new assessment, standardized tests are also still given that do not measure the same skills. In math for example, do teachers continue to respond to the National Council of Teachers of Mathematics (N.C.T.M.) standards that call for the use of manipulatives and problem solving math that is performance based and part of portfolio assessment, or do they focus on computational skill and drill practice to bring up those standardized test scores? Which kind of assessment is better to use? Do they tell us the same things? Testing and assessment are increasingly the levers of choice for educational reform in the United States today (Darling-Hammond, L. 1994). They drive curriculum and instruction and are a focal point of the Clinton administrationÕs Goals 2000 legislation. However standardized and performance assessments are embedded in diametrically different paradigms of sociology, philosophy, and organization. For this reason I have selected this dilemma to discuss in detail.

Scope of the problem

standardized tests
Standardized assessments have met the criteria of being reliable in their consistency of subject performance as well as valid in reflecting the knowledges and skills they are intended to measure. We know how to design these tests, we have lots of experience and results. They are easy to arrange and oversee and are understood both by teachers and the community at large. They are inexpensive to grade and administer. They are scientific and objective; we can run the data through a variety of statistical calculations to analyze, sort, compare, and rank it.
On the other hand, do they really test what is important? They don't measure creativity, the ability to communicate, or the ability to come up with new solutions. The fragmented questions assess breadth of information at lower cognitive levels. According to Wiggins (1993), their questions are not cued to knowledge because mastery is not achieved by the application of algorithms, but by the ability to perform knowledge assessed as we construct something in response to the particular tasks and situation at hand. Linn, Baker & Dunbar (1991), suggest validity and reliability are not sufficient indicators of the success of an assessment. They've added: consequences, fairness, transfer and generalizability, cognitive complexity, content quality, content coverage, meaningfulness and cost/efficiency, as more important criteria to consider.
Additionally many argue that what a human knows cannot be measured objectively because the evaluator and instrument of assessment can't be separated. Human action and experience are complex and situation dependent and can only be understood within their contexts. Studies of behavior in home and laboratory environments show parent-child interactions are different in these environments. Human action can only be understood within its own context of socially grounded rules for defining, categorizing and interpreting the meaning of our conduct (Bogdan & Biklen, 1992; Lincoln, 1985; Mischler, 1979; Weaver, 1985).

alternative assessments
Alternative assessments include performance, authentic and portfolio formats and conform to the same category of testing developments and only differ in connotation and denotation (Coutinho & Malouf, 1993). Performance assessment, according to the 1992 report on testing from the Office of Technology Assessment, is 'testing that requires a student to create an answer or a product that demonstrates his or her knowledge or skills." Wiggins (1993) offers a similar definition; that of executing a task or process and bringing it to completion as a construct of performance. This may include open-ended questions, exhibits, group projects, interviews, oral presentations, demonstrations, hands-on experiments, computer simulations and portfolios (Herman, J. 1992; Rudner & Boston, 1994). Some consider authentic assessment to be a sub category of performance assessment while others view performance assessment as a sub category of authentic assessment with portfolios being one kind of evidence of the performance (Coutinho & Malouf, 1993). For purpose of focus, I shall talk about the similarities of these assessments under the umbrella of alternative assessments.
Alternative assessments allow for diversity, and creativity. They serve multiple purposes and are magnets of instruction, providing teachers with better instructional tools with an emphasis on teaching relevant skills. They promote instruction that is deep, incorporating alternative solutions and cooperation (Abbot, 1995; Darling-Hammond, 1994). They describe how someone performs in context, solving real problems. Another benefit of alternative assessment is its potential to involve teachers directly in the assessment process. Fill-in-the-grid tests are developed by others and usually graded externally. Using performance and portfolio testing involves a great deal of teacher training in understanding rubrics and curricula outcomes and thus provides another avenue for school reform (O'Neil, 1992).
However, The problems with alternative assessment are also plentiful. In a literature search of 89 articles on performance assessment from the previous 10 years, Herman and Winters (1994) found only seven articles reporting technical data or employing accepted research methods. Most articles only explained the rationale for portfolio use, models, and stories of implementation. There is difficulty with reliability, grading is more subjective, and they are much more time consuming to devise, administer and evaluate. This translates into money with estimated costs ranging from two to three times more than a standardized test (O'Neil, 1992).
Knowing is cued to context, but fidelity to criterion situations maximizes the complexity and ambiguity of the task requirements and maximizes the freedom to respond in a variety of ways. The very elements that define alternative testing work against standardization and reliability, our traditional ways of assessing the instruments themselves (Wiggins, 1993; Herman, 1992). A RAND corporation report analyzing the Vermont Portfolio Assessment Program after the first year of implementation found weak rater reliability scores suggesting that the results were meaningless. The response from the state was not to give up but to learn how to do it better (Rothman, 1992).

Analysis of the issues
Assessment selection can be viewed as a philosophy and a methodology. Assessment drives education. It determines what is taught, how it is taught, and is an important lever of school reform. Accountability pressures teachers and administrators to focus planning and instructional effort on test content and devote more and more time preparing students to do well on the tests (Herman, 1992). Standardized tests focus on only a part of the curriculum, mainly knowledge questions, neglecting higher order thinking skills of application, analysis, synthesis and prediction. Research on traditional standardized tests has shown that they have a negative impact on program quality (Herman, 1992) because teachers focus on teaching responses and strategies to answering lower level questions.

Sociological differences
Using the lens of a functionalist, schools can be considered an institution for transmitting the culture and trying to prepare all students for success in society based on their abilities (Groslin, 1965). Standardized tests are a meritorious way of providing an opportunity to reward hard working and deserving students and furnish a fair, objective way to accomplish this. The prize in this open contest is a good education and the chance of an elite status (Turner, 1971). Structuralists perceive schools as institutions reproducing society, offering one kind of education to the dominant class and another to the working class. Dominant groups have discrete means of preserving their privileged access to desirable positions and use education to transmit these social inequalities without violating democratic ideology (Swartz, 1990). The skills to be assessed on these tests are those the hegemony possesses, values and covets. The skills that other groups possess are not valued. Gordon (1992) reported that little attention is given to the cultural capital that Afro-American children bring with them to a learning situation . Gardner (1983) has identified at least seven distinctive areas of human intellectual competencies and only two of these, linguistical and logical, are the basis of assessment for purposes of tracking, ranking, and sorting. "We know how to use test data to rank, rather than improve, schools, and to sort, rather than to educate, children" (Wolf, LeMahieu, & Eresh, 1992, p. 9).
Standardized tests themselves can be riddled with biases that favor particular groups. One example of this is the Scholastic Aptitude Test (SAT) and the documented gender bias found in the mathematics portion of this measure. Elements such as timing tests, references to familiar contexts, test atmosphere and styles of questions can favor some groups' test performances. Analysis of question content reveals a large proportion of questions in mathematics related to sports events and other male interests. Such biases are often not considered in rating a tests validity and reliability. The SAT, a filter used by most colleges for entrance and scholarship opportunities, has been found to seriously under predict females' ability to perform in college math courses. Math scores of college women and men who had earned the same grades in the same college course, found women's scores thirty five points lower than those of their male classmates. (American Association of University Women, 1992).
There is a great deal of research demonstrating that students conform to the level of expectations teachers set for them. These self-fulfilling prophecies have been referred to as the Pygmalion Effect. When students were randomly assigned IQ scores, teacher's treated those with higher scores as brighter and as a result their grades went up. Bourdieu described the term "habitus" to refer to the permanent unconscious ideas about one's chances of success that strongly relate to successful academic performance, a kind of cultural capital that determines what is taught, and how it is assessed (Swartz, 1990). One blatant example of cultural capital that was evident to me appeared on the New York City Reading Test. The CLOZE format allowed children to select from a list of words to complete a paragraph with blanks. The first reading on the seventh grade exam required the word "skiff" to be implanted into a story on boats, not a word or experience my class of poor inner-city students knew.
In schools where standardized scores are lowest there is an even greater focus of time and effort on relentless preparation for these exams. Since these schools tend to be the ones serving at-risk and disadvantaged populations, these students spend a great deal of their time simply learning how to take this kind of test. Apple (1983) refers to this practice as "deskilling", a way of controlling what students learn by reducing the curriculum to a set of predetermined, predefined bits of lower thinking levels of knowledge. When I taught in New York City, we started preparing students for the standardized reading test taken in March, right after Christmas. This was done in all subject areas, including science which is what I taught. Students were expected to practice CLOZE exams in each of their subject areas on a daily basis until the exam. Worksheets and workbooks with practice tests were generously supplied for this purpose. All schools would then be "ranked" by results whose publication in the New York Times each year had a determining effect on administrative contract renewals. Since in a normed test half of the population is defined as "below normal", schools would be thrown into a frenzy of wasted energy and resources competing with each other. This serves as an example of the punitive focus this kind of assessment produces.

Philosophical differences
Much of the discourse surrounding standardized versus alternative assessments are based on perspectives of truth, reality, and how we ascertain what they are and if they even exist. If there is one known reality then we may be able to define criteria, control variables and determine causes and effects. If there is no one definable reality, what do we compare it to? These arguments parallel those of traditional research and the emerging paradigm of qualitative research. Where we stand is what we see and therefore what we even choose to assess is a matter of perspective (Lincoln & Guba, 1986). At different times in the past, reality was defined in different ways. (Burke, 1985). During the first half of this century the universe was depicted as structured, organized and predictable. With the onset of quantum mechanics and chaos theory, a different picture has been painted of an uncontrollable, random universe. More contemporary theories surrounding connectivity create new perceptions of the universe as a living organism with all events impacting upon each other. If there is no tangible reality, how do we assess against it? As we keep shifting to greater complexity, how can we identify each iota of knowledge to measure? This view challenges our faith in scientific inquiry and traditional constructs even though upon close examination, scientific discovery has left a trail of old beliefs replaced by new ones. James Burke writes "Science ...is not what it appears to be. It is not objective and impartial, since every observation it makes of nature is impregnated with theory" (Burke, 1985 p. 336).
According to Mischler, (1979) the problem with positivism is the belief that context-stripping to arrive at objective data is applicable to the study of social sciences. To qualitative researchers, just as to advocates of alternative assessment, events can be understood only if they are seen in natural context settings. Both tester and testee are affected by their interaction and their environment. The experience is seen as a whole instead of fragmented variables (Ely, Anzul, Friedman, Garner & Steinmetz, 1991). Humans are too complex to analyze into specific pieces. They are multidimensional and unique and cannot be compared to "norms". These are much the same criteria used to defend alternative assessments.

Organizational differences
Concepts of standardization are characteristic of organizations with a mechanistic infrastructure, a structure that grew out of an industrialized model of society (Morgan, 1986). They are departmentalized, hierarchical and authoritarian organizations which translate into standardized assessment as fragmentation, ranking and control. There is little or no room for creativity because conformity of response is the goal. The whole is the sum of its parts and everything needs to be broken down into its smallest task. This bureaucratic approach was reflected in standardized tests which measured how fast someone could respond to small, departmentalized bits of information. They were designed to do this by selecting those people who could perform these tasks best. As we have learned from the business community, such inflexible static structures have been crumbling in response to the earthquakes of change.
In response, organizations are needed that "must be designed as learning systems that place primary emphasis on being open to inquiry and self-criticism"(Morgan, 1986, p. 105). Morgan describes this kind of organization as "brains". They are richly connected, redundant, and are engaged in a range of functions, and like authentic learning and testing systems they are complex, integrated, coherent and reflective. Success on a standardized test tells little about the kind of person needed to function in a "brain" organization.

"The times, they are a changin!"
Changes in society are also effecting how we view the world and what knowledge we value as important. Standardized tests reflect an industrialized model of education that makes round pegs fit into round holes, or sands down square pegs until they are round. Changes in society have created the need for new skills and processes. The need is for people who can learn on the job, in collaboration with other people, and use technical resources to help them (Abbott, 1995). Knowing what a ŌskiffĶ is doesnÕt means you know how to solve problems, synthesize information and predict, which are the higher order skills that our information age market requires.

This change in the need for a different kind of workforce has precipitated school reform and this new look at assessment. The form of the change itself is not easily definable and pricks at the stability that is important to people. It rocks and shatters the stable state (Schon, 1971). We look for reasons to avoid change and tend to ignore data that would upset our current thinking. Deciding whether to adapt alternative assessment is a decision of significant change because it forces us to define what constitutes high quality work (Abruscato, 1993). This impacts upon many of our basic belief systems and methodologies, and therefore constitutes a considerable bulwark to overcome.

If the process of learning becomes the goal, then how do we create assessments to measure progress? How do we test learning to learn? This is a real difference in how we do business. ItÕs trying to hit a moving target on a moving horse and learning how to "aim, fire, aim," to constantly inform us about how well we fired.

A light in the paradox
Portfolio assessment is one kind of alternative assessment piloted and practiced in Vermont. In mathematics and language arts children in grades 4 and 8 are asked to solve problems, explain their answers, seek alternative solutions, and place samples of these skills in a portfolio (Vermont Department of Education, 1994). This policy was implemented after discussions with teachers who wanted more information about student performance and less about percentile ranks. They said they wanted to "...capture the moments when students are working at their best" (Abruscato, 1993, p. 475). Service giver and client are the primary actors in any policy implementation (Elmore,1980), so for new policies to succeed they have to serve their needs. The voice of both teacher and student are important to listen to in guiding educational reform. Even with a strong commitment to portfolio assessment, traditional methods of multiple choice testing are also used (Vermont Department of Education, 1994). Locally defined change that is creepingly slow is often the best because it is more stable. Implementation should be seen as bargaining and transforming and focusing on individuals, not institutions (McLaughlin, 1987). However along with these changes all individuals need to be educated and kept informed. Professional development is crucial, and not just for teachers. School board members, parents, students and other educators need to experience performance assessment to understand why we cannot continue to use old measurements for criteria that are no longer valued.
Good policy is one which fosters coherence and meaningful relationships between all actors in the system and where all actors are involved in the implementation (O'Hare, 1986). Better connections between educators and the public is the only avenue for nurturing support for educational reform and the freedom of discovering new ways of learning. Focus groups, discussion via interactive television, family math nights and other community events should be opportunities to build trust and reflect upon the feedback of new innovations. Homework assignments should include parents and others in solving authentic problems and evaluating success.

The world is no longer linear. We can't make plans in stone because the rules keep changing. Geller and Johnson (1990) describe it "...like walking through a maze whose walls rearrange themselves." An implementation becomes a moving target (Pressman & Wildavsky, 1983). We have to know how to size up our situation based on previous constructs in order to devise improvised solutions. Double loop learning teaches the individual or organization how to reflect, assess and redesign. It's an evolution of continuous change through learning, with assessment a key component. (Pressman & Wildavsky, 1983). All discoveries pertaining to the best use of knowledge, how to adapt it, and how to manage one's own learning must be shared by all stakeholders, not just the teachers.
We may not be ready to totally abandon standardized tests in all communities. SAT scores are still the one most important test determining college admission. When colleges start demanding portfolios, as the state of Vermont does for new teacher certification, high schools will also. It may still be valuable to know how well a child reads, but should it matter whether a child knows what a skiff is?
People perform better when their goal is clear, and they have a standard with which to compare their own performance. Students should be involved in the process of creating their own rubrics, assessing their needs, and determining their progression. The best way to learn is to teach, and the best way to learn to assess is to learn to assess oneself.

Examining the issue of what constitutes high quality work through different perspectives seems to create paradoxical interpretations. But learning to see through other lenses adds more dimensions to an issue and gives us a much better chance of hitting our target. Paradoxes in education are not isolated, but all interconnected as spokes of a wheel pulling and pushing. Democracy is a play of discussing potential disasters and contradictions: of freedom contradicted by law, loyalty by dissent, private conscience by public representation (Hampten-Turner, 1981). Conflicts both divide and unite people (Schattschneider, 1960). These forces are not contradictions but balances. They give us coherence, engage us and energize us. They prevent us from moving to quickly and upsetting the cart, but they require patience, communication and coordination to forge ahead.

References