We are pleased to announce the confirmed invited speakers for the 2006 Language Testing and Evaluation Forum:
This presentation will provide an introduction to the concept of washback and some of the research that has been conducted to date. First we will look at possible relationships between teaching and testing, and consider how washback can generate a situation of two possible competing goals. We will define washback and some other key terms before examining the characteristics of positive and negative washback. We will then discuss three components of washback (the participants, the processes, and the products). Traditional washback theory will be updated with a view of some recent research on washback. The talk will conclude with suggestions for promoting positive washback.To Top
Too often, teachers see testing as divorced from daily classroom activities designed to foster student learning. It is viewed as a necessary—or at least customary—evil whose principal purpose is to give students grades or to determine if they pass or fail a course. While many teachers take a broader view of assessment, not as something distinct from the teaching/learning process but as an intrinsic component of it, they may have had little explicit guidance on how to design effective assessment procedures for their students, especially procedures for assessing productive language skills. Often their teacher-training programs either gave only a broad-brush view of testing or focused principally on large-scale, high-stakes assessment. This paper discusses the design of writing assessment tools in pedagogical contexts.
For teachers as for professional language testers, the process of designing effective assessment procedures should begin with careful consideration of what they want to know about students’ abilities and what use they will make of that information. This informs the design of a writing task and an associated feedback mechanism (such as a rating scale or a checksheet) to accomplish that pedagogical goal. In this presentation, I will discuss design principles for both the task and the feedback system of assessment tools, taking into account specific challenges related to assessing productive language skills. These principles will be illustrated with reference to both formative assessments, such as self-assessments and peer reviews, and to summative assessments, such as course final exams and placement tests.To Top
Use of discourse features in spoken language depends on the context in which the language is used and varies according to whether the situation is transactional or interactional in nature (Richards, 1989). Conversational context minimally includes role and status of the participants, the topic or situation, and the purpose of the interaction (Hymes, 1974). Within a given context, the NNS’s control over discourse features of oral language makes a significant difference in his or her intelligibility and communicative success. Practically, information about NNS’s level of intelligibility in spoken English in specific contexts is becoming more important as the world becomes more globalized. In addition to English for academic purposes, fields that depend on the results of oral assessment of NNSs for hiring or professional advancement include medicine (NNS doctors and nurses); education (International Teaching Assistants and NNS lecturers), business (out-sourcing telemarketing services), travel and tourism (all levels of personnel), etc. Developing oral assessment tasks and scoring rubrics that are practical (considering testing/scoring time and expense), reliable and valid is a major goal for test developers today. Context-based testing, in which the scope of spoken production is limited to well defined contexts, may result in more better and more practical tests of oral production (Douglas, 1997).
This paper will present a case for developing more contextually focused assessment tools for testing discourse features of spoken production by outlining the predominant discourse features in different contexts, reviewing several tests of oral production for evidence of assessment of discourse features and their contexts, and reporting on a study in progress that looks at student attitudes and shifting performance outcomes during oral production testing in different contexts.To Top
Large-scale language testing requires the collection of the maximum amount of evidence in the least possible time, in order to make important decisions about individuals. The types of information and the methods of collection need to be maximally efficient in order to keep the costs down for test takers, while ensuring that the quality is high enough for fair decisions to be made, as these may affect the lives of the individuals concerned. The processes and tools used to achieve efficiency in large-scale testing have been evolving for over a century.
As valuable as these processes and tools are, their value to classroom assessment is less clear. Writers on language testing and assessment continue to assume that classroom teachers should follow practices from large-scale testing in their daily work. However, these practices may not be relevant to the needs of learners, or useful to the pedagogic objectives of teachers.
This presentation will look at some of the assumptions, processes and practices of large-scale language testing and contrast these with the work of the teacher in the classroom. The frequent disjunction between the two contexts will be examined, and alternative solutions suggested.To Top
The purpose of this paper is to review current reading research and its implications for language assessment. The first part of the paper will outline key features of reading ability. Following comprehension theories developed by Kintsch (1998) and Perfetti (2003), reading is described as a restricted interactive process.
The second part of the paper will focus on the importance of specific components of reading comprehension. These component abilities include word recognition fluency, vocabulary knowledge, syntactic knowledge, main idea comprehension, reading fluency, discourse structure knowledge, strategic awareness, extensive reading exposure, motivation, and background knowledge. Key research will be reviewed to show how these component skills and knowledge bases support the development of reading. These component abilities will also be used to examine differences between L1 and L2 reading.
In the final part of the paper, the research on component skills will be a resource for a discussion of L2 reading assessment. At issue is the way that the construct of reading informs assessment practices. After a review of effective assessment practices, the paper will conclude with issues for assessment and possible future directions.To Top
This paper reports on a small-scale exploratory study to evaluate the effectiveness of explicit instruction of reading strategies to adult Greek learners of English participating in an examination preparation course for the University of Michigan Examination for the Certificate of Competency in English (ECCE) at the Hellenic American Union, Athens. A corollary of this research is to explore the development of a model of reading strategy instruction that could be integrated in pre- and in - service teacher education programs applicable to a wider context.
A recent review of research in language learning strategies highlights the need for research in specific language contexts in order to realize its potential for enhancing second language acquisition and instruction (Chamot, 2005). The context of this particular study is unique, taking into consideration the extent to which international language examinations are used in Greece in comparison to other European countries and countries world wide. Insights into effective reading strategies for specific test tasks in the ECCE may have a wider impact enabling students to read efficiently not only in examination conditions but also in the real world. As Alderson (2000) points out, assessment tasks may have potential for promoting learning.To Top
Increasingly, language tests in Europe are being linked to the Common European Framework of Reference (CEFR) developed by the Council of Europe. The framework distinguishes levels of language competence in terms of descriptors or can-do statements. In the area of assessment, the idea is that this will help learners, teachers, testers, employers and publishers of educational materials, among others, to have an idea how a candidate is actually able to perform when he or she has passed a particular examination.
The training for language certificates would ideally focus on the relevant skills and competences required for passing the examination, although in practice the training may be much more focussed on the required performance on specific tasks of a specific format in a specific examination. Through the links with the CEFR, however, students and teachers may be shown what one can do, in generally understood terms of language behaviour, when one has passed an examination.
Although the CEFR is based on descriptions of language behaviour, until recently stakeholders sometimes had difficulty in understanding what exactly was meant by a particular descriptor or what the difference was between a descriptor at one level and that at higher level. Descriptors have sometimes been phrased in linguistic jargon not accessible to learners. Also, descriptors may differ in the detail necessary to understand how well a language user has to perform to be at a particular level.
There have been various initiatives to try and illustrate the descriptors in the CEFR. The “Dutch Grid” project has produced a framework to specify texts and tasks for reading and listening. A European item bank of reading and listening items is now being developed for items that have been classified on the basis of the Dutch Grid.
In my presentation I should like to focus on international activities to benchmark speaking and writing performances based on the descriptors in the CEFR. First there is the issue of the level and content of the task: we need to be sure that the task taps the skill expressed in the descriptor at a given level. Then we need to gather student performances that will illustrate the descriptor. We need to benchmark these performances: experts need to agree that a particular performance is at the intended level and that other performances on the same task are below or possibly above this level.
With benchmarked performances of speaking and writing we can help teachers, students and other stakeholders to understand how they will need to perform not just in the context of the examination but in everyday life.To Top
Three decades of teaching and testing in Greece has taught me (at least) that testing is the engine that drives forward the whole ELT process, at least in the private sector, perhaps less so and in a different form in the state sector. Opportunities for learning are framed and circumscribed by the overt and covert target of passing examinations and collecting paper qualifications. Apart from any putative benefits of extrinsic motivation, the Great Greek Paper Chase has, on the whole, a negative backwash effect on classroom practice. Methodologies, communicative, task-based or learner-based to any degree must pass through the filter of negative backwash and reach the student, if at all, in a truncated, impoverished form.
This session details the difference between a ‘testing’ mode of teaching EFL in Greece and a ‘teaching’ mode. Teachers, it will be demonstrated, engage in both overt and covert testing practices; I focus on covert testing, which can be defined as the often unconscious adoption of ‘testing’ procedures in activities which would be normally be classified ‘teaching/learning’ activities. ‘Covert testing’ is a realisation of negative backwash in action in the classroom and identifying it is the first step towards transforming it into positive backwash.
The session welcomes trends which may potentially reverse this state of affairs (e.g. the European Portfolio and more culture-sensitive local examinations) and suggests techniques which can transform testing procedures, task-types and specifications into opportunities for learning and at the same time provide an opportunity for furthering teacher development and enhancing ELT methodology in the Greek ELT scene.To Top
The concept of washback is now widely accepted as the influence of tests on teaching and learning. What is less clear is the nature of test impact more generally. This paper proposes a model for investigating the impact of language assessment within national educational contexts. It sees the investigation of impact not as a discrete or one-off activity, but as an essential component in establishing the overall validity of an assessment system in terms of its fitness for specific purposes and contexts of use. The proposed model locates the study of test impact as one of a set of research and development tools within an iterative approach to validation.
A number of impact studies carried out by the speaker over the past 10 years have informed the proposed model and these will be briefly summarized. The first is a world-wide survey of the impact of an international English language testing system (IELTS). The second is a study of the impact of language tests within a national reform project in Europe. The third, focusing on washback as one aspect of impact, looks at a specific teaching/learning environment within a single language school in Italy. The case study data from each of the contexts have been used as meta-data and the analysis has led to a more comprehensive model of impact for application in other educational contexts.
The final part of the talk looks at current impact research based on the emerging model, and in particular its use in the context of Asset Languages, a British government-funded assessment project covering 26 languages. Will this implementation of the Languages Ladder recognition system deliver the intended impacts as a key element of the UK’s National Languages Strategy?To Top
Teacher supervision and assessment are inextricably linked at many levels. What makes teacher supervision and assessment so challenging is, in part, due to the fact that teachers typically react defensively and hostilely toward both activities (and sometimes the individuals responsible for them). These adversarial attitudes stem from traditional supervisor-supervisee relationships and the unsystematic and subjective nature of traditional classroom visits, which are often unannounced, supervisor-centered, directive, and judgmental. Teacher supervision and assessment, however, need not be viewed so negatively.
The goal of teacher trainers, teacher supervisors, and language program administrators should be to turn these negative attitudes around with models of supervision that lend themselves to more productive supervisor/supervisee interactions and to more positive outcomes in the form of professional development and improved instruction. In most instructional settings, traditional attitudes toward supervision and assessment can be reversed by adopting approaches that are more interactive than directive, more democratic than authoritarian, more teacher-centered than supervisor-centered, more concrete than vague, more objective than subjective, and more focused than unsystematic. In this presentation, I will outline a number of approaches to teacher supervision and assessment that break the traditional mode. Emphasis will be placed on “clinical supervision” (Acheson & Gall, 1995), an approach that can radically change the dynamics of teacher supervision and assessment. By means of this approach, we can provide objective feedback on instruction, diagnose and solve instructional problems, assist teachers in developing strategies to promote more effective instruction, and help teachers develop a more positive attitude toward continuous professional development.To Top
It has long been noted that high-stakes language exams exert a powerful influence on language learners and teachers, a phenomenon known within the language testing literature as the ‘washback effect’ (Alderson and Wall, 1993). However, despite the plethora of assertions that the introduction of high-stakes language examinations into the Greek educational system created a shift towards exam-oriented syllabuses and methodologies, little empirical research has been carried out in the present context that can show what is actually happening under their influence.
The study that I am going to present is aimed at investigating the washback effect of a high-stakes language test (the First Certificate in English - Cambridge ESOL) on the teaching and learning that takes place in Greek private language schools (also known as ‘frontistiria’) based on the analysis of teachers’ interviews, exam-based textbooks and students’ diaries.
During the presentation I will report the results of each of round of data and demonstrate how the mechanism of washback of high-stakes language tests operates in the present context. The discussion will round off with recommendations on how to promote positive washback and suggestions for future researchers in the area.To Top
Nigel Downey, Anne Nebel
Center for Applied Linguistics and Language Studies (CALLS)
Hellenic American Union
The CEFR description of overall listening comprehension at C1 level includes the statement, “Can recognize a wide range of idiomatic expressions and colloquialisms.” Recent studies highlight an important role for colloquialisms, idiomatic expressions and other formulaic sequences (FS) in fluency, facilitation of language processing and promotion of self (Wray, 2002) and call for greater attention to input and increased interaction toward the acquisition of a repertoire of formulaic language by language learners (Wood, 2002). Despite the acknowledged importance of FS in fluent language use and an acceptance of its place in CEFR, review of the existing literature points to a need for further discussion/research in three important areas: 1) the use of FS in English as an International Language; 2) the role of FS in language testing; and 3) washback of FS in language instruction.
This work-in-progress presentation is part of a larger project of mapping the listening section of one language test to the CEFR. In this presentation the presenters will outline their investigation into the use, testing and washback of FS, report on findings and outline future areas of inquiry.To Top