Scott | Testing Speaking

進修、研究與著作 / 著作 / 教學相關著作 / Testing Speaking

Testing Speaking in an EFL Classroom

Introduction

The assessment of the spoken language has been a headache for English teachers. Thus, speaking has traditionally been ignored in a test. Even many well-established tests do not have an oral component, among which TOEFL is the most obvious example. As an English teacher, I have been frustrated by this problem as well. During my four-year teaching experience in Taiwan, testing speaking has been the weakest part in classroom assessment, which also results in both teachers’ and students’ ignorance of speaking skill. However, it is a trend in language teaching that communication is holding a pre-eminent place. More and more teachers, including me, wish to make their students to be able to communicate information naturally and effectively through developing speaking skill. To achieve this, a teacher may wish to assess his students to realize how they perform and progress and also to investigate the weak points of their performance, which may require more attention. Accordingly, the assessment of spoken production becomes a necessary issue.

The Characteristics of a Good Test

Before discussing the issue of testing speaking in an EFL classroom, it is meaningful to go over the characteristics of a good test first so that the qualities can be included in later discussion. According to David P. Harris in his book “Testing English as a second language”, all good tests should possess tree qualities: validity, reliability, and practicality. That is to say, whenever a teacher is conducting a test, he should take these three elements in to consideration and apply them to the test. Actually, the qualities of a test are affected by a number of factors, and when it comes to testing speaking, the problems seem to be especially awkward.

Types of Oral Production Tests and the Problems

1. Paper-and-Pencil Tests

Although it is indicated above that speaking has usually been ignored in traditional tests, some oral components can be found in some tests, such as paper-and pencil tests of pronunciation, word stress, and sentence intonation. It is admitted that this kind of test is convenient and can be perfectly accurately scored; but, unfortunately, it seems artificial and impractical in terms of daily communication. Communication is always set in context. This type of sound test tends to focus only on discrete elements in the language. How students communicate thus cannot be assessed.

2. Unstructured Interviews

Another commonly used type of test is the rated interview, which is the longest adopted testing technique. In a rated interview, one or more raters interview each candidate separately for a fixed period of time and assess his speaking competence. A scale with scoring criteria is typically used as a rating sheet. Generally speaking, it is easy to place a candidate to a certain level based on a general standard of communication competence. Unfortunately, when the focus is put on the characteristics of a good test, low reliability appears to be its great weakness. As Harris said, “The great weakness of oral ratings is their tendency to have rather low reliability.” There are some problems mentioned in his book: (1) no two interviews are conducted in exactly the same way, even be the same interviewer. (2) no rater can maintain exactly the same rating standard throughout all the interviews, especially when there are a large number of candidates. (3) when all the candidates are not assessed be the same rater, the reliability will be even lower. In short, “the most general feature of this type of assessment is that it is carried out as the student speaks and is dependent on the examiners maintaining a constant set of assessment criteria from the first student interviewed to the fifty-first, and beyond.” It does prove to be useful in including real communication into assessment; however, there are some deficiencies which require careful attention.

3. Structured Speech Samples

Generally speaking, these tests are designed to minimize the weaknesses of the unstructured interviews we have discussed above. According to Harris, the following item types are included in this category:

1. Sentence repetition

2. Reading passage

3. Sentence conversion

4. Sentence construction

5. Response to pictorial stimuli

However, in addition to these types, there are some more which should be added to the list. Let us look at the following test formats proposed by Cyril Weir.

1. Indirect

Sentence repetition

Mini-situation on tape

Information transfer: narrative on a series of pictures

2. Indirect Student with Student

Information gap exercise

3. Interaction Student with Examiner or Interlocutor

The free interview/conversation

The controlled interview

Role play

Information gap

From the above formats, “information gap exercise” and “role play” can be also categorized into Harris’ “structured speech samples”. Also, the controlled interview is specially designed to minimize the unstructured interview which was discussed above. It can be categorized in this type as well.

Structured testing types do improve the possibility to offer more reliability to assessment because the scoring procedure is based on a more objective standard. As Harris said, “Oral-production tests comprising the above, or similar, types of highly structured speech tasks offer considerable promise as replacements for the unstructured interview, for they greatly increase both test and scorer consistency.” However, he reminds that the scoring procedure is still conducted by human beings, and only if the testers are well trained can high reliability be carried out. This means teachers, as testers, have the responsibility to well prepare themselves for the rating procedure. Moreover, compared to the paper-and-pencil tests and the unstructured interviews, such structured tests require more time and care to prepare. This may be an extra burden to the testers.

The Problems I Encountered When Testing Speaking

During my past teaching years, the most often adopted format is the paper-and-pencil tests of pronunciation, which are also considered the most convenient and acceptable by school teachers. However, as what has been discussed previously, the lack of communicativeness is its greatest deficiency. When getting used to this kind of test, students tend to place emphasis only on discrete linguistic elements, such as pronunciation difference between words, word stress, or sentence intonation. It is often found that students who receive high scores in a test don’t perform the same oral competence as expected. In this situation, testing speaking is more like testing the language knowledge. This is a serious problem in the English education system in Taiwan. Since I understood the problem, I sometimes changed the assessment strategy. Below are some other formats that I have adopted.

1. Reading passages

2. Describing visual stimuli

3. Role-playing conversations between two students

Among these tasks, reading passages is obviously the easiest to prepare for the examiner. Description often takes a great deal of time in preparation for the stimuli. Role play is easy to prepare on the teacher’s side, but difficult to prepare for on the students’ side. The reason is students’ limited oral proficiency; that is, language competence. Even though my students have learned English for more than three years, most of them have only little vocabulary, weak grammar concepts, and plenty of pidgin components. Some students tend to lose confidence during the process of preparing for this kind of tasks because of fearing making mistakes. Moreover, the performance also depends on students’ personal characteristics. It is understandable and predictable that some introverted students have difficulty expressing themselves, especially when the assessment is conducted in class. For several times, some particular students could not even say a word in front of the whole class. This also happened in the description tasks. In addition to students’ language competence and personality, there are two overall problems. The first one is the control of classroom order. Since it was impossible to find suitable time for oral assessment for all the classes, I always integrated the assessment activities into the class time. When students took turns performing, the classroom order was easily out of control. The second problem is lack of reliability if considered with the characteristics of a good test. Since I was both the only designer of the scoring sheet and also the only examiner, the degree of reliability was doubted.

Discussion

I agree with Harris that highly structured speech tests “greatly increase both test and scorer consistency. Moreover, in my situation, this kind of tests function better when applied to my students. The reason is they are given more guidelines and information during the tests. With limited language proficiency, it is easier for them to use given information than to create everything by themselves. The above three test formats that I applied to them can all be categorized as highly structured speech tests. Though there are problems with them, my experience does prove them to be useful in assessing speaking, at least in my teaching context.

Adjusted Solutions

Below are the adjusted solutions combing both others’ and my own opinions:

1. Students’ language proficiency limit

This problem can be minimized by simplifying the task material by making it closely match the teaching points and class activities in the course, so that students have easy access to the required vocabulary. This is justified by S. Kathleen, Kitao and Kenji Kitao’s suggestion:

“it is necessary that they be prepared for that kind of test, that is, that the course material cover the sorts of tasks they are being asked to perform.”

Also, adopting the information gap test can help testees mitigate vocabulary problem because it can be designed with required vocabulary and information on the test sheet.

2. Different personal characteristics

This is a more awkward problem. Extra-linguistic factors should be taken into account, such as personality (shy, talkative, etc.), nervousness, lack of self-confidence, etc., but they should not interfere with the assessment. In my opinion, separating the testees from the other students may be an alternative to ease the fear of speaking in public, but that will make the control of classroom order more difficult.

3. Classroom order

What strategies can be adopted to improve the classroom discipline? First, let us look at David Carless’ suggestions when he discusses the discipline in task-based learning. He suggests that the teacher’s expectations of students’ behavior in class should be indicated clearly both before and when an activity is carried out. This can be adopted to mitigate indiscipline in the “task-based testing”. However, I suggest that two more points be added. The first, which seems more passive, is to include discipline performance into rating criteria. That is to say, students’ behavior when listening to others’ performance is scored. This will reinforce students’ awareness of what is expected of them. The second, a more constructive one, is to have students fill in the performance check forms for all other students as the presentation proceeds. This idea is extended from David Progosh’s “Presentation Check Activity”, which I think can be adopted to classroom order control.

4. Lack of reliability

Generally speaking, this problem can be mitigated by using more raters and carefully designing the scoring criteria. However, in a general classroom test, it is less possible for several teachers to work together, so establishing a set of appropriate criteria seems more practicable. Here, an essential factor should be noticed when a teacher is establishing assessment criteria. That is “the relationship between a task and the criteria that can be applied to its product” when “taking decisions on what to include in a test of spoken or written production. This means consideration needs to be taken as to the relevance of the task and the rating criteria. Thus, there may be sufficient suggested scoring criteria, but there is not an ideal scoring sheet which suits all tests. The decision on what to include entirely falls on the tester. Therefore, in a classroom, it is the teacher’s responsibility to take this into consideration.

Conclusion

It is obvious that each assessment method has its drawbacks. It seems impossible to work out a perfect way of testing speaking. Although I found structured speech tests more practicable for my students, they are not necessarily suitable for others, not to mention the problems I encountered. Nevertheless, a teacher needs to recognize that efforts could be taken to minimize the difficulties. It is possible for each teacher to find out a most appropriate, though not perfect, way of assessment for their students. Testing speaking has its own backwash effect. If students can be stimulated to speak more, this can only have a positive effect on their language learning.

消息公佈欄