Humanising Testing

Marina Marinova, Germany

Marina Marinova was born in 1966 in Varna, Bulgaria. She has lived in and around Hamburg, Germany, for over 25 years. A graduate of the Sofia University “Kliment Ochridski” and of the Hamburg University in the fields of English Studies, Ancient Indian Studies and Modern Indian Studies. Marina has been an English teacher and teacher trainer for over 20 years. She has worked and is working for various institutes and organisations, among others Vocational Schools, the Chamber of Commerce and Industries in Lüneburg, Germany, The Hamburg Employment and Professional Development Agency, Pilgrims, Canterbury, and the Hamburg University. She is currently involved in a range of research projects on simultaneous bilingualism, multilingualism, and third language acquisition, with a particular focus on big Indian cities. She is a HELTA member and has been a presenter and an invited speaker for a number of English Teacher Associations throughout Europe.

What is a Humanistic Approach?

Sigmund Freud’s theory of psychoanalysis and A.F. Skinner’s behaviourism revolutionised the practice of psychoanalysis and psychotherapy in their time. Different, as they may be in many aspects, these two cornerstones in the development of modern psychology have something in common: a certain disregard, up to full negation of individuality and free will.

The Humanistic Approach, which emerged in the middle of the 20th century in psychology, came to rectify this. It places the individual human in the focus of research and therapy, as a unique being with unique past, problems, and potential. It strives to avoid overgeneralisation and statistical evidence and focus upon the human as a whole, which is far more complex than individual parts, be it dreams, family relationships or behavioural patterns.

The Humanistic Approach in psychology has a long history, rich in research and publications, beginning with Abraham Maslow and Carl Rodgers.

As they became more popular, principles of the Humanistic Approach were incorporated into a number of cognitive approaches to teaching, first of all, language teaching.

There was a precedent for this. Skinner’s behaviourism theory builds the scientific foundation of the audio-lingual approach.

Thus, one may argue, that we cannot speak of a humanistic approach to language teaching, but rather of approaches, each of them having its own scientific foundation and research history, each being backed by its own empirical evidence.

A “measuring unit” for their association with the humanistic approach is the level to which they focus on students and their needs, rather than on imparting “objective” knowledge.

“Teach students, not subjects”, has been the motto of Pilgrim’s founders and a leading working principle of all teacher trainers. “Broken down” to day-by-day implementation this means:

Look into what your students need, rather than what a course plan requires.
Be prepared to change and adapt your lesson, responding to students’ immediate needs.
Course books were written for a certain “students’ average”, not your students. Change and adapt them, to accommodate your class, discard if this is impossible.
Creating a positive environment for learning is more valuable than “teaching the right facts”.
Respect your students. Their dignity is independent of their knowledge and abilities in the subject you teach.
Students’ out-of-the-classroom lives and problems need to be taken into consideration. Just by entering your classroom, they don’t turn into language learning machines with no other emotions or worries.
Learning can and should be enjoyable and motivational. Quality of learning is measured by real-life performance, not by effort and long learning hours.
Create chances for students to show and do their best.
Focus on students’ success, not on their failures. Positive reinforcement is a tool to encourage learning, punishment is not.

Can testing agree with the principles of a humanistic approach to teaching?

Your spontaneous, un-reflected answer to this question may well be “no”. There are certain qualities of tests that seem irreconcilable with a humanistic approach to teaching.

Tests focus on a certain level of knowledge or skills, expected as a minimum/average at a certain point, not on individual progress.
Failing to present proof for that knowledge/skills is penalised.
The reasons for the failure are of little to no consequence. Whether it’s a real lack of knowledge, or a momentary lapse of concentration due to, for example, toothache, failing to comply with requirements means failing a test or at least a task.
Tests are created in advance, they cannot be changed or adapted to respond to students’ immediate state of being.
Tests, more often than not, a stressful experience. This is particularly the case with high-stake tests, where underperformance may have long term, up to life-long negative consequences.

Why bother then?

There is, indeed, a certain tradition of “test-damnation” among propagators of a humanistic approach and there are very good, valid reasons for that.

On the other hand, teaching reality in most educational systems dictates the necessity to deal with testing and assessment. Turning our backs to it won’t be of service to our students, who will be expected to get a certain grade or pass their finals. Telling them how counter-productive testing is to real knowledge will be of little help if the university they want to study at requires a B2 Certificate for language knowledge for enrolment.

You may argue that good quality teaching should be enough to enable students to pass such tests with flying colours. While good teaching is, of course, a basic requirement, I argue that it may not be sufficient. You cannot have good testing with bad teaching, but you can have good teaching with bad testing.

Does this mean that we stop being humanistic teachers on test days?

I would like to outline the main reasons why the answer to this question can be “no”.

Once I accept the necessity to deal with tests and refuse to betray the principles of humanistic teaching, I have one option: looking for ways to humanise testing.

A helpful way to do this is to examine the “official” requirement of good testing and look for “common denominators” between these requirements and the humanistic principles.

I have found a range of such common denominators, especially in the context of the “holy trinity of a good test”, validity, reliability and positive backwash.

Validity

The statement “A valid test is a humanistic test” may seem somewhat far-fetched. With a slight shift of focus, we can bring a large portion of truth in it.

Current “testing gurus” seem to have entered a “coming up with new validities“ competition. Moreover, there’s a certain disagreement on the definition of each validity.

I call this phenomenon in science the “potato salad syndrome”. If you live in Germany, you will soon find out that every household has its own recipe for a potato salad and it’s always “the best and the ultimate one.” If you are not as passionate about potato salad, you will easily detect common ingredients (most prominently boiled potatoes) and realise that the differences (fried onion rings or finely chopped shallots) aren’t nearly as important about the final result.

Following this approach, I am going to examine the central types of validity and their importance for a good = humanistic test.

Content validity

Broadly defined, content validity states “Only test what you have taught” or “Don’t test what you haven’t taught”.

When discussing the topic with teachers, who write their own test, a universal reaction is, “But that’s self-evident.” Responses become considerably more differentiated when we compare definitions of “taught”. These begin with “present, do a few exercises and homework”. In extreme cases, teachers have admitted to quickly teach a language phenomenon (usually tense), because it had to be tested the next lesson! This is often the case with test-driven educational systems, with rigid “when to test what” schemes. Clearly, there is no way these can be reconciled with the principles of humanistic teaching. We need to be equally clear, however, that such tests violate the claim of content validity.

A humanistic definition of content validity, therefore, would be, “Only test what students have had enough opportunities and sufficient time to learn.” There is a pitfall here, teachers are all too willing to fall into. When writing a test, have you caught yourself thinking, “I’m not going to test this, they all know it”? A humanistic approach requires us to look into our own motivation. Try to answer the following questions as honestly as possible:

Are you testing to give students an opportunity to reliably demonstrate what they have learned or to “catch out” and penalise them for what they haven’t (yet) learned?
Do you feel the test was too easy if the majority of your class did well?
What is your initial reaction, when the majority did poorly, “My test was unfair.” or “That’ll teach them, they should’ve worked harder.”?

“So we can only write easy-peasy tests then?” This is a question defiant teachers will often ask.

Obviously, the skill and knowledge we test should correspond to the general test-takers’ level. Testing a B2 class closed questions in the present simple will probably produce 100% top results unless students lose confidence and start looking for possible ‘traps’ in a test that seems too easy. Such a test would be unfair, as it will be a severe underestimation of students’ abilities and is equally unsuitable as an opportunity to demonstrate them.

In general, however, I have found that teachers are more likely to err on the “too difficult” side when writing tests.

Format validity

The principal claim of this validity is “Use only task types students are familiar with.” Again, we need to examine the notion of “being familiar with”. In the context of humanistic teaching, Format validity means, “Use only task types students are comfortable with. Provide students with reliable task-solving strategies. Make sure they feel comfortable with a task type if you must use it. Don’t assume they’ll work it out on their own.”

The expectation to transfer skills is a major pitfall and the most common violation of this validity. Here are a few examples of such violation from my own practice as teacher and teacher trainer:

After reading a text about an extreme holiday experience and “covering” new vocabulary, students (A2.1) focus on grammar. They practice writing sentences in the past simple, related to their own experiences. In the test, students are asked to write a story about their last holiday.
Why is this a violation of format validity? Writing a story requires a set of skills which goes well beyond producing grammatically correct sentences in the past simple. Being exposed to one text is insufficient for students to acquire these skills, even if proper discourse analysis has been done. In this particular case, the teacher focused on grammar. She expected students to notice discourse features, learn markers and linkers as ‘new words’, and ‘put them together’ to produce an adequate test. When discussing the matter, the teacher justified the test saying, “But they have already written many such texts in German. They know how it goes. Besides, I have no time for writing long texts in the classroom.”
While this is often the sad truth, we cannot assume, that our students will transfer skills from their L1 into the target language. If there is no time for producing a single text in the classroom and giving adequate feedback, we should reconsider using this task type in a test. In any case, we need to at least make students aware of this specific task requirements, and provide them with workable strategies. One such strategy is providing a checklist for editing and training students to edit their own text. You will find numerous such checklists on the Internet to use and adapt, or create your own one. Here is an example of a checklist that would have been of great help to that particular class:

Is my text grouped into paragraphs?

Are my paragraphs visibly divided from each other?

Are my paragraphs in a logical order?

Is there any unnecessary information?

Is any necessary information missing?

Is there a lot of repetition? Can I change this and be certain that my text is still correct?

Is there too much repetition of linkers like and, but, then etc.? Can I change this and be certain that my text is still correct?

Have I used articles (the, a, an) correctly?

Have I used the correct verb? What about regular and irregular verbs.

Are my questions and negative sentences correct? Have I used auxiliary verbs where necessary?

Is the punctuation correct?

Have I spelt all words correctly?

An additional advantage of such a checklist is that it will help students focus upon aspects of language that are particularly important for successfully passing the test. (See Face Validity)

< >Students have explored the differences between “will” and “going to” at some length. They have been exposed to texts with the two forms, been encouraged to notice them and analyse use, and have produced sentences/texts of their own. The test is a set of MSQs, where students have to tick the right form. The choices for each question are: present simple, present continuous, will, and going to. They have never practiced MSQs in an English class.Students listen to a 1 to 3 minutes recording twice. After that, they answer a range of MCQs.A young teacher in a bilingual class in Germany prepared a test for her graduate students. The result comprised 20% of their final grade, the stakes were extremely high.
The test was so difficult, that the highest grade was a C, with more than 30% of the class failing. Confronted with the bad results, the teacher argued, that these were her criteria and her expectations.The centralised final exam in Germany (Abitur) produced entirely different results. Most students actually had top scores. The first test was still reflected in their final grade and created the false impression, that these students’ English is worse than that of students from other classes and other schools. Grades are important when applying to universities or for jobs. Young people cannot say, “Yes, my grade is worse than XYZ’s, but my English is, in fact, much better. I just had a much more demanding teacher.”

This is a genuine example that actually resulted in a law process. Independent experts declared the test unjustifiably difficult and the teacher’s assessment “hostile”. Nevertheless, it marred what should have been a happy moment in these young peoples’ lives – their graduation.

However knowledgeable the teacher may have been (and she produced some very impressive university grades), her tests consistently violated cognitive validity. An important reason for this was focusing upon her own demands and expectations, instead of her students’. Particularly with high-stake tests, I find looking into the “larger picture” of how the results will affect our students’ lives of vital importance. Of course, this doesn’t mean we should be unduly lenient, but it may be useful to consider what is generally expected by students of this particular age/under these particular circumstances and adjust our own assessment strategies accordingly.

International exams and the CEFR can offer a good starting point when developing our own criteria.

Reliability and assessment criteria

Simply put, the assessment of a reliable test will produce the same result if.

The same person assesses it under different circumstances (morning, evening, fresh, tired …)
Two people assess the same paper.

The second scenario is easy to check. In fact, having more than one assessor is a common practice with high-stake international exams. In everyday practice, however, one teacher with write a test, administer it, and assess it. Ensuring consistency and reliability can be challenging.

The best tool we can use is a comprehensive set of criteria. In fact, some theories equal reliability with criteria.

Designing criteria can be a daunting enterprise. Once again, international exams and CEFR can be of great help.

As a humanistic teacher, I will make sure my students are familiar with these criteria and with my assessment strategies. I will pay attention to demands and expectation they feel uncomfortable with and do my best provide sufficient practice/tools to deal with problems. Why not allow students to question criteria? This gives us a valuable opportunity to explain their importance (or to question them ourselves). Why not experiment with criteria? Have your students write an essay, where spelling mistakes will not be penalised or where they can use dictionaries/notes/media for reference. Have them agree on a criterion that will not be valid for a particular test and a criterion that will be of double importance. Their choice may give you valuable information about their strengths and insecurities.

Reducing the stress to get everything right off the top of their heads may produce some unexpected positive results.

As mentioned earlier, students can use such criteria for self-checking and editing their work.

Why is this important? Tests are those moments in a classroom when students feel (and are) entirely at our mercy. This can lead to resentment and rejection of the results, even complete loss of motivation. After all, it is much easier and more tempting to contribute bad results to “the teacher doesn’t like me” than to accept one’s own shortcomings, or the need to work harder. Particularly with teachers, who make conscious efforts to be as objective and as reliable as possible, this can cause equal resentment.

Of course, we must never violate the principle of defining the assessment criteria before writing the test itself and not change them, particularly not after the test has been administered.

Backwash

Backwash/washback is the effect a test has on future learning. This is already a humanistic concept, as it focuses upon students’ feelings and motivation.

A reliable test with clear and consistent assessment strategies has a good chance to result in positive backwash. This is even more the case when students are acquainted with the assessment criteria and accept them.

In reality, we can assess a test’s backwash only retrospectively. Our best intentions may end up in a disaster, a test we didn’t put that much effort into may cause a surge in a student’s/students’ motivation.

Reliable, objective, and consistent assessment is often seen as the holy grail of positive backwash, but it may be quite discouraging for a student if he/she worked hard for a test (subjectively perceived), know your assessment is objective, and still have worse results than hoped for. A possible way to prevent this (not a guarantee), is to give feedback, focusing on a student’s progress. Of course, we need to be aware of this progress ourselves, if we want to remain credible.

Even worse effect on washback has, in my experience, the deliberate “marking down” to make sure students “won’t get complacent”, “will know they have still a lot to learn”, or “will be kept on their toes”. Good, as a teacher’s intentions may be, failing to see and recognise students’ efforts and achievements is unlikely to result in enhanced motivation. “Generous marking” at final tests may come as a nice surprise, but won’t make up for the previous disappointment.

Facility value

Calculating facility value (FV) is a good way to assess our own tasks in terms of difficulty. The principle is quite simple: you divide the number of correct solutions by the total number of students.

For example: 24 out of 32 students managed a task. FV = 24:32 = 0.75

A good FV for a low-stake test is expected to range between 0.75 and 0.25, for a high-stake test – between 1 and 0.5, as students are likely to prepare more for those.

We can examine a task in further detail if we divide the class into two groups according to general performance and calculate FV for each group. (FV1 and FV2)

The following outcomes are possible:

FV1 > FV2, both within the range for the type of test. The task was well chosen and appropriate for the group in terms of difficulty.
FV1 > FV2, but FV2 lies below the lowest score, expected for the type of test (below 0.25 for a low-stake test, or 0.5 for a high-stake test). This usually indicates that the class hasn’t had enough time to work on the language tested. It was just enough for the high-performers. As a humanistic teacher, I will give more practicing opportunities and, if possible, administer a new test, invalidating the results of the first one.
FV1 > FV2, both below the lowest score, expected for the type of test. This is a clear indicator that the task was too difficult. Students need more time and practice. Again, if possible, I will give them an opportunity for that and administer a new test.
FV1 > FV2, both above the highest score expected for a low-stake test. The task is obviously too easy. The consequences for future teaching and testing will depend on circumstances.
FV1 < FV2. This unusual result may have two causes:
1. Badly designed task
2. Malpractice

Future actions will depend on circumstances.

How does this relate to humanistic teaching?

Teachers generally find FV a useful tool to assess their own testing. In order to apply it properly, we need to know our students well, and define the groups for FV1 and FV2 as precisely as possible. Relying on previous term’s/previous year’s grades, or on a previous test’s results is tempting, but may be dangerous, as it ignores our students’ current work and progress. Someone, who didn’t do well before, may have found new motivation, a former top-performer may have lost theirs.

Final thoughts

If you work in a context, demanding testing, you know, that it will always be stressful for students, and can never produce “ideal results” in terms of validity, reliability and positive backwash. You also know that good teaching is possible with bad testing, but bad teaching cannot result in good testing. For some, the effort to humanising testing can be seen as an effort to minimise damage.

I believe, however, that we can achieve a considerable improvement in both teaching and testing, once we focus on our students and their success. A test can be an opportunity to outline progress and increase confidence, a tool for positive re-enhancement. In most occasions, it is us, teachers, who make this possible.

Please check the How to be a Teacher Trainer course at Pilgrims website.

Please check the English Course for Teachers and School Staff at Pilgrims website.

Tagged Various Articles