Blog | Readability Tests

By Richard

In the first article on “Readability” we talked about the basic concept. For example:

Everyone wears undies. Some boys call them jocks or underpants. Some girls call them knickers or pants. But I like to call them undies. The thing I look for in a good pair of undies is that they have to be tight. I don’t like loose undies. (Loose Undies, Comedy)

The Flesch test rates “Loose Undies” at Grade 2.5. (A quick note, the Flesch instrument was developed by Dr Rudolf Flesch, of Columbia University, back in the 1970s. It has been repeatedly subjected to research, and overall returns a raw reliability of 93%. So it’s pretty good.)

On Ziptales we rank this story as a “Zip Stage 5” text, suitable for children between Grade 2 and 4. It is therefore a ‘mid level’ text, in our scheme, in a reading band scheme of 10 stages (where Stage 1 is simple three word sentences and Stage 10 is extended texts of up to 5,000 words).

The Flesch test is right when it comes to saying this is easy to read. His famous algorhythm divides words by sentences and syllables by words. Clearly in this sample, the sentences are short and the words themselves are short - out of a total of 47 words only 7 are multisyllabic. Thus the text is “easy” reading, suitable for younger children.

However…

Readability tests are not as smart as humans. Even a brilliant test like the Flesch one, honed over decades and used throughout the world, cannot be right all the time.

I found out the hard way how rubbery even a good readability “engine” like Flesch can be when I was doing preliminary testing for a story we wanted to put up on Ziptales, in the Advanced Library (for older, highly able readers). It’s called Best Friends. Here is the introduction to the story, from the raw original manuscript, straight from the author.

Best Friends sample passage (unaltered)

Katie was queen bee. She was leader of the Year 6 girls. Other kids called them the 'Princesses'. But there was trouble in the kingdom.
It had all started with the sleepover just before Christmas. Katie had been adamant that all her best friends should be there.
After a dinner of pizza, they all settled in for a movie.
‘What would you like to watch?’ asked Katie’s Mum. ‘Narnia? The Princess Diaries? Freaky Friday? Nannie McPhee? Alice in Wonderland?’
‘Oh Mum,’ said Katie, blushing. ‘They’re all so babyish. What about Pretty Woman?’
‘Oh that’s far too adult for you girls…’

(100 words. Raw manuscript: Grade 2.3)

I tested it and couldn’t believe what I saw. It suggested the story had a “readability” of Grade 2.3.

All very well to suggest Year 2 in reading difficulty. But this is not the same as saying, “suitable for Grade 2”. Take the word “adamant”. Far too difficult. I knew the story was a very complicated study of bullying by girls in Grade 6, and at 2800 words it was a very demanding piece. The themes and references were totally inappropriate for younger children. The reading difficulty may have been Year 2, but the story wasn’t. But why?

I realized that the Flesch engine was picking up on the short sentences. The author uses lots of dialogue, and dialogue is almost always - short sentences!

I then decided to test something. What if I took out the direct speech, and joined the sentences? What would happen? I tested this theory on the first 100 words.

Best Friends sample passage (direct speech removed, some sentences combined)

Katie was queen bee and leader of the Year 6 girls. Other kids called them the 'Princesses'; however, there was trouble in the kingdom.
It had all started with the sleepover just before Christmas. Katie had been adamant that all her best friends should be there.
After a dinner of pizza, they all settled in for a movie.
Katie’s Mum asked what they would like to watch, suggesting Narnia, The Princess Diaries, Freaky Friday, Nannie McPhee and Alice in Wonderland. Katie was embarrassed by these choices, and blushed, saying, ‘They’re all so babyish. What about Pretty Woman?’, but her mother replied, ‘Oh that’s far too adult for you girls…’

(109 words) (Altered manuscript: Grade 6.2)

Sure enough, the rating changed dramatically - by four whole grade levels! I was pleased that the new score actually matched the target audience (Grade 6), as I saw it. And all I had done was to fiddle with the length of the sentences. What if I fiddled more?

Best Friends sample passage (all sentences combined)

Katie was queen bee and leader of the Year 6 girls, whom other kids called the 'Princesses'; however, there was trouble in the kingdom.
It had all started with the sleepover just before Christmas, something Katie had insisted on, adamant that all her best friends should be there.
After a dinner of pizza, they all settled in for a movie.
Katie’s Mum asked what they would like to watch, suggesting Narnia, The Princess Diaries, Freaky Friday, Nannie McPhee and Alice in Wonderland. Katie was embarrassed by these choices, and blushed, saying, ‘They’re all so babyish. What about Pretty Woman?’, but her mother replied, ‘Oh that’s far too adult for you girls…’

(111 words) (Altered text: Grade 7.9)

Wow! Now the suggested grade level was nearly Year 8! And it was the same 100 words (give or take). A six year gap of “readability”, with effectively the same text.

But you can see how the score can be manipulated. Flesch largely charts syllables and sentence length – short words, short sentences = simple text.

While the middle test score (Passage 2) is probably close to being “right”, in terms of notching up an ‘appropriate’ grade score (6.2), it is not what the author wrote. Yet to publish the story and say it was OK for Grade 2 would be flagrantly wrong.

I feel that Passage 1, the original, is actually the best, not only because it is what the author wrote, but because cutting out the mother’s dialogue makes it too bland. To artificially ‘push’ it up to Grade 6 or 7 (as 2 & 3 do) would completely change the feel of the original.

This story was never finally published on Ziptales. One of our readers felt it was inclined to make the central girl, Emma, seem too much of a victim. So it was shelved.

There is a moral in all this. Readability engines cannot think. They are just clever bits of code. Very useful as a short cut, but not completely reliable, because they only look at the language. They have no way of understanding the content.

We would have put this story into the Advanced Library, at about Zip Stage 9 or 10 – to reflect the more sophisticated content – if we had published it.

You can be comfortable in knowing that the allocation of stories to the different Zip Stages is reliable. They were arrived at by a combination of the raw score from the readability engine (eg “Loose Undies” on 2.5 from Flesch and 480 from Lexiles), and the opinion of a bank of teacher ‘readers’ (or auditors), who also considered the content of the narrative, and if necessary, adjusted the Zip Stage score to better reflect where it would work best for children.

Readability is not just the words. It’s also the ideas and themes. It takes a teacher to figure this out. We hope you like our readability scheme.

For more information on the Ziptales readability measures, check out our readability page here.

Ziptales

Readability Tests A Cautionary Tale

Best Friends sample passage (unaltered)

Best Friends sample passage (direct speech removed, some sentences combined)

Best Friends sample passage (all sentences combined)

Login

We have noticed you are located outside of Australia / New Zealand.

Readability Tests
A Cautionary Tale