“Item bias as a consequence of hidden social makers in assessment content”. A coffee conversation with cApStAn co-founder Steve Dept at E-ATP 2021

“Item bias as a consequence of hidden social makers in assessment content”. A coffee conversation with cApStAn co-founder Steve Dept at E-ATP 2021

by Pisana Ferrari – cApStAn Ambassador to the Global Village

This year’s (virtual) edition of the E-ATP conference, of which cApStAn is a proud sponsor, focussed on the challenges to the testing industry brought on by the COVID-19 pandemic, and how these were met. These include, inter alia, how to address the shift to technology-based assessment (TBA) and remote proctoring, catering to new audiences, adapting technology and test questions, providing flexibility in delivery, and ensuring accessibility to all. Fairness and equity in assessments were at the forefront of many discussions at the conference and at the heart of our own Steve Dept’s coffee conversation on Day 3 on how hidden social makers in assessment content can introduce item bias. In this lively and well attended session Steve explained that tests contain far more stereotypes than we think and may conceal social markers that can put some socio-demographic groups at a disadvantage. Playing on the idea of the “coffee” conversation, he gave this first example:

“As I pondered over this, I sipped from my coffee for half an hour”.

“I entered the bar, ordered a coffee, gulped it down, paid and went to my meeting”.

The first sentence can be understood by an American, but would be challenging for an Italian test taker (who is unlikely to sit down for an extended period of time to have his coffee, also given that coffee cups are tiny while mugs are a relatively unknown item in the country). The second sentence would be straightforward for any Italian test taker (this is the way Italians normally consume their coffee, standing up at the bar and then rushing off), and would be understood the same way probably by a small subset of urban Americans familiar with “espresso shots”. The word “coffee” in itself is not a social marker, but the context in which it is presented is a social marker, Steve said. “Inference from context varies widely across cultural groups”.

The second example relates to the stimulus of a mathematics item.

The size of the Makathini’s lawn is 3.5 times that of a standard rugby field. How long would it take Mrs Makathini to mow her lawn if […a discrete set of conditions]?”

Which is the salient hidden social marker in this case?

– Is it rugby?

– Mowing the lawn?

– The fact that it is a woman mowing?

– Or the name “Makathini”?

“Rugby” is easy to adapt to football, cricket, or anything that indicates an area, according to the local culture. “Woman” is a distractor — not construct relevant — as indeed the name “Makathini”. The correct answer is “mowing the lawn”. This puts the reader in a “category”, that of a person that does not just “cut the grass”. It implies that the test taker is familiar with the idea of having a garden, of taking care of a garden or having somebody take care of it, and could belong to a privileged social category. “Cut the grass” is more inclusive and is a concept that anyone can relate to more easily, regardless of the socio-economic or demographic background.

These examples reveal just how much scrutiny must go into checking the language of assessments for bias, Steve said, and from how many different perspectives. You need to have strategies in place, or at least a plan, to avoid such sources of construct-irrelevant variance (and item bias). More diversity in test writer pools and SME panels, with representatives of different communities and cultural backgrounds, would ensure that (at least part of the) problematic expressions are flagged and the necessary adaptations made – or not.

cApStAn’s multicultural Diversity Equity Inclusion and Bias Reduction (DEI-BR) panel analyzes the mature draft of an assessment, even if the test is only to be administered in English, “and we mentally translate it into our strongest language to identify places where the representation of what is written can vary across different demographics”, explained Steve. The objective is to spot socially-loaded noise in the items.

This is not just about taking into account cultural sensitivities: it also implies knowledge of notions or concepts that can blur the constructs:

– A problem-solving item that uses a map of the underground as a stimulus may put respondents who live in rural areas at a disadvantage.

– A science item that includes references to rules and regulations may elicit different cognitive strategies in Northern versus Southern countries

“That was a dense exchange”, Steve wrote on his LinkedIn blog after the session, “and the input of some of the attendees helped make the point that including non-native English speakers in cultural review panels adds a lot of perspective (and value) when applying a Diversity, Equity, Inclusion and Bias Reduction (DEI-BR) filter to test content developed in English.”