Trend Measurement in International Assessment Surveys

by Andrea Ferrari – Senior Associate

In international surveys in general and in comparative assessments in particular, investigators are interested in collecting data about knowledge, skills and competences of a given population at a given point in time.

In addition, if periodical data collections are planned, it is of great interest to also measure change over time. For this purpose, items that have been administered in the past (trend items or link items) are administered again, usually in conjunction with newly developed items.

Implications for linguistic quality assurance: the theory

Albert Beaton famously said, “if you want to measure change, do not change the measure” (Beaton et al, 1990). This maxim functions as a starting point for the guidelines on trend items in international studies such as OECD’s PISA or IEA’S TIMSS.

Indeed, there is abundant literature about the impact on test scores of even minor wording or punctuation changes. In theory then, linguistic quality assurance would focus only on newly developed items–which need to be carefully translated/adapted in the languages of the survey–while translated/adapted trend items exist already and would just need to be carried forward, untouched.

… And the practice

In practice, things are not so simple and the ‘exceptions’ that may need to be accommodated can be numerous. On the one hand, trend items may need to be updated at the level of the source language because of e.g., new scientific or technological developments, or because the delivery mode of the survey may have changed, e.g., from a paper and pencil test to a computer-delivered or tablet-delivered test. Such updates to source then need to be echoed to all target versions, i.e., the translated/adapted items.

On the other hand, even when the source is unchanged, existing translated/adapted items can become outdated due to several and sometimes unpredictable factors, e.g., a spelling reform in the target language; an educational reform in the target country; a contextual change such as a change of currency in the target country. Also, errors that went undetected in the first administration of an item may be discovered due to poor item functioning in the field or when preparing the next administration.

Another risk factor may be that new project teams in local field institutes or national centres of participating countries[1] are inclined to review trend materials and propose or implement significant revisions to their wording. In such cases, it can be difficult to assess whether the proposed changes are preferential, stylistic edits (that may represent linguistic improvements or not); whether they correct outright errors; or whether they are necessary or desirable modifications due to a change in local usage or local context.

The need for a trend management process

In light of the above, what is needed in any case is to design strict procedures to filter and control changes in trend content, so that even the tiniest edit is clearly documented and its effect can be tracked.

In an ideal world, requests for changes to trend content should always be supported by data: if an item had a clearly perceptible country/item or language/item interaction that could be described as differential item functioning or item bias, there is a good reason to scrutinize the wording or cultural adaptations carefully and possibly to propose alternative wording to remedy that situation.

Conversely, one could advocate that once an item seems to have worked well (i.e., the translated/adapted item has been ‘validated’ by having been fielded and scored and has not shown any unusual statistics), even correcting a residual error may be an unnecessary risk. However, it might be difficult to convince a national team that a clearly identified error should remain uncorrected and be kept for the next administration.

Different approaches

A resolutely “decentralised” approach used in IEA studies is to leave full responsibility for trend items to the participating country teams and have the cApStAn verifiers identify any differences in trend items without expressing their opinion on their desirability or appropriateness.

Variations of a more “centralised” approach have been used in PISA surveys, whereby participating country teams review their trend materials and make requests for changes, which are then negotiated. In a first variant, the trend items are still under the control of the participating countries, while in a stricter variant of this approach the trend items are locked for editing and agreed changes are implemented by the international project team, not by the countries.

Open questions

1. Is it sensible to transfer known errors across survey cycles?

Although difficult to get across as an idea, this would be a necessary by-product of the “strictest” possible approach to trend management, consisting of “No changes whatsoever to trend items, under no circumstances”. With such an approach, trend items would not be opened for review at all.

Our take: this should be decided on a case-by-case basis; if an error is corrected it may mean renouncing to an item as part of trend and considering as a (better, corrected) new item.

2. Who should be assigned with the role of determining whether a change is acceptable to make or not?

In PISA, it is the “Translation Referee”, who advises countries on translation plans, reviews all verification feedback, and negotiates with countries on crucial issues until corrective action is agreed–liaising with item developers as needed.

3. For surveys where national teams in participating countries are responsible for translation and adaptation (decentralized management), is it possible to nevertheless organise the management of trend content by the international project team, with a view to controlling the urge to revise materials and reduce the risks?

The PISA experience seems positive in this regard.

[1] Many international surveys apply a decentralized translation/adaptation model: the participating entities (countries or regions) produce or adapt their survey instruments, whereas guidance and quality assurance (before, during, and after) is the remit of the contractors implementing the project.

cApStAn has been responsible for ensuring linguistic equivalence of multiple language versions of various large-scale international surveys and tests since its inception. If you would like to speak to one of our experts about your requirements, please write to us in the form below