English

Español

Interrater Reliability

For any research program that requires qualitative rating by different researchers, it is important to establish a good level of interrater reliability, also known as interobserver reliability.

This article is a part of the guide:

Discover 21 more articles on this topic

Don't miss these related articles:

This ensures that the generated results meet the accepted criteria defining reliability, by quantitatively defining the degree of agreement between two or more observers.

Interrater Reliability and the Olympics

Interrater reliability is the most easily understood form of reliability, because everybody has encountered it.

For example, watching any sport using judges, such as Olympics ice skating or a dog show, relies upon human observers maintaining a great degree of consistency between observers. If even one of the judges is erratic in their scoring system, this can jeopardize the entire system and deny a participant their rightful prize.

Outside the world of sport and hobbies, inter-rater reliability has some far more important connotations and can directly influence your life.

Examiners marking school and university exams are assessed on a regular basis, to ensure that they all adhere to the same standards. This is the most important example of interobserver reliability - it would be extremely unfair to fail an exam because the observer was having a bad day.

For most examination boards, appeals are usually rare, showing that the interrater reliability process is fairly robust.

An Example From Experience

I used to work for a bird protection charity and, every morning, we went down to the seashore and used to estimate the number individuals for each bird species.

Obviously, you cannot count thousands of birds individually; apart from the huge numbers, they constantly move, leaving and rejoining the group. Using experience, we estimated the numbers and then compared our estimate.

If one person estimated 1000 dunlin, one 4000 and the other 12000, then there was something wrong with our estimation and it was highly unreliable.

If, however, we independently came up with figures of 4000, 5000 and 6000, then that was accurate enough for our purposes, and we knew that we could use the average with a good degree of confidence.

Qualitative Assessments and Interrater Reliability

Any qualitative assessment using two or more researchers must establish interrater reliability to ensure that the results generated will be useful.

One good example is Bandura's Bobo Doll experiment, which used a scale to rate the levels of displayed aggression in young children. Apart from extensive pre-testing, the observers constantly compared and calibrated their ratings, adjusting their scales to ensure that they were as similar as possible.

Guidelines and Experience

Interobserver reliability is strengthened by establishing clear guidelines and thorough experience. If the observers are given clear and concise instructions about how to rate or estimate behavior, this increases the interobserver reliability.

Experience is also a great teacher; researchers who have worked together for a long time will be fully aware of each other's strengths, and will be surprisingly similar in their observations.

Bibliography

Auerbach, C., La Porte, H.H. & Caputo, R.K. (2004). Statistical Methods for Estimates of Interrater Reliability. In Roberts, A.R. & Yeager, K.R. Evidence Based Practice Manual: Reasearch and Outcome Measures in Health and Human Services, pp 444-448, New York, NY: Oxford University Press

Jackson, S.L. (2011). Research Methods and Statistics: A Critical Thinking Approach (2^nd Ed.). Belmont, CA: Wadsworth Cengage Learning

Rubin, A., & Babbie, E.R. (2007). Essential Research Methods for Social Work, Belmont, CA: Wadsworth Cengage Learning

Check out our quiz-page with tests about:

Full reference:

Martyn Shuttleworth (Aug 16, 2009). Interrater Reliability. Retrieved Mar 11, 2026 from Explorable.com: https://explorable.com/interrater-reliability

You Are Allowed To Copy The Text

The text in this article is licensed under the Creative Commons-License Attribution 4.0 International (CC BY 4.0).

This means you're free to copy, share and adapt any parts (or all) of the text in the article, as long as you give appropriate credit and provide a link/reference to this page.

That is it. You don't need our permission to copy the article; just include a link/reference back to this page. You can use it freely (with some kind of link), and we're also okay with people reprinting in publications like books, blogs, newsletters, course-material, papers, wikipedia and presentations (with clear attribution).

Footer bottom

Footer

Links

Complete Collection

Like Explorable? Take it with you wherever you go.

Thank you to...

Innovation Norway

The Research Council of Norway

This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 827736.

Subscribe / Share

Subscribe to our RSS Feed
Like us on Facebook
Follow us on Twitter

Explorable.com - 2008-2026

You are free to copy, share and adapt any text in the article, as long as you give appropriate credit and provide a link/reference to this page.

Interrater Reliability

This article is a part of the guide:

Browse Full Outline

Interrater Reliability and the Olympics

An Example From Experience

Qualitative Assessments and Interrater Reliability

Guidelines and Experience

Bibliography

You Are Allowed To Copy The Text

This article is a part of the guide:

Browse Full Outline

Save this course for later

Footer bottom

Search form

Interrater Reliability

This article is a part of the guide:

Browse Full Outline

Interrater Reliability and the Olympics

An Example From Experience

Qualitative Assessments and Interrater Reliability

Guidelines and Experience

Bibliography

You Are Allowed To Copy The Text

Related articles

This article is a part of the guide:

Browse Full Outline

Save this course for later

Footer bottom