In normal language, we use the word reliable to mean that something is dependable and that it will give the same outcome every time. We might talk of a football player as reliable, meaning that he gives a good performance game after game.
Reliability and Science
Reliability is something that every scientist, especially in social sciences and biology, must be aware of.
In science, the definition is the same, but needs a much narrower and unequivocal definition.
Another way of looking at this is as maximizing the inherent repeatability or consistency in an experiment. For maintaining reliability internally, a researcher will use as many repeat sample groups as possible, to reduce the chance of an abnormal sample group skewing the results.
If you use three replicate samples for each manipulation, and one generates completely different results from the others, then there may be something wrong with the experiment.
For many experiments, results follow a ‘normal distribution' and there is always a chance that your sample group produces results lying at one of the extremes. Using multiple sample groups will smooth out these extremes and generate a more accurate spread of results.
If your results continue to be wildly different, then there is likely to be something very wrong with your design; it is unreliable.
Reliability and Cold Fusion
Reliability is also extremely important externally, and another researcher should be able to perform exactly the same experiment, with similar equipment, under similar conditions, and achieve exactly the same results. If they cannot, then the design is unreliable.
A good example of a failure to apply the definition of reliability correctly is provided by the cold fusion case, of 1989
Fleischmann and Pons announced to the world that they had managed to generate heat at normal temperatures, instead of the huge and expensive tori used in most research into nuclear fusion.
This announcement shook the world, but researchers in many other institutions across the world attempted to replicate the experiment, with no success. Whether the researchers lied, or genuinely made a mistake is unclear, but their results were clearly unreliable.
Reliability and Statistics
Physical scientists expect to obtain exactly the same results every single time, due to the relative predictability of the physical realms. If you are a nuclear physicist or an inorganic chemist, repeat experiments should give exactly the same results, time after time.
Ecologists and social scientists, on the other hand, understand fully that achieving exactly the same results is an exercise in futility. Research in these disciplines incorporates random factors and natural fluctuations and, whilst any experimental design must attempt to eliminate confounding variables and natural variations, there will always be some disparities.
Reliability and validity are often confused, but the terms actually describe two completely different concepts, although they are often closely inter-related. This distinct difference is best summed up with an example:
A researcher devises a new test that measures IQ more quickly than the standard IQ test:
If the new test delivers scores for a candidate of 87, 65, 143 and 102, then the test is not reliable or valid, and it is fatally flawed.
If the test consistently delivers a score of 100 when checked, but the candidates real IQ is 120, then the test is reliable, but not valid.
If the researcher's test delivers a consistent score of 118, then that is pretty close, and the test can be considered both valid and reliable.
Reliability is an essential component of validity but, on its own, is not a sufficient measure of validity. A test can be reliable but not valid, whereas a test cannot be valid yet unreliable.
Reliability, in simple terms, describes the repeatability and consistency of a test. Validity defines the strength of the final results and whether they can be regarded as accurately describing the real world.
The Definition of Reliability - An Example
Imagine that a researcher discovers a new drug that she believes helps people to become more intelligent, a process measured by a series of mental exercises. After analyzing the results, she finds that the group given the drug performed the mental tests much better than the control group.
For her results to be reliable, another researcher must be able to perform exactly the same experiment on another group of people and generate results with the same statistical significance. If repeat experiments fail, then there may be something wrong with the original research.
Testing Reliability for Social Sciences and Education
In the social sciences, testing reliability is a matter of comparing two different versions of the instrument and ensuring that they are similar. When we talk about instruments, it does not necessarily mean a physical instrument, such as a mass-spectrometer or a pH-testing strip.
The Test-Retest Method is the simplest method for testing reliability, and involves testing the same subjects at a later date, ensuring that there is a correlation between the results. An educational test retaken after a month should yield the same results as the original.
The difficulty with this method is that it assumes that nothing has changed in that time period. Staying with education, if you administer exactly the same test, the student may perform much better because they remember the questions and have thought about the questions.
How many times have you left an exam and, after a couple of hours, thought; “How could I have been so stupid - I knew the answer to that one!” Of course, next time, you will get that question right, meaning that the test is unreliable.
For this reason, if you have to retake an exam, you will be faced with different questions and may be marked a little more strictly to take into account that you had extra time to revise. This is not the complete picture, because the two exams will need to be compared, to ensure that they produce the same results. This shows the importance of reliability in our lives and also highlights the fact that there is no easy way to test it.
For example, sticking with exams, imagine that an examining board wants to test that its new mathematics exam is reliable, and selects a group of test students. For each section of the exam, such as calculus, geometry, algebra and trigonometry, they actually ask two questions, designed to measure the aptitude of the student in that particular area.
If there is a high internal consistency, and the results for the two sets of questions are similar, then the new test is likely to be reliable. The test - retest method involves two separate administrations of the same instrument, internal consistency measures two different versions at the same time.
A horribly complicated statistical formula, called Cronbach's Alpha tests the reliability and compares the various pairs of questions but, luckily, computer programs take care of that and spit out a single number, telling you exactly how reliable the test is!
Reliability - One of the Foundations of Science
As we have seen, understanding the definition of reliability is extremely important for any scientist but, for social scientists, biologists and psychologists, amongst others, it is a crucial foundation of any research design. If any test is not reliable then it cannot be valid and the experiment is a waste of time.
For this reason, extensive research programs always involve a number of pre-tests, ensuring that all of the instruments used are consistent. Even physical scientists perform instrumental pretests, ensuring that all of their measuring equipment is calibrated against established standards.