Skip Nav

Research design: measurement, reliability, and validity.

What is External Validity?

❶Perri and Lichtenwald provide a starting point for a discussion about a wide range of reliability and validity topics in their analysis of a wrongful murder conviction.

Validity in Research Design

Result Filters
Supplemental Content
Navigation menu

Validity is used to determine whether research measures what it intended to measure and to approximate the truthfulness of the results. Researchers often use their own definition when it comes to what is considered valid. In quantitative research testing for validity and reliability is a given.

However some qualitative researchers have gone so far as to suggest that validity does not apply to their research even as they acknowledge the need for some qualifying checks or measures in their work. To disregard validity is to put the trustworthiness of your work in question and to call into question others confidence in its results.

Even when qualitative measures are used in research they need to be looked at using measures of reliability and validity in order to sustain the trustworthiness of the results.

Quality research depends on a commitment to testing and increasing the validity as well as the reliability of your research results. Any research worth its weight is concerned with whether what is being measured is what is intended to be measured and considers the ways in which observations are influenced by the circumstances in which they are made.

The basis of how our conclusions are made play an important role in addressing the broader substantive issues of any given study. For this reason we are going to look at various validity types that have been formulated as a part of legitimate research methodology. This is the least scientific method of validity as it is not quantified using statistical methods. This is not validity in a technical sense of the term.

It is concerned with whether it seems like we measure what we claim. Here we look at how valid a measure appears on the surface and make subjective judgments based off of that. In research its never sufficient to rely on face judgments alone and more quantifiable methods of validity are necessary in order to draw acceptable conclusions. There are many instruments of measurement to consider so face validity is useful in cases where you need to distinguish one approach over another.

Face validity should never be trusted on its own merits. This is also a subjective measure but unlike face validity we ask whether the content of a measure covers the full domain of the content.

If a researcher wanted to measure introversion they would have to first decide what constitutes a relevant domain of content for that trait. Where it distinguishes itself is through its use of experts in the field or individuals belonging to a target population. This study can be made more objective through the use of rigorous statistical tests. For example you could have a content validity study that informs researchers how items used in a survey represent their content domain, how clear they are, and the extent to which they maintain the theoretical factor structure assessed by the factor analysis.

A construct represents a collection of behaviors that are associated in a meaningful way to create an image or an idea invented for a research purpose. Depression is a construct that represents a personality trait which manifests itself in behaviors such as over sleeping, loss of appetite, difficulty concentrating, etc. The existence of a construct is manifest by observing the collection of related indicators.

Any one sign may be associated with several constructs. A person with difficulty concentrating may have A. Construct validity is the degree to which inferences can be made from operationalizations connecting concepts to observations in your study to the constructs on which those operationalizations are based.

To establish construct validity you must first provide evidence that your data supports the theoretical structure. You must also show that you control the operationalization of the construct, in other words, show that your theory has some correspondence with reality. This refers to the extent to which the independent variable can accurately be stated to produce the observed effect. If the effect of the dependent variable is only due to the independent variable s then internal validity is achieved.

A group is introduced to a treatment or condition and then observed for changes which are attributed to the treatment X O The Problems with this design are: A total lack of control. Also, it is of very little scientific value as securing scientific evidence to make a comparison, and recording differences or contrasts.

O 1 X O 2 However, there exists threats to the validity of the above assertion: History --between O 1 and O 2 many events may have occurred apart from X to produce the differences in outcomes. The longer the time lapse between O 1 and O 2 , the more likely history becomes a threat. X O 1 O 2 Threats to validity include: Selection --groups selected may actually be disparate prior to any treatment.

Three True Experimental Designs The next three designs discussed are the most strongly recommended designs: An explanation of how this design controls for these threats is below. History --this is controlled in that the general history events which may have contributed to the O 1 and O 2 effects would also produce the O 3 and O 4 effects.

This is true only if the experiment is run in a specific manner--meaning that you may not test the treatment and control groups at different times and in vastly different settings as these differences may effect the results.

Rather, you must test simultaneously the control and experimental groups. Intrasession history must also be taken into consideration. For example if the groups truly are run simultaneously, then there must be different experimenters involved, and the differences between the experimenters may contribute to effects. A solution to history in this case is the randomization of experimental occasions--balanced in terms of experimenter, time of day, week and etc.

The factors described so far effect internal validity. These factors could produce changes which may be interpreted as the result of the treatment. These are called main effects which have been controlled in this design giving it internal validity. However, in this design, there are threats to external validity also called interaction effects because they involve the treatment and some other variable the interaction of which cause the threat to validity.

It is important to note here that external validity or generalizability always turns out to involve extrapolation into a realm not represented in one's sample.

In contrast, internal validity are solvable within the limits of the logic of probability statistics. This means that we can control for internal validity based on probability statistics within the experiment conducted, however, external validity or generalizability can not logically occur because we can't logically extrapolate to different conditions.

Hume's truism that induction or generalization is never fully justified logically. Interaction of testing and X --because the interaction between taking a pretest and the treatment itself may effect the results of the experimental group, it is desirable to use a design which does not use a pretest.

Research should be conducted in schools in this manner--ideas for research should originate with teachers or other school personnel. The designs for this research should be worked out with someone expert at research methodology, and the research itself carried out by those who came up with the research idea. Results should be analyzed by the expert, and then the final interpretation delivered by an intermediary. Tests of significance for this design--although this design may be developed and conducted appropriately, statistical tests of significance are not always used appropriately.

Wrong statistic in common use--many use a t-test by computing two ts, one for the pre-post difference in the experimental group and one for the pre-post difference of the control group.

If the experimental t-test is statistically significant as opposed to the control group, the treatment is said to have an effect. However this does not take into consideration how "close" the t-test may really have been. A better procedure is to run a 2X2 ANOVA repeated measures, testing the pre-post difference as the within-subject factor , the group difference as the between-subject factor , and the interaction effect of both factors.

What is Reliability?

Main Topics

Privacy Policy

Validity is used to determine whether research measures what it intended to measure and to approximate the truthfulness of the results. Researchers often use their own definition when it comes to what is considered valid. In quantitative research testing for validity and reliability is a given.

Privacy FAQs

Internal validity dictates how an experimental design is structured and encompasses all of the steps of the scientific research method. Even if your results are great, sloppy and inconsistent design will compromise your integrity in the eyes of the scientific community.

About Our Ads

Please note that validity discussed here is in the context of experimental design, not in the context of measurement. Internal validity refers specifically to whether an experimental treatment/condition makes a difference or not, and whether there is sufficient evidence to support the claim. What are Internal validity External validity in Research Design? Research Design and Methods The concept of external validity. Journal of Consumer Research, 9, 3, pp.

Cookie Info

In general, VALIDITY is an indication of how sound your research is. More specifically, validity applies to both the design and the methods of your research. Validity in data collection means that your findings truly represent the phenomenon you are claiming to measure. Valid claims are solid claims. The concept of measuring constructs is discussed. An explanation of reliability and validity of measures is presented. Reliability is consistency in measurement over repeated measures. Reliable measures are those with low random (chance) errors. Reliability is assessed by one of four methods: retest.