• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Statistics By Jim

Making statistics intuitive

Reliability vs Validity: Differences & Examples

By Jim Frost 1 Comment

Reliability and validity are criteria by which researchers assess measurement quality. Measuring a person or item involves assigning scores to represent an attribute. This process creates the data that we analyze. However, to provide meaningful research results, that data must be good. And not all data are good!

Check mark indicating that the researchers have assessed measurement reliability and validity.

For data to be good enough to allow you to draw meaningful conclusions from a research study, they must be reliable and valid. What are the properties of good measurements? In a nutshell, reliability relates to the consistency of measures, and validity addresses whether the measurements are quantifying the correct attribute.

In this post, learn about reliability vs. validity, their relationship, and the various ways to assess them.

Learn more about Experimental Design: Definition, Types, and Examples .

Reliability

Reliability refers to the consistency of the measure. High reliability indicates that the measurement system produces similar results under the same conditions. If you measure the same item or person multiple times, you want to obtain comparable values. They are reproducible.

If you take measurements multiple times and obtain very different values, your data are unreliable. Numbers are meaningless if repeated measures do not produce similar values. What’s the correct value? No one knows! This inconsistency hampers your ability to draw conclusions and understand relationships.

Suppose you have a bathroom scale that displays very inconsistent results from one time to the next. It’s very unreliable. It would be hard to use your scale to determine your correct weight and to know whether you are losing weight.

Inadequate data collection procedures and low-quality or defective data collection tools can produce unreliable data. Additionally, some characteristics are more challenging to measure reliably. For example, the length of an object is concrete. On the other hand, a psychological construct, such as conscientiousness, depression, and self-esteem, can be trickier to measure reliably.

When assessing studies, evaluate data collection methodologies and consider whether any issues undermine their reliability.

Validity refers to whether the measurements reflect what they’re supposed to measure. This concept is a broader issue than reliability. Researchers need to consider whether they’re measuring what they think they’re measuring. Or do the measurements reflect something else? Does the instrument measure what it says it measures? It’s a question that addresses the appropriateness of the data rather than whether measurements are repeatable.

Validity is a smaller concern for tangible measurements like height and weight. You might have a biased bathroom scale if it tends to read too high or too low—but it still measures weight. Validity is a bigger concern in the social sciences, where you can measure elusive concepts such as positive outlook and self-esteem. If you’re assessing the psychological construct of conscientiousness, you need to confirm that the instrument poses questions that appraise this attribute rather than, say, obedience.

Reliability vs Validity

A measurement must be reliable first before it has a chance of being valid. After all, if you don’t obtain consistent measurements for the same object or person under similar conditions, it can’t be valid. If your scale displays a different weight every time you step on it, it’s unreliable, and it is also invalid.

So, having reliable measurements is the first step towards having valid measures. Validity is necessary for reliability, but it is insufficient by itself.

Suppose you have a reliable measurement. You step on your scale a few times in a short period, and it displays very similar weights. It’s reliable. But the weight might be incorrect.

Just because you can measure the same object multiple times and get consistent values, it does not necessarily indicate that the measurements reflect the desired characteristic.

How can you determine whether measurements are both valid and reliable? Assessing reliability vs. validity is the topic for the rest of this post!

Similar measurements for the same person/item under the same conditions. Measurements reflect what they’re supposed to measure.
Stability of results across time, between observers, within the test. Measures have appropriate relationships to theories, similar measures, and different measures.
Unreliable measurements typically cannot be valid. Valid measurements are also reliable.

How to Assess Reliability

Reliability relates to measurement consistency. To evaluate reliability, analysts assess consistency over time, within the measurement instrument, and between different observers. These types of consistency are also known as—test-retest, internal, and inter-rater reliability. Typically, appraising these forms of reliability involves taking multiple measures of the same person, object, or construct and assessing scatterplots and correlations of the measurements. Reliable measurements have high correlations because the scores are similar.

Test-Retest Reliability

Analysts often assume that measurements should be consistent across a short time. If you measure your height twice over a couple of days, you should obtain roughly the same measurements.

To assess test-retest reliability, the experimenters typically measure a group of participants on two occasions within a few days. Usually, you’ll evaluate the reliability of the repeated measures using scatterplots and correlation coefficients . You expect to see high correlations and tight lines on the scatterplot when the characteristic you measure is consistent over a short period, and you have a reliable measurement system.

This type of reliability establishes the degree to which a test can produce stable, consistent scores across time. However, in practice, measurement instruments are never entirely consistent.

Keep in mind that some characteristics should not be consistent across time. A good example is your mood, which can change from moment to moment. A test-retest assessment of mood is not likely to produce a high correlation even though it might be a useful measurement instrument.

Internal Reliability

This type of reliability assesses consistency across items within a single instrument. Researchers evaluate internal reliability when they’re using instruments such as a survey or personality inventories. In these instruments, multiple items relate to a single construct. Questions that measure the same characteristic should have a high correlation. People who indicate they are risk-takers should also note that they participate in dangerous activities. If items that supposedly measure the same underlying construct have a low correlation, they are not consistent with each other and might not measure the same thing.

Inter-Rater Reliability

This type of reliability assesses consistency across different observers, judges, or evaluators. When various observers produce similar measurements for the same item or person, their scores are highly correlated. Inter-rater reliability is essential when the subjectivity or skill of the evaluator plays a role. For example, assessing the quality of a writing sample involves subjectivity. Researchers can employ rating guidelines to reduce subjectivity. Comparing the scores from different evaluators for the same writing sample helps establish the measure’s reliability. Learn more about inter-rater reliability .

Related post : Interpreting Correlation

Cronbach’s Alpha

Cronbach’s alpha measures the internal consistency, or reliability, of a set of survey items. Use this statistic to help determine whether a collection of items consistently measures the same characteristic. Learn more about Cronbach’s Alpha .

Gage R&R Studies

These studies evaluation a measurement systems reliability and identifies sources of variation that can help you target improvement efforts effectively. Learn more about Gage R&R Studies .

How to Assess Validity

Validity is more difficult to evaluate than reliability. After all, with reliability, you only assess whether the measures are consistent across time, within the instrument, and between observers. On the other hand, evaluating validity involves determining whether the instrument measures the correct characteristic. This process frequently requires examining relationships between these measurements, other data, and theory. Validating a measurement instrument requires you to use a wide range of subject-area knowledge and different types of constructs to determine whether the measurements from your instrument fit in with the bigger picture!

An instrument with high validity produces measurements that correctly fit the larger picture with other constructs. Validity assesses whether the web of empirical relationships aligns with the theoretical relationships.

The measurements must have a positive relationship with other measures of the same construct. Additionally, they need to correlate in the correct direction (positively or negatively) with the theoretically correct constructs. Finally, the measures should have no relationship with unrelated constructs.

If you need more detailed information, read my post that focuses on Measurement Validity . In that post, I cover the various types, how to evaluate them, and provide examples.

Experimental validity relates to experimental designs and methods. To learn about that topic, read my post about Internal and External Validity .

Whew, that’s a lot of information about reliability vs. validity. Using these concepts, you can determine whether a measurement instrument produces good data!

Share this:

reliability and validity in research difference

Reader Interactions

' src=

August 17, 2022 at 3:53 am

Good way of expressing what validity and reliabiliy with building examples.

Comments and Questions Cancel reply

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, automatically generate references for free.

  • Knowledge Base
  • Methodology
  • Reliability vs Validity in Research | Differences, Types & Examples

Reliability vs Validity in Research | Differences, Types & Examples

Published on 3 May 2022 by Fiona Middleton . Revised on 10 October 2022.

Reliability and validity are concepts used to evaluate the quality of research. They indicate how well a method , technique, or test measures something. Reliability is about the consistency of a measure, and validity is about the accuracy of a measure.

It’s important to consider reliability and validity when you are creating your research design , planning your methods, and writing up your results, especially in quantitative research .

Reliability vs validity
Reliability Validity
What does it tell you? The extent to which the results can be reproduced when the research is repeated under the same conditions. The extent to which the results really measure what they are supposed to measure.
How is it assessed? By checking the consistency of results across time, across different observers, and across parts of the test itself. By checking how well the results correspond to established theories and other measures of the same concept.
How do they relate? A reliable measurement is not always valid: the results might be reproducible, but they’re not necessarily correct. A valid measurement is generally reliable: if a test produces accurate results, they should be .

Table of contents

Understanding reliability vs validity, how are reliability and validity assessed, how to ensure validity and reliability in your research, where to write about reliability and validity in a thesis.

Reliability and validity are closely related, but they mean different things. A measurement can be reliable without being valid. However, if a measurement is valid, it is usually also reliable.

What is reliability?

Reliability refers to how consistently a method measures something. If the same result can be consistently achieved by using the same methods under the same circumstances, the measurement is considered reliable.

What is validity?

Validity refers to how accurately a method measures what it is intended to measure. If research has high validity, that means it produces results that correspond to real properties, characteristics, and variations in the physical or social world.

High reliability is one indicator that a measurement is valid. If a method is not reliable, it probably isn’t valid.

However, reliability on its own is not enough to ensure validity. Even if a test is reliable, it may not accurately reflect the real situation.

Validity is harder to assess than reliability, but it is even more important. To obtain useful results, the methods you use to collect your data must be valid: the research must be measuring what it claims to measure. This ensures that your discussion of the data and the conclusions you draw are also valid.

Prevent plagiarism, run a free check.

Reliability can be estimated by comparing different versions of the same measurement. Validity is harder to assess, but it can be estimated by comparing the results to other relevant data or theory. Methods of estimating reliability and validity are usually split up into different types.

Types of reliability

Different types of reliability can be estimated through various statistical methods.

Type of reliability What does it assess? Example
The consistency of a measure : do you get the same results when you repeat the measurement? A group of participants complete a designed to measure personality traits. If they repeat the questionnaire days, weeks, or months apart and give the same answers, this indicates high test-retest reliability.
The consistency of a measure : do you get the same results when different people conduct the same measurement? Based on an assessment criteria checklist, five examiners submit substantially different results for the same student project. This indicates that the assessment checklist has low inter-rater reliability (for example, because the criteria are too subjective).
The consistency of : do you get the same results from different parts of a test that are designed to measure the same thing? You design a questionnaire to measure self-esteem. If you randomly split the results into two halves, there should be a between the two sets of results. If the two results are very different, this indicates low internal consistency.

Types of validity

The validity of a measurement can be estimated based on three main types of evidence. Each type can be evaluated through expert judgement or statistical methods.

Type of validity What does it assess? Example
The adherence of a measure to  of the concept being measured. A self-esteem questionnaire could be assessed by measuring other traits known or assumed to be related to the concept of self-esteem (such as social skills and optimism). Strong correlation between the scores for self-esteem and associated traits would indicate high construct validity.
The extent to which the measurement  of the concept being measured. A test that aims to measure a class of students’ level of Spanish contains reading, writing, and speaking components, but no listening component.  Experts agree that listening comprehension is an essential aspect of language ability, so the test lacks content validity for measuring the overall level of ability in Spanish.
The extent to which the result of a measure corresponds to of the same concept. A is conducted to measure the political opinions of voters in a region. If the results accurately predict the later outcome of an election in that region, this indicates that the survey has high criterion validity.

To assess the validity of a cause-and-effect relationship, you also need to consider internal validity (the design of the experiment ) and external validity (the generalisability of the results).

The reliability and validity of your results depends on creating a strong research design , choosing appropriate methods and samples, and conducting the research carefully and consistently.

Ensuring validity

If you use scores or ratings to measure variations in something (such as psychological traits, levels of ability, or physical properties), it’s important that your results reflect the real variations as accurately as possible. Validity should be considered in the very earliest stages of your research, when you decide how you will collect your data .

  • Choose appropriate methods of measurement

Ensure that your method and measurement technique are of high quality and targeted to measure exactly what you want to know. They should be thoroughly researched and based on existing knowledge.

For example, to collect data on a personality trait, you could use a standardised questionnaire that is considered reliable and valid. If you develop your own questionnaire, it should be based on established theory or the findings of previous studies, and the questions should be carefully and precisely worded.

  • Use appropriate sampling methods to select your subjects

To produce valid generalisable results, clearly define the population you are researching (e.g., people from a specific age range, geographical location, or profession). Ensure that you have enough participants and that they are representative of the population.

Ensuring reliability

Reliability should be considered throughout the data collection process. When you use a tool or technique to collect data, it’s important that the results are precise, stable, and reproducible.

  • Apply your methods consistently

Plan your method carefully to make sure you carry out the same steps in the same way for each measurement. This is especially important if multiple researchers are involved.

For example, if you are conducting interviews or observations, clearly define how specific behaviours or responses will be counted, and make sure questions are phrased the same way each time.

  • Standardise the conditions of your research

When you collect your data, keep the circumstances as consistent as possible to reduce the influence of external factors that might create variation in the results.

For example, in an experimental setup, make sure all participants are given the same information and tested under the same conditions.

It’s appropriate to discuss reliability and validity in various sections of your thesis or dissertation or research paper. Showing that you have taken them into account in planning your research and interpreting the results makes your work more credible and trustworthy.

Reliability and validity in a thesis
Section Discuss
What have other researchers done to devise and improve methods that are reliable and valid?
How did you plan your research to ensure reliability and validity of the measures used? This includes the chosen sample set and size, sample preparation, external conditions, and measuring techniques.
If you calculate reliability and validity, state these values alongside your main results.
This is the moment to talk about how reliable and valid your results actually were. Were they consistent, and did they reflect true values? If not, why not?
If reliability and validity were a big problem for your findings, it might be helpful to mention this here.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the ‘Cite this Scribbr article’ button to automatically add the citation to our free Reference Generator.

Middleton, F. (2022, October 10). Reliability vs Validity in Research | Differences, Types & Examples. Scribbr. Retrieved 21 August 2024, from https://www.scribbr.co.uk/research-methods/reliability-or-validity/

Is this article helpful?

Fiona Middleton

Fiona Middleton

Other students also liked, the 4 types of validity | types, definitions & examples, a quick guide to experimental design | 5 steps & examples, sampling methods | types, techniques, & examples.

  • How it works

researchprospect post subheader

Reliability and Validity – Definitions, Types & Examples

Published by Alvin Nicolas at August 16th, 2021 , Revised On October 26, 2023

A researcher must test the collected data before making any conclusion. Every  research design  needs to be concerned with reliability and validity to measure the quality of the research.

What is Reliability?

Reliability refers to the consistency of the measurement. Reliability shows how trustworthy is the score of the test. If the collected data shows the same results after being tested using various methods and sample groups, the information is reliable. If your method has reliability, the results will be valid.

Example: If you weigh yourself on a weighing scale throughout the day, you’ll get the same results. These are considered reliable results obtained through repeated measures.

Example: If a teacher conducts the same math test of students and repeats it next week with the same questions. If she gets the same score, then the reliability of the test is high.

What is the Validity?

Validity refers to the accuracy of the measurement. Validity shows how a specific test is suitable for a particular situation. If the results are accurate according to the researcher’s situation, explanation, and prediction, then the research is valid. 

If the method of measuring is accurate, then it’ll produce accurate results. If a method is reliable, then it’s valid. In contrast, if a method is not reliable, it’s not valid. 

Example:  Your weighing scale shows different results each time you weigh yourself within a day even after handling it carefully, and weighing before and after meals. Your weighing machine might be malfunctioning. It means your method had low reliability. Hence you are getting inaccurate or inconsistent results that are not valid.

Example:  Suppose a questionnaire is distributed among a group of people to check the quality of a skincare product and repeated the same questionnaire with many groups. If you get the same response from various participants, it means the validity of the questionnaire and product is high as it has high reliability.

Most of the time, validity is difficult to measure even though the process of measurement is reliable. It isn’t easy to interpret the real situation.

Example:  If the weighing scale shows the same result, let’s say 70 kg each time, even if your actual weight is 55 kg, then it means the weighing scale is malfunctioning. However, it was showing consistent results, but it cannot be considered as reliable. It means the method has low reliability.

Internal Vs. External Validity

One of the key features of randomised designs is that they have significantly high internal and external validity.

Internal validity  is the ability to draw a causal link between your treatment and the dependent variable of interest. It means the observed changes should be due to the experiment conducted, and any external factor should not influence the  variables .

Example: age, level, height, and grade.

External validity  is the ability to identify and generalise your study outcomes to the population at large. The relationship between the study’s situation and the situations outside the study is considered external validity.

Also, read about Inductive vs Deductive reasoning in this article.

Looking for reliable dissertation support?

We hear you.

  • Whether you want a full dissertation written or need help forming a dissertation proposal, we can help you with both.
  • Get different dissertation services at ResearchProspect and score amazing grades!

Threats to Interval Validity

Threat Definition Example
Confounding factors Unexpected events during the experiment that are not a part of treatment. If you feel the increased weight of your experiment participants is due to lack of physical activity, but it was actually due to the consumption of coffee with sugar.
Maturation The influence on the independent variable due to passage of time. During a long-term experiment, subjects may feel tired, bored, and hungry.
Testing The results of one test affect the results of another test. Participants of the first experiment may react differently during the second experiment.
Instrumentation Changes in the instrument’s collaboration Change in the   may give different results instead of the expected results.
Statistical regression Groups selected depending on the extreme scores are not as extreme on subsequent testing. Students who failed in the pre-final exam are likely to get passed in the final exams; they might be more confident and conscious than earlier.
Selection bias Choosing comparison groups without randomisation. A group of trained and efficient teachers is selected to teach children communication skills instead of randomly selecting them.
Experimental mortality Due to the extension of the time of the experiment, participants may leave the experiment. Due to multi-tasking and various competition levels, the participants may leave the competition because they are dissatisfied with the time-extension even if they were doing well.

Threats of External Validity

Threat Definition Example
Reactive/interactive effects of testing The participants of the pre-test may get awareness about the next experiment. The treatment may not be effective without the pre-test. Students who got failed in the pre-final exam are likely to get passed in the final exams; they might be more confident and conscious than earlier.
Selection of participants A group of participants selected with specific characteristics and the treatment of the experiment may work only on the participants possessing those characteristics If an experiment is conducted specifically on the health issues of pregnant women, the same treatment cannot be given to male participants.

How to Assess Reliability and Validity?

Reliability can be measured by comparing the consistency of the procedure and its results. There are various methods to measure validity and reliability. Reliability can be measured through  various statistical methods  depending on the types of validity, as explained below:

Types of Reliability

Type of reliability What does it measure? Example
Test-Retests It measures the consistency of the results at different points of time. It identifies whether the results are the same after repeated measures. Suppose a questionnaire is distributed among a group of people to check the quality of a skincare product and repeated the same questionnaire with many groups. If you get the same response from a various group of participants, it means the validity of the questionnaire and product is high as it has high test-retest reliability.
Inter-Rater It measures the consistency of the results at the same time by different raters (researchers) Suppose five researchers measure the academic performance of the same student by incorporating various questions from all the academic subjects and submit various results. It shows that the questionnaire has low inter-rater reliability.
Parallel Forms It measures Equivalence. It includes different forms of the same test performed on the same participants. Suppose the same researcher conducts the two different forms of tests on the same topic and the same students. The tests could be written and oral tests on the same topic. If results are the same, then the parallel-forms reliability of the test is high; otherwise, it’ll be low if the results are different.
Inter-Term It measures the consistency of the measurement. The results of the same tests are split into two halves and compared with each other. If there is a lot of difference in results, then the inter-term reliability of the test is low.

Types of Validity

As we discussed above, the reliability of the measurement alone cannot determine its validity. Validity is difficult to be measured even if the method is reliable. The following type of tests is conducted for measuring validity. 

Type of reliability What does it measure? Example
Content validity It shows whether all the aspects of the test/measurement are covered. A language test is designed to measure the writing and reading skills, listening, and speaking skills. It indicates that a test has high content validity.
Face validity It is about the validity of the appearance of a test or procedure of the test. The type of   included in the question paper, time, and marks allotted. The number of questions and their categories. Is it a good question paper to measure the academic performance of students?
Construct validity It shows whether the test is measuring the correct construct (ability/attribute, trait, skill) Is the test conducted to measure communication skills is actually measuring communication skills?
Criterion validity It shows whether the test scores obtained are similar to other measures of the same concept. The results obtained from a prefinal exam of graduate accurately predict the results of the later final exam. It shows that the test has high criterion validity.

Does your Research Methodology Have the Following?

  • Great Research/Sources
  • Perfect Language
  • Accurate Sources

If not, we can help. Our panel of experts makes sure to keep the 3 pillars of Research Methodology strong.

Does your Research Methodology Have the Following?

How to Increase Reliability?

  • Use an appropriate questionnaire to measure the competency level.
  • Ensure a consistent environment for participants
  • Make the participants familiar with the criteria of assessment.
  • Train the participants appropriately.
  • Analyse the research items regularly to avoid poor performance.

How to Increase Validity?

Ensuring Validity is also not an easy job. A proper functioning method to ensure validity is given below:

  • The reactivity should be minimised at the first concern.
  • The Hawthorne effect should be reduced.
  • The respondents should be motivated.
  • The intervals between the pre-test and post-test should not be lengthy.
  • Dropout rates should be avoided.
  • The inter-rater reliability should be ensured.
  • Control and experimental groups should be matched with each other.

How to Implement Reliability and Validity in your Thesis?

According to the experts, it is helpful if to implement the concept of reliability and Validity. Especially, in the thesis and the dissertation, these concepts are adopted much. The method for implementation given below:

Segments Explanation
All the planning about reliability and validity will be discussed here, including the chosen samples and size and the techniques used to measure reliability and validity.
Please talk about the level of reliability and validity of your results and their influence on values.
Discuss the contribution of other researchers to improve reliability and validity.

Frequently Asked Questions

What is reliability and validity in research.

Reliability in research refers to the consistency and stability of measurements or findings. Validity relates to the accuracy and truthfulness of results, measuring what the study intends to. Both are crucial for trustworthy and credible research outcomes.

What is validity?

Validity in research refers to the extent to which a study accurately measures what it intends to measure. It ensures that the results are truly representative of the phenomena under investigation. Without validity, research findings may be irrelevant, misleading, or incorrect, limiting their applicability and credibility.

What is reliability?

Reliability in research refers to the consistency and stability of measurements over time. If a study is reliable, repeating the experiment or test under the same conditions should produce similar results. Without reliability, findings become unpredictable and lack dependability, potentially undermining the study’s credibility and generalisability.

What is reliability in psychology?

In psychology, reliability refers to the consistency of a measurement tool or test. A reliable psychological assessment produces stable and consistent results across different times, situations, or raters. It ensures that an instrument’s scores are not due to random error, making the findings dependable and reproducible in similar conditions.

What is test retest reliability?

Test-retest reliability assesses the consistency of measurements taken by a test over time. It involves administering the same test to the same participants at two different points in time and comparing the results. A high correlation between the scores indicates that the test produces stable and consistent results over time.

How to improve reliability of an experiment?

  • Standardise procedures and instructions.
  • Use consistent and precise measurement tools.
  • Train observers or raters to reduce subjective judgments.
  • Increase sample size to reduce random errors.
  • Conduct pilot studies to refine methods.
  • Repeat measurements or use multiple methods.
  • Address potential sources of variability.

What is the difference between reliability and validity?

Reliability refers to the consistency and repeatability of measurements, ensuring results are stable over time. Validity indicates how well an instrument measures what it’s intended to measure, ensuring accuracy and relevance. While a test can be reliable without being valid, a valid test must inherently be reliable. Both are essential for credible research.

Are interviews reliable and valid?

Interviews can be both reliable and valid, but they are susceptible to biases. The reliability and validity depend on the design, structure, and execution of the interview. Structured interviews with standardised questions improve reliability. Validity is enhanced when questions accurately capture the intended construct and when interviewer biases are minimised.

Are IQ tests valid and reliable?

IQ tests are generally considered reliable, producing consistent scores over time. Their validity, however, is a subject of debate. While they effectively measure certain cognitive skills, whether they capture the entirety of “intelligence” or predict success in all life areas is contested. Cultural bias and over-reliance on tests are also concerns.

Are questionnaires reliable and valid?

Questionnaires can be both reliable and valid if well-designed. Reliability is achieved when they produce consistent results over time or across similar populations. Validity is ensured when questions accurately measure the intended construct. However, factors like poorly phrased questions, respondent bias, and lack of standardisation can compromise their reliability and validity.

You May Also Like

In historical research, a researcher collects and analyse the data, and explain the events that occurred in the past to test the truthfulness of observations.

A survey includes questions relevant to the research topic. The participants are selected, and the questionnaire is distributed to collect the data.

Inductive and deductive reasoning takes into account assumptions and incidents. Here is all you need to know about inductive vs deductive reasoning.

USEFUL LINKS

LEARNING RESOURCES

researchprospect-reviews-trust-site

COMPANY DETAILS

Research-Prospect-Writing-Service

  • How It Works
  • What is the difference between reliability and validity?

Last updated

12 February 2023

Reviewed by

Cathy Heath

Short on time? Get an AI generated summary of this article instead

Make research less tedious

Dovetail streamlines research to help you uncover and share actionable insights

  • What is reliability?

Reliability indicates the extent to which the results of an experiment can be replicated when it is performed multiple times. Effective research and experiments should produce similar results over time when performed by other people, as long as instructions and conditions (methodology) are followed correctly or in the same manner each time. 

Experiments that generate unusually large differences in results should be questioned. Just because an experiment can be reproduced correctly doesn’t make the results or outcomes reliable. 

  • What is validity?

Validity indicates the extent to which your research usefully and accurately measures what you are trying to measure and how that stacks up with other established concepts.  Validity can be harder to determine than reliability, but a high level of reliability assists in proving that your research is valid.  

Reliability and validity provide slightly different indications about the overall quality of your research and whether the data you obtain can accurately be used in service of your reason for doing the experiment. It is possible for an experiment to be reliable but not valid, or valid but not reliable, but the most accurate experiments produce strong, consistent results in both categories. 

  • Why is this difference important?

Experiments that cannot be replicated with similar results and those that do not adequately address the question they were designed to solve produce results that should not be used for further research or as a basis for important decisions and/or policies. Since reliability and validity are different, quality research should reflect both of these principles.

  • How are reliability and validity assessed?

Several types of assessments can be used to determine whether the information you gather is reliable and valid. Here are some of the most common tests used to assess the reliability and validity of research.

  • Types of reliability

Test-retest reliability, internal reliability, and inter-rater reliability are among the most common types of reliability assessment. Each option can be used to help you determine the overall accuracy and consistency of your research. External reliability, parallel forms reliability, or other assessment options may also be appropriate, depending on the nature of the research you are conducting.

Each of these reliability types looks at a different factor that may affect the outcome of your research. This means that it is generally a good idea to use a combination of tests to create the most complete picture of the reliability of your data. 

Internal reliability

Many complex experiments include two or more components that are intended to measure the same type of data. This technique, which is known as internal reliability, assesses whether elements that are supposed to produce the same results do so consistently. Strong internal reliability increases the overall confidence that your data is accurate, consistent, and replicable. 

External reliability

External reliability indicates how consistent a type of measure is over a period of time or with different types of survey conditions, such as different individuals, and how successfully this can be generalized. Reliability can be determined if standard operating procedures (SOPs) are used to manage the way research is conducted so it can be reproduced. 

Test-retest reliability

Test-retest reliability measures how well tests produce similar results over time. This type of assessment provides insights into whether your experiment yields stable, consistent data when it is repeated multiple times externally, including at a later date. 

Inter-rater reliability

Well-designed tests should produce similar results regardless of who is performing them. Tests that result in vastly different data when different types of individuals are used in an experiment, for example, can indicate that there is too much variation in the equipment or tools the researchers are using. Other possibilities are that the instructions are not clear enough to be followed exactly how the experiment's designers intended them to be.

  • Types of validity

Likewise, there are several types of validity assessments that can be combined to better understand how well your data represents what you want it to represent. 

Some types of validity that may be used include: 

Convergent 

Concurrent  

Construct  

Criterion  

Predictive 

  • How to ensure reliability and validity in your research

Ensuring reliability and validity is a crucial step in knowing that the information your research provides is accurate, consistent, and valuable. Making sure that methods are applied consistently and taking steps to conduct research in conditions that are as similar as possible can help ensure your research is reliable.

In addition, choosing appropriate sampling, measurement tools, and methods can help make sure your research is valid. Being vigilant about ensuring reliability and validity in your research from the beginning can help you get the most out of the time, money, and other resources that are used to conduct it. 

  • Can something be reliable but not valid or valid but not reliable?

It is possible for your research to fit into one category but not the other. Your data may show consistent results for something other than the question you are actually studying, or inconsistent results that do address your question but indicate that another issue is skewing your data.

Information you gain from research that is only reliable or only valid can form a helpful starting point in continuing to develop your research, but it should not be used as your final results.

Should you be using a customer insights hub?

Do you want to discover previous research faster?

Do you share your research findings with others?

Do you analyze research data?

Start for free today, add your research, and get to key insights faster

Editor’s picks

Last updated: 18 April 2023

Last updated: 27 February 2023

Last updated: 5 February 2023

Last updated: 16 April 2023

Last updated: 16 August 2024

Last updated: 9 March 2023

Last updated: 30 April 2024

Last updated: 12 December 2023

Last updated: 11 March 2024

Last updated: 4 July 2024

Last updated: 6 March 2024

Last updated: 5 March 2024

Last updated: 13 May 2024

Latest articles

Related topics, .css-je19u9{-webkit-align-items:flex-end;-webkit-box-align:flex-end;-ms-flex-align:flex-end;align-items:flex-end;display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-flex-direction:row;-ms-flex-direction:row;flex-direction:row;-webkit-box-flex-wrap:wrap;-webkit-flex-wrap:wrap;-ms-flex-wrap:wrap;flex-wrap:wrap;-webkit-box-pack:center;-ms-flex-pack:center;-webkit-justify-content:center;justify-content:center;row-gap:0;text-align:center;max-width:671px;}@media (max-width: 1079px){.css-je19u9{max-width:400px;}.css-je19u9>span{white-space:pre;}}@media (max-width: 799px){.css-je19u9{max-width:400px;}.css-je19u9>span{white-space:pre;}} decide what to .css-1kiodld{max-height:56px;display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}@media (max-width: 1079px){.css-1kiodld{display:none;}} build next, decide what to build next, log in or sign up.

Get started for free

reliability and validity in research difference

Validity vs. Reliability in Research: What's the Difference?

reliability and validity in research difference

Introduction

What is the difference between reliability and validity in a study, what is an example of reliability and validity, how to ensure validity and reliability in your research, critiques of reliability and validity.

In research, validity and reliability are crucial for producing robust findings. They provide a foundation that assures scholars, practitioners, and readers alike that the research's insights are both accurate and consistent. However, the nuanced nature of qualitative data often blurs the lines between these concepts, making it imperative for researchers to discern their distinct roles.

This article seeks to illuminate the intricacies of reliability and validity, highlighting their significance and distinguishing their unique attributes. By understanding these critical facets, qualitative researchers can ensure their work not only resonates with authenticity but also trustworthiness.

reliability and validity in research difference

In the domain of research, whether qualitative or quantitative , two concepts often arise when discussing the quality and rigor of a study: reliability and validity . These two terms, while interconnected, have distinct meanings that hold significant weight in the world of research.

Reliability, at its core, speaks to the consistency of a study. If a study or test measures the same concept repeatedly and yields the same results, it demonstrates a high degree of reliability. A common method for assessing reliability is through internal consistency reliability, which checks if multiple items that measure the same concept produce similar scores.

Another method often used is inter-rater reliability , which gauges the consistency of scores given by different raters. This approach is especially amenable to qualitative research , and it can help researchers assess the clarity of their code system and the consistency of their codings . For a study to be more dependable, it's imperative to ensure a sufficient measurement of reliability is achieved.

On the other hand, validity is concerned with accuracy. It looks at whether a study truly measures what it claims to. Within the realm of validity, several types exist. Construct validity, for instance, verifies that a study measures the intended abstract concept or underlying construct. If a research aims to measure self-esteem and accurately captures this abstract trait, it demonstrates strong construct validity.

Content validity ensures that a test or study comprehensively represents the entire domain of the concept it seeks to measure. For instance, if a test aims to assess mathematical ability, it should cover arithmetic, algebra, geometry, and more to showcase strong content validity.

Criterion validity is another form of validity that ensures that the scores from a test correlate well with a measure from a related outcome. A subset of this is predictive validity, which checks if the test can predict future outcomes. For instance, if an aptitude test can predict future job performance, it can be said to have high predictive validity.

The distinction between reliability and validity becomes clear when one considers the nature of their focus. While reliability is concerned with consistency and reproducibility, validity zeroes in on accuracy and truthfulness.

A research tool can be reliable without being valid. For instance, faulty instrument measures might consistently give bad readings (reliable but not valid). Conversely, in discussions about test reliability, the same test measure administered multiple times could sometimes hit the mark and at other times miss it entirely, producing different test scores each time. This would make it valid in some instances but not reliable.

For a study to be robust, it must achieve both reliability and validity. Reliability ensures the study's findings are reproducible while validity confirms that it accurately represents the phenomena it claims to. Ensuring both in a study means the results are both dependable and accurate, forming a cornerstone for high-quality research.

reliability and validity in research difference

Efficient, easy data analysis with ATLAS.ti

Start analyzing data quickly and more deeply with ATLAS.ti. Download a free trial today.

Understanding the nuances of reliability and validity becomes clearer when contextualized within a real-world research setting. Imagine a qualitative study where a researcher aims to explore the experiences of teachers in urban schools concerning classroom management. The primary method of data collection is semi-structured interviews .

To ensure the reliability of this qualitative study, the researcher crafts a consistent list of open-ended questions for the interview. This ensures that, while each conversation might meander based on the individual’s experiences, there remains a core set of topics related to classroom management that every participant addresses.

The essence of reliability in this context isn't necessarily about garnering identical responses but rather about achieving a consistent approach to data collection and subsequent interpretation . As part of this commitment to reliability, two researchers might independently transcribe and analyze a subset of these interviews. If they identify similar themes and patterns in their independent analyses, it suggests a consistent interpretation of the data, showcasing inter-rater reliability .

Validity , on the other hand, is anchored in ensuring that the research genuinely captures and represents the lived experiences and sentiments of teachers concerning classroom management. To establish content validity, the list of interview questions is thoroughly reviewed by a panel of educational experts. Their feedback ensures that the questions encompass the breadth of issues and concerns related to classroom management in urban school settings.

As the interviews are conducted, the researcher pays close attention to the depth and authenticity of responses. After the interviews, member checking could be employed, where participants review the researcher's interpretation of their responses to ensure that their experiences and perspectives have been accurately captured. This strategy helps in affirming the study's construct validity, ensuring that the abstract concept of "experiences with classroom management" has been truthfully and adequately represented.

In this example, we can see that while the interview study is rooted in qualitative methods and subjective experiences, the principles of reliability and validity can still meaningfully inform the research process. They serve as guides to ensure the research's findings are both dependable and genuinely reflective of the participants' experiences.

Ensuring validity and reliability in research, irrespective of its qualitative or quantitative nature, is pivotal to producing results that are both trustworthy and robust. Here's how you can integrate these concepts into your study to ensure its rigor:

Reliability is about consistency. One of the most straightforward ways to gauge it in quantitative research is using test-retest reliability. It involves administering the same test to the same group of participants on two separate occasions and then comparing the results.

A high degree of similarity between the two sets of results indicates good reliability. This can often be measured using a correlation coefficient, where a value closer to 1 indicates a strong positive consistency between the two test iterations.

Validity, on the other hand, ensures that the research genuinely measures what it intends to. There are various forms of validity to consider. Convergent validity ensures that two measures of the same construct or those that should theoretically be related, are indeed correlated. For example, two different measures assessing self-esteem should show similar results for the same group, highlighting that they are measuring the same underlying construct.

Face validity is the most basic form of validity and is gauged by the sheer appearance of the measurement tool. If, at face value, a test seems like it measures what it claims to, it has face validity. This is often the first step and is usually followed by more rigorous forms of validity testing.

Criterion-related validity, a subtype of the previously discussed criterion validity , evaluates how well the outcomes of a particular test or measurement correlate with another related measure. For example, if a new tool is developed to measure reading comprehension, its results can be compared with those of an established reading comprehension test to assess its criterion-related validity. If the results show a strong correlation, it's a sign that the new tool is valid.

Ensuring both validity and reliability requires deliberate planning, meticulous testing, and constant reflection on the study's methods and results. This might involve using established scales or measures with proven validity and reliability, conducting pilot studies to refine measurement tools, and always staying cognizant of the fact that these two concepts are important considerations for research robustness.

While reliability and validity are foundational concepts in many traditional research paradigms, they have not escaped scrutiny, especially from critical and poststructuralist perspectives. These critiques often arise from the fundamental philosophical differences in how knowledge, truth, and reality are perceived and constructed.

From a poststructuralist viewpoint, the very pursuit of a singular "truth" or an objective reality is questionable. In such a perspective, multiple truths exist, each shaped by its own socio-cultural, historical, and individual contexts.

Reliability, with its emphasis on consistent replication, might then seem at odds with this understanding. If truths are multiple and shifting, how can consistency across repeated measures or observations be a valid measure of anything other than the research instrument's stability?

Validity, too, faces critique. In seeking to ensure that a study measures what it purports to measure, there's an implicit assumption of an observable, knowable reality. Poststructuralist critiques question this foundation, arguing that reality is too fluid, multifaceted, and influenced by power dynamics to be pinned down by any singular measurement or representation.

Moreover, the very act of determining "validity" often requires an external benchmark or "gold standard." This brings up the issue of who determines this standard and the power dynamics and potential biases inherent in such decisions.

Another point of contention is the way these concepts can inadvertently prioritize certain forms of knowledge over others. For instance, privileging research that meets stringent reliability and validity criteria might marginalize more exploratory, interpretive, or indigenous research methods. These methods, while offering deep insights, might not align neatly with traditional understandings of reliability and validity, potentially relegating them to the periphery of "accepted" knowledge production.

To be sure, reliability and validity serve as guiding principles in many research approaches. However, it's essential to recognize their limitations and the critiques posed by alternative epistemologies. Engaging with these critiques doesn't diminish the value of reliability and validity but rather enriches our understanding of the multifaceted nature of knowledge and the complexities of its pursuit.

reliability and validity in research difference

A rigorous research process begins with ATLAS.ti

Download a free trial of our powerful data analysis software to make the most of your research.

reliability and validity in research difference

  • Reliability vs Validity in Research: Types & Examples

busayo.longe

In everyday life, we probably use reliability to describe how something is valid. However, in research and testing, reliability and validity are not the same things.

When it comes to data analysis, reliability refers to how easily replicable an outcome is. For example, if you measure a cup of rice three times, and you get the same result each time, that result is reliable.

The validity, on the other hand, refers to the measurement’s accuracy. This means that if the standard weight for a cup of rice is 5 grams, and you measure a cup of rice, it should be 5 grams.

So, while reliability and validity are intertwined, they are not synonymous. If one of the measurement parameters, such as your scale, is distorted, the results will be consistent but invalid.

Data must be consistent and accurate to be used to draw useful conclusions. In this article, we’ll look at how to assess data reliability and validity, as well as how to apply it.

Read: Internal Validity in Research: Definition, Threats, Examples

What is Reliability?

When a measurement is consistent it’s reliable. But of course, reliability doesn’t mean your outcome will be the same, it just means it will be in the same range. 

For example, if you scored 95% on a test the first time and the next you score, 96%, your results are reliable.  So, even if there is a minor difference in the outcomes, as long as it is within the error margin, your results are reliable.

Reliability allows you to assess the degree of consistency in your results. So, if you’re getting similar results, reliability provides an answer to the question of how similar your results are.

What is Validity?

A measurement or test is valid when it correlates with the expected result. It examines the accuracy of your result.

Here’s where things get tricky: to establish the validity of a test, the results must be consistent. Looking at most experiments (especially physical measurements), the standard value that establishes the accuracy of a measurement is the outcome of repeating the test to obtain a consistent result.

Read: What is Participant Bias? How to Detect & Avoid It

For example, before I can conclude that all 12-inch rulers are one foot, I must repeat the experiment several times and obtain very similar results, indicating that 12-inch rulers are indeed one foot.

Most scientific experiments are inextricably linked in terms of validity and reliability. For example, if you’re measuring distance or depth, valid answers are likely to be reliable.

But for social experiences, one isn’t the indication of the other. For example, most people believe that people that wear glasses are smart. 

Of course, I’ll find examples of people who wear glasses and have high IQs (reliability), but the truth is that most people who wear glasses simply need their vision to be better (validity). 

So reliable answers aren’t always correct but valid answers are always reliable.

How Are Reliability and Validity Assessed?

When assessing reliability, we want to know if the measurement can be replicated. Of course, we’d have to change some variables to ensure that this test holds, the most important of which are time, items, and observers.

If the main factor you change when performing a reliability test is time, you’re performing a test-retest reliability assessment.

Read: What is Publication Bias? (How to Detect & Avoid It)

However, if you are changing items, you are performing an internal consistency assessment. It means you’re measuring multiple items with a single instrument.

Finally, if you’re measuring the same item with the same instrument but using different observers or judges, you’re performing an inter-rater reliability test.

Assessing Validity

Evaluating validity can be more tedious than reliability. With reliability, you’re attempting to demonstrate that your results are consistent, whereas, with validity, you want to prove the correctness of your outcome.

Although validity is mainly categorized under two sections (internal and external), there are more than fifteen ways to check the validity of a test. In this article, we’ll be covering four.

First, content validity, measures whether the test covers all the content it needs to provide the outcome you’re expecting. 

Suppose I wanted to test the hypothesis that 90% of Generation Z uses social media polls for surveys while 90% of millennials use forms. I’d need a sample size that accounts for how Gen Z and millennials gather information.

Next, criterion validity is when you compare your results to what you’re supposed to get based on a chosen criteria. There are two ways these could be measured, predictive or concurrent validity.

Read: Survey Errors To Avoid: Types, Sources, Examples, Mitigation

Following that, we have face validity . It’s how we anticipate a test to be. For instance, when answering a customer service survey, I’d expect to be asked about how I feel about the service provided.

Lastly, construct-related validity . This is a little more complicated, but it helps to show how the validity of research is based on different findings.

As a result, it provides information that either proves or disproves that certain things are related.

Types of Reliability

We have three main types of reliability assessment and here’s how they work:

1) Test-retest Reliability

This assessment refers to the consistency of outcomes over time. Testing reliability over time does not imply changing the amount of time it takes to conduct an experiment; rather, it means repeating the experiment multiple times in a short time.

For example, if I measure the length of my hair today, and tomorrow, I’ll most likely get the same result each time. 

A short period is relative in terms of reliability; two days for measuring hair length is considered short. But that’s far too long to test how quickly water dries on the sand.

A test-retest correlation is used to compare the consistency of your results. This is typically a scatter plot that shows how similar your values are between the two experiments.

If your answers are reliable, your scatter plots will most likely have a lot of overlapping points, but if they aren’t, the points (values) will be spread across the graph.

Read: Sampling Bias: Definition, Types + [Examples]

2) Internal Consistency

It’s also known as internal reliability. It refers to the consistency of results for various items when measured on the same scale.

This is particularly important in social science research, such as surveys, because it helps determine the consistency of people’s responses when asked the same questions.

Most introverts, for example, would say they enjoy spending time alone and having few friends. However, if some introverts claim that they either do not want time alone or prefer to be surrounded by many friends, it doesn’t add up.

These people who claim to be introverts or one this factor isn’t a reliable way of measuring introversion.

Internal reliability helps you prove the consistency of a test by varying factors. It’s a little tough to measure quantitatively but you could use the split-half correlation .

The split-half correlation simply means dividing the factors used to measure the underlying construct into two and plotting them against each other in the form of a scatter plot.

Introverts, for example, are assessed on their need for alone time as well as their desire to have as few friends as possible. If this plot is dispersed, likely, one of the traits does not indicate introversion.

3) Inter-Rater Reliability

This method of measuring reliability helps prevent personal bias. Inter-rater reliability assessment helps judge outcomes from the different perspectives of multiple observers.

A good example is if you ordered a meal and found it delicious. You could be biased in your judgment for several reasons, perception of the meal, your mood, and so on.

But it’s highly unlikely that six more people would agree that the meal is delicious if it isn’t. Another factor that could lead to bias is expertise. Professional dancers, for example, would perceive dance moves differently than non-professionals. 

Read: What is Experimenter Bias? Definition, Types & Mitigation

So, if a person dances and records it, and both groups (professional and unprofessional dancers) rate the video, there is a high likelihood of a significant difference in their ratings.

But if they both agree that the person is a great dancer, despite their opposing viewpoints, the person is likely a great dancer.

Types of Validity

Researchers use validity to determine whether a measurement is accurate or not. The accuracy of measurement is usually determined by comparing it to the standard value.

When a measurement is consistent over time and has high internal consistency, it increases the likelihood that it is valid.

1) Content Validity

This refers to determining validity by evaluating what is being measured. So content validity tests if your research is measuring everything it should to produce an accurate result.

For example, if I were to measure what causes hair loss in women. I’d have to consider things like postpartum hair loss, alopecia, hair manipulation, dryness, and so on.

By omitting any of these critical factors, you risk significantly reducing the validity of your research because you won’t be covering everything necessary to make an accurate deduction. 

Read: Data Cleaning: 7 Techniques + Steps to Cleanse Data

For example, a certain woman is losing her hair due to postpartum hair loss, excessive manipulation, and dryness, but in my research, I only look at postpartum hair loss. My research will show that she has postpartum hair loss, which isn’t accurate.

Yes, my conclusion is correct, but it does not fully account for the reasons why this woman is losing her hair.

2) Criterion Validity

This measures how well your measurement correlates with the variables you want to compare it with to get your result. The two main classes of criterion validity are predictive and concurrent.

3) Predictive validity

It helps predict future outcomes based on the data you have. For example, if a large number of students performed exceptionally well in a test, you can use this to predict that they understood the concept on which the test was based and will perform well in their exams.

4) Concurrent validity

On the other hand, involves testing with different variables at the same time. For example, setting up a literature test for your students on two different books and assessing them at the same time.

You’re measuring your students’ literature proficiency with these two books. If your students truly understood the subject, they should be able to correctly answer questions about both books.

5) Face Validity

Quantifying face validity might be a bit difficult because you are measuring the perception validity, not the validity itself. So, face validity is concerned with whether the method used for measurement will produce accurate results rather than the measurement itself.

If the method used for measurement doesn’t appear to test the accuracy of a measurement, its face validity is low.

Here’s an example: less than 40% of men over the age of 20 in Texas, USA, are at least 6 feet tall. The most logical approach would be to collect height data from men over the age of twenty in Texas, USA.

However, asking men over the age of 20 what their favorite meal is to determine their height is pretty bizarre. The method I am using to assess the validity of my research is quite questionable because it lacks correlation to what I want to measure.

6) Construct-Related Validity

Construct-related validity assesses the accuracy of your research by collecting multiple pieces of evidence. It helps determine the validity of your results by comparing them to evidence that supports or refutes your measurement.

7) Convergent validity

If you’re assessing evidence that strongly correlates with the concept, that’s convergent validity . 

8) Discriminant validity

Examines the validity of your research by determining what not to base it on. You are removing elements that are not a strong factor to help validate your research. Being a vegan, for example, does not imply that you are allergic to meat.

How to Ensure Validity and Reliability in Your Research

You need a bulletproof research design to ensure that your research is both valid and reliable. This means that your methods, sample, and even you, the researcher, shouldn’t be biased.

  • Ensuring Reliability

To enhance the reliability of your research, you need to apply your measurement method consistently. The chances of reproducing the same results for a test are higher when you maintain the method you’re using to experiment.

For example, you want to determine the reliability of the weight of a bag of chips using a scale. You have to consistently use this scale to measure the bag of chips each time you experiment.

You must also keep the conditions of your research consistent. For instance, if you’re experimenting to see how quickly water dries on sand, you need to consider all of the weather elements that day.

So, if you experimented on a sunny day, the next experiment should also be conducted on a sunny day to obtain a reliable result.

Read: Survey Methods: Definition, Types, and Examples
  • Ensuring Validity

There are several ways to determine the validity of your research, and the majority of them require the use of highly specific and high-quality measurement methods.

Before you begin your test, choose the best method for producing the desired results. This method should be pre-existing and proven.

Also, your sample should be very specific. If you’re collecting data on how dogs respond to fear, your results are more likely to be valid if you base them on a specific breed of dog rather than dogs in general.

Validity and reliability are critical for achieving accurate and consistent results in research. While reliability does not always imply validity, validity establishes that a result is reliable. Validity is heavily dependent on previous results (standards), whereas reliability is dependent on the similarity of your results.

Logo

Connect to Formplus, Get Started Now - It's Free!

  • concurrent validity
  • examples of research bias
  • predictive reliability
  • research analysis
  • research assessment
  • validity of research
  • busayo.longe

Formplus

You may also like:

Research Bias: Definition, Types + Examples

Simple guide to understanding research bias, types, causes, examples and how to avoid it in surveys

reliability and validity in research difference

Simpson’s Paradox & How to Avoid it in Experimental Research

In this article, we are going to look at Simpson’s Paradox from its historical point and later, we’ll consider its effect in...

How to do a Meta Analysis: Methodology, Pros & Cons

In this article, we’ll go through the concept of meta-analysis, what it can be used for, and how you can use it to improve how you...

Selection Bias in Research: Types, Examples & Impact

In this article, we’ll discuss the effects of selection bias, how it works, its common effects and the best ways to minimize it.

Formplus - For Seamless Data Collection

Collect data the right way with a versatile data collection tool. try formplus and transform your work productivity today..

  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • QuestionPro

survey software icon

  • Solutions Industries Gaming Automotive Sports and events Education Government Travel & Hospitality Financial Services Healthcare Cannabis Technology Use Case AskWhy Communities Audience Contactless surveys Mobile LivePolls Member Experience GDPR Positive People Science 360 Feedback Surveys
  • Resources Blog eBooks Survey Templates Case Studies Training Help center

reliability and validity in research difference

Home Market Research

Reliability vs. Validity in Research: Types & Examples

Explore how reliability vs validity in research determines quality. Learn the differences and types + examples. Get insights!

When it comes to research, getting things right is crucial. That’s where the concepts of “Reliability vs Validity in Research” come in. 

Imagine it like a balancing act – making sure your measurements are consistent and accurate at the same time. This is where test-retest reliability, having different researchers check things, and keeping things consistent within your research plays a big role. 

As we dive into this topic, we’ll uncover the differences between reliability and validity, see how they work together, and learn how to use them effectively.

Understanding Reliability vs. Validity in Research

When it comes to collecting data and conducting research, two crucial concepts stand out: reliability and validity. 

These pillars uphold the integrity of research findings, ensuring that the data collected and the conclusions drawn are both meaningful and trustworthy. Let’s dive into the heart of the concepts, reliability, and validity, to comprehend their significance in the realm of research truly.

What is reliability?

Reliability refers to the consistency and dependability of the data collection process. It’s like having a steady hand that produces the same result each time it reaches for a task. 

In the research context, reliability is all about ensuring that if you were to repeat the same study using the same reliable measurement technique, you’d end up with the same results. It’s like having multiple researchers independently conduct the same experiment and getting outcomes that align perfectly.

Imagine you’re using a thermometer to measure the temperature of the water. You have a reliable measurement if you dip the thermometer into the water multiple times and get the same reading each time. This tells you that your method and measurement technique consistently produce the same results, whether it’s you or another researcher performing the measurement.

What is validity?

On the other hand, validity refers to the accuracy and meaningfulness of your data. It’s like ensuring that the puzzle pieces you’re putting together actually form the intended picture. When you have validity, you know that your method and measurement technique are consistent and capable of producing results aligned with reality.

Think of it this way; Imagine you’re conducting a test that claims to measure a specific trait, like problem-solving ability. If the test consistently produces results that accurately reflect participants’ problem-solving skills, then the test has high validity. In this case, the test produces accurate results that truly correspond to the trait it aims to measure.

In essence, while reliability assures you that your data collection process is like a well-oiled machine producing the same results, validity steps in to ensure that these results are not only consistent but also relevantly accurate. 

Together, these concepts provide researchers with the tools to conduct research that stands on a solid foundation of dependable methods and meaningful insights.

Types of Reliability

Let’s explore the various types of reliability that researchers consider to ensure their work stands on solid ground.

High test-retest reliability

Test-retest reliability involves assessing the consistency of measurements over time. It’s like taking the same measurement or test twice – once and then again after a certain period. If the results align closely, it indicates that the measurement is reliable over time. Think of it as capturing the essence of stability. 

Inter-rater reliability

When multiple researchers or observers are part of the equation, interrater reliability comes into play. This type of reliability assesses the level of agreement between different observers when evaluating the same phenomenon. It’s like ensuring that different pairs of eyes perceive things in a similar way. 

Internal reliability

Internal consistency dives into the harmony among different items within a measurement tool aiming to assess the same concept. This often comes into play in surveys or questionnaires, where participants respond to various items related to a single construct. If the responses to these items consistently reflect the same underlying concept, the measurement is said to have high internal consistency. 

Types of validity

Let’s explore the various types of validity that researchers consider to ensure their work stands on solid ground.

Content validity

It delves into whether a measurement truly captures all dimensions of the concept it intends to measure. It’s about making sure your measurement tool covers all relevant aspects comprehensively. 

Imagine designing a test to assess students’ understanding of a history chapter. It exhibits high content validity if the test includes questions about key events, dates, and causes. However, if it focuses solely on dates and omits causation, its content validity might be questionable.

Construct validity

It assesses how well a measurement aligns with established theories and concepts. It’s like ensuring that your measurement is a true representation of the abstract construct you’re trying to capture. 

Criterion validity

Criterion validity examines how well your measurement corresponds to other established measurements of the same concept. It’s about making sure your measurement accurately predicts or correlates with external criteria.

Differences between reliability and validity in research

Let’s delve into the differences between reliability and validity in research.

NoCategoryReliabilityValidity
01MeaningFocuses on the consistency of measurements over time and conditions.Concerns about the accuracy and relevance of measurements in capturing the intended concept.
02What it assessesAssesses whether the same results can be obtained consistently from repeated measurements.Assesses whether measurements truly measure what they are intended to measure.
03Assessment methodsEvaluated through test-retest consistency, interrater agreement, and internal consistency.Assessed through content coverage, construct alignment, and criterion correlation.
04InterrelationA measurement can be reliable (consistent) without being valid (accurate).A valid measurement is typically reliable, but high reliability doesn’t guarantee validity.
05ImportanceEnsures data consistency and replicabilityGuarantees meaningful and credible results.
06FocusFocuses on the stability and consistency of measurement outcomes.Focuses on the meaningfulness and accuracy of measurement outcomes.
07OutcomeReproducibility of measurements is the key outcome.Meaningful and accurate measurement outcomes are the primary goal.

While both reliability and validity contribute to trustworthy research, they address distinct aspects. Reliability ensures consistent results, while validity ensures accurate and relevant results that reflect the true nature of the measured concept.

Example of Reliability and Validity in Research

In this section, we’ll explore instances that highlight the differences between reliability and validity and how they play a crucial role in ensuring the credibility of research findings.

Example of reliability

Imagine you are studying the reliability of a smartphone’s battery life measurement. To collect data, you fully charge the phone and measure the battery life three times in the same controlled environment—same apps running, same brightness level, and same usage patterns. 

If the measurements consistently show a similar battery life duration each time you repeat the test, it indicates that your measurement method is reliable. The consistent results under the same conditions assure you that the battery life measurement can be trusted to provide dependable information about the phone’s performance.

Example of validity

Researchers collect data from a group of participants in a study aiming to assess the validity of a newly developed stress questionnaire. To ensure validity, they compare the scores obtained from the stress questionnaire with the participants’ actual stress levels measured using physiological indicators such as heart rate variability and cortisol levels. 

If participants’ scores correlate strongly with their physiological stress levels, the questionnaire is valid. This means the questionnaire accurately measures participants’ stress levels, and its results correspond to real variations in their physiological responses to stress. 

Validity assessed through the correlation between questionnaire scores and physiological measures ensures that the questionnaire is effectively measuring what it claims to measure participants’ stress levels.

In the world of research, differentiating between reliability and validity is crucial. Reliability ensures consistent results, while validity confirms accurate measurements. Using tools like QuestionPro enhances data collection for both reliability and validity. For instance, measuring self-esteem over time showcases reliability, and aligning questions with theories demonstrates validity. 

QuestionPro empowers researchers to achieve reliable and valid results through its robust features, facilitating credible research outcomes. Contact QuestionPro to create a free account or learn more!

LEARN MORE         FREE TRIAL

MORE LIKE THIS

reliability and validity in research difference

Customer Experience Lessons from 13,000 Feet — Tuesday CX Thoughts

Aug 20, 2024

insight

Insight: Definition & meaning, types and examples

Aug 19, 2024

employee loyalty

Employee Loyalty: Strategies for Long-Term Business Success 

Jotform vs SurveyMonkey

Jotform vs SurveyMonkey: Which Is Best in 2024

Aug 15, 2024

Other categories

  • Academic Research
  • Artificial Intelligence
  • Assessments
  • Brand Awareness
  • Case Studies
  • Communities
  • Consumer Insights
  • Customer effort score
  • Customer Engagement
  • Customer Experience
  • Customer Loyalty
  • Customer Research
  • Customer Satisfaction
  • Employee Benefits
  • Employee Engagement
  • Employee Retention
  • Friday Five
  • General Data Protection Regulation
  • Insights Hub
  • Life@QuestionPro
  • Market Research
  • Mobile diaries
  • Mobile Surveys
  • New Features
  • Online Communities
  • Question Types
  • Questionnaire
  • QuestionPro Products
  • Release Notes
  • Research Tools and Apps
  • Revenue at Risk
  • Survey Templates
  • Training Tips
  • Tuesday CX Thoughts (TCXT)
  • Uncategorized
  • What’s Coming Up
  • Workforce Intelligence
  • Privacy Policy

Research Method

Home » Reliability Vs Validity

Reliability Vs Validity

Table of Contents

Reliability Vs Validity

Reliability and validity are two important concepts in research that are used to evaluate the quality of measurement instruments or research studies.

Reliability

Reliability refers to the degree to which a measurement instrument or research study produces consistent and stable results over time, across different observers or raters, or under different conditions.

In other words, reliability is the extent to which a measurement instrument or research study produces results that are free from random error. A reliable measurement instrument or research study should produce similar results each time it is used or conducted, regardless of who is using it or conducting it.

Validity, on the other hand, refers to the degree to which a measurement instrument or research study accurately measures what it is supposed to measure or tests what it is supposed to test.

In other words, validity is the extent to which a measurement instrument or research study measures or tests what it claims to measure or test. A valid measurement instrument or research study should produce results that accurately reflect the concept or construct being measured or tested.

Difference Between Reliability Vs Validity

Here’s a comparison table that highlights the differences between reliability and validity:

ReliabilityValidity
The degree to which a measurement instrument or research study produces consistent and stable results over time, across different observers or raters, or under different conditions.The degree to which a measurement instrument or research study accurately measures what it is supposed to measure or tests what it is supposed to test.
Consistency and stability of resultsAccuracy and truthfulness of results
Test-retest reliability, inter-rater reliability, internal consistency reliabilityContent validity, criterion validity, construct validity
Degree of agreement or correlation between repeated measures or observersDegree of association between a measure and an external criterion, or degree to which a measure assesses the intended construct
A bathroom scale that consistently provides the same weight measurement when used multiple times in a rowA math test that measures only the math skills it is intended to test and not other factors, such as test-taking anxiety or language ability.

Also see Research Methods

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Descriptive Statistics vs Inferential Statistics

Descriptive vs Inferential Statistics – All Key...

External Validity

External Validity – Threats, Examples and Types

Market Research Vs Marketing Research

Market Research Vs Marketing Research

Reliability

Reliability – Types, Examples and Guide

Content Validity

Content Validity – Measurement and Examples

Face Validity

Face Validity – Methods, Types, Examples

reliability and validity in research difference

Validity & Reliability In Research

A Plain-Language Explanation (With Examples)

By: Derek Jansen (MBA) | Expert Reviewer: Kerryn Warren (PhD) | September 2023

Validity and reliability are two related but distinctly different concepts within research. Understanding what they are and how to achieve them is critically important to any research project. In this post, we’ll unpack these two concepts as simply as possible.

This post is based on our popular online course, Research Methodology Bootcamp . In the course, we unpack the basics of methodology  using straightfoward language and loads of examples. If you’re new to academic research, you definitely want to use this link to get 50% off the course (limited-time offer).

Overview: Validity & Reliability

  • The big picture
  • Validity 101
  • Reliability 101 
  • Key takeaways

First, The Basics…

First, let’s start with a big-picture view and then we can zoom in to the finer details.

Validity and reliability are two incredibly important concepts in research, especially within the social sciences. Both validity and reliability have to do with the measurement of variables and/or constructs – for example, job satisfaction, intelligence, productivity, etc. When undertaking research, you’ll often want to measure these types of constructs and variables and, at the simplest level, validity and reliability are about ensuring the quality and accuracy of those measurements .

As you can probably imagine, if your measurements aren’t accurate or there are quality issues at play when you’re collecting your data, your entire study will be at risk. Therefore, validity and reliability are very important concepts to understand (and to get right). So, let’s unpack each of them.

Free Webinar: Research Methodology 101

What Is Validity?

In simple terms, validity (also called “construct validity”) is all about whether a research instrument accurately measures what it’s supposed to measure .

For example, let’s say you have a set of Likert scales that are supposed to quantify someone’s level of overall job satisfaction. If this set of scales focused purely on only one dimension of job satisfaction, say pay satisfaction, this would not be a valid measurement, as it only captures one aspect of the multidimensional construct. In other words, pay satisfaction alone is only one contributing factor toward overall job satisfaction, and therefore it’s not a valid way to measure someone’s job satisfaction.

reliability and validity in research difference

Oftentimes in quantitative studies, the way in which the researcher or survey designer interprets a question or statement can differ from how the study participants interpret it . Given that respondents don’t have the opportunity to ask clarifying questions when taking a survey, it’s easy for these sorts of misunderstandings to crop up. Naturally, if the respondents are interpreting the question in the wrong way, the data they provide will be pretty useless . Therefore, ensuring that a study’s measurement instruments are valid – in other words, that they are measuring what they intend to measure – is incredibly important.

There are various types of validity and we’re not going to go down that rabbit hole in this post, but it’s worth quickly highlighting the importance of making sure that your research instrument is tightly aligned with the theoretical construct you’re trying to measure .  In other words, you need to pay careful attention to how the key theories within your study define the thing you’re trying to measure – and then make sure that your survey presents it in the same way.

For example, sticking with the “job satisfaction” construct we looked at earlier, you’d need to clearly define what you mean by job satisfaction within your study (and this definition would of course need to be underpinned by the relevant theory). You’d then need to make sure that your chosen definition is reflected in the types of questions or scales you’re using in your survey . Simply put, you need to make sure that your survey respondents are perceiving your key constructs in the same way you are. Or, even if they’re not, that your measurement instrument is capturing the necessary information that reflects your definition of the construct at hand.

If all of this talk about constructs sounds a bit fluffy, be sure to check out Research Methodology Bootcamp , which will provide you with a rock-solid foundational understanding of all things methodology-related. Remember, you can take advantage of our 60% discount offer using this link.

Need a helping hand?

reliability and validity in research difference

What Is Reliability?

As with validity, reliability is an attribute of a measurement instrument – for example, a survey, a weight scale or even a blood pressure monitor. But while validity is concerned with whether the instrument is measuring the “thing” it’s supposed to be measuring, reliability is concerned with consistency and stability . In other words, reliability reflects the degree to which a measurement instrument produces consistent results when applied repeatedly to the same phenomenon , under the same conditions .

As you can probably imagine, a measurement instrument that achieves a high level of consistency is naturally more dependable (or reliable) than one that doesn’t – in other words, it can be trusted to provide consistent measurements . And that, of course, is what you want when undertaking empirical research. If you think about it within a more domestic context, just imagine if you found that your bathroom scale gave you a different number every time you hopped on and off of it – you wouldn’t feel too confident in its ability to measure the variable that is your body weight 🙂

It’s worth mentioning that reliability also extends to the person using the measurement instrument . For example, if two researchers use the same instrument (let’s say a measuring tape) and they get different measurements, there’s likely an issue in terms of how one (or both) of them are using the measuring tape. So, when you think about reliability, consider both the instrument and the researcher as part of the equation.

As with validity, there are various types of reliability and various tests that can be used to assess the reliability of an instrument. A popular one that you’ll likely come across for survey instruments is Cronbach’s alpha , which is a statistical measure that quantifies the degree to which items within an instrument (for example, a set of Likert scales) measure the same underlying construct . In other words, Cronbach’s alpha indicates how closely related the items are and whether they consistently capture the same concept . 

Reliability reflects whether an instrument produces consistent results when applied to the same phenomenon, under the same conditions.

Recap: Key Takeaways

Alright, let’s quickly recap to cement your understanding of validity and reliability:

  • Validity is concerned with whether an instrument (e.g., a set of Likert scales) is measuring what it’s supposed to measure
  • Reliability is concerned with whether that measurement is consistent and stable when measuring the same phenomenon under the same conditions.

In short, validity and reliability are both essential to ensuring that your data collection efforts deliver high-quality, accurate data that help you answer your research questions . So, be sure to always pay careful attention to the validity and reliability of your measurement instruments when collecting and analysing data. As the adage goes, “rubbish in, rubbish out” – make sure that your data inputs are rock-solid.

Literature Review Course

Psst… there’s more!

This post is an extract from our bestselling short course, Methodology Bootcamp . If you want to work smart, you don't want to miss this .

Kennedy Sinkamba

THE MATERIAL IS WONDERFUL AND BENEFICIAL TO ALL STUDENTS.

THE MATERIAL IS WONDERFUL AND BENEFICIAL TO ALL STUDENTS AND I HAVE GREATLY BENEFITED FROM THE CONTENT.

Submit a Comment Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

  • Print Friendly
  • Key Differences

Know the Differences & Comparisons

Difference Between Validity and Reliability

validity vs reliability

For the purpose of checking the accuracy and applicability, a multi-item measurement scale needs to be evaluated, in terms of reliability, validity, and generalizability. These are certain preferred qualities which gauge the goodness in measuring the characteristics under consideration. Validity is all about the genuineness of the research, whereas reliability is nothing but the repeatability of the outcomes. This article will break down the fundamental differences between validity and reliability.

Content: Validity Vs Reliability

Comparison chart.

Basis for ComparisonValidityReliability
MeaningValidity implies the extent to which the research instrument measures, what it is intended to measure.Reliability refers to the degree to which scale produces consistent results, when repeated measurements are made.
InstrumentA valid instrument is always reliable.A reliable instrument need not be a valid instrument.
Related toAccuracyPrecision
ValueMoreComparatively less.
AssessmentDifficultEasy

Definition of Validity

In statistics, the term validity implies utility. It is the most important yardstick that signals the degree to which research instrument gauges, what it is supposed to measure.

Simply, it measures the point to which differences discovered with the scale reflect true differences, among objects on the characteristics under study, instead of a systematic and random error. To be considered as perfectly valid, it should not possess any measurement error. There are three types of validity, which are:

  • Content Validity : Otherwise known as face validity, it is the point to which the scale provides adequate coverage of the subject being tested.
  • Criterion Validity : The type of validity which gauges the performance of measuring instrument, i.e. whether it performs as expected or estimated, with respect to the other variables, chosen as a meaningful parameter. The criterion should be relevant, unbiased, reliable, etc.
  • Convergent Validity
  • Discriminant Validity
  • Nomological Validity

Definition of Reliability

Reliability is used to mean the extent to which the measurement tool provides consistent outcomes if the measurement is repeatedly performed. To assess reliability approaches used are test-retest, internal consistency methods, and alternative forms. There are two key aspects, which requires being indicated separately are:

  • Stability : Degree of stability can be checked by making a comparison of the results of repeated measurement.
  • Equivalence : Equivalence can be gauged when two researchers compare the observations of the same events.

Systematic errors do not affect reliability, but random errors lead to inconsistency of the results, thus lower reliability. When the research instrument conforms to reliability, then one can be sure that the temporary and situational factors are not interfering. Reliability can be improved by way of:

  • Standardizing the conditions under which the measurement occurs, i.e. source through which variation takes place should be removed or minimized.
  • Designing the directions carefully for measurement by employing such individuals who have got enough experience and are motivated too, for carrying out research and also by increasing the number of samples being tested.

Key Differences Between Validity and Reliability

The points presented below, explains the fundamental differences between validity and reliability:

  • The degree to which the scale gauges, what it is designed to gauge, is known as validity. On the other hand, reliability refers to the degree of reproducibility of the results, if repeated measurements are done.
  • When it comes to the instrument, a valid instrument is always reliable, but the reverse is not true, i.e. a reliable instrument need not be a valid instrument.
  • While evaluating multi-item scale, validity is considered more valuable in comparison to reliability.
  • One can easily assess the reliability of the measuring instrument, however, to assess validity is difficult.
  • Validity focuses on accuracy, i.e. it checks whether the scale produces expected results or not. Conversely, reliability concentrates on precision, which measures the extent to which scale produces consistent outcomes.

To sum up, validity and reliability are two vital test of sound measurement. Reliability of the instrument can be evaluated by identifying the proportion of systematic variation in the instrument. On the other hand, the validity of the instrument is assessed by determining the degree to which variation in observed scale score indicates actual variation among those being tested.

You Might Also Like:

accuracy vs precision

April 11, 2018 at 12:04 am

Quite informative and user-friendly. It is written in simple language to understand. Bravo!

Md. Zahirul Islam says

June 12, 2019 at 5:58 pm

Thanks for details. #Respect

Quickbooks Enterprise Support says

June 22, 2019 at 5:20 pm

that was really greatly explained. thanks for sharing this.

Pierrette Nyiramahirwe says

October 24, 2019 at 10:32 pm

provides good and valid information! thank you so much 👏

Dipak RaI says

December 20, 2019 at 11:11 am

Thanks a lot for your clear explanation. I am overwhelmed.

Avi Deshmukh says

November 17, 2020 at 7:24 pm

Indeed, good clarity on two interrelated components of statistical and research tools.

Kipsang Gideon says

November 27, 2020 at 1:12 pm

good information on research details and detailed information on matters of research.

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • My Bibliography
  • Collections
  • Citation manager

Save citation to file

Email citation, add to collections.

  • Create a new collection
  • Add to an existing collection

Add to My Bibliography

Your saved search, create a file for external citation management software, your rss feed.

  • Search in PubMed
  • Search in NLM Catalog
  • Add to Search

Reliability and validity: Importance in Medical Research

Affiliations.

  • 1 Al-Nafees Medical College,Isra University, Islamabad, Pakistan.
  • 2 Fauji Foundation Hospital, Foundation University Medical College, Islamabad, Pakistan.
  • PMID: 34974579
  • DOI: 10.47391/JPMA.06-861

Reliability and validity are among the most important and fundamental domains in the assessment of any measuring methodology for data-collection in a good research. Validity is about what an instrument measures and how well it does so, whereas reliability concerns the truthfulness in the data obtained and the degree to which any measuring tool controls random error. The current narrative review was planned to discuss the importance of reliability and validity of data-collection or measurement techniques used in research. It describes and explores comprehensively the reliability and validity of research instruments and also discusses different forms of reliability and validity with concise examples. An attempt has been taken to give a brief literature review regarding the significance of reliability and validity in medical sciences.

Keywords: Validity, Reliability, Medical research, Methodology, Assessment, Research tools..

PubMed Disclaimer

Similar articles

  • Principles and methods of validity and reliability testing of questionnaires used in social and health science researches. Bolarinwa OA. Bolarinwa OA. Niger Postgrad Med J. 2015 Oct-Dec;22(4):195-201. doi: 10.4103/1117-1936.173959. Niger Postgrad Med J. 2015. PMID: 26776330
  • The measurement of collaboration within healthcare settings: a systematic review of measurement properties of instruments. Walters SJ, Stern C, Robertson-Malt S. Walters SJ, et al. JBI Database System Rev Implement Rep. 2016 Apr;14(4):138-97. doi: 10.11124/JBISRIR-2016-2159. JBI Database System Rev Implement Rep. 2016. PMID: 27532315 Review.
  • Evaluation of research studies. Part IV: Validity and reliability--concepts and application. Fullerton JT. Fullerton JT. J Nurse Midwifery. 1993 Mar-Apr;38(2):121-5. doi: 10.1016/0091-2182(93)90146-8. J Nurse Midwifery. 1993. PMID: 8492191
  • Validity and reliability of measurement instruments used in research. Kimberlin CL, Winterstein AG. Kimberlin CL, et al. Am J Health Syst Pharm. 2008 Dec 1;65(23):2276-84. doi: 10.2146/ajhp070364. Am J Health Syst Pharm. 2008. PMID: 19020196 Review.
  • [Psychometric characteristics of questionnaires designed to assess the knowledge, perceptions and practices of health care professionals with regards to alcoholic patients]. Jaussent S, Labarère J, Boyer JP, François P. Jaussent S, et al. Encephale. 2004 Sep-Oct;30(5):437-46. doi: 10.1016/s0013-7006(04)95458-9. Encephale. 2004. PMID: 15627048 Review. French.
  • A psychometric assessment of a novel scale for evaluating vaccination attitudes amidst a major public health crisis. Cheng L, Kong J, Xie X, Zhang F. Cheng L, et al. Sci Rep. 2024 May 4;14(1):10250. doi: 10.1038/s41598-024-61028-z. Sci Rep. 2024. PMID: 38704420 Free PMC article.
  • Test-Retest Reliability of Isokinetic Strength in Lower Limbs under Single and Dual Task Conditions in Women with Fibromyalgia. Gomez-Alvaro MC, Leon-Llamas JL, Melo-Alonso M, Villafaina S, Domínguez-Muñoz FJ, Gusi N. Gomez-Alvaro MC, et al. J Clin Med. 2024 Feb 24;13(5):1288. doi: 10.3390/jcm13051288. J Clin Med. 2024. PMID: 38592707 Free PMC article.
  • Bridging, Mapping, and Addressing Research Gaps in Health Sciences: The Naqvi-Gabr Research Gap Framework. Naqvi WM, Gabr M, Arora SP, Mishra GV, Pashine AA, Quazi Syed Z. Naqvi WM, et al. Cureus. 2024 Mar 8;16(3):e55827. doi: 10.7759/cureus.55827. eCollection 2024 Mar. Cureus. 2024. PMID: 38590484 Free PMC article. Review.
  • Reliability, validity, and responsiveness of the simplified Chinese version of the knee injury and Osteoarthritis Outcome Score in patients after total knee arthroplasty. Yao R, Yang L, Wang J, Zhou Q, Li X, Yan Z, Fu Y. Yao R, et al. Heliyon. 2024 Feb 21;10(5):e26786. doi: 10.1016/j.heliyon.2024.e26786. eCollection 2024 Mar 15. Heliyon. 2024. PMID: 38434342 Free PMC article.
  • Psychometric evaluation of the Chinese version of the stressors in breast cancer scale: a translation and validation study. Hu W, Bao J, Yang X, Ye M. Hu W, et al. BMC Public Health. 2024 Feb 9;24(1):425. doi: 10.1186/s12889-024-18000-3. BMC Public Health. 2024. PMID: 38336690 Free PMC article.

Publication types

  • Search in MeSH

LinkOut - more resources

Full text sources.

  • Pakistan Medical Association

full text provider logo

  • Citation Manager

NCBI Literature Resources

MeSH PMC Bookshelf Disclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Methodology

  • The 4 Types of Reliability in Research | Definitions & Examples

The 4 Types of Reliability in Research | Definitions & Examples

Published on August 8, 2019 by Fiona Middleton . Revised on June 22, 2023.

Reliability tells you how consistently a method measures something. When you apply the same method to the same sample under the same conditions, you should get the same results. If not, the method of measurement may be unreliable or bias may have crept into your research.

There are four main types of reliability. Each can be estimated by comparing different sets of results produced by the same method.

Type of reliability Measures the consistency of…
The same test over .
The same test conducted by different .
of a test which are designed to be equivalent.
The of a test.

Table of contents

Test-retest reliability, interrater reliability, parallel forms reliability, internal consistency, which type of reliability applies to my research, other interesting articles, frequently asked questions about types of reliability.

Test-retest reliability measures the consistency of results when you repeat the same test on the same sample at a different point in time. You use it when you are measuring something that you expect to stay constant in your sample.

Why it’s important

Many factors can influence your results at different points in time: for example, respondents might experience different moods, or external conditions might affect their ability to respond accurately.

Test-retest reliability can be used to assess how well a method resists these factors over time. The smaller the difference between the two sets of results, the higher the test-retest reliability.

How to measure it

To measure test-retest reliability, you conduct the same test on the same group of people at two different points in time. Then you calculate the correlation between the two sets of results.

Test-retest reliability example

You devise a questionnaire to measure the IQ of a group of participants (a property that is unlikely to change significantly over time).You administer the test two months apart to the same group of people, but the results are significantly different, so the test-retest reliability of the IQ questionnaire is low.

Improving test-retest reliability

  • When designing tests or questionnaires , try to formulate questions, statements, and tasks in a way that won’t be influenced by the mood or concentration of participants.
  • When planning your methods of data collection , try to minimize the influence of external factors, and make sure all samples are tested under the same conditions.
  • Remember that changes or recall bias can be expected to occur in the participants over time, and take these into account.

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

reliability and validity in research difference

Interrater reliability (also called interobserver reliability) measures the degree of agreement between different people observing or assessing the same thing. You use it when data is collected by researchers assigning ratings, scores or categories to one or more variables , and it can help mitigate observer bias .

People are subjective, so different observers’ perceptions of situations and phenomena naturally differ. Reliable research aims to minimize subjectivity as much as possible so that a different researcher could replicate the same results.

When designing the scale and criteria for data collection, it’s important to make sure that different people will rate the same variable consistently with minimal bias . This is especially important when there are multiple researchers involved in data collection or analysis.

To measure interrater reliability, different researchers conduct the same measurement or observation on the same sample. Then you calculate the correlation between their different sets of results. If all the researchers give similar ratings, the test has high interrater reliability.

Interrater reliability example

A team of researchers observe the progress of wound healing in patients. To record the stages of healing, rating scales are used, with a set of criteria to assess various aspects of wounds. The results of different researchers assessing the same set of patients are compared, and there is a strong correlation between all sets of results, so the test has high interrater reliability.

Improving interrater reliability

  • Clearly define your variables and the methods that will be used to measure them.
  • Develop detailed, objective criteria for how the variables will be rated, counted or categorized.
  • If multiple researchers are involved, ensure that they all have exactly the same information and training.

Parallel forms reliability measures the correlation between two equivalent versions of a test. You use it when you have two different assessment tools or sets of questions designed to measure the same thing.

If you want to use multiple different versions of a test (for example, to avoid respondents repeating the same answers from memory), you first need to make sure that all the sets of questions or measurements give reliable results.

The most common way to measure parallel forms reliability is to produce a large set of questions to evaluate the same thing, then divide these randomly into two question sets.

The same group of respondents answers both sets, and you calculate the correlation between the results. High correlation between the two indicates high parallel forms reliability.

Parallel forms reliability example

A set of questions is formulated to measure financial risk aversion in a group of respondents. The questions are randomly divided into two sets, and the respondents are randomly divided into two groups. Both groups take both tests: group A takes test A first, and group B takes test B first. The results of the two tests are compared, and the results are almost identical, indicating high parallel forms reliability.

Improving parallel forms reliability

  • Ensure that all questions or test items are based on the same theory and formulated to measure the same thing.

Internal consistency assesses the correlation between multiple items in a test that are intended to measure the same construct.

You can calculate internal consistency without repeating the test or involving other researchers, so it’s a good way of assessing reliability when you only have one data set.

When you devise a set of questions or ratings that will be combined into an overall score, you have to make sure that all of the items really do reflect the same thing. If responses to different items contradict one another, the test might be unreliable.

Two common methods are used to measure internal consistency.

  • Average inter-item correlation : For a set of measures designed to assess the same construct, you calculate the correlation between the results of all possible pairs of items and then calculate the average.
  • Split-half reliability : You randomly split a set of measures into two sets. After testing the entire set on the respondents, you calculate the correlation between the two sets of responses.

Internal consistency example

A group of respondents are presented with a set of statements designed to measure optimistic and pessimistic mindsets. They must rate their agreement with each statement on a scale from 1 to 5. If the test is internally consistent, an optimistic respondent should generally give high ratings to optimism indicators and low ratings to pessimism indicators. The correlation is calculated between all the responses to the “optimistic” statements, but the correlation is very weak. This suggests that the test has low internal consistency.

Improving internal consistency

  • Take care when devising questions or measures: those intended to reflect the same concept should be based on the same theory and carefully formulated.

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

It’s important to consider reliability when planning your research design , collecting and analyzing your data, and writing up your research. The type of reliability you should calculate depends on the type of research  and your  methodology .

What is my methodology? Which form of reliability is relevant?
Measuring a property that you expect to stay the same over time. Test-retest
Multiple researchers making observations or ratings about the same topic. Interrater
Using two different tests to measure the same thing. Parallel forms
Using a multi-item test where all the items are intended to measure the same variable. Internal consistency

If possible and relevant, you should statistically calculate reliability and state this alongside your results .

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Normal distribution
  • Degrees of freedom
  • Null hypothesis
  • Discourse analysis
  • Control groups
  • Mixed methods research
  • Non-probability sampling
  • Quantitative research
  • Ecological validity

Research bias

  • Rosenthal effect
  • Implicit bias
  • Cognitive bias
  • Selection bias
  • Negativity bias
  • Status quo bias

Reliability and validity are both about how well a method measures something:

  • Reliability refers to the  consistency of a measure (whether the results can be reproduced under the same conditions).
  • Validity   refers to the  accuracy of a measure (whether the results really do represent what they are supposed to measure).

If you are doing experimental research, you also have to consider the internal and external validity of your experiment.

You can use several tactics to minimize observer bias .

  • Use masking (blinding) to hide the purpose of your study from all observers.
  • Triangulate your data with different data collection methods or sources.
  • Use multiple observers and ensure interrater reliability.
  • Train your observers to make sure data is consistently recorded between them.
  • Standardize your observation procedures to make sure they are structured and clear.

Reproducibility and replicability are related terms.

  • A successful reproduction shows that the data analyses were conducted in a fair and honest manner.
  • A successful replication shows that the reliability of the results is high.

Research bias affects the validity and reliability of your research findings , leading to false conclusions and a misinterpretation of the truth. This can have serious implications in areas like medical research where, for example, a new form of treatment may be evaluated.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Middleton, F. (2023, June 22). The 4 Types of Reliability in Research | Definitions & Examples. Scribbr. Retrieved August 21, 2024, from https://www.scribbr.com/methodology/types-of-reliability/

Is this article helpful?

Fiona Middleton

Fiona Middleton

Other students also liked, reliability vs. validity in research | difference, types and examples, what is quantitative research | definition, uses & methods, data collection | definition, methods & examples, "i thought ai proofreading was useless but..".

I've been using Scribbr for years now and I know it's a service that won't disappoint. It does a good job spotting mistakes”

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • J Family Med Prim Care
  • v.4(3); Jul-Sep 2015

Validity, reliability, and generalizability in qualitative research

Lawrence leung.

1 Department of Family Medicine, Queen's University, Kingston, Ontario, Canada

2 Centre of Studies in Primary Care, Queen's University, Kingston, Ontario, Canada

In general practice, qualitative research contributes as significantly as quantitative research, in particular regarding psycho-social aspects of patient-care, health services provision, policy setting, and health administrations. In contrast to quantitative research, qualitative research as a whole has been constantly critiqued, if not disparaged, by the lack of consensus for assessing its quality and robustness. This article illustrates with five published studies how qualitative research can impact and reshape the discipline of primary care, spiraling out from clinic-based health screening to community-based disease monitoring, evaluation of out-of-hours triage services to provincial psychiatric care pathways model and finally, national legislation of core measures for children's healthcare insurance. Fundamental concepts of validity, reliability, and generalizability as applicable to qualitative research are then addressed with an update on the current views and controversies.

Nature of Qualitative Research versus Quantitative Research

The essence of qualitative research is to make sense of and recognize patterns among words in order to build up a meaningful picture without compromising its richness and dimensionality. Like quantitative research, the qualitative research aims to seek answers for questions of “how, where, when who and why” with a perspective to build a theory or refute an existing theory. Unlike quantitative research which deals primarily with numerical data and their statistical interpretations under a reductionist, logical and strictly objective paradigm, qualitative research handles nonnumerical information and their phenomenological interpretation, which inextricably tie in with human senses and subjectivity. While human emotions and perspectives from both subjects and researchers are considered undesirable biases confounding results in quantitative research, the same elements are considered essential and inevitable, if not treasurable, in qualitative research as they invariable add extra dimensions and colors to enrich the corpus of findings. However, the issue of subjectivity and contextual ramifications has fueled incessant controversies regarding yardsticks for quality and trustworthiness of qualitative research results for healthcare.

Impact of Qualitative Research upon Primary Care

In many ways, qualitative research contributes significantly, if not more so than quantitative research, to the field of primary care at various levels. Five qualitative studies are chosen to illustrate how various methodologies of qualitative research helped in advancing primary healthcare, from novel monitoring of chronic obstructive pulmonary disease (COPD) via mobile-health technology,[ 1 ] informed decision for colorectal cancer screening,[ 2 ] triaging out-of-hours GP services,[ 3 ] evaluating care pathways for community psychiatry[ 4 ] and finally prioritization of healthcare initiatives for legislation purposes at national levels.[ 5 ] With the recent advances of information technology and mobile connecting device, self-monitoring and management of chronic diseases via tele-health technology may seem beneficial to both the patient and healthcare provider. Recruiting COPD patients who were given tele-health devices that monitored lung functions, Williams et al. [ 1 ] conducted phone interviews and analyzed their transcripts via a grounded theory approach, identified themes which enabled them to conclude that such mobile-health setup and application helped to engage patients with better adherence to treatment and overall improvement in mood. Such positive findings were in contrast to previous studies, which opined that elderly patients were often challenged by operating computer tablets,[ 6 ] or, conversing with the tele-health software.[ 7 ] To explore the content of recommendations for colorectal cancer screening given out by family physicians, Wackerbarth, et al. [ 2 ] conducted semi-structure interviews with subsequent content analysis and found that most physicians delivered information to enrich patient knowledge with little regard to patients’ true understanding, ideas, and preferences in the matter. These findings suggested room for improvement for family physicians to better engage their patients in recommending preventative care. Faced with various models of out-of-hours triage services for GP consultations, Egbunike et al. [ 3 ] conducted thematic analysis on semi-structured telephone interviews with patients and doctors in various urban, rural and mixed settings. They found that the efficiency of triage services remained a prime concern from both users and providers, among issues of access to doctors and unfulfilled/mismatched expectations from users, which could arouse dissatisfaction and legal implications. In UK, a care pathways model for community psychiatry had been introduced but its benefits were unclear. Khandaker et al. [ 4 ] hence conducted a qualitative study using semi-structure interviews with medical staff and other stakeholders; adopting a grounded-theory approach, major themes emerged which included improved equality of access, more focused logistics, increased work throughput and better accountability for community psychiatry provided under the care pathway model. Finally, at the US national level, Mangione-Smith et al. [ 5 ] employed a modified Delphi method to gather consensus from a panel of nominators which were recognized experts and stakeholders in their disciplines, and identified a core set of quality measures for children's healthcare under the Medicaid and Children's Health Insurance Program. These core measures were made transparent for public opinion and later passed on for full legislation, hence illustrating the impact of qualitative research upon social welfare and policy improvement.

Overall Criteria for Quality in Qualitative Research

Given the diverse genera and forms of qualitative research, there is no consensus for assessing any piece of qualitative research work. Various approaches have been suggested, the two leading schools of thoughts being the school of Dixon-Woods et al. [ 8 ] which emphasizes on methodology, and that of Lincoln et al. [ 9 ] which stresses the rigor of interpretation of results. By identifying commonalities of qualitative research, Dixon-Woods produced a checklist of questions for assessing clarity and appropriateness of the research question; the description and appropriateness for sampling, data collection and data analysis; levels of support and evidence for claims; coherence between data, interpretation and conclusions, and finally level of contribution of the paper. These criteria foster the 10 questions for the Critical Appraisal Skills Program checklist for qualitative studies.[ 10 ] However, these methodology-weighted criteria may not do justice to qualitative studies that differ in epistemological and philosophical paradigms,[ 11 , 12 ] one classic example will be positivistic versus interpretivistic.[ 13 ] Equally, without a robust methodological layout, rigorous interpretation of results advocated by Lincoln et al. [ 9 ] will not be good either. Meyrick[ 14 ] argued from a different angle and proposed fulfillment of the dual core criteria of “transparency” and “systematicity” for good quality qualitative research. In brief, every step of the research logistics (from theory formation, design of study, sampling, data acquisition and analysis to results and conclusions) has to be validated if it is transparent or systematic enough. In this manner, both the research process and results can be assured of high rigor and robustness.[ 14 ] Finally, Kitto et al. [ 15 ] epitomized six criteria for assessing overall quality of qualitative research: (i) Clarification and justification, (ii) procedural rigor, (iii) sample representativeness, (iv) interpretative rigor, (v) reflexive and evaluative rigor and (vi) transferability/generalizability, which also double as evaluative landmarks for manuscript review to the Medical Journal of Australia. Same for quantitative research, quality for qualitative research can be assessed in terms of validity, reliability, and generalizability.

Validity in qualitative research means “appropriateness” of the tools, processes, and data. Whether the research question is valid for the desired outcome, the choice of methodology is appropriate for answering the research question, the design is valid for the methodology, the sampling and data analysis is appropriate, and finally the results and conclusions are valid for the sample and context. In assessing validity of qualitative research, the challenge can start from the ontology and epistemology of the issue being studied, e.g. the concept of “individual” is seen differently between humanistic and positive psychologists due to differing philosophical perspectives:[ 16 ] Where humanistic psychologists believe “individual” is a product of existential awareness and social interaction, positive psychologists think the “individual” exists side-by-side with formation of any human being. Set off in different pathways, qualitative research regarding the individual's wellbeing will be concluded with varying validity. Choice of methodology must enable detection of findings/phenomena in the appropriate context for it to be valid, with due regard to culturally and contextually variable. For sampling, procedures and methods must be appropriate for the research paradigm and be distinctive between systematic,[ 17 ] purposeful[ 18 ] or theoretical (adaptive) sampling[ 19 , 20 ] where the systematic sampling has no a priori theory, purposeful sampling often has a certain aim or framework and theoretical sampling is molded by the ongoing process of data collection and theory in evolution. For data extraction and analysis, several methods were adopted to enhance validity, including 1 st tier triangulation (of researchers) and 2 nd tier triangulation (of resources and theories),[ 17 , 21 ] well-documented audit trail of materials and processes,[ 22 , 23 , 24 ] multidimensional analysis as concept- or case-orientated[ 25 , 26 ] and respondent verification.[ 21 , 27 ]

Reliability

In quantitative research, reliability refers to exact replicability of the processes and the results. In qualitative research with diverse paradigms, such definition of reliability is challenging and epistemologically counter-intuitive. Hence, the essence of reliability for qualitative research lies with consistency.[ 24 , 28 ] A margin of variability for results is tolerated in qualitative research provided the methodology and epistemological logistics consistently yield data that are ontologically similar but may differ in richness and ambience within similar dimensions. Silverman[ 29 ] proposed five approaches in enhancing the reliability of process and results: Refutational analysis, constant data comparison, comprehensive data use, inclusive of the deviant case and use of tables. As data were extracted from the original sources, researchers must verify their accuracy in terms of form and context with constant comparison,[ 27 ] either alone or with peers (a form of triangulation).[ 30 ] The scope and analysis of data included should be as comprehensive and inclusive with reference to quantitative aspects if possible.[ 30 ] Adopting the Popperian dictum of falsifiability as essence of truth and science, attempted to refute the qualitative data and analytes should be performed to assess reliability.[ 31 ]

Generalizability

Most qualitative research studies, if not all, are meant to study a specific issue or phenomenon in a certain population or ethnic group, of a focused locality in a particular context, hence generalizability of qualitative research findings is usually not an expected attribute. However, with rising trend of knowledge synthesis from qualitative research via meta-synthesis, meta-narrative or meta-ethnography, evaluation of generalizability becomes pertinent. A pragmatic approach to assessing generalizability for qualitative studies is to adopt same criteria for validity: That is, use of systematic sampling, triangulation and constant comparison, proper audit and documentation, and multi-dimensional theory.[ 17 ] However, some researchers espouse the approach of analytical generalization[ 32 ] where one judges the extent to which the findings in one study can be generalized to another under similar theoretical, and the proximal similarity model, where generalizability of one study to another is judged by similarities between the time, place, people and other social contexts.[ 33 ] Thus said, Zimmer[ 34 ] questioned the suitability of meta-synthesis in view of the basic tenets of grounded theory,[ 35 ] phenomenology[ 36 ] and ethnography.[ 37 ] He concluded that any valid meta-synthesis must retain the other two goals of theory development and higher-level abstraction while in search of generalizability, and must be executed as a third level interpretation using Gadamer's concepts of the hermeneutic circle,[ 38 , 39 ] dialogic process[ 38 ] and fusion of horizons.[ 39 ] Finally, Toye et al. [ 40 ] reported the practicality of using “conceptual clarity” and “interpretative rigor” as intuitive criteria for assessing quality in meta-ethnography, which somehow echoed Rolfe's controversial aesthetic theory of research reports.[ 41 ]

Food for Thought

Despite various measures to enhance or ensure quality of qualitative studies, some researchers opined from a purist ontological and epistemological angle that qualitative research is not a unified, but ipso facto diverse field,[ 8 ] hence any attempt to synthesize or appraise different studies under one system is impossible and conceptually wrong. Barbour argued from a philosophical angle that these special measures or “technical fixes” (like purposive sampling, multiple-coding, triangulation, and respondent validation) can never confer the rigor as conceived.[ 11 ] In extremis, Rolfe et al. opined from the field of nursing research, that any set of formal criteria used to judge the quality of qualitative research are futile and without validity, and suggested that any qualitative report should be judged by the form it is written (aesthetic) and not by the contents (epistemic).[ 41 ] Rolfe's novel view is rebutted by Porter,[ 42 ] who argued via logical premises that two of Rolfe's fundamental statements were flawed: (i) “The content of research report is determined by their forms” may not be a fact, and (ii) that research appraisal being “subject to individual judgment based on insight and experience” will mean those without sufficient experience of performing research will be unable to judge adequately – hence an elitist's principle. From a realism standpoint, Porter then proposes multiple and open approaches for validity in qualitative research that incorporate parallel perspectives[ 43 , 44 ] and diversification of meanings.[ 44 ] Any work of qualitative research, when read by the readers, is always a two-way interactive process, such that validity and quality has to be judged by the receiving end too and not by the researcher end alone.

In summary, the three gold criteria of validity, reliability and generalizability apply in principle to assess quality for both quantitative and qualitative research, what differs will be the nature and type of processes that ontologically and epistemologically distinguish between the two.

Source of Support: Nil.

Conflict of Interest: None declared.

Explore Jobs

  • Jobs Near Me
  • Remote Jobs
  • Full Time Jobs
  • Part Time Jobs
  • Entry Level Jobs
  • Work From Home Jobs

Find Specific Jobs

  • $15 Per Hour Jobs
  • $20 Per Hour Jobs
  • Hiring Immediately Jobs
  • High School Jobs
  • H1b Visa Jobs

Explore Careers

  • Business And Financial
  • Architecture And Engineering
  • Computer And Mathematical

Explore Professions

  • What They Do
  • Certifications
  • Demographics

Best Companies

  • Health Care
  • Fortune 500

Explore Companies

  • CEO And Executies
  • Resume Builder
  • Career Advice
  • Explore Majors
  • Questions And Answers
  • Interview Questions

Validity Vs. Reliability: What’s The Difference?

  • Parameter vs. Statistic
  • Reoccurring vs. Recurring
  • Linear vs. Nonlinear
  • Observational Study vs. Experiment
  • Histogram vs. Bar Graph
  • Discrete vs. Continuous
  • Validity vs. Reliability
  • Type 1 vs. Type 2 Error
  • Objective vs. Subjective Data
  • Prospective vs. Retrospective Study
  • Sample vs. Population
  • Interpolation vs. Extrapolation
  • Exogenous vs. Endogenous

Find a Job You Really Want In

The difference between validity and reliability is important in research, testing, and statistical analysis. Both are used to determine how well a test measures something, but the two of them tell you different things about your test. Validity is all about accuracy in your measurements, while reliability determines consistency. Ideally, you want your equipment to be both reliable and valid – or consistent and accurate – be it a thermometer, questionnaire, or scale. Key Takeaways: Validity Reliability If a measurement is accurate, then it’s valid. If a measurement is consistent, then it’s reliable. Validity is essential in all types of testing. If your results are skewed, then your conclusion is likely to be as well. Reliability is also important. If your instruments for collecting data don’t produce reliable results, you can’t draw any conclusions. Test results can’t be valid if they aren’t reliable. If you keep getting different results from measurements under the same conditions, then it’s neither reliable nor correct. A tool can have reliable measurements that aren’t valid. If a radar gun isn’t properly calibrated, it may register 50 mph for every car that goes by at 35 mph. It’s reliable, but it isn’t valid. There are three major types of determinations of validity: criterion, content, and construct. There are four major types of determinations of reliability: test-retest, inter-rater , parallel forms, and internal consistency. What is Validity?

Validity is the measure of whether or not your test is accurate. If you have a ten-pound weight and your scale reads it as ten pounds, then it’s valid. Valid test results need not be consistent as long as they’re accurate. If the conditions change – even if you’re unaware of them – then you should get a different measurement.

Hard measurements – such as weight, temperature, and pH – aren’t the only type of measurements that require determining validity. It’s also used in medicine and psychology to determine how useful their surveys and questionnaires are.

For instance, a questionnaire created to determine if a person has a type of illness is valid if the answers predict whether or not the patient suffers from that disease. And if it’s valid, it can be a useful tool for diagnosis.

Of course, validity isn’t quite as simple as that. There are three major types of validity that are referenced in tests.

Criterion Validity. This determines whether or not the test fits the criteria. To put it plainly, it’s whether or not it stacks up to other valid measurements of the same thing.

Construct Validity. Does this test measure what it’s meant to measure? If you want to measure someone’s reading comprehension and instead design a test that is a great indicator of short-term memory, it’s not valid.

Content Validity. Sometimes also called face validity, this measures whether or not the test adequately covers what you’re attempting to measure. For instance, if it’s a test to determine comprehension of a subject in a course, it should cover all the key knowledge learned in the course.

As with most things in studies, validity isn’t a hard measure. Most studies have a sliding scale of validity, and they try to get it as close to the top as reasonably possible, but it’s essentially impossible to have something that’s truly, completely valid.

What is Reliability?

Reliability is the measure of the consistency of your instruments. If a weight put on a scale consistently comes up as ten pounds, then your scale is reliable. It should be noted that the weight in question doesn’t need to weigh ten pounds. If it’s a five-pound weight and the scale is off by five pounds, but it comes up with the same answer every time, it’s still reliable.

As with validity, there are different types of ways to determine reliability.

Test-retest reliability. This determination is exactly what it sounds like. Tests are conducted multiple different times in order to determine the reliability of the results. This is best for something like temperature under similar conditions – something that isn’t going to change.

Parallel forms reliability. With this one, they use different tests that are designed to be equivalent to one another. Sometimes this is also done with split-half reliability , where the test is split into two pieces, and those are compared.

Internal consistency reliability. This is often used in personality tests, where the questions are related to what you’re trying to determine. In personality tests , they will even ask multiple similar questions in order to help determine reliability.

Inter-rater reliability. For this type of reliability, different people run the same study or test, and the results are compared. This is the basis of many serious studies, as someone will run a study, then another person will run a similar or identical study in order to make sure the results can be replicated.

Like validity, reliability isn’t binary in most studies. The goal is to try to get as high a level of reliability as possible. The idea of limited reliability is seen most often in polling – there’s always a listed margin of error . If the margin of error is large enough, it also calls into question the validity.

Validity vs. Reliability FAQ

What is the relationship between validity and reliability?

The relationship between validity and reliability is that they’re both used to determine the efficacy of a test or a study. Validity determines whether or not it’s accurate, while reliability determines whether or not the results are consistent.

What are examples of reliability and validity?

An example of validity is a poll accurately predicting whether or not a candidate will win reelection. An example of reliability is that a poll gets similar results from similar parts of the electorate.

Can something be valid but not reliable?

No, something can’t be valid but not reliable. If your results aren’t reliable, they’re inherently not valid. Validity is accuracy, so if your results aren’t consistent in similar conditions, they can’t possibly be accurate. However, something can be reliable but not valid.

How do you measure reliability and validity?

There are several different ways to measure reliability and validity. For reliability, the best way to do it is to repeat the test multiple times in order to make sure you get the same results. For validity, it’s best to try to compare to other similar results that you know are valid.

How useful was this post?

Click on a star to rate it!

Average rating / 5. Vote count:

No votes so far! Be the first to rate this post.

' src=

Di has been a writer for more than half her life. Most of her writing so far has been fiction, and she’s gotten short stories published in online magazines Kzine and Silver Blade, as well as a flash fiction piece in the Bookends review. Di graduated from Mary Baldwin College (now University) with a degree in Psychology and Sociology.

Responsive Image

Related posts

reliability and validity in research difference

Implicit Vs. Explicit Costs: What’s The Difference?

reliability and validity in research difference

Objective Data Vs. Subjective Data: What’s The Difference?

reliability and validity in research difference

FIFO Vs. LIFO: What’s The Difference?

reliability and validity in research difference

Sociology Vs. Psychology: What’s The Difference?

  • Career Advice >
  • Science Terms >
  • Validity Vs Reliability

ESLBUZZ

Reliability vs. Validity: A Comparison for Research Study

By: Author ESLBUZZ

Posted on Last updated: August 31, 2023

Sharing is caring!

Reliability vs. validity – the differences between them are not always that clear for new English learners and researchers. These two concepts are essential to consider when designing and conducting research studies. While they both relate to the quality of data, they have distinct differences that are important to understand. In this article, we will discuss the differences between reliability vs validity and provide examples to help clarify these concepts.

In the following sections, we will delve deeper into these concepts and provide more examples to help you understand the differences between reliability and validity.

Reliability vs. Validity

Reliability vs. Validity

Reliability vs. Validity: Definitions

Understanding reliability.

Definition of Reliability

Reliability refers to the consistency and stability of results obtained from a measurement tool or instrument. In other words, it is the degree to which a measurement tool produces consistent results over time and across different settings.

For example, if a student takes an English grammar test and scores 80% on the first attempt and 85% on the second attempt, we can say that the test is reliable because the scores are consistent. However, if the student scores 80% on the first attempt and 95% on the second attempt, we can say that the test is not reliable because the scores are not consistent.

Importance of Reliability in English Grammar and Writing

Reliability is important in English grammar and writing because it ensures that the results obtained from a measurement tool are accurate and trustworthy. In other words, if a measurement tool is reliable, we can be confident that the results obtained from it are not due to chance or random error.

For example, if a teacher uses a grammar test to assess the writing skills of their students, they need to be sure that the test is reliable. If the test is not reliable, the teacher may get inconsistent results, which may lead to incorrect conclusions about the writing skills of their students.

To ensure reliability in English grammar and writing, it is important to use measurement tools that have been validated and standardized. This means that the tools have been tested and proven to produce consistent and accurate results.

In summary, reliability is an important concept in English grammar and writing because it ensures that the results obtained from a measurement tool are accurate and trustworthy. It is important to use validated and standardized measurement tools to ensure reliability.

Understanding Validity

Definition of Validity

Validity refers to the accuracy of a measure in assessing what it is intended to measure. In other words, validity measures whether a test or assessment is measuring what it is supposed to measure. It is a crucial concept in English grammar and writing as it ensures that the assessment or test is measuring the intended construct.

For instance, if a writing assessment is designed to measure a student’s ability to write an argumentative essay, then the validity of the assessment would depend on whether it measures the student’s ability to write an argumentative essay and not any other writing skills.

Significance of Validity in English Grammar and Writing

Validity is significant in English grammar and writing as it ensures that the assessment or test is measuring the intended construct. For example, in language proficiency tests, validity ensures that the test measures the intended language skills. Without validity, the test results may not accurately reflect the student’s language proficiency, leading to inaccurate placement in language courses.

In writing, validity ensures that the assessment measures the intended writing skills. For instance, if a writing assessment is designed to measure a student’s ability to write a persuasive essay, then the validity of the assessment would depend on whether it measures the student’s ability to write a persuasive essay and not any other writing skills.

Table: Reliability vs. Validity

Reliability Validity
Consistency of a measure Accuracy of a measure
Measures consistency in results Measures whether a test or assessment is measuring what it is supposed to measure
A reliable measure may not be valid A valid measure is always reliable

In conclusion, understanding validity is crucial in English grammar and writing as it ensures that the assessment or test measures the intended construct. It is essential to consider validity when designing assessments or tests to ensure that they accurately measure what they are intended to measure.

Reliability vs. Validity: A Comparison in Detail

When it comes to language testing and writing assessment, reliability and validity are two terms that are often used interchangeably. However, they are distinct concepts that have different implications for assessment. In this section, we will explore the differences between reliability and validity in language testing and writing assessment.

Reliability vs. Validity in Language Testing

Reliability in language testing refers to the consistency of test scores across different administrations of the same test. A reliable test produces consistent scores, regardless of when or where it is administered. For example, if a student takes a language proficiency test twice and scores the same both times, the test can be considered reliable.

Validity, on the other hand, refers to the degree to which a test measures what it is intended to measure. In language testing, validity is concerned with whether a test accurately measures a student’s language skills. For example, a language proficiency test that only measures reading skills may not be valid for assessing a student’s overall language proficiency.

To illustrate the difference between reliability and validity in language testing, consider the following example. Suppose a language proficiency test is administered to a group of students twice, with a one-week interval between the two administrations. If the test produces consistent scores both times, it can be considered reliable. However, if the test only measures reading skills and not speaking or writing skills, it may not be valid for assessing a student’s overall language proficiency.

Reliability vs. Validity in Writing Assessment

Reliability in writing assessment refers to the consistency of scores assigned to the same piece of writing by different raters. A reliable writing assessment produces consistent scores, regardless of who is doing the scoring. For example, if two raters score the same piece of writing and assign the same score, the assessment can be considered reliable.

Validity, in writing assessment, refers to the degree to which a writing assessment measures what it is intended to measure. In writing assessment, validity is concerned with whether an assessment accurately measures a student’s writing skills. For example, a writing assessment that only measures grammar and spelling may not be valid for assessing a student’s overall writing ability.

To illustrate the difference between reliability and validity in writing assessment, consider the following example. Suppose a writing assessment is administered to a group of students, and two raters score the same piece of writing. If the raters assign the same score, the assessment can be considered reliable. However, if the assessment only measures grammar and spelling and not organization or coherence, it may not be valid for assessing a student’s overall writing ability.

In summary, reliability and validity are two distinct concepts that are essential for assessing language skills and writing ability. While reliability refers to consistency, validity refers to accuracy. It is important to consider both reliability and validity when designing language tests and writing assessments to ensure that they produce accurate and consistent results.

Reliability vs. Validity: Common Misconceptions

When it comes to research, reliability and validity are two essential concepts that are often misunderstood. In this section, we will address some of the common misconceptions about these two concepts.

Misconception 1: Reliability and validity are the same thing

One of the most common misconceptions about reliability and validity is that they are interchangeable terms. However, reliability refers to the consistency of a measure, while validity refers to the accuracy of a measure. In other words, reliability is about how consistently a measure produces the same results, while validity is about whether the measure is actually measuring what it is supposed to measure.

Misconception 2: A measure can be reliable but not valid

Another common misconception is that a measure can be reliable but not valid. While it is true that a measure can be reliable without being valid, it is not possible for a measure to be valid without also being reliable. This is because a measure that is not consistent cannot be accurate.

Misconception 3: Reliability and validity are only important in quantitative research

While reliability and validity are often discussed in the context of quantitative research, they are also important in qualitative research. In qualitative research, reliability refers to the consistency of the findings, while validity refers to the accuracy of the interpretations.

Misconception 4: Reliability and validity are fixed properties of a measure

Finally, it is important to understand that reliability and validity are not fixed properties of a measure. Instead, they are influenced by a variety of factors, including the context in which the measure is used and the population being studied. It is important to consider these factors when evaluating the reliability and validity of a measure.

In summary, reliability and validity are two important concepts in research that are often misunderstood. By understanding these concepts and addressing common misconceptions, researchers can ensure that their findings are accurate and reliable.

How to Improve Reliability and Validity in English Grammar and Writing

When it comes to English grammar and writing, ensuring both reliability and validity is crucial. Here are a few tips to help you improve both factors in your writing:

Reliability

  • Use consistent grammar rules: Stick to a set of grammar rules throughout your writing, and avoid changing them frequently. This will help ensure consistency and reliability in your writing.
  • Proofread your work: Go through your writing multiple times to check for errors and inconsistencies. This will help you catch any mistakes and improve the reliability of your writing.
  • Use reliable sources: When researching for your writing, make sure to use reliable sources that are accurate and trustworthy. This will help ensure the reliability of your information.
  • Use relevant examples: When making a point or argument in your writing, use relevant examples to support your claims. This will help improve the validity of your writing.
  • Check your facts: Make sure to fact-check any information you include in your writing. This will help ensure the validity of your writing.
  • Avoid exaggeration: Avoid making exaggerated or false claims in your writing. Stick to the facts and use evidence to support your claims. This will help improve the validity of your writing.

In conclusion, improving both reliability and validity in your English grammar and writing is important for creating high-quality content. By following these tips, you can ensure that your writing is both reliable and valid.

In conclusion, understanding the differences between reliability and validity is crucial in conducting research. While both concepts are important, they serve different purposes. Reliability refers to the consistency and stability of a measure or research findings, while validity refers to the accuracy of a measure in measuring what it claims to measure.

To summarize, reliability is about the consistency of data over time, while validity is about whether the data is measuring what it is supposed to measure. Researchers should consider both reliability and validity when designing their research methods and collecting data.

It is important to note that reliability and validity are not mutually exclusive. In fact, a measure can be reliable but not valid, or valid but not reliable. Therefore, it is crucial to assess both reliability and validity to ensure that the data collected is both accurate and consistent.

To help differentiate between reliability and validity, the following table provides a comparison of the two concepts:

Reliability Validity
Refers to consistency and stability of data Refers to accuracy of data
Can be measured through test-retest, inter-rater, or internal consistency Can be measured through content, criterion, or construct
A measure can be reliable but not valid A measure can be valid but not reliable
Important in quantitative research Important in both quantitative and qualitative research

In conclusion, understanding the differences between reliability and validity is crucial in conducting research. By considering both concepts, researchers can ensure that the data collected is both accurate and consistent, leading to more reliable and valid research findings.

Frequently Asked Questions

What is the difference between reliability and validity in research?

Reliability refers to the consistency of a measure, while validity refers to the accuracy of a measure. In other words, reliability is about whether a measurement produces consistent results over time and across different situations, while validity is about whether a measurement actually measures what it’s supposed to measure.

How can you differentiate between reliability and validity with examples?

One way to differentiate between validity and reliability is to think of a dartboard. If you’re trying to hit the bullseye, reliability is about whether you can hit the same spot repeatedly, while validity is about whether you’re actually hitting the bullseye or not. For example, if you’re consistently hitting the same spot on the board, that’s reliable, but if that spot is nowhere near the bullseye, that’s not valid.

What is the importance of validity and reliability in research?

Validity and reliability are important because they help ensure that research findings are accurate and trustworthy. Without validity and reliability, researchers can’t be sure that their measurements are actually measuring what they’re supposed to be measuring, or that their findings are consistent and repeatable.

What is content validity and how does it differ from construct validity?

Content validity is about whether a measure covers all aspects of a particular concept or construct. For example, if you’re measuring intelligence, content validity would mean that your measure covers all aspects of intelligence, not just one or two. Construct validity, on the other hand, is about whether a measure actually measures the construct it’s supposed to be measuring. For example, if you’re measuring intelligence, construct validity would mean that your measure is actually measuring intelligence, not something else.

What are inter-rater reliability and internal consistency, and how do they relate to validity and reliability?

Inter-rater reliability is about whether different raters or observers can agree on their measurements. For example, if you’re measuring the quality of a restaurant, inter-rater reliability would mean that different people who rate the restaurant would come up with similar ratings. Internal consistency, on the other hand, is about whether different items in a measure are consistent with each other. For example, if you’re measuring depression, internal consistency would mean that different items in your measure (such as “feeling sad” and “loss of interest”) are consistent with each other. Both inter-rater reliability and internal consistency are important for ensuring the validity and reliability of a measure.

How do you determine the validity and reliability of an article?

To determine the validity and reliability of an article, you should look for information about the methods used to collect and analyze data, as well as any measures used in the study. You can also look for information about the sample size and characteristics, as well as any potential biases or limitations in the study. Additionally, you can look for information about the study’s findings and whether they are consistent with other research in the field. By considering all of these factors, you can get a better sense of the validity and reliability of the article.

Reliability refers to the consistency of a measure, while validity refers to the accuracy of a measure. In other words, reliability is about whether a measurement produces consistent results over time and across different situations, while validity is about whether a measurement actually measures what it's supposed to measure.

"}},{"@type":"Question","name":"How can you differentiate between validity and reliability with examples?","acceptedAnswer":{"@type":"Answer","text":"

One way to differentiate between validity and reliability is to think of a dartboard. If you're trying to hit the bullseye, reliability is about whether you can hit the same spot repeatedly, while validity is about whether you're actually hitting the bullseye or not. For example, if you're consistently hitting the same spot on the board, that's reliable, but if that spot is nowhere near the bullseye, that's not valid.

"}},{"@type":"Question","name":"What is the importance of validity and reliability in research?","acceptedAnswer":{"@type":"Answer","text":"

Validity and reliability are important because they help ensure that research findings are accurate and trustworthy. Without validity and reliability, researchers can't be sure that their measurements are actually measuring what they're supposed to be measuring, or that their findings are consistent and repeatable.

"}},{"@type":"Question","name":"What is content validity and how does it differ from construct validity?","acceptedAnswer":{"@type":"Answer","text":"

Content validity is about whether a measure covers all aspects of a particular concept or construct. For example, if you're measuring intelligence, content validity would mean that your measure covers all aspects of intelligence, not just one or two. Construct validity, on the other hand, is about whether a measure actually measures the construct it's supposed to be measuring. For example, if you're measuring intelligence, construct validity would mean that your measure is actually measuring intelligence, not something else.

"}},{"@type":"Question","name":"What are inter-rater reliability and internal consistency, and how do they relate to validity and reliability?","acceptedAnswer":{"@type":"Answer","text":"

Inter-rater reliability is about whether different raters or observers can agree on their measurements. For example, if you're measuring the quality of a restaurant, inter-rater reliability would mean that different people who rate the restaurant would come up with similar ratings. Internal consistency, on the other hand, is about whether different items in a measure are consistent with each other. For example, if you're measuring depression, internal consistency would mean that different items in your measure (such as \"feeling sad\" and \"loss of interest\") are consistent with each other. Both inter-rater reliability and internal consistency are important for ensuring the validity and reliability of a measure.

"}},{"@type":"Question","name":"How do you determine the validity and reliability of an article?","acceptedAnswer":{"@type":"Answer","text":"

To determine the validity and reliability of an article, you should look for information about the methods used to collect and analyze data, as well as any measures used in the study. You can also look for information about the sample size and characteristics, as well as any potential biases or limitations in the study. Additionally, you can look for information about the study's findings and whether they are consistent with other research in the field. By considering all of these factors, you can get a better sense of the validity and reliability of the article.

  • Recent Posts

ESLBUZZ

  • Ed Words: Expand Your Vocabulary and Improve Your Writing! - April 15, 2024
  • List of Ethnicities and Their Cultures from Around the World - April 2, 2024
  • Mastering English Writing: Essential Transitional Words for Body Paragraphs - March 25, 2024

Related posts:

  • WHO vs. WHOM vs. WHOSE: How to Use them Correctly
  • Accept vs. Except: Understanding the Difference
  • Affect vs. Effect: A Comprehensive Guide
  • Alligator vs. Crocodile: Spot the Differences for English Learners!

Illustration

  • Basics of Research Process
  • Methodology

Reliability vs Validity in Research: Main Differences

  • Speech Topics
  • Basics of Essay Writing
  • Essay Topics
  • Other Essays
  • Main Academic Essays
  • Research Paper Topics
  • Basics of Research Paper Writing
  • Miscellaneous
  • Chicago/ Turabian
  • Data & Statistics
  • Admission Writing Tips
  • Admission Advice
  • Other Guides
  • Student Life
  • Studying Tips
  • Understanding Plagiarism
  • Academic Writing Tips
  • Basics of Dissertation & Thesis Writing

Illustration

  • Essay Guides
  • Research Paper Guides
  • Formatting Guides
  • Admission Guides
  • Dissertation & Thesis Guides

reliability vs validity

Table of contents

Illustration

Use our free Readability checker

Reliability and validity are two important concepts in research design that are used to assess the quality of research results.

  • Reliability refers to the consistency of research findings over time or across different studies. Research is considered reliable if it produces identical outcomes when repeated under similar conditions.
  • Validity means the accuracy or truthfulness of research findings. A valid study measures what it is supposed to measure and its results can be applied to the population of interest.

Why is it important to know the difference between reliability vs validity? Conducting complex research typically requires some preparation, particularly to evaluate your data collection and analysis methods. Do they produce correct results? Are they applicable for this subject? Both validity and reliability values make it possible to quickly evaluate how well your research approach works in a particular case. Specific techniques like test-retest help to calculate the correlation between the results of subsequent measurements and thus show whether these results are reliable. Checking how well these results correspond to common sense may help to learn whether they are valid. If you want to learn more about these two major parameters and get help in writing a research paper , let’s get into this together!

Reliability vs Validity: Definition

To better explain validity vs reliability, we need to start with the basics. In fact, there is a strong relation between both these parameters as they all are elements of quality. However, it can happen that your assessment method provides valid results at first, but its reliability turns out to be low because you cannot achieve consistency after using it again. So, let’s dive into details with our coursework writing service .

What Does Valid Mean: Definition

Let's start with  validity meaning . It is a quality parameter that shows how accurately a measurement is performed. In case the test results match the expected values or correspond to other properties of the subject or the surrounding environment, they are most probably valid. The meaning of this parameter is that it indicates whether it is safe to make assumptions based on results of a measurement . Main types of validity are:

What Is Reliability: Definition

As for a reliability definition , it is a parameter that indicates consistency of a tool or a method. In case it repeatedly produces the same or similar results we can call it reliable, meaning that it does not degrade as time passes. The goal of a researcher who measures some values again and again is to understand whether the tool in question can be safely reused. Main types of reliability are:

  • test-retest
  • internal consistency.

Reliable vs Valid: What Is the Difference

To understand the concept of reliability vs validity just keep in mind that they represent different aspects of quality and evaluate measurement results from different angles. The first indicates whether an assessment tool works properly under different conditions and after being used repeatedly. And the validity level of this tool shows it is able to measure properly at all. Both these parameters are crucial for ensuring the internal quality level of a research and the mark it scores, regardless of an academic field it belongs to. Let’s see how to use them and how exactly they can help with dissertation or other research.

Reliability and Validity: How to Use in Your Research

Validity and reliability of your results indicate the quality level of your research. Therefore, they show whether its results can be trusted, whether they are useful, or whether they support your statements as intended. So, you should use these parameters in order to create a strong design research , ensuring all your methods, samples, and other parts of content are appropriate. Results of these parameters are equally crucial for in-depth scientific research and for student-level works. So, let’s dig deeper and find out how to use both of them in research.

Validity in Research

In general, validity and reliability in research are to be used together to ensure you can reach your research goals. When it comes to ensuring validity, it is often recommended to do that at earlier stages of your research. When you work on your research design and particularly decide how you will collect your data, you can verify available methods to see whether they are helpful in your particular case. Once you ensure they are valid, you can proceed evaluating their reliability. Otherwise it would hardly be useful to have reliable methods that consistently provide incorrect results.

Reliability in Research

Speaking about validity vs reliability in research, it is important to understand that it doesn’t always help to check whether your methods are valid after a first run. Depending on specific conditions, their efficiency may change at further steps. So it is highly recommended to verify their consistency. You need to consider the reliability of your tools and methods throughout the entire data collection process . The more you invest into this verification, the more confidence about the quality of your overall work you will have.

Reliability vs Validity: Examples

Finally, let’s review some reliability vs validity examples. This will help to illustrate the meaning and usage of both these concepts in case you still have any questions after reading the explanations above.

Validity vs Reliability: Key Takeaways

So, we have learned about the concept of reliable vs valid approach in research. This article covers the most important elements of this construct: the meaning of both these quality parameters, their main differences and their usage in research projects.

Illustration

Or just don’t know where to start from? Feel free to check out our academic writing service and buy research paper online . We are a team of skilled writers with vast experience in different academic fields, ready to help any student with their paper. All our articles are well-written, proofread and always delivered on time!

Joe_Eckel_1_ab59a03630.jpg

Joe Eckel is an expert on Dissertations writing. He makes sure that each student gets precious insights on composing A-grade academic writing.

You may also like

longitudinal study

Let’s suppose that a group of a local mall’s consumers is monitored by a research team for several years. Their shopping habits and preferences are examined by conducting surveys. If their responses do not change significantly over time, this indicates high reliability of this approach. Alternatively, if different researchers conducting the survey on this group’s subsections also get correlated results, it is safe to assume that these tests are reliable.  Now let’s suppose that at some point it becomes clear that some questions in the survey contain mistakes and aren’t actually collecting the data which is needed. In this case this approach is invalid despite the tests being consistent. It is necessary to ensure the validity at the start of research to avoid such outcomes.

Validity and Reliability: Frequently Asked Questions

1. where do you write about reliability and validity in a thesis.

You may write about reliability and validity in various sections of your thesis or dissertation, as it depends on your work’s structure. However, it would be best to include these evaluations to the part where you describe your research design. You need to explain how you will assess the quality of your approach and your results before you conduct actual research steps and make conclusions about your topic.

2. Can something be valid but not reliable?

A measure can be valid but not reliable if it returns correct results at first but then fails to do so for some reasons, particularly because of changing circumstances. It is also possible that a measure is reliable if it is measuring something very consistently, but not valid however, in case a wrong construct is measured all the time. Therefore both these parameters aren’t alway correlated despite being closely connected.

3. Is reliability necessary for validity?

What makes reliability necessary for validity? In most cases we cannot say a test is valid if it isn’t reliable. Test score reliability is actually a component of validity. However a researcher must remember that additional verifications are needed to ensure validity of a group of tests in addition to verifying their reliability. These two parameters cannot replace one another.

4. What does it mean that reliability is necessary but not sufficient for validity?

In most cases, reliability is a component of validity. We cannot say a test is valid, if it produces errors or gets inappropriate data at some point.  At the same time it is important to remember that overall reliability of tests is not sufficient for assuming their validity, since they might provide wrong results consistently. It would make them reliable but at the end they would just repeat the same error again and again.

Wright State University - Research Logo

Assessing the Reliability and Validity of the Mini Clinical Evaluation Exercise for Internal Medicine Residency Training

  • Wright State University

Research output : Contribution to journal › Article › peer-review

Original languageAmerican English
Journal
Volume77
StatePublished - Sep 1 2002

Disciplines

  • Internal Medicine
  • Medical Specialties
  • Medicine and Health Sciences

Other files and links

  • Link to repository

T1 - Assessing the Reliability and Validity of the Mini Clinical Evaluation Exercise for Internal Medicine Residency Training

AU - Durning, Steven J

AU - Cation, Lannie J

AU - Markert, Ronald J

AU - Pangaro, Louis N

PY - 2002/9/1

Y1 - 2002/9/1

N2 - Abstract PURPOSE: The mini-clinical evaluation exercise, or mini-CEX, assesses residents' history and physical examination skills. To date, no study has assessed the validity of the mini-CEX (mCEX) evaluation format. The authors' objective was to determine the reliability and validity of the mCEX evaluation format. METHOD: Twenty-three first-year residents at Wright-Patterson Medical Center in Dayton, Ohio, were included in the study (academic years 1996-97, 1997-98, and 1998-99). Validity of the instrument was determined by comparing mCEX scores with scores from corresponding sections of a modified version of the standard American Board of Internal Medicine's (ABIM's) monthly evaluation form (MEF) and the American College of Physicians-American Society of Internal Medicine In-Training Examination (ITE). All ABIM MEFs were used without exclusionary criteria, including ABIM MEFs from months where a corresponding mCEX evaluation was not performed. RESULTS: Each resident in the study had an average of seven mCEX evaluations and 12 ABIM MEFs. Of the 168 required mCEX evaluations, 162 were studied. Internal consistency reliability was .90. Statistically significant correlations were found for the following: mCEX history with ABIM history; mCEX physical exam with ABIM physical exam; mCEX clinical judgment with ABIM clinical judgment, medical care, medical knowledge, and the ITE; mCEX humanistic attributes with ABIM humanistic attributes, and mCEX overall clinical competence with ABIM overall clinical competence, medical care, medical knowledge, and the ITE. Analysis of variance comparing sequential mean mCEX scores yielded no significant difference. CONCLUSIONS: This study suggests that the mCEX is a feasible and reliable evaluation tool. The validity of the mCEX is supported by the strong correlations between mCEX scores and corresponding ABIM MEF scores as well as the ITE.

AB - Abstract PURPOSE: The mini-clinical evaluation exercise, or mini-CEX, assesses residents' history and physical examination skills. To date, no study has assessed the validity of the mini-CEX (mCEX) evaluation format. The authors' objective was to determine the reliability and validity of the mCEX evaluation format. METHOD: Twenty-three first-year residents at Wright-Patterson Medical Center in Dayton, Ohio, were included in the study (academic years 1996-97, 1997-98, and 1998-99). Validity of the instrument was determined by comparing mCEX scores with scores from corresponding sections of a modified version of the standard American Board of Internal Medicine's (ABIM's) monthly evaluation form (MEF) and the American College of Physicians-American Society of Internal Medicine In-Training Examination (ITE). All ABIM MEFs were used without exclusionary criteria, including ABIM MEFs from months where a corresponding mCEX evaluation was not performed. RESULTS: Each resident in the study had an average of seven mCEX evaluations and 12 ABIM MEFs. Of the 168 required mCEX evaluations, 162 were studied. Internal consistency reliability was .90. Statistically significant correlations were found for the following: mCEX history with ABIM history; mCEX physical exam with ABIM physical exam; mCEX clinical judgment with ABIM clinical judgment, medical care, medical knowledge, and the ITE; mCEX humanistic attributes with ABIM humanistic attributes, and mCEX overall clinical competence with ABIM overall clinical competence, medical care, medical knowledge, and the ITE. Analysis of variance comparing sequential mean mCEX scores yielded no significant difference. CONCLUSIONS: This study suggests that the mCEX is a feasible and reliable evaluation tool. The validity of the mCEX is supported by the strong correlations between mCEX scores and corresponding ABIM MEF scores as well as the ITE.

UR - https://corescholar.libraries.wright.edu/internal_medicine/77

M3 - Article

JO - Academic Medicine

JF - Academic Medicine

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

applsci-logo

Article Menu

reliability and validity in research difference

  • Subscribe SciFeed
  • Recommended Articles
  • Google Scholar
  • on Google Scholar
  • Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

Validity and reliability of a commercially available inertial sensor for measuring barbell mechanics during weightlifting.

reliability and validity in research difference

1. Introduction

2. materials and methods, 2.1. experimental approach to the problem, 2.2. participants, 2.3. procedures, 2.4. data capture and processing, 2.5. statistical analysis, 2.5.1. validity, 2.5.2. reliability, 3.1. validity, 3.2. reliability, 4. discussion, 5. conclusions, author contributions, institutional review board statement, informed consent statement, data availability statement, acknowledgments, conflicts of interest.

  • Stone, M.H.; O’Bryant, H.S.; Williams, F.E.; Johnson, R.L.; Pierce, K.C. Analysis of bar paths during the snatch in elite male weightlifters. Strength Cond. J. 1998 , 20 , 30–38. [ Google Scholar ]
  • Gourgoulis, V.; Aggeloussis, N.; Garas, A.; Mavromatis, G. Unsuccessful vs. successful performance in snatch lifts: A kinematic approach. J. Strength Cond. Res. 2009 , 23 , 486–494. [ Google Scholar ]
  • Mastalerz, A.; Szyszka, P.; Grantham, W.; Sadowski, J. Biomechanical analysis of successful and unsuccessful snatch lifts in elite female weightlifters. J. Hum. Kinet. 2019 , 68 , 69–79. [ Google Scholar ] [ PubMed ]
  • Nagao, H.; Kubo, Y.; Tsuno, T.; Kurosaka, S.; Muto, M. A biomechanical comparison of successful and unsuccessful snatch attempts among elite male weightlifters. Sports 2019 , 7 , 151. [ Google Scholar ] [ CrossRef ]
  • Nagao, H.; Huang, Z.; Kubo, Y. Biomechanical comparison of successful snatch and unsuccessful frontward barbell drop in world-class male weightlifters. Sports Biomech. 2023 , 22 , 1120–1135. [ Google Scholar ] [ PubMed ]
  • Kauhanen, H.; Häkkinen, K.; Komi, P.V. A biomechanical analysis of the snatch and clean and jerk techniques of Finnish elite and district level weightlifters. Scand. J. Med. Sci. Sports 1984 , 6 , 47–56. [ Google Scholar ]
  • Burdett, R.G. Biomechanics of the snatch technique of highly skilled and skilled weightlifters. Res. Q. Exerc. Sport 1982 , 53 , 193–197. [ Google Scholar ]
  • Liu, G.; Fekete, G.; Yang, H.; Ma, J.; Sun, D.; Mei, Q.; Gu, Y. Comparative 3-dimensional kinematic analysis of snatch technique between top-elite and sub-elite male weightlifters in 69-kg category. Heliyon 2018 , 4 , e00658. [ Google Scholar ]
  • Campos, J.; Poletaev, P.; Cuesta, A.; Pablos, C.; Carratalá, V. Kinematical analysis of the snatch in elite male junior weightlifters of different weight categories. J. Strength Cond. Res. 2006 , 20 , 843–850. [ Google Scholar ]
  • Garhammer, J. Power production by Olympic weightlifters. Med. Sci. Sports Exerc. 1980 , 12 , 54–60. [ Google Scholar ] [ PubMed ]
  • Antoniuk, O.; Pavlyuk, O.; Chopyk, T.; Pavlyuk, Y. Characteristics of barbell trajectory in snatch, fulfilled by elite female weightlifters. Pedagog. Psychol. Med.-Biol. Probl. Phys. Train. Sports 2016 , 20 , 4–8. [ Google Scholar ]
  • Musser, L.; Garhammer, J.; Rozenek, R.; Crussemeyer, J.; Vargas, E. Anthropometry and barbell trajectory in the snatch lift for elite women weightlifters. J. Strength Cond. Res. 2014 , 28 , 1636–1648. [ Google Scholar ] [ PubMed ]
  • Cunanan, A.J.; Hornsby, G.W.; South, M.A.; Ushakova, K.P.; Mizuguchi, S.; Sato, K.; Pierce, K.C.; Stone, M.H. Survey of barbell trajectory and kinematics of the snatch lift from the 2015 world and 2017 Pan-American weightlifting championships. Sports 2020 , 8 , 118. [ Google Scholar ] [ CrossRef ]
  • Sandau, I.; Langen, G.; Nitzsche, N. Variability of time series barbell kinematics in elite male weightlifters. Front. Sports Act. Living 2023 , 13 , 1264280. [ Google Scholar ]
  • Vorobyev, A.N. A Textbook on Weightlifting ; International Weightlifting Federation: Budapest, Hungary, 1978. [ Google Scholar ]
  • Isaka, T.; Okada, J.; Funato, K. Kinematic analysis of the barbell during the snatch movement of elite Asian weightlifters. J. Appl. Biomech. 1996 , 12 , 508–516. [ Google Scholar ]
  • Sato, K.; Sands, W.A.; Stone, M.H. The reliability of accelerometery to measure weightlifting performance. Sports Biomech. 2012 , 11 , 524–531. [ Google Scholar ] [ PubMed ]
  • Kipp, K.; Harris, C. Patterns of barbell acceleration during the snatch in weightlifting competition. J. Sports Sci. 2015 , 33 , 1467–1471. [ Google Scholar ]
  • Sandau, I.; Chaabene, H.; Granacher, U. Concurrent validity of barbell force measured from video-based barbell kinematics during the snatch in male elite weightlifters. PLoS ONE 2021 , 16 , e0254705. [ Google Scholar ]
  • Menrad, T.; Edelmann-Nusser, J. Validation of velocity measuring devices in velocity-based strength training. Int. J. Comput. Sci. Sport 2021 , 20 , 106–118. [ Google Scholar ]
  • Fritschi, R.; Seiler, J.; Gross, M. Validity and effects of placement of velocity-based training devices. Sports 2021 , 9 , 123. [ Google Scholar ] [ CrossRef ]
  • Bazyler, C.D.; Mizuguchi, S.; Zourdos, M.C.; Sato, K.; Kavanaugh, A.A.; DeWeese, B.H.; Breuel, K.F.; Stone, M.H. Characteristics of a national level female weightlifter peaking for competition: A case study. J. Strength Cond. Res. 2018 , 32 , 3029–3038. [ Google Scholar ] [ PubMed ]
  • McKay, A.K.; Stellingwerff, T.; Smith, E.S.; Martin, D.T.; Mujika, I.; Goosey-Tolfrey, V.L.; Sheppard, J.; Burke, L.M. Defining training and performance caliber: A participant classification framework. Int. J. Sports Physiol. Perform. 2021 , 17 , 317–331. [ Google Scholar ]
  • Chavda, S.; Hill, M.; Martin, S.; Swisher, A.; Haff, G.G.; Turner, A.N. Weightlifting: An applied method of technical analysis. Strength Cond. J. 2021 , 43 , 32–42. [ Google Scholar ]
  • Sandau, I.; Granacher, U. Effects of the barbell load on the acceleration phase during the snatch in elite Olympic weightlifting. Sports 2020 , 8 , 59. [ Google Scholar ] [ CrossRef ]
  • Garhammer, J. A Review of Power Output Studies of Olympic and Powerlifting: Methodology, Performance Prediction, and Evaluation Tests. J. Strength Cond. Res. 1993 , 7 , 76–89. [ Google Scholar ]
  • Lin, L.I.K. A concordance correlation coefficient to evaluate reproducibility. Biometrics 1989 , 45 , 255–268. [ Google Scholar ]
  • McBride, G.B. A proposal for strength-of-agreement criteria for Lin’s concordance correlation coefficient. NIWA Client Rep. 2005 , 45 , 307–310. [ Google Scholar ]
  • Passing, H.; Bablok, W. A new biometrical procedure for testing the equality of measurements from two different analytical methods. Application of linear regression procedures for method comparison studies in clinical chemistry, Part I. J. Clin. Chem. Biochem. 1983 , 21 , 709–720. [ Google Scholar ]
  • Lake, J.P.; Mundy, P.D.; Comfort, P.; Suchomel, T.J. Do the peak and mean force methods of assessing vertical jump force asymmetry agree? Sports Biomech. 2018 , 19 , 227–234. [ Google Scholar ] [ PubMed ]
  • Koo, T.K.; Li, M.Y. A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J. Chiropr. Med. 2016 , 15 , 155–163. [ Google Scholar ]
  • Weir, J.P. Quantifying test-retest reliability using the intraclass correlation coefficient and the SEM. J. Strength Cond. Res. 2005 , 19 , 231–240. [ Google Scholar ] [ PubMed ]
  • Bernards, J.R.; Sato, K.; Haff, G.G.; Bazyler, C.D. Current research and statistical practices in sport science and a need for change. Sports 2017 , 5 , 87. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Cohen, J. Statistical Power Analysis for the Behavioural Sciences ; Routledge Academic: New York, NY, USA, 1988. [ Google Scholar ]
  • Feuerbacher, J.F.; Jacobs, M.W.; Dragutinovic, B.; Goldmann, J.P.; Cheng, S.; Schumann, M. Validity and test-retest reliability of the Vmaxpro sensor for evaluation of movement velocity in the deep squat. J. Strength Cond. Res. 2023 , 37 , 35–40. [ Google Scholar ]
  • Dragutinovic, B.; Jacobs, M.; Feuerbacher, J.; Goldmann, J.P.; Cheng, S.; Schumann, M. Evaluation of the Vmaxpro sensor for assessing movement velocity and load-velocity variables: Accuracy and implications for practical use. Biol. Sport 2023 , 41 , 41–51. [ Google Scholar ]
  • Olaya-Cuartero, J.; Villalón-Gasch, L.; Penichet-Tomás, A.; Jimenez-Olmedo, J.M. Validity and Reliability of the VmaxPro IMU for back squat exercise in multipower machine. J. Phys. Educ. Sport 2022 , 22 , 2920–2926. [ Google Scholar ]
  • Villalon-Gasch, L.; Jimenez-Olmedo, J.M.; Olaya-Cuartero, J.; Pueo, B. Test–Retest and Between–Device Reliability of Vmaxpro IMU at Hip and Ankle for Vertical Jump Measurement. Sensors 2023 , 23 , 2068. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Jimenez-Olmedo, J.M.; Pueo, B.; Mossi, J.M.; Villalon-Gasch, L. Concurrent validity of the inertial measurement unit VmaxPro in vertical jump estimation. Appl. Sci. 2023 , 13 , 959. [ Google Scholar ] [ CrossRef ]
  • Chiu, L.Z.; Schilling, B.K.; Fry, A.C.; Salem, G.J. The influence of deformation on barbell mechanics during the clean pull. Sports Biomech. 2008 , 7 , 260–273. [ Google Scholar ]
  • Sandau, I.; Jentsch, H.; Bunk, M. Realanalyzer HD—A real-time barbell tracking software for weightlifting. EWF Sci. Mag. 2019 , 5 , 14–23. [ Google Scholar ]
  • Medvedyev, A.S. A System of Multi-Year Training in Weightlifting ; Fizkultura I Sport: Moscow, Russia, 1989. [ Google Scholar ]
  • Lake, J.; Augustus, S.; Austin, K.; Comfort, P.; McMahon, J.; Mundy, P.; Haff, G.G. The reliability and validity of the bar-mounted PUSH Band TM 2.0 during bench press with moderate and heavy loads. J. Sports Sci. 2019 , 37 , 2685–2690. [ Google Scholar ]

Click here to enlarge figure

PhaseVariableEnode
Mean ± SD [95% CI]
3D Criterion
Mean ± SD [95% CI]
Intercept
[95% CI]
Slope
[95% CI]
RSD
[95% CI]
RSD (%)
1st pullv11.06 ± 0.18 [0.61–1.52]1.15 ± 0.2 [0.66–1.65]−0.01 [−0.2–0.17]1.1 [0.94–1.29]0.04 [−0.08–0.08]4%
x11.94 ± 1.65 [1.11–2.76]1.04 ± 1.52 [0.6–1.49]−0.45 [−1.98–0.06]0.94 [0.64–1.73]0.88 [−1.73–1.73]45%
y125.96 ± 4.27 [14.86–37.06]30.95 ± 4.44 [17.72–44.18]1.01 [−38.03–14.61]1.17 [0.61–2.71]2.48 [−4.86–4.86]10%
TransitionvT1.51 ± 0.16 [0.86–2.15]1.51 ± 0.13 [0.87–2.16]0.23 [0–0.51]0.86 [0.67–1]0.03 [−0.05–0.05]2%
vLoss0.44 ± 0.18 [0.25–0.63]0.36 ± 0.15 [0.21–0.51]−0.07 [−0.21–0.10]0.93 [0.55–1.31]0.07 [−0.14–0.14]47%
2nd Pullv22.08 ± 0.16 [1.19–2.97]2 ± 0.15 [1.15–2.86]0.07 [−0.35–0.31]0.93 [0.81–1.12]0.02 [−0.05–0.05]1%
– -
y281.11 ± 6.33 [46.43–115.79]79.65 ± 6.24 [45.6–113.71]1.35 [−15.78–14.23]0.97 [0.8–1.17]1.16 [−2.27–2.27]1%
Turnover
yPBH118.62 ± 9.56 [67.91–169.34]116.42 ± 8.99 [66.65–166.19]4.88 [−13.71–24.4]0.94 [0.78–1.09]1.52 [−2.97–2.97]1%
Receive
yR108.66 ± 8.5 [62.2–155.11]107.75 ± 8.35 [61.69–153.82]0.88 [−24.82–21.61]0.99 [0.79–1.23]1.67 [−3.27–3.27]2%
CatchD19.96 ± 4.16 [5.7–14.22]8.66 ± 3.53 [4.96–12.37]0.22 [−1.85–1.33]0.83 [0.73–1.09]0.69 [−1.34–1.34]7%
yCatch94.87 ± 8.06 [54.31–135.42]95.6 ± 8.26 [54.73–136.48]−6.7 [−88.35–25.04]1.08 [0.74–1.97]2.14 [−4.2–4.2]2%
D213.79 ± 6.05 [7.9–19.69]12.15 ± 6.15 [6.96–17.35]−1.93 [−3.57–1.98]1 [0.78–1.15]0.99 [−1.94–1.94]7%
D323.76 ± 9.36 [13.6–33.91]20.81 ± 8.97 [11.92–29.71]−1.62 [−5.58–0.8]0.94 [0.83–1.16]1.46 [−2.87–2.87]6%
yLoop101.23 ± 7.96 [57.95–144.5]101.78 ± 6.98 [58.27–145.3]19.83 [−29.72–41.07]0.82 [0.61–1.3]2.75 [−5.39–5.39]3%
Loop8.67 ± 2.54 [4.96–12.37]8.69 ± 1.81 [4.98–12.41]2.47 [−4.78–5.92]0.72 [0.35–1.61]1.2 [−2.36–2.36]14%
Force
and
Power
AvgP770 ± 176 [441–1099]745 ± 176 [427–1064]−10 [−153.97–63.22]0.97 [0.86–1.2]31 [−61–61]4%
PP1799 ± 473 [1030–2569]1636 ± 372 [937–2336]157.13 [−147.02–434.16]0.82 [0.67–1.04]85 [−166–166]5%
Session 1
PhaseVariableRep 1
Mean ± SD [95% CI]
Rep 2
Mean ± SD [95% CI]
ICCSEMSDDHedges g
1st pullv11.08 ± 0.18 [0.62, 1.54]1.04 ± 0.20 [0.59, 1.48]0.940 [0.757, 0.983]0.000.010.2 [−0.61, 1.02]
x12.25 ± 2.13 [1.29, 3.21]1.95 ± 1.76 [1.12, 2.79]0.667 [0.208, 0.885]0.080.230.14 [−0.67, 0.96]
y126.43 ± 5.32 [15.13, 37.73]26.18 ± 5.00 [14.99, 37.37]0.871 [0.629, 0.959]0.050.130.05 [−0.76, 0.86]
TransitionvT1.52 ± 0.15 [0.87, 2.16]1.52 ± 0.20 [0.87, 2.17]0.727 [0.306, 0.909]0.000.01−0.04 [−0.85, 0.77]
xT5.53 ± 3.75 [3.17, 7.90]5.52 ± 3.48 [3.16, 7.87]0.837 [0.544, 0.948]0.000.010.00 [−0.81, 0.81]
yT50.23 ± 6.62 [28.76, 71.71]50.97 ± 7.05 [29.18, 72.76]0.772 [0.407, 0.924]0.180.49−0.10 [−0.91, 0.71]
vLoss0.44 ± 0.20 [0.25, 0.62]0.48 ± 0.23 [0.28, 0.69]0.848 [0.587, 0.951]0.010.03−0.21 [−1.03, 0.60]
2nd Pullv22.08 ± 0.16 [1.19, 2.97]2.09 ± 0.17 [1.20, 2.98]0.928 [0.783, 0.977]0.000.00−0.05 [−0.85, 0.76]
x2−0.14 ± 5.11 [−0.08, −0.20]−0.09 ± 5.27 [−0.05, −0.13]0.901 [0.705, 0.969]0.010.02−0.01 [−0.82, 0.80]
y280.85 ± 7.21 [46.29, 115.42]81.88 ± 6.34 [46.87, 116.88]0.705 [0.280, 0.900]0.280.77−0.15 [−0.96, 0.66]
TurnoverxPBH3.44 ± 5.98 [1.97, 4.91]4.02 ± 6.98 [2.30, 5.73]0.756 [0.371, 0.919]0.140.39−0.09 [−0.90, 0.72]
yPBH117.90 ± 10.18 [67.49, 168.31]118.28 ± 9.79 [67.71, 168.84]0.766 [0.386, 0.923]0.090.25−0.04 [−0.85, 0.77]
ReceivexR9.42 ± 7.90 [5.39, 13.44]9.74 ± 8.80 [5.57, 13.90]0.629 [0.124, 0.871]0.100.27−0.04 [−0.85, 0.77]
yR108.09 ± 8.96 [61.88, 154.31]106.79 ± 9.90 [61.14, 152.45]0.728 [0.322, 0.908]0.340.940.13 [−0.68, 0.94]
CatchD19.81 ± 5.02 [5.61, 14.00]11.48 ± 5.71 [6.57, 16.39]0.841 [0.518, 0.950]0.330.93−0.3 [−1.12, 0.51]
xCatch12.62 ± 9.01 [7.23, 18.02]12.19 ± 12.10 [6.98, 17.40]0.519 [0.048, 0.827]0.150.410.04 [−0.77, 0.85]
yCatch95.75 ± 8.96 [54.81, 136.68]93.85 ± 10.58 [53.73, 133.98]0.696 [0.271, 0.895]0.521.450.19 [−0.62, 1.00]
D212.35 ± 5.00 [7.07, 17.62]12.94 ± 4.99 [7.41, 18.47]0.877 [0.654, 0.960]0.100.29−0.11 [−0.93, 0.70]
D322.15 ± 8.96 [12.68, 31.63]24.42 ± 9.57 [13.98, 34.86]0.925 [0.682, 0.979]0.310.86−0.24 [−1.05, 0.58]
xLoop−2.68 ± 5.31 [−1.53, −3.82]−2.03 ± 6.73 [−1.16, −2.90]0.822 [0.518, 0.942]0.140.38−0.10 [−0.91, 0.71]
yLoop101.25 ± 8.48 [57.96, 144.53]100.42 ± 8.69 [57.49, 143.36]0.584 [0.057, 0.853]0.270.740.09 [−0.72, 0.90]
Loop8.21 ± 2.64 [4.70, 11.72]7.95 ± 3.47 [4.55, 11.35]0.850 [0.583, 0.952]0.050.140.08 [−0.73, 0.89]
Force
and
Power
AvgP760 ± 177 [435, 1086]760 ± 175 [435, 1085]0.990 [0.968, 0.997]0.010.020.00 [−0.81, 0.81]
PP1781 ± 482 [1019, 2542]1802 ± 511 [1032, 2572]0.964 [0.889, 0.989]2.035.63−0.04 [−0.85, 0.77]
PF1000 ± 236 [572, 1427]1004 ± 248 [575, 1433]0.985 [0.950, 0.995]0.250.69−0.02 [−0.83, 0.79]
Session 2
PhaseVariableRep 1
Mean ± SD [95% CI]
Rep 2
Mean ± SD [95% CI]
ICCSEMSDDHedges g
1st pullv11.06 ± 0.17 [0.61–1.52]1.08 ± 0.2 [0.62–1.54]0.835 [0.543, 0.947]0.000.01−0.1 [−0.91, 0.71]
x11.98 ± 2.06 [1.13–2.82]1.57 ± 2.01 [0.9–2.24]0.829 [0.546, 0.944]0.080.230.2 [−0.62, 1.01]
y125.57 ± 4.19 [14.64–36.5]25.65 ± 4.52 [14.69–36.62]0.707 [0.266, 0.901]0.020.06−0.02 [−0.83, 0.79]
TransitionvT1.49 ± 0.19 [0.85–2.12]1.5 ± 0.19 [0.86–2.14]0.738 [0.329, 0.913]0.000.01−0.05 [−0.86, 0.76]
xT5.52 ± 3.31 [3.16–7.88]4.65 ± 3.52 [2.66–6.63]0.836 [0.551, 0.947]0.180.490.25 [−0.57, 1.06]
yT50.12 ± 7.01 [28.69–71.54]50 ± 7.28 [28.62–71.37]0.833 [0.534, 0.946]0.020.070.02 [−0.79, 0.83]
vLoss0.42 ± 0.19 [0.24–0.6]0.42 ± 0.15 [0.24–0.6]0.869 [0.624, 0.958]0.000.000.00 [−0.81, 0.81]
2nd Pullv22.07 ± 0.18 [1.18–2.95]2.09 ± 0.17 [1.2–2.99]0.950 [0.841, 0.984]0.000.01−0.11 [−0.92, 0.70]
x2−0.38 ± 4.89 [−0.22–0.54]−1.89 ± 5.34 [−1.08–2.7]0.831 [0.520, 0.946]0.310.860.29 [−0.53, 1.10]
y281.05 ± 6.4 [46.4–115.71]80.66 ± 7.62 [46.18–115.15]0.861 [0.606, 0.956]0.070.200.05 [−0.76, 0.86]
TurnoverxPBH3.5 ± 6.46 [2–5]1.5 ± 6.44 [0.86–2.14]0.649 [0.206, 0.876]0.591.640.30 [−0.51, 1.11]
yPBH118.42 ± 10.71 [67.79–169.05]119.88 ± 10.72 [68.63–171.14]0.893 [0.696, 0.966]0.240.66−0.13 [−0.94, 0.68]
ReceivexR9.42 ± 8.2 [5.39–13.45]7.7 ± 8.3 [4.41–10.99]0.630 [0.156, 0.870]0.521.450.20 [−0.61, 1.01]
yR108.84 ± 10.95 [62.31–155.37]110.91 ± 10.21 [63.49–158.32]0.760 [0.398, 0.919]0.511.41−0.19 [−1.00, 0.62]
CatchD19.58 ± 5.34 [5.49–13.68]8.98 ± 2.71 [5.14–12.81]0.504 [0.057, 0.819]0.210.590.14 [−0.67, 0.95]
xCatch13.02 ± 10.04 [7.46–18.59]10.95 ± 10.61 [6.27–15.63]0.613 [0.125, 0.863]0.641.780.19 [−0.62, 1.01]
yCatch93.81 ± 7.86 [53.7–133.91]96.05 ± 9.96 [54.99–137.12]0.798 [0.477, 0.933]0.501.40−0.24 [−1.05, 0.57]
D215.03 ± 8.67 [8.6–21.46]14.85 ± 6.64 [8.5–21.2]0.870 [0.625, 0.959]0.030.090.02 [−0.79, 0.83]
D324.62 ± 10.8 [14.09–35.14]23.83 ± 9.05 [13.64–34.02]0.962 [0.884, 0.988]0.080.210.08 [−0.73, 0.89]
xLoop−3.22 ± 5.29 [−1.85–4.6]−5.12 ± 5.76 [−2.93–7.3]0.754 [0.374, 0.918]0.471.310.33 [−0.48, 1.15]
yLoop100.89 ± 8.43 [57.76–144.03]102.34 ± 10.48 [58.59–146.09]0.835 [0.558, 0.946]0.290.82−0.15 [−0.96, 0.66]
Loop8.75 ± 2.59 [5.01–12.49]9.76 ± 2.88 [5.59–13.93]0.702 [0.287, 0.898]0.280.76−0.36 [−1.17, 0.46]
Force
and
Power
AvgP772 ± 183 [442 −1102]785 ± 176 [450 −1121]0.975 [0.924, 0.992]1.032.85−0.07 [−0.88, 0.74]
PP1800 ± 476 [1031–2570]1814 ± 443 [1039–2590]0.968 [0.898, 0.990]1.253.47−0.03 [−0.84, 0.78]
PF1014 ± 244 [580–1447]1004 ± 234 [575–1433]0.968 [0.902, 0.990]0.892.480.04 [−0.77, 0.85]
PhaseVariableSession 1
Mean ± SD [95% CI]
Session 2
Mean ± SD [95% CI]
ICCSEMSDDHedges g
1st pullv11.06 ± 0.19 [0.61–1.51]1.07 ± 0.18 [0.61–1.53]0.917 [0.727, 0.975]0.000.00−0.05 [−0.86, 0.76]
x12.1 ± 1.78 [1.2–3]1.77 ± 1.95 [1.02–2.53]0.732 [0.115, 0.918]0.090.240.17 [−0.64, 0.98]
y126.3 ± 4.98 [15.06–37.55]25.61 ± 4.01 [14.66–36.56]0.881 [0.620, 0.963]0.120.330.15 [−0.66, 0.96]
TransitionvT1.52 ± 0.16 [0.87–2.17]1.49 ± 0.18 [0.85–2.13]0.942 [0.816, 0.982]0.000.010.17 [−0.64, 0.98]
xT5.52 ± 3.45 [3.16–7.88]5.08 ± 3.29 [2.91–7.26]0.919 [0.741, 0.975]0.060.170.13 [−0.68, 0.94]
yT50.6 ± 6.42 [28.97–72.23]50.06 ± 6.82 [28.66–71.46]0.966 [0.890, 0.989]0.050.140.08 [−0.73, 0.89]
vLoss0.46 ± 0.21 [0.26–0.66]0.42 ± 0.17 [0.24–0.6]0.862 [0.565, 0.957]0.010.020.20 [−0.61, 1.01]
2nd Pullv22.09 ± 0.16 [1.19–2.98]2.08 ± 0.17 [1.19–2.97]0.954 [0.850, 0.986]0.000.000.06 [−0.75, 0.87]
x2−0.12 ± 5.05 [−0.07–0.16]−1.13 ± 4.93 [−0.65–1.62]0.937 [0.797, 0.981]0.130.350.20 [−0.62, 1.01]
y281.37 ± 6.25 [46.58–116.15]80.86 ± 6.77 [46.29–115.43]0.943 [0.817, 0.983]0.060.170.08 [−0.73, 0.89]
TurnoverxPBH3.73 ± 6.07 [2.13–5.32]2.5 ± 5.88 [1.43–3.57]0.834 [0.474, 0.949]0.250.690.20 [−0.61, 1.01]
yPBH118.09 ± 9.35 [67.6–168.58]119.15 ± 10.42 [68.21–170.1]0.930 [0.777, 0.979]0.140.39−0.10 [−0.91, 0.71]
ReceivexR9.58 ± 7.5 [5.48–13.67]8.56 ± 7.44 [4.9–12.22]0.818 [0.403, 0.945]0.220.600.13 [−0.68, 0.94]
yR107.44 ± 8.75 [61.51–153.38]109.87 ± 9.93 [62.9–156.85]0.785 [0.324, 0.933]0.561.56−0.25 [−1.06, 0.56]
CatchD110.65 ± 5.2 [6.09–15.2]9.28 ± 3.65 [5.31–13.25]0.825 [0.456, 0.946]0.290.800.30 [−0.52, 1.11]
xCatch12.41 ± 9.23 [7.1–17.71]11.98 ± 9.25 [6.86–17.11]0.698 [0.048, 0.909]0.120.330.05 [−0.76, 0.85]
yCatch94.8 ± 9.02 [54.27–135.33]94.93 ± 8.53 [54.34–135.52]0.826 [0.412, 0.948]0.030.08−0.01 [−0.82, 0.80]
D212.64 ± 4.84 [7.24–18.05]14.94 ± 7.45 [8.55–21.33]0.897 [0.599, 0.970]0.371.02−0.35 [−1.17, 0.46]
D323.29 ± 9.15 [13.33–33.25]24.22 ± 9.87 [13.87–34.58]0.966 [0.894, 0.99]0.090.24−0.09 [−0.9, 0.72]
xLoop−2.35 ± 5.77 [−1.35–3.36]−4.17 ± 5.22 [−2.39–5.95]0.862 [0.555, 0.958]0.340.940.32 [−0.49, 1.14]
yLoop100.83 ± 7.6 [57.72–143.94]101.62 ± 9.11 [58.17–145.06]0.894 [0.655, 0.968]0.130.36−0.09 [−0.90, 0.72]
Loop8.08 ± 2.96 [4.62–11.53]9.25 ± 2.55 [5.3–13.21]0.787 [0.332, 0.934]0.270.75−0.41 [−1.23, 0.41]
Force
and
Power
AvgP760 ± 176 [435–1086]779 ± 178 [446–1112]0.986 [0.952, 0.996]1.103.05−0.10 [−0.91, 0.71]
PP1791 ± 492 [1025–2557]1807 ± 456 [1035–2580]0.993 [0.978, 0.998]0.661.84−0.03 [−0.84, 0.78]
PF1002 ± 241 [574–1430]1009 ± 237 [578–1440]0.997 [0.989, 0.999]0.200.57−0.03 [−0.84, 0.78]
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

Chavda, S.; Sandau, I.; Bishop, C.; Xu, J.; Turner, A.N.; Lake, J.P. Validity and Reliability of a Commercially Available Inertial Sensor for Measuring Barbell Mechanics during Weightlifting. Appl. Sci. 2024 , 14 , 7397. https://doi.org/10.3390/app14167397

Chavda S, Sandau I, Bishop C, Xu J, Turner AN, Lake JP. Validity and Reliability of a Commercially Available Inertial Sensor for Measuring Barbell Mechanics during Weightlifting. Applied Sciences . 2024; 14(16):7397. https://doi.org/10.3390/app14167397

Chavda, Shyam, Ingo Sandau, Chris Bishop, Jiaqing Xu, Anthony N. Turner, and Jason P. Lake. 2024. "Validity and Reliability of a Commercially Available Inertial Sensor for Measuring Barbell Mechanics during Weightlifting" Applied Sciences 14, no. 16: 7397. https://doi.org/10.3390/app14167397

Article Metrics

Article access statistics, further information, mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

Development, validity and reliability of the healthy lifestyle behavior scale

  • Open access
  • Published: 21 August 2024
  • Volume 21 , article number  62 , ( 2024 )

Cite this article

You have full access to this open access article

reliability and validity in research difference

  • Ugurcan Sayili   ORCID: orcid.org/0000-0002-5925-2128 1 ,
  • Kevser Sak   ORCID: orcid.org/0000-0002-8440-5763 1 ,
  • Sumeyye Nur Aydin   ORCID: orcid.org/0000-0002-0891-2587 1 ,
  • Busra Kara   ORCID: orcid.org/0000-0001-8698-9654 1 ,
  • Deniz Turgut   ORCID: orcid.org/0000-0002-9304-3178 1 &
  • Osman Bisgin   ORCID: orcid.org/0009-0006-2372-5559 1  

45 Accesses

Explore all metrics

Healthy lifestyle behaviors encompass activities aimed at promoting, maintaining, or reclaiming health. Evaluating these behaviors accurately requires comprehensive, valid, and reliable tools.

This study aimed to develop the Healthy Lifestyle Behavior Scale and evaluate its psychometric properties in the Turkish population.

For this methodological research, a cross-sectional online-based survey was conducted between 21 March 2023 and 31 March 2023 among 330 participants who were recruited via convenience sampling. The initial item pool included 90 items across seven domains (exercise, health responsibility, preventive health actions, sleep, stress and social support, nutrition, smoking, and alcohol). The content validity of the scale was verified by taking expert opinions. Construct validity and reliability were assessed using principal component analysis (PCA) and Cronbach's alpha.

A total of 330 people were recruited for the study (65.2% female, mean age 34.2 ± 9.4 years). The final scale comprised 34 items: 5 on exercise, 4 on health responsibility, 4 on preventive health actions, 2 on sleep, 5 on social support, 3 on stress management, 5 on nutrition, 4 on smoking and 2 on alcohol. The construct validity analysis revealed a 9-factor structure explaining 62.35% of the variance (Kaiser–Meyer–Olkin value = 0.807). Internal consistency was confirmed with Cronbach's alpha (α = 0.863 for the scale, > 0.7 for subscales) and high item-total correlation.

Conclusions

Our newly developed Healthy Lifestyle Behavior Scale demonstrated good validity and reliability. It outperformed existing scales, boasting higher alpha values for subfactors and explained variance. This scale is a robust tool for assessing healthy lifestyle behaviors in adults.

Similar content being viewed by others

reliability and validity in research difference

The Healthy Lifestyle and Personal Control Questionnaire (HLPCQ): a novel tool for assessing self-empowerment through a constellation of daily activities

Ontario adults’ health behaviors, mental health, and overall well-being during the covid-19 pandemic, lifestyle choices and mental health: a representative population survey.

Avoid common mistakes on your manuscript.

1 Introduction

People's lifestyles and choices affect their risk of developing many noncommunicable diseases, such as cancer, heart disease, stroke and diabetes [ 1 ]. It is known that many chronic diseases that cause high morbidity and mortality in the world and in our country, which also impose serious burdens on the insurance institutions of the countries and create high disability-adjusted life years (DALYs) and quality-adjusted life years (QALYs), can be prevented by the implementation of healthy lifestyle behaviors [ 2 ]. In 2019, noncommunicable diseases accounted for 74% of deaths worldwide, 88% in high-income countries and 90% in Türkiye [ 3 , 4 ].

Healthy lifestyle behaviors are considered to be any activity undertaken to promote, maintain or regain health. Behavioral changes made through such activities continue to be an important element of health promotion [ 5 ]. It is known that the healthy lifestyle behaviors exhibited by individuals are affected by many factors, such as age, sex, educational status, socioeconomic status and chronic diseases [ 6 , 7 ].

Healthy lifestyle behaviors, physical activity, personal responsibility, sleep, stress and social support, nutrition, smoking and alcohol use are the main elements in our scale development study. Healthy lifestyle behaviors such as regular physical activity, healthy eating, sleeping 7–8 h a day, and maintaining weight control reduce the risk of mortality [ 8 ]. It has been proven that regular physical activity improves health-related quality of life, contributes to the management of chronic diseases, prevents weight gain and is beneficial for mental health and cognitive functions [ 9 ]. Personal health responsibility is the behavior that an individual should perform to maintain physical, mental and social well-being. However, to fulfill this responsibility, information and social support and all necessary facilities must be provided by health providers and nongovernmental organizations [ 10 ]. Sleep has an important role in supporting mental health [ 11 ]. Insufficient sleep leads to depressive mood, increased anxiety, obesity and loss of attention [ 12 ]. Stress is a situation whose severity varies according to personal perception and causes physical and mental discomfort and tension depending on various factors [ 13 ]. Stress leads to different conditions, such as musculoskeletal and sleep problems and myocardial ischemia [ 14 , 15 ]. According to the 2020 World Health Organization (WHO) guidelines, there are behavioral recommendations for combating stress, being in contact with the environment in accordance with one's own values, and behavioral recommendations against situations that cause stress [ 16 ]. For nutrition, according to the CINDI guidelines developed by the WHO, for plant-based nutrition, products with low sugar content and daily salt consumption should not exceed one teaspoon (6 g), alcohol consumption should not exceed 20 g per day, and food in a safe hygienic environment; it is recommended to cook by steaming or boiling [ 17 ]. Smoking may negatively affect people's health [ 18 ]. In 2020, 22.3% of the global population were smokers, and there were more than 8 million smoking-related deaths in 2019 [ 19 , 20 ]. Harmful use of alcohol, a toxic and psychoactive substance, is responsible for 5.1% (7.1% for men; 2.2% for women) of the global burden of disease [ 21 ].

To obtain more accurate and valid information on health prevention initiatives, it is necessary to evaluate the health behaviors of individuals with valid and reliable tools suitable for their culture. Walker et al. (1987) developed the 'Healthy Lifestyle Behaviors Scale' based on Peder's health promotion model. The first version of the Health-Promoting Lifestyle Profile consists of 48 items and six factors (self-actualization, health responsibility, exercise, nutrition, interpersonal support, and stress management) [ 22 ]. The scale was reworked and revised in 1996 and named the Health-Promoting Lifestyle Profile II, which consists of 52 items and six factors (spiritual growth, interpersonal relations, nutrition, physical activity, health responsibility, and stress management) [ 23 ]. Recommendations on healthy lifestyle behaviors are changing according to researches, and guidelines are being updated. Therefore, the scales we use to measure healthy lifestyle behaviors should be based on up-to-date information. Accordingly, there is a need for a scale that includes all aspects of healthy living behaviors and is based on up-to-date information. The validity and reliability of the actual scales were found to be high, but this scale does not include important issues such as sexual health, dental health, smoking and alcohol use, vaccinations and screening programs. Many studies in the literature have focused on these points and emphasized the negative health outcomes in these issues [ 16 , 17 , 24 , 25 ].

We carried out this study due to the need for a new scale based on current information in line with guidelines, directives, and studies on actual health behaviors that is generalizable to the population and includes all aspects of healthy lifestyle behaviors. The target of this study was to develop a brief, easy-to-use, easy-to-interpret scale with good psychometric properties that is based on current knowledge and includes many aspects of healthy living behaviors.

This study aimed to develop the Healthy Lifestyle Behavior Scale, investigate its psychometric properties and evaluate its validity and reliability. In addition, as a secondary aim, we wanted to evaluate the relationships between sociodemographic factors and healthy lifestyle behavior scale, which have been shown to be valid and reliable, as a standard exploration of the participants.

2 Materials and methods

2.1 study design and sampling.

For this methodological research, a cross-sectional online-based open survey was conducted with 330 participants between 21 March 2023 and 31 March 2023. The target population and inclusion criteria consisted of Turkish citizens aged 18–65 years with internet access, who were users of WhatsApp and/or Instagram, literate Turkish and willing to participate in the study. The study setting was online, utilizing social media platforms for participant recruitment and data collection. Study data were collected through the LimeSurvey platform. The survey consisted of two pages; The first page of the survey consisted of 18 questions (demographic, socioeconomic, and characteristic) and the second page consisted of a 59-item scale. The usability and technical functionality of the electronic questionnaire were tested by the researchers before the survey was administered; these tests were not included in the data. Participants received the survey link via WhatsApp and Instagram applications. The data were obtained by the researchers by distributing the questionnaires to various groups on the specified platforms.

Researchers shared stories and posts from their own social media accounts (WhatsApp, Instagram) and distributed the survey link to groups they were part of or could reach. Participation in the survey was voluntary and open to anyone over the age of 18 who could read and understand Turkish. Any incentives were not offered to participants (eg, monetary, prizes, or non-monetary incentives such as an offer to provide the survey results). An information form was included at the beginning of the questionnaire. The respondents were informed about the aim of the study and the study, confidentiality/anonymity of the data, the number of questions and the estimated completion time of the survey at the beginning of the survey. The survey questions were displayed in the same order for each participant. To submit the questionnaire, all questions had to be answered. The respondents were able to review and change their answers with a back button. Only the data of the respondents who completed the questionnaire were included. Although the total reach of the posts is unknown, 465 people clicked on the link and viewed the first page of the survey. Of these, 330 respondents gave a complete response (response rate 71.0%). To prevent duplicate participation, “cookie usage” was selected in the survey interface. In addition, at the beginning of the survey, respondents were instructed to participate only once.

The general recommendation for sample size in the guidelines is a ratio of approximately 5 to 10 subjects per item up to approximately 300 subjects [ 26 ]. Since the candidate scale contains 59 questions, the sample size was determined to be 300 participants. Data collection ended with 330 participants using the convenience sampling method. Convenience sampling was employed because of its ease of access and time-saving benefits.

2.2 Survey development

2.2.1 item creation and internal validity evaluation.

The authors conducted a literature review for item generation and scale constructs. The WHO guidelines and directives were taken into consideration when creating the items in the newly developed scale. At this stage, 90 items were created for the 7 constructs of the pooled scale (Exercise, Health Responsibility, Preventive Health Actions, Sleep, Stress and Social Support, Nutrition, Smoking and Alcohol).

The questionnaire created in the subdimensions of exercise, health responsibility, preventive health actions, sleep, stress and social support, nutrition, smoking and alcohol was adjusted according to a 5-point Likert scale. (0: never 1: rarely, 2: sometimes, 3: often, 4: always; if it is a negative statement, 0: always 1: often, 2: sometimes, 3: rarely, 4: never (if it is a positive statement, 1: never 2: rarely, 3: sometimes, 4: often, 5: always; if it is a negative statement, 1: always 2: often, 3: sometimes, 4: rarely, 5: never).

The internal validity of the questions was analyzed by the content validity ratio (CVR) and content validity index (CVI). The scale was sent to 30 experts via e-mail or the face-to-face method. Sixteen expert opinions were received. These were experts in internal medicine, psychiatry, psychology, nutrition and dietetics, and nursing.

To determine the CVR, the experts were asked for their opinions using a three-point scale (appropriate, appropriate but not necessary, unnecessary) for each item.

The following formula was applied to determine the CVR.

In the formula, N is the total number of experts, and n e is the number of experts who selected the main item. The Lawshe technique was used to evaluate the content validity ratio (CVR).

According to the Lawshe technique, a minimum CVR of 0.500 and above was considered appropriate for an evaluation with 16 experts (p < 0.05) [ 27 ].

The content validity index (CVI) was assessed after CVR.

The CVI was calculated with the following formula:

The criterion for the content validity of the items was set. Items were considered adequate if there was > 79% agreement, questionable if there was 70–79% agreement, and unacceptable if there was < 69% agreement [ 28 ].

After content validity analyses, the first scale to be applied to the participants was determined.

2.2.2 Construct validity

Principal component analysis (PCA) was applied to demonstrate the construct validity of the following constructs: exercise, health responsibility, preventive health action, sleep, stress and social support, nutrition, smoking and alcohol. Principal component analysis (PCA) was used to identify items and constructs. After exploratory factor analysis (EFA), the validation of the constructs and items was tested. Eigenvalues greater than 1 were accepted for factor identification in PCA. Items with factor loadings < 0.4 or factor loading differences < 0.1 were excluded. The Kaiser‒Meyer‒Olkin (KMO) value was 0.807, and Bartlett’s test was statistically significant (p < 0.001).

2.2.3 Reliability

Cronbach's alpha coefficient and the item-rest correlation were used to evaluate the internal consistency of the scale and subdimensions.

2.2.4 Other variables

The demographic variables included age, sex, height and weight. The body mass index (BMI) of the participants was calculated using self-reported weight and height (kg/m 2 ), and participants were grouped as normal weight (≤ 25), overweight or obese (≥ 25) according to BMI values. Participants were asked to indicate marital status as married or single and having children as yes or no. Participants were asked about their education level (high school or less or university or high). Employment status was evaluated in three groups: working, not working and retired. Income status was evaluated in three groups considering local minimum income limits: < 9,000 Turkish Liras (TL), 9,001–18,000 TL and > 18,000 TL. Participants were asked whether they had a known chronic disease and regular medication use due to a disease. If they had chronic disease, they were asked to specify whether they had hypertension, diabetes, coronary artery disease or other disease. Participants were asked “how many days in the last 7 days in total they had been physically active for at least 60 min a day” by defining physical activity, and the physical activity day value was evaluated as a numerical variable. The self-rated health status of the participants was evaluated by answers of excellent, good, fair and poor to the question “How do you think your health is?”. Responses were grouped as excellent/good or fair/poor.

The Cantril ladder method was used to measure life satisfaction. The measure is presented pictorially as an 11-point ladder from 0 to 10, with 10 points indicating ‘the best possible life’ for the individual and 0 points indicating ‘the worst possible life’. Participants were asked, “Where do you feel you are standing on the ladder right now?” and asked to answer. According to the participants’ answers to the life satisfaction question, values of 7 and above were grouped as high life satisfaction. For quality of life, participants were asked to score between 0 and 100. A score of 100 indicates “great quality of life”, “95 almost great quality of life”, “85 very good quality of life”, “70 good quality of life”, “60 moderately good quality of life”, “40 somewhat poor quality of life”, “30 poor quality of life”, “15 very poor quality of life”, and 0 indicates “extremely poor quality of life”. Participants were asked, “At what level do you currently feel your quality of life is?” and asked to answer. According to the participants’ answers to the quality of life question, they were grouped as high quality of life (70 and above) or very high quality of life (85 and above).

2.3 Statistical analysis

The Statistical Package for the Social Sciences version 21.0 for Windows (IBM Corp., Armonk, NY, USA), Jamovi 2.3.18 and Microsoft Office Excel were used for data evaluation and analysis. Categorical variables were presented as frequencies (n) and percentages (%), and numerical variables were presented as the mean ± standard deviation (SD) and median (interquartile range (IQR)). The Kolmogorov‒Smirnov test was applied to evaluate the normal distribution of continuous variables. Univariate hypothesis tests were applied to compare the scale scores according to demographic and socioeconomic factors. Independent samples t test was used to compare scale scores between two independent groups; one-way ANOVA was used to compare scale scores between more than two independent groups. The Mann‒Whitney U test was used to compare each item score between the 27% lower–upper groups. Principal component analysis, Cronbach's alpha and Spearman's correlation analysis were used for validity and reliability. A p value < 0.05 was accepted for statistical significance.

3.1 Demographic characteristics

A total of 330 people participated in the study, and most of the participants (65.2%) were female. The mean participant age was 34.2 ± 9.4 years. The mean BMI was 24.9 ± 4.2, and 54.2% of the participants had a BMI < 25. A total of 61.5% of the participants were married, 52.7% had children, 85% were university graduates, and 76.1% were employed. Most of the participants (68.8%) had an income of 18,000 TL or above. A total of 21.2% of the participants had a chronic disease, and 18.2% were taking regular medication. Hypertension was reported by 4.8%, diabetes mellitus by 2.4%, coronary artery disease by 2.1% and other chronic diseases by 18.8%. A total of 53.6% of the participants described their health status as excellent-good. Of the participants, 50.6% reported high quality of life, 17% reported very high quality of life, and 53% reported high life satisfaction.

3.2 Content validity analysis

The scale pool consisted of 90 items. In the content validity analyses, 59 items that met the content validity ratio (CVR) ≥ 0.50 and content validity index (CVI) > 79% criterion remained in the scale. Of the 59 items, 7 were related to exercise, 13 to health responsibility, preventive health actions, 7 to sleep, 10 to stress and social support, 13 to nutrition, 5 to smoking and 4 to alcohol.

3.3 Construct validity analysis

Exploratory factor analysis (EFA) was applied for construct validity. The Kaiser‒Meyer‒Olkin (KMO) index was 0.807, Bartlett's test was significant (p < 0.001), and the total variance explained by the 9-factor structure was 62.35.

Exploratory factor analysis (EFA) revealed a 9-factor structure. The items designed as personal responsibility constructs were divided into two different constructs. The stress and social support constructs were divided into two different constructs. The 25 items with low factor loadings (< 0.4), factor loading differences < 0.1 and loadings of two factors were removed. A 9-factor structure consisting of 34 items was obtained. Of the 34 items, 5 were exercise, 8 were personal health responsibility (4 were evaluated as health responsibility, 4 were evaluated as preventive health actions in different constructs), 2 were sleep, 5 were social support, 3 were stress management, 5 were nutrition, 4 were smoking and 2 were alcohol (Table  1 ). The difference between the groups with the highest score of 27% and the lowest score of 27% in all 34 items (Table  2 ).

3.4 Reliability analysis

The scale subscales had good reliability results. The Cronbach’s alpha value was 0.863 for the scale and above 0.7 for the subscales. The item-rest correlation and Cronbach’s alpha values of the subscales are presented in Table  3 .

There was a significant correlation above 0.30 between the total scale score and the other subcategories except alcohol. There was a very weak correlation between alcohol and the total score of the scale (r = 0.14, p = 0.01). When the relationship between alcohol and other subcategories was analyzed, a significant and very weak relationship was found with smoking (r = 0.20, p < 0.01).

3.5 Scale scoring

To develop a 0–100 point scale, never (0), rarely (1), sometimes (2), often (3), and always (4) points were accepted. The item scores were summed, divided by the total number of items and multiplied by 25 to obtain a scale scored in the range of 0–100 points. For the factors, the scores of the items loaded on the factor were summed, divided by the number of items in each factor and multiplied by 25 to obtain factor scores in the range of 0–100 points.

3.6 Associations with demographic and socioeconomic factors

When the relationship between the sociodemographic characteristics of the participants and healthy life behavior scores was examined, a significant relationship was observed between sex, BMI, educational status, income level, DM, self-rated health status, high quality of life, high life satisfaction and healthy life behavior (p < 0.05). There was no significant relationship between marital status, having children, employment status, having chronic diseases, HT, regular medication use and healthy living behavior (p > 0.05).

Compared with male participants, female participants had a greater mean total score on the Healthy Living Behavior Scale (60.7 ± 13.1; 54.7 ± 13.3, p < 0.001). Participants with a BMI < 25 had a higher mean total score on the healthy living behavior scale (p = 0.013). Participants with an educational level of university and above had a higher mean total score on the healthy living behavior scale (p = 0.019). Participants with higher income status had higher scores on the Healthy Living Behavior Scale than did those in other income groups (p = 0.001). Patients without DM had a greater mean total score on the healthy living behavior scale than patients with DM (p = 0.032). Participants who described their health status as excellent-good had a higher mean total score on the healthy life behavior scale than those who described their health status as fair-poor (p < 0.001). The scale scores were greater for participants with high quality of life and high life satisfaction (p < 0.001). The relationships between the participants' healthy living behavior scale scores and their sociodemographic characteristics are presented in Table  4 .

4 Discussion

In our study, the validity and reliability of the scale, which was created in light of current information and recommendations, were demonstrated. The Cronbach’s alpha value was 0.863 for the scale and above 0.7 for the subscales. These values are in the “satisfactory to good” range [ 29 ]. The KMO index was 0.807; Bartlett's test was significant. [ 30 ]. A 9-factor structure consisting of a total of 34 items was obtained. The total variance explained by the 9-factor structure was 62.35. In our study, a significant relationship was found between sex, BMI, educational status, income level, DM, self-rated health status, high quality of life, high life satisfaction and healthy life behavior. No significant relationships were found between marital status, having children, employment status, having chronic diseases, HT, regular medication use and healthy life behavior.

The overall alpha coefficient of the first version of the ‘Healthy Lifestyle Behaviors Scale’ developed by Walker was 0.92, and the variance explained by the six factors was 47.1%. In 1996, the scale was revised and named the Health-Promoting Lifestyle Profile II, and the Cronbach’s alpha value was found to be 0.94 for the total scale [ 22 ]. The validity and reliability of these scales in Türkiye have been examined in different studies [ 31 , 32 ]. The alpha value for the first version of the 48-item scale was found to be 0.91 in Esin’s study and 0.90 in Akça’s study. The alpha values of the subfactors ranged between 0.55 and 0.84 in Esin’s study and between 0.52 and 0.81 in Akça’s study [ 31 , 32 ]. By identifying areas where these scales were inadequate, we developed our new scale, which is more up-to-date. The acceptable item-rest correlation for a multidimensional questionnaire/scale ranges between 0.2 and 0.4 [ 33 ]. In our study, this value was found to be below 0.2 only for alcohol substances. The effects of alcohol on health are well known. In the short term, consequences such as falls, drowning, murder, suicide, and alcohol poisoning may occur. Excessive alcohol use is associated with many diseases, such as high blood pressure, heart disease, stroke, liver disease, digestive problems, and various cancers. The CDC also emphasizes that excessive alcohol consumption is harmful to health [ 34 , 35 ]. According to the World Health Organization, 3 million deaths worldwide each year are caused by the harmful use of alcohol. This represents 5.3% of all deaths. The harmful use of alcohol causes social and economic losses to individuals and society. A total of 13.5% of the total deaths between the ages of 20 and 39 can be attributed to alcohol [ 36 ]. For these reasons, it is important to ask about alcohol use when assessing healthy living behavior, as we did in our scale; therefore, alcohol consumption items that were not present in previous scales were retained in the scale.

In this study, exercise was found to be a factor with five items and a high factor load. Physical activity is a risk reducer for breast cancer, colorectal cancer, diabetes, heart disease, etc. Physical activity reduces the risk of high blood pressure and stroke, improves mental health and cognitive function and prevents weight gain. It helps to age in a healthy way. It improves sleep, reduces the risk of falls, improves balance and joint mobility, helps protect weak bones and prevents muscle loss [ 37 , 38 ]. According to the World Health Organization, the recommended duration of physical activity for adults is at least 150–300 min of moderate-intensity aerobic physical activity, at least 75–150 min of high-intensity aerobic physical activity or an equivalent combination of moderate and high-intensity activity per week [ 38 ]. Therefore, being physically active is very important for healthy life behavior.

Walkers et al. questioned health responsibility on the scale of health responsibility, generally through consultation with experts and reading. In our developed scale, personal health responsibility formed two factors in terms of consultation and behavior. From a public health perspective, primary prevention is important and the first goal. For this reason, vaccinations, screenings, and prevention of transmission of sexually transmitted diseases are important topics in terms of public health [ 24 ]. These topics are getting vaccinations that are recommended but not included in the routine vaccination program, having annual check-ups for dental health, undergoing recommended cancer screenings and researching sexual health and methods of prevention against sexual diseases or getting expert opinion. What makes this study unique is that we have developed a scale that includes these topics.

In this study, sleep pattern was found to be a factor with two items and had a high factor load. According to the American Academy of Sleep Medicine, sleep is essential for healthy life behaviors such as nutrition and exercise. According to the AASM, adolescents between the ages of 13 and 18 years should sleep 8–10 h each night to support optimal health, while adults should sleep at least 7 h each night [ 39 ].

In this scale, smoking constituted a factor with four items. According to the CDC, smoking causes cancer, heart disease, stroke, lung disease, diabetes and chronic obstructive pulmonary disease. Exposure to cigarette smoke causes approximately 41,000 deaths among nonsmoking adults and 400 deaths among infants each year. Passive smoking causes stroke, lung cancer and coronary heart disease in adults [ 40 ].

This study shows that sex is an important social determinant of health, shaping how women and men engage in health behaviors. In our study, the mean total score of female participants on the Healthy Living Behavior Scale was greater than that of male participants. The Gender Equality Index 2021 report and studies generally frame women as engaging in health-promoting behaviors, while men are considered to adopt risky behaviors despite their harmful consequences [ 41 , 42 ]. Among the participants, those with a BMI < 25 had a higher mean total score on the healthy lifestyle behavior scale. In support of our findings, there are results showing an inverse relationship between a healthy lifestyle and BMI [ 43 , 44 ]. However, it has also been observed that obese and overweight people exhibit healthier behaviors in terms of diet and exercise compared to those with normal weight [ 45 ]. In our study, the healthy lifestyle behavior scale scores of participants with an educational level of university and above and participants with a higher income level were higher than those of the other groups. There are findings supporting our study in our country and abroad [ 18 , 46 , 47 ]. We think that the fact that financial means facilitate access to both preventive and curative health services and that the thought of having this opportunity gives confidence to people may cause this result. Participants were asked to rate their health status as excellent, good, fair or poor. The answer to this short question is considered to be a dynamic assessment that evaluates the trajectory of health, not just current health at a given time. This self-assessment is then thought to influence behaviors that affect health status [ 48 ]. In our study, those who described their health status as excellent-good had significantly higher healthy living behavior scores than did those who described their health status as fair-poor. In a study conducted in retired adults, the health behavior score of those who rated their perceived health status as very good was significantly greater than that of those who rated their health status as good or poor [ 49 ].

5 Strengths and limitations

This study has several limitations. One of the important limitations of this study is the convenience sampling method, which is a nonprobability sampling method. Despite its benefits, such as being cost-effective and less time-consuming, the generalizability of the sample to the population is limited, and its ability to represent a large population is low. Therefore, future research should aim to use probability-sampling techniques to increase the representativeness and validity of the findings [ 50 ]. Data collection using the electronic survey method resulted in a limited population of literate individuals and those with internet access. This factor limits the external validity of the study. The study was conducted as a self-report survey; participants may have been hesitant to provide information or may have given incorrect information.

In addition to its limitations, this study has several strengths. The overall internal consistency coefficient of the scale (Cronbach's alpha = 0.863) shows that this scale is reliable. The alpha values of the subfactors and the total explained variance were greater than those of the other scales. This shows that it provides more consistent and reliable results than other scales and that it measures the researched characteristics better. In addition, our study included items such as smoking, alcohol, dental health, vaccination and sexual health, which are important factors in healthy life behaviors.

6 Conclusion

The Healthy Lifestyle Behavior Scale can be used as a reliable and valid tool in the assessment of healthy lifestyle behaviors. This will enable effective planning and implementation of health interventions. Although this study was conducted with a sample with limited generalizability, it has high validity and reliability. Practices in different community groups or with high levels of participation are also needed. It is also suitable for use in different languages and populations through validity and reliability studies.

Data availability

The data that support the findings of this study are not openly available due to legal and ethical restrictions but are available from the corresponding author upon reasonable request.

Loef M, Walach H. The combined effects of healthy lifestyle behaviors on all cause mortality: a systematic review and meta-analysis. Prev Med. 2012;55(3):163–70. https://doi.org/10.1016/j.ypmed.2012.06.017 .

Article   PubMed   Google Scholar  

Danaei G, Ding EL, et al. The preventable causes of death in the United States: comparative risk assessment of dietary, lifestyle, and metabolic risk factors. PLoS Med. 2009;6(4):e1000058. https://doi.org/10.1371/journal.pmed.1000058 .

Article   PubMed   PubMed Central   Google Scholar  

World Health Organization: WHO noncommunicable diseases. 2022. https://www.who.int/news-room/fact-sheets/detail/noncommunicable-diseases . Accessed 24 Aug 2023

World Bank Open Data. Cause of death, by non-communicable diseases (% of total)—high income. https://data.worldbank.org/indicator/SH.DTH.NCOM.ZS?end=2019&locations=XD&name_desc=false&start=2000&view=chart . Accessed 24 Aug 2023

World Health Organization. Health promotion glossary of terms 2021. https://apps.who.int/iris/bitstream/handle/10665/350161/9789240038349-eng.pdf . Accessed 28 Aug 2023.

Ochieng BM. Factors affecting choice of a healthy lifestyle: implications for nurses. Br J Community Nurs. 2006;11(2):78–81. https://doi.org/10.12968/BJCN.2006.11.2.20445 .

Zhen J, Liu S, Zhao G, et al. Impact of healthy lifestyles on risk of hypertension in the Chinese population: finding from SHUN-CVD study. Fam Pract. 2023. https://doi.org/10.1093/fampra/cmad041 .

Fernández-Ballesteros R, Valeriano-Lorenzo E, et al. Behavioral lifestyles and survival: a meta-analysis. Front Psychol. 2022. https://doi.org/10.3389/fpsyg.2021.786491 .

World Health Organization. WHO guidelines on physical activity and sedentary behavior: at a glance. 2020. https://apps.who.int/iris/handle/10665/337001 . Accessed 28 Aug 2023.

Avcı YD. Personal health responsibility. TAF Prev Med Bull. 2016. https://doi.org/10.5455/pmb.1-1445494881 .

Article   Google Scholar  

Scott AJ, Webb, et al. Improving sleep quality leads to better mental health: a meta-analysis of randomised controlled trials. Sleep Med Rev. 2021. https://doi.org/10.1016/j.smrv.2021.101556 .

Tarokh L, Saletin, et al. Sleep in adolescence: physiology, cognition and mental health. Neurosci Biobehav Rev. 2016;70:182–8. https://doi.org/10.1016/j.neubiorev.2016.08.008 .

Duhault JL. Stress prevention and management: a challenge for patients and physicians. Metabolism. 2022;51(6):46–8. https://doi.org/10.1053/meta.2002.33192 .

Article   CAS   Google Scholar  

Hämmig O. Work-and stress-related musculoskeletal and sleep disorders among health professionals: a cross-sectional study in a hospital setting in Switzerland. BMC Musculoskelet Disord. 2020;21(1):1–11. https://doi.org/10.1186/s12891-020-03327-w .

Jiang W, Samad, et al. Prevalence and clinical characteristics of mental stress–induced myocardial ischemia in patients with coronary heart disease. J Am Coll Cardiol. 2013;61(7):714–22. https://doi.org/10.1016/j.jacc.2012.11.037 .

World Health Organization. Mental health and substance use. Doing what matters in times of stress. 2020. https://www.who.int/publications/i/item/9789240003927 . Accessed 15 Aug 2023.

World Health Organization. Regional Office for Europe. CINDI dietary guide. 2020. https://apps.who.int/iris/handle/10665/108342 . Accessed 15 Aug 2023.

Al-Othman N, Ghanim, et al. Comparison between smoking and nonsmoking palestinian medical students in the health-promoting behaviors and lifestyle characteristics. Biomed Res Int. 2021. https://doi.org/10.1155/2021/5536893 .

Tobacco, N. WHO global report on trends in prevalence of tobacco use 2000–2025, 4th edn. 2021. https://www.who.int/publications/i/item/9789240039322 . Accessed 15 Aug 2023.

Murray CJ, Aravkin, et al. Global burden of 87 risk factors in 204 countries and territories, 1990–2019: a systematic analysis for the global burden of disease study 2019. Lancet. 2020;396(10258):1223–49. https://doi.org/10.1016/s0140-6736(20)30752-2 .

World Health Organization. Harmful use of alcohol. 2018. https://www.who.int/health-topics/alcohol#tab=tab_1 . Accessed 15 Aug 2023.

Walker SN, Sechrist KR, Pender NJ. The health-promoting lifestyle profile: development and psychometric characteristics. Nursing Res. 1987;36(2):76–81.

Walker SN, Sechrist KR, Pender NJ. Health Promotion Model - Instruments to Measure Health Promoting Lifestyle : HealthPromoting Lifestyle Profile [HPLP II] (Adult Version). https://deepblue.lib.umich.edu/handle/2027.42/85349 . Accessed 25 Aug 2023

Conner M. Health behaviors. In: Wright JD, editor. International encyclopedia of the social & behavioral sciences. 2nd ed. Amsterdam: Elsevier; 2015. p. 582–7.

Chapter   Google Scholar  

Ruiz MC, Devonport, et al. A cross-cultural exploratory study of health behaviors and wellbeing during COVID-19. Front Psychol. 2020;11:608216. https://doi.org/10.3389/fpsyg.2020.608216 .

DeVellis RF. Scale development: theory and applications. 4th ed. Washington DC: Sage publications; 2016.

Google Scholar  

Ayre C, Scally, et al. Critical values for Lawshe’s content validity ratio: revisiting the original methods of calculation. Meas Eval Couns Dev. 2014;47(1):79–86. https://doi.org/10.1177/0748175613513808 .

Hyrkäs K, Appelqvist-Schmidlechner, et al. Validating an instrument for clinical supervision using an expert panel. Int J Nurs Stud. 2003;40(6):619–25. https://doi.org/10.1016/s0020-7489(03)00036-1 .

Hair JF, Risher, et al. When to use and how to report the results of PLS-SEM. Eur Bus Rev. 2019;31(1):2–24. https://doi.org/10.1108/EBR-11-2018-0203 .

IBM. SPSS Statistics documentation. KMO and Bartlett’s Test. https://www.ibm.com/docs/tr/spss-statistics/29.0.0?topic=detection-kmo-bartletts-test . Accessed 25 Aug 2023.

Esin-Özabacı M. Determination and development of health behaviors of workers working in industrial area [dissertation]. Istanbul: Istanbul University Institute of Health Sciences Department of Nursing; 1997.

Akça Ş. Health promotion behaviors of university instructors and evaluation of factors affecting these behaviors [dissertation]. İzmir: Ege University Institute of Health Sciences; 1998.

Hobart J, Cano, et al. Improving the evaluation of therapeutic interventions in multiple sclerosis: the role of new psychometric methods. Health Technol Assess. 2009;13(12):1–200. https://doi.org/10.3310/hta13120 .

CDC (Centers for Disease Control and Prevention). Drinking too much alcohol can harm your health. Learn the facts. https://www.cdc.gov/alcohol/fact-sheets/alcohol-use.htm . Accessed 25 Aug 2023.

World Health Organization. Global status report on alcohol and health. 2018. https://www.who.int/publications/i/item/9789241565639 . Accessed 25 Aug 2023.

World Health Organization: WHO. Alcohol. https://www.who.int/news-room/fact-sheets/detail/alcohol . Accessed 18 Jul 2023.

Centers for Disease Control and Prevention. Physical activity. https://www.cdc.gov/physicalactivity/about-physical-activity/index.html . Accessed 1 Jul 2023.

World Health Organization: WHO. Physical activity. https://www.who.int/news-room/fact-sheets/detail/physical-activity . Accessed 14 Jul 2023.

Troy, D. Healthy Sleep—Sleep Education by the AASM. Sleep Education. https://sleepeducation.org/healthy-sleep/ Accessed Jul 18 2023.

Centers for Disease Control and Prevention. Health effects of smoking and tobacco use. https://www.cdc.gov/tobacco/basic_information/health_effects/index.htm Accessed Jul 18 2023.

European Institute for Gender Equality. Health and risk behaviors are clearly gendered. https://eige.europa.eu/publications-resources/toolkits-guides/gender-equality-index-2021-report/health-and-risk-behaviours-are-clearly-gendered Accessed 5 Jul 2023.

Deeks A, Lombard, et al. The effects of gender and age on health related behaviors. BMC Publ Health. 2009;9(1):1–8. https://doi.org/10.1186/1471-2458-9-213 .

Bulló M, Garcia-Aloy, et al. Association between a healthy lifestyle and general obesity and abdominal obesity in an elderly population at high cardiovascular risk. Prev Med. 2011;53(3):155–61. https://doi.org/10.1016/j.ypmed.2011.06.008 .

Marconcin P, Ihle, et al. The association of healthy lifestyle behaviors with overweight and obesity among older adults from 21 countries. Nutrients. 2021;13(2):315. https://doi.org/10.3390/nu13020315 .

Stanziano DC, Phoebe, et al. Differences in health-related behaviors and body mass index risk categories in African American women in college. J Natl Med. 2011;103(1):4–8. https://doi.org/10.1016/s0027-9684(15)30236-4 .

Braveman P, Gottlieb, et al. The social determinants of health: it’s time to consider the causes of the causes. Public Health Rep. 2014;129:19–31. https://doi.org/10.1177/00333549141291s206 .

Koçoğlu D, Akın B. The Relationship of Socioeconomic Inequalities with Healthy Lifestyle Behaviors and Quality of Life. Dokuz Eylül Üniversitesi Hemşirelik Yüksekokulu Elektronik Dergisi, 2009;2(4):145–154.

Benyamini Y. Why does self-rated health predict mortality? an update on current knowledge and a research agenda for psychologists. Psychol Health. 2011;26(11):1407–13. https://doi.org/10.1080/08870446.2011.621703 .

Lara J, McCrum L-A, et al. Association of Mediterranean diet and other health behaviours with barriers to healthy eating and perceived health among British adults of retirement age. Maturitas. 2014;79(3):292–8. https://doi.org/10.1016/j.maturitas.2014.07.003 .

Golzar J, Noor S, Tajik O. Convenience sampling. Int J Educ Lang Stud. 2022;1(2):72–7. https://doi.org/10.22034/ijels.2022.162981 .

Download references

Acknowledgements

Not applicable.

The authors report that no funding has been received for the study.

Author information

Authors and affiliations.

Department of Public Health, Cerrahpaşa Faculty of Medicine, Istanbul University-Cerrahpaşa, Kocamustafapasa, Fatih, 34098, Istanbul, Türkiye

Ugurcan Sayili, Kevser Sak, Sumeyye Nur Aydin, Busra Kara, Deniz Turgut & Osman Bisgin

You can also search for this author in PubMed   Google Scholar

Contributions

US designed the study. All authors participated the data collection. KS, SNA, BK, DT, OB performed the statistical analysis. KS, SNA and BK interpreted the results. DT and OB created the tables. KS, SNA, BK, DT and OB wrote the manuscript. US revised the manuscript. US supervised the study. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Ugurcan Sayili .

Ethics declarations

Ethics approval and consent to participate.

This study was approved by the Ethics Committee of Istanbul University-Cerrahpasa Medical Faculty (Approval number and date: 13.01.2023-589421). Online informed consent was obtained from all participants before starting the study. When the survey link was clicked, a page introducing the study and including the informed consent form was opened. Participants who clicked on the “I agree to participate in the study” button reached the page containing the questionnaire and scale. The study was conducted in accordance with the Declaration of Helsinki.

Consent for publication

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/ .

Reprints and permissions

About this article

Sayili, U., Sak, K., Aydin, S.N. et al. Development, validity and reliability of the healthy lifestyle behavior scale. Discov Public Health 21 , 62 (2024). https://doi.org/10.1186/s12982-024-00186-x

Download citation

Received : 16 February 2024

Accepted : 14 August 2024

Published : 21 August 2024

DOI : https://doi.org/10.1186/s12982-024-00186-x

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Healthy lifestyle behavior
  • Reliability
  • Find a journal
  • Publish with us
  • Track your research
  • Open access
  • Published: 19 August 2024

Goniometry and fleximetry measurements to assess cervical range of motion in individuals with chronic neck pain: a validity and reliability study

  • Gabriel Gardhel Costa Araujo 1 , 2 ,
  • André Pontes-Silva 3 ,
  • Plínio da Cunha Leal 1 , 4 ,
  • Bruno Sousa Gomes 2 ,
  • Maisa Lopes Reis 2 ,
  • Sâmira Kennia de Mello Pereira Lima 2 ,
  • Cid André Fidelis-de-Paula-Gomes 5 &
  • Almir Vieira Dibai-Filho 1 , 4  

BMC Musculoskeletal Disorders volume  25 , Article number:  651 ( 2024 ) Cite this article

110 Accesses

Metrics details

To assess the test–retest and inter-rater reliability of goniometry and fleximetry in measuring cervical range of motion in individuals with chronic neck pain.

A reliability study. Thirty individuals with chronic neck pain were selected. Cervical range of motion was measured by goniometry and fleximetry at two time points 7 days apart. To characterize the sample, we used the numerical pain rating scale, Pain-Related Catastrophizing Thoughts Scale, and Neck Disability Index. Intraclass correlation coefficient (ICC), standard error of measurement (SEM) and minimum detectable change (MDC) were calculated. Correlations between goniometry and fleximetry measurements were performed using Spearman’s correlation coefficient (rho).

For goniometry, we found excellent test–retest reliability (ICC ≥ 0.986, SEM ≤ 1.89%, MDC ≤ 5.23%) and inter-rater reliability (ICC ≥ 0.947, SEM ≤ 3.91%, MDC ≤ 10.84%). Similarly, we found excellent test–retest reliability (ICC ≥ 0.969, SEM ≤ 2.71%, MDC ≤ 7.52%) and inter-rater reliability (ICC ≥ 0.981, SEM ≤ 1.88%, MDC ≤ 5.20%) for fleximetry. Finally, we observed a strong correlation between the goniometry and the fleximetry for all cervical movements (rho ≥ 0.993).

Goniometry and fleximetry measurements are reliable for assessing cervical range of motion in individuals with chronic neck pain.

Peer Review reports

Introduction

Chronic neck pain is a multifactorial condition with variable clinical features [ 1 , 2 ]. Therefore, in addition to the use of scales or questionnaires to measure, for example, pain intensity and disability, cervical range of motion should be part of the physical examination performed in individuals with chronic neck pain to allow a better understanding of the clinical characteristics, in addition to being a measure moderately correlated with pain intensity, disability, and fear of movement [ 3 ].

Various instruments for measuring cervical range of motion have been reported in the scientific literature, such as a smartphone app [ 4 ], cervical range of motion tools [ 5 ], goniometry [ 6 ], and gravity inclinometry. In Brazil, fleximetry is gravity inclinometry with Velcro fastening system [ 7 ]. The goniometry is a low-cost instrument commonly used to measure range of motion of multiple joints. Regarding the evaluation of the reliability of the goniometry in measuring cervical range of motion, Farooq et al.[ 6 ] and Rondoni et al. [ 8 ] identified adequate reliability for measuring cervical range of motion in healthy individuals, with intraclass correlation coefficient (ICC) ranging from 0.79 to 0.98, and Chaves et al. [ 9 ] used a goniometry in healthy individuals, identifying moderate reliability for all cervical movements, with an ICC ranging from 0.44 to 0.54.

The fleximetry is an instrument specially used in Brazil [ 7 ], due to the robust commercialization of this equipment by several companies. It has evaluation characteristics similar to those of the goniometry; however, it uses Velcro to attach the device to the body. Among the few published studies on the cervical region, the most important one found test–retest and inter-rater reliability ranging from moderate to excellent for the use of a fleximetry in healthy individuals [ 7 ].

Previous research in healthy individuals cannot be extrapolated to individuals with chronic neck pain [ 6 , 7 ]. Previous studies indicate that individuals with chronic neck pain have lower cervical range of motion compared to healthy individuals, and psychological aspects related to chronic pain imply lower cervical range of motion, such as kinesiophobia [ 10 , 11 ]. Furthermore, there is a consensus in the literature regarding the influence of sample characteristics on the measurement properties of instruments, which justifies the need for research on cervical range of motion [ 12 ].

The cervical range of motion (CROM) device has good reliability and is widely used in clinical and research settings [ 13 ]. However, it is more expensive than the regular goniometry and fleximetry. Therefore, the greater the number of reliable instruments, the greater the clinical assessment repertoire. Therefore, the purpose of this study was to assess the test–retest and inter-rater reliability of goniometry and fleximetry in measuring cervical range of motion in individuals with chronic neck pain.

Design and ethics aspects

This is a reliability study whose report is based on the Guidelines for Reporting Reliability and Agreement Studies (GRRAS) [ 14 ]. Individuals signed an informed consent form prior to participation. The study was approved by the institution’s human research ethics committee (opinion number 2.935.437).

Individuals and eligibility criteria

Individual recruitment occurred after the study’s disclosure by means of posters, pamphlets, social networks, and messaging apps from August 2020 to July 2021. The data collection was performed in a private, well-lit, temperature-controlled room, without external noises, located in a physiotherapy clinic in the city of São Luís (Maranhão, northeast of Brazil).

A priori sample size calculation was performed considering a confidence coefficient of 0.95 and an amplitude of the confidence interval for the ICC of 0.30. The calculation was performed to detect moderate reliability (ICC = 0.75) according to the study conducted by Fleiss [ 15 ]. Therefore, a minimum sample size of 24 individuals was estimated. The calculation of the sample size was performed based on the study by Bonett [ 16 ].

The inclusion criteria were: age between 18 and 59 years; either sexes; verbal reports of neck pain for more than ninety days; and a score in the numerical pain rating scale (NPRS) ≥ 3 points [ 17 ]. The exclusion criteria were: the presence of specific neck pain (neck pain attributable to a specific and identifiable cause, such as a history of spinal surgery and/or vertebral fractures; spondylosis and spondylolisthesis; the presence of radiculopathy and/or herniated disc confirmed by imaging and neurological impairment by physical examination with the presence of altered sensitivity, reflex and/or muscle strength); a history of physical therapy interventions for neck pain in the last ninety days or medication (anti-inflammatory, painkillers and/or muscle relaxants) in the last seven days; a medical diagnosis of cancer, rheumatological, neurological, psychiatric, cardiovascular or metabolic diseases; and pregnancy [ 18 , 19 ].

Pain assessment

We used three instruments to characterize the chronic neck pain of the individuals within the biopsychosocial context. Thus, pain intensity, disability, and catastrophizing were assessed to allow an understanding of the main components related to the multidimensional pain assessment.

The NPRS is a scale used to quantify pain intensity by means of a sequence of eleven numbers (0 represents "no pain" and 10 indicates "the worst pain you can imagine"). For pain with movement, we asked the individual to perform flexion, extension, lateral flexion to the left and right, and rotation to the left and right. After that, we evaluated the highest pain intensity perceived by the individual, regardless of which movement evoked the pain. This scale was previously validated in Portuguese [ 17 ].

The Pain-Related Catastrophizing Thoughts Scale (PCTS) is composed of nine items arranged on a Likert scale that varies in a numerical measure from 0 to 5 associated with the words "almost never" and "almost always". The total score is obtained by the sum of the total score, divided by the number of items answered. The final scores range from 0 to 5 points, with higher scores indicating a higher occurrence of catastrophic thoughts according to the version adapted to the Brazilian population [ 20 ].

The Neck Disability Index (NDI) is a questionnaire that has been adapted and validated for the Brazilian population, which is capable of measuring disability in individuals with chronic neck pain. It consists of 10 items with 6 response possibilities, ranging from 0 to 5. The total score varies from 0 to 50 points; the higher the value, the greater the disability [ 21 ].

The Baecke Habitual Physical Activity Questionnaire (BHPAQ) was used to assess the individuals’ habitual physical activity. 14 It is an instrument that has already been validated for the Brazilian population, which measures physical activity in the occupational, sports, and leisure dimensions. The score for each domain varies between 1 and 5 points, with no cutoff points. Lower scores correspond to less active individuals [ 22 ].

Data collection flow

After application of the eligibility criteria and evaluation of neck cervical by an independent researcher (not involved in the measurement of cervical range of motion), the first rater measured the cervical range of motion (flexion, extension, lateral flexion to the left, lateral flexion to the right, rotation to the left, and rotation to the right) using a goniometry or fleximetry (the choice of instrument order was defined by drawing lots, i.e., first goniometry followed by fleximetry or vice versa).

After 10 min of rest, the second rater performed the same measurement. After 10 min of rest, the first rater measured the movements using the instrument that was not previously used and, finally, after 10 min of rest, the second rater also carried out the evaluation with the second instrument (Fig.  1 ). For the reliability analysis, the raters repeated the procedure after a seven-day interval between the test sessions [ 23 ]. The order of movements to be measured was defined by drawing lots before starting the data collection.

figure 1

Flowchart for data collection. NPRS: Numerical pain rating scale; PCTS: Pain-Related Catastrophizing Thoughts Scale; NDI: Neck Disability Index; BHPAQ: Baecke Habitual Physical Activity Questionnaire (BHPAQ); ROM: Range of motion

The raters have at least three years of clinical experience in evaluating and treating individuals with chronic neck pain. Also, the two raters performed four weeks of training prior to the study: lectures were carried out informing the technical details of the equipment; the raters handled the instrument; there was standardization of the approach and verbal command during data collection; a safety protocol established to proceed in the face of adverse reactions (such as increased pain through repetitive movements, nausea and dizziness).

We measured the cervical range of motion for flexion, extension, lateral flexion and rotation with the individual seated according to Marques [ 24 ]. Flexion and extension: the axis of the goniometer was positioned at the level of the seventh cervical vertebra, with the fixed arm kept perpendicular to the ground, and at the end of the movement, the mobile arm was aligned with the earlobe. Lateral flexion (right/left): the axis of the goniometer was positioned over the spinous process of C7, the fixed arm was perpendicular to the ground, and the mobile arm was on the midline of the cervical spine. Rotation (right/left): the goniometer axis was positioned at the center of the head, the fixed arm was positioned at the center of the head, and at the end of the movement, the mobile arm was aligned with the nose. Figure  2 shows the measurement of cervical range of motion using the goniometer.

figure 2

Measurements with goniometer of range of motion for flexion ( A ), extension ( B ), rotation to the right ( C ), lateral flexion to the right ( D ), lateral flexion to the left ( E ), and rotation to the left ( F ). Before and during the movements, all individuals were instructed not to compensate the movement with the trunk (thoracolumbar region), thus isolating the movement specifically to the cervical region

We measured the cervical range of motion for flexion, extension, lateral flexion and rotation according to Florêncio et al. [ 7 ]. Flexion/extension: the volunteer was placed in a seated position, with feet flat on the floor, and fleximeter in the right temporal region. Lateral lateral flexion (right/left): we used the same individual positioning and fleximeter in the central occipital region. Rotation (right/left): we place the individual supine, with their head off the stretcher, and the fleximeter in the upper central region of the skull. Figure  3 shows the measurement of cervical range of motion using fleximetry.

figure 3

Measurements with fleximetry of range of motion for flexion ( A ), lateral flexion to the right ( B ), rotation to the left ( C ), extension ( D ), lateral flexion to the left ( E ), and rotation to the right ( C ). Before and during the movements, all individuals were instructed not to compensate the movement with the trunk (thoracolumbar region), thus isolating the movement specifically to the cervical region

Statistical analysis

The intraclass correlation coefficient (ICC) and 95% of confidence interval were used to determine the test–retest and inter-rater reliability in the measurement of cervical spine movements performed by means of goniometry and fleximetry, considering two-way mixed effects, absolute agreement, multiple raters/measurements [ 25 ]. We also used the standard error of measurement (SEM) and minimum detectable change (MDC) at 95% of confidence interval [ 26 ].

The classification to ICC was based on Fleiss [ 15 ]: values below 0.40 indicate low reliability; between 0.40 and 0.75, moderate reliability; between 0.75 and 0.90, substantial reliability; and greater than 0.90, excellent reliability. Interpretation of the SEM value was as follows: ≤ 5% = very good; > 5% and ≤ 10% = good; > 10% and ≤ 20% = doubtful; and > 20% = negative [ 27 ].

For the correlations between the goniometry and fleximetry, we initially applied the Shapiro–Wilk test, followed by the Spearman’s correlation coefficient (rho). We consider correlations adequate when above 0.7 [ 12 ]. Data processing was performed using the SPSS software, version 17.0 (Chicago, IL, USA), and a significance level of 5% was adopted in all analyses.

Thirty-two individuals were recruited for the study, two of whom were excluded for not attending the retest session. Thus, the final sample consisted of thirty individuals. Table 1 describes the characteristics of the study individuals. The majority are female (70%, n  = 21), with a body mass of 66.24 and a stature of 1.61 (± 0.07).

For goniometry (Tables 2 and 3 ), we found excellent test–retest reliability (ICC ≥ 0.986, SEM ≤ 1.89%, MDC ≤ 5.23%) and inter-rater reliability (ICC ≥ 0.947, SEM ≤ 3.91%, MDC ≤ 10.84%). Similarly, for fleximetry (Tables 4 and 5 ), we found excellent test–retest reliability (ICC ≥ 0.969, SEM ≤ 2.71%, MDC ≤ 7.52%) and inter-rater reliability (ICC ≥ 0.981, SEM ≤ 1.88%, MDC ≤ 5.20%).

We observed a strong correlation between the goniometry and the fleximetry for all cervical movements (rho ≥ 0.993), proving that the instruments are in agreement (Table  6 ).

The instruments tested in this study (goniometry and fleximetry) have excellent reliability in individuals with chronic neck pain when using different times and raters to assess cervical range of motion.

In addition to the goniometry’s reliability for cervical range of motion in healthy individuals [ 6 ], the literature presents the following results for other joints: shoulder [ 28 , 29 ], hip [ 30 ], knee [ 31 ], ankle [ 32 ], finger [ 33 ], wrist [ 34 ], and lower back [ 35 ]. For the fleximetry, in addition to the reliability study in healthy individuals [ 7 , 36 ], only one reliability study was conducted with chronic shoulder pain [ 28 , 29 ]. In our study, we observed higher ICC values (greater than 0.90), which can be explained by 2 reasons: 1) the clinical experience of the raters and their prior training prior to data collection, which leads to standardization in the scoring and consequently to similar values in the measurement of cervical range of motion; and 2) the use of the mean of 3 repetitions in the statistical analysis, which leads to less scatter.

In addition to the ICC, we found adequate SEM (< 5%) and MDC values. According to a previous study, SEM values less than 5% are very good, but for MDC, no interpretative values have been established in the literature [ 27 ]. From previously published reliability studies of cervical range of motion in healthy subjects, Farooq et al. [ 6 ] found SEM values ≤ 3.35º, while Florêncio et al. [ 7 ] did not calculate SEM. MDC was not calculated in either study.

Regarding clinical interpretation, Gajdosik and Bohannon [ 38 ] state that range of motion is just range of motion, although the relationship between cervical range of motion and disability has been described previously [ 37 ]. Therefore, measurement of this joint aspect needs to be complemented by other clinical measures such as pain intensity, disability, and kinesiophobia in individuals with chronic neck pain.

The high magnitude correlations between the goniometry and fleximetry found in the present study, in addition to the excellent reliability of the two instruments, support clinical professionals in choosing which instrument to use in their routine assessment of individuals with chronic neck pain.

Our study has some limitations that need to be described. Our study used analog devices, so it is not possible to extrapolate these results to digital devices for measuring cervical range of motion [ 38 , 39 ]. Furthermore, our results do not support the reliability of goniometry and fleximetry to assess range of motion of other spinal regions. Our study did not assess the clinical stability of individuals’ symptoms at retest, and this should be considered when analyzing the results found here. We suggest that future studies use self-report instruments specifically adapted to the sample of interest, investigate whether the quantity of joint movements affects the quality of the measure under investigation, and whether other methods of measurement are reliable.

Availbility of data and materials

The data and materials in this paper are available from the corresponding author on request.

Data availability

No datasets were generated or analysed during the current study.

Abbreviations

Intraclass correlation coefficient

Standard error of measurement

Minimum detectable change

Spearman’s correlation coefficient

Rampazo ÉP, da Silva VR, de Andrade ALM, Back CGN, Madeleine PM, Arendt-Nielsen LLR. Sensory, motor, and psychosocial characteristics of individuals with chronic neck pain: a case-control study. Phys Ther. 2021;101:1–10.

Article   Google Scholar  

Girasol CE, Dibai-Filho AV, de Oliveira AK, de Jesus Guirro RR. Correlation between skin temperature over myofascial trigger points in the upper trapezius muscle and range of motion, electromyographic activity, and pain in chronic neck pain patients. J Manipulative Physiol Ther. 2018;41:350–7.

Article   PubMed   Google Scholar  

Sarig Bahat H, Weiss PL, Sprecher E, Krasovsky A, Laufer Y. Do neck kinematics correlate with pain intensity, neck disability or with fear of motion? Man Ther. 2014;19:252–8.

Stenneberg MS, Busstra H, Eskes M, van Trijffel E, Cattrysse E, Scholten-Peeters GGM, et al. Concurrent validity and interrater reliability of a new smartphone application to assess 3D active cervical range of motion in patients with neck pain. Musculoskelet Sci Pract. 2018;34:59–65.

Fletcher JP, Bandy WD. Intrarater reliability of CROM measurement of cervical spine active range of motion in persons with and without neck pain. J Orthop Sports Phys Ther. 2008;38:640–5.

Farooq MN, Mohseni Bandpei MA, Ali M, Khan GA. Reliability of the universal goniometer for assessing active cervical range of motion in asymptomatic healthy persons. Pakistan J Med Sci. 2016;32:457–61.

Google Scholar  

Florêncio LL, Pereira PA, Silva ERT, Pegoretti KS, Gonçalves MC, Bevilaqua-Grossi D. Agreement and reliability of two non-invasive methods for assessing cervical range of motion among young adults. Rev Bras Fisioter. 2010;14:175–81.

Rondoni A, Rossettini G, Ristori D, Gallo F, Strobe M, Giaretta F, et al. Intrarater and inter-rater reliability of active cervical range of motion in patients with nonspecific neck pain measured with technological and common use devices: a systematic review with meta-regression. J Manipulative Physiol Ther. 2017;40:597–608.

Chaves TC, Nagamine HM, Belli JFC, de Hannai MCT, Bevilaqua-Grossi D, de Oliveira AS. Reliability of fleximetry and goniometry for assessing cervical range of motion among children. Rev Bras Fisioter. 2008;12:283–9.

Rampazo ÉP, da Silva VR, de Andrade ALM, Back CGN, Madeleine P, Arendt-Nielsen L, et al. Sensory, Motor, and Psychosocial Characteristics of Individuals With Chronic Neck Pain: A Case Control Study. Phys Ther. 2021;101:33774667.

Asiri F, Reddy RS, Tedla JS, Al Mohiza MA, Alshahrani MS, Govindappa SC, et al. Kinesiophobia and its correlations with pain, proprioception, and functional performance among individuals with chronic neck pain. PLoS ONE. 2021;16: e0254262.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Prinsen CAC, Mokkink LB, Bouter LM, Alonso J, Patrick DL, de Vet HCW, et al. COSMIN guideline for systematic reviews of patient-reported outcome measures. Qual Life Res. 2018;27:1147–57.

Williams MA, Williamson E, Gates S, Cooke MW. Reproducibility of the cervical range of motion (CROM) device for individuals with sub-acute whiplash associated disorders. Eur Spine J. 2012;21:872–8.

Kottner J, Audigé L, Brorson S, Donner A, Gajewski BJ, Hróbjartsson A, et al. Guidelines for reporting reliability and agreement studies (GRRAS) were proposed. J Clin Epidemiol. 2011;64:96–106.

Fleiss JL. The design and analysis of clinical experiments. New York: Wiley; 1986.

Bonett DG. Sample size requirements for estimating intraclass correlations with desired precision. Stat Med. 2002;21:1331–5.

Ferreira-Valente MA, Pais-Ribeiro JL, Jensen MP. Validity of four pain intensity rating scales. Pain. 2011;152:2399–404.

Dibai-Filho AV, De Oliveira AK, Girasol CE, Dias FRC, De Jesus Guirro RR. Additional effect of static ultrasound and diadynamic currents on myofascial trigger points in a manual therapy program for patients with chronic neck pain: a randomized clinical trial. Am J Phys Med Rehabil. 2017;96:243–52.

Pontes-Silva A, Avila MA, Fidelis-de-Paula-Gomes CA, Dibai-Filho AV. The Short-Form Neck Disability index has adequate measurement properties in chronic neck pain patients. Eur Spine J. 2021;30:3593–9.>

Sardá-Junior J, Nicholas MK, Pereira IA, Pimenta CA de M, Asghari A, Cruz RMC. Validation of the pain-related catastrophizing thoughts scale. Bangladesh J Med Sci. 2008;34:1–17.

Cook C, Richardson JK, Braga L, Menezes A, Soler X, Kume P, et al. Cross-cultural adaptation and validation of the Brazilian Portuguese version of the Neck Disability Index and Neck Pain and Disability Scale. Spine (Phila Pa 1976). 2006;31:1621–7.

Florindo AA, Dias de Oliveira Latorre M do R, Constante Jaime P, Tanaka T, de Freitas Zerbini CA. Methodology to evaluation the habitual physical activity in men aged 50 years of more. Rev Saude Publica. 2004;38:307–14.

Pinheiro JS, Monteiro OLS, Pinheiro CAB, Penha LMB, Almeida MQG, Bassi-Dibai D, et al. Seated single-arm shot-put test to measure the functional performance of the upper limbs in exercise practitioners with chronic shoulder pain: a reliability study. J Chiropr Med. 2020;19:153–8.

Article   PubMed   PubMed Central   Google Scholar  

Marques AP. Manual de Goniometria. 2 nd edition. São Paulo: Manole; 2003.

Koo TK, Li MY. A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J Chiropr Med. 2016;15:155–63.

Tucci HT, Martins J, Sposito GDC, Maria P, Camarini F. Closed kinetic chain Upper Extremity Stability test (CKCUES test): a reliability study in persons with and without shoulder impingement syndrome. Musculoskelet Disord. 2014;15:1–9.

Ostelo RWJG, De Vet HCW, Knol DL, Van Den Brandt PA. 24-Item Roland-Morris Disability Questionnaire was preferred out of six functional status questionnaires for post-lumbar disc surgery. J Clin Epidemiol. 2004;57:268–76.

MacDermid JC, Chesworth BM, Patterson S, Roth JH. Intratester and intertester reliability of goniometric measurement of passive lateral shoulder rotation. J Hand Ther. 1999;12:187–92.

Article   CAS   PubMed   Google Scholar  

Cools AM, De Wilde L, van Tongel A, Ceyssens C, Ryckewaert R, Cambier DC. Measuring shoulder external and internal rotation strength and range of motion: Comprehensive intra-rater and inter-rater reliability study of several testing protocols. J Shoulder Elb Surg. 2014;23:1454–61.

Pandya S, Florence JM, King WM, Robison JD, Oxman M, Province MA. Reliability of goniometric measurements in patients with Duchenne muscular dystrophy. Phys Ther. 1985;65:1339–42.

Hancock GE, Hepworth T, Wembridge K. Accuracy and reliability of knee goniometry methods. J Exp Orthop. 2018;5.

Youdas JW, Bogard CL, Suman VJ. Reliability of goniometric measurements and visual estimates of ankle joint active range of motion obtained in a clinical setting. Arch Phys Med Rehabil. 1993;74:1113–8.

van Kooij YE, Fink A, Nijhuis-van der Sanden MW, Speksnijder CM. The reliability and measurement error of protractor-based goniometry of the fingers: A systematic review. J Hand Ther. 2017;30:457–67.

Reissner L, Fischer G, List R, Taylor WR, Giovanoli P, Calcagni M. Minimal detectable difference of the finger and wrist range of motion: Comparison of goniometry and 3D motion analysis. J Orthop Surg Res. 2019;14:1–10.

Mieritz RM, Bronfort G, Kawchuk G, Breen A, Hartvigsen J. Reliability and measurement error of 3-dimensional regional lumbar motion measures: A systematic review. J Manipulative Physiol Ther. 2012;35:645–56.

Kubas C, Chen YW, Echeverri S, McCann SL, Denhoed MJ, Walker CJ, et al. Reliability and validity of cervical range of motion and muscle strength testing. J Strength Cond Res. 2017;31:1087–96.

Kim S, Kang K, Lee K. A correlation study on pain, range of motion of neck, Neck Disability Index and grip strength after thoracic manipulation and cervical stabilization training in chronic neck pain. J Korean Phys Ther. 2017;29:158–63.

Fraeulin L, Holzgreve F, Brinkbäumer M, Dziuba A, Friebe D, Klemz S, et al. Intra- and inter-rater reliability of joint range of motion tests using tape measure, digital inclinometer and inertial motion capturing. PLoS ONE. 2020;15: e0243646.

Torres R, Silva F, Pedrosa V, Ferreira J, Lopes A. The Acute Effect of Cryotherapy on Muscle Strength and Shoulder Proprioception. J Sport Rehabil. 2017;26:497–506.

Download references

Acknowledgements

We extend our gratitude to all the patients who so kindly volunteered to participate in this research, thereby forming our study sample.

This study was partially supported by the Coordination for the Improvement of Higher Education Personnel (CAPES, code 001) and by the Fundação de Amparo à Pesquisa e ao Desenvolvimento Científico e Tecnológico do Maranhão (FAPEMA, grant BM-01622/21). The funding source had no role in the study design, collection, analysis, interpretation of data, writing of the report, or in the decision to submit the article for publication.

Author information

Authors and affiliations.

Postgraduate Program in Physical Education, Department of Physical Education, Universidade Federal do Maranhão, São Luís, Maranhão, Brazil

Gabriel Gardhel Costa Araujo, Plínio da Cunha Leal & Almir Vieira Dibai-Filho

Instituto Center Fisio Inovare, São Luís, Maranhão, Brazil

Gabriel Gardhel Costa Araujo, Bruno Sousa Gomes, Maisa Lopes Reis & Sâmira Kennia de Mello Pereira Lima

Postgraduate Program in Physical Therapy, Department of Physical Therapy, Universidade Federal de São Carlos, São Carlos, São Paulo, Brazil

André Pontes-Silva

Postgraduate Program in Adult Health, Universidade Federal Do Maranhão, São Luís, Maranhão, Brazil

Plínio da Cunha Leal & Almir Vieira Dibai-Filho

Postgraduate Program in Rehabilitation Sciences, Universidade Nove de Julho, São Paulo, São Paulo, Brazil

Cid André Fidelis-de-Paula-Gomes

You can also search for this author in PubMed   Google Scholar

Contributions

GGCA, AP-S, PCL, BSG, MLR, SKMPL, CAFPG, AVDF – Conceptualization, Data curation, Formal Analysis, Investigation, Methodology, Validation, Visualization, Writing (original draft, review, and editing).

Corresponding author

Correspondence to André Pontes-Silva .

Ethics declarations

Ethics approval and consent to participate.

This study was approved by the Research Ethics Committee of the Universidade Federal do Maranhão – Brazil (report number: 2.935.437). Informed consent was obtained from all the individuals. All respondents participated in this study freely and with consent. All experiments were performed in accordance with relevant guidelines and regulations

Consent for publication

Informed consent was obtained from all subjects and/or their legal guardian(s).

Competing interests

AVDF, AP-S, and CAFPG are associate editors and peer reviewers of the BMC Musculoskeletal Disorders. The other authors have no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/ .

Reprints and permissions

About this article

Cite this article.

Araujo, G.G.C., Pontes-Silva, A., Leal, P. et al. Goniometry and fleximetry measurements to assess cervical range of motion in individuals with chronic neck pain: a validity and reliability study. BMC Musculoskelet Disord 25 , 651 (2024). https://doi.org/10.1186/s12891-024-07775-6

Download citation

Received : 29 December 2023

Accepted : 13 August 2024

Published : 19 August 2024

DOI : https://doi.org/10.1186/s12891-024-07775-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Chronic Pain; Spine; Range of Motion
  • Articular; Reproducibility of Results

BMC Musculoskeletal Disorders

ISSN: 1471-2474

reliability and validity in research difference

COMMENTS

  1. Reliability vs. Validity in Research

    Reliability is about the consistency of a measure, and validity is about the accuracy of a measure.opt. It's important to consider reliability and validity when you are creating your research design, planning your methods, and writing up your results, especially in quantitative research. Failing to do so can lead to several types of research ...

  2. Reliability vs Validity: Differences & Examples

    Reliability and validity are criteria by which researchers assess measurement quality. Measuring a person or item involves assigning scores to represent an attribute. This process creates the data that we analyze. However, to provide meaningful research results, that data must be good.

  3. Reliability vs Validity in Research

    Revised on 10 October 2022. Reliability and validity are concepts used to evaluate the quality of research. They indicate how well a method, technique, or test measures something. Reliability is about the consistency of a measure, and validity is about the accuracy of a measure. It's important to consider reliability and validity when you are ...

  4. Reliability and Validity

    Reliability refers to the consistency of the measurement. Reliability shows how trustworthy is the score of the test. If the collected data shows the same results after being tested using various methods and sample groups, the information is reliable. If your method has reliability, the results will be valid. Example: If you weigh yourself on a ...

  5. The Difference Between Reliability and Validity

    Validity indicates the extent to which your research usefully and accurately measures what you are trying to measure and how that stacks up with other established concepts. Validity can be harder to determine than reliability, but a high level of reliability assists in proving that your research is valid.

  6. Validity vs. Reliability

    What is the difference between reliability and validity in a study? In the domain of research, whether qualitative or quantitative, two concepts often arise when discussing the quality and rigor of a study: reliability and validity.These two terms, while interconnected, have distinct meanings that hold significant weight in the world of research.

  7. Reliability vs Validity in Research: Types & Examples

    However, in research and testing, reliability and validity are not the same things. When it comes to data analysis, reliability refers to how easily replicable an outcome is. For example, if you measure a cup of rice three times, and you get the same result each time, that result is reliable. The validity, on the other hand, refers to the ...

  8. Reliability vs. Validity in Research: Types & Examples

    Example of Reliability and Validity in Research. In this section, we'll explore instances that highlight the differences between reliability and validity and how they play a crucial role in ensuring the credibility of research findings. Example of reliability; Imagine you are studying the reliability of a smartphone's battery life measurement.

  9. Reliability Vs Validity

    Test-retest reliability, inter-rater reliability, internal consistency reliability: Content validity, criterion validity, construct validity: Measure: Degree of agreement or correlation between repeated measures or observers: Degree of association between a measure and an external criterion, or degree to which a measure assesses the intended ...

  10. Validity & Reliability In Research

    As with validity, reliability is an attribute of a measurement instrument - for example, a survey, a weight scale or even a blood pressure monitor. But while validity is concerned with whether the instrument is measuring the "thing" it's supposed to be measuring, reliability is concerned with consistency and stability.

  11. Difference Between Validity and Reliability (with Comparison Chart

    Validity implies the extent to which the research instrument measures, what it is intended to measure. Reliability refers to the degree to which scale produces consistent results, when repeated measurements are made. Instrument. A valid instrument is always reliable. A reliable instrument need not be a valid instrument.

  12. The 4 Types of Validity in Research

    The difference is that face validity is subjective, and assesses content at surface level. ... Reliability vs. Validity in Research | Difference, Types and Examples Reliability is about a method's consistency, and validity is about its accuracy. You can assess both using various types of evidence.

  13. Reliability vs. Validity in Scientific Research

    Reliability vs. Validity in Scientific Research. In the fields of science and technology, the terms reliability and validity are used to describe the robustness of qualitative and quantitative research methods. While these criteria are related, the terms aren't interchangeable. In the fields of science and technology, the terms reliability ...

  14. Reliability and Validity: Linking Evidence to Practice

    Testing reliability and validity generally involves assessing agreement between 2 scores, either scores on the same measure collected twice (reliability) or scores on different measures (validity). ... Measurement is an entire field of research by itself. Although the general concepts are quite straightforward, you do not have to scratch too ...

  15. Reliability and validity: Importance in Medical Research

    Reliability and validity are among the most important and fundamental domains in the assessment of any measuring methodology for data-collection in a good research. Validity is about what an instrument measures and how well it does so, whereas reliability concerns the truthfulness in the data obtained and the degree to which any measuring tool ...

  16. The 4 Types of Reliability in Research

    Reliability is a key concept in research that measures how consistent and trustworthy the results are. In this article, you will learn about the four types of reliability in research: test-retest, inter-rater, parallel forms, and internal consistency. You will also find definitions and examples of each type, as well as tips on how to improve reliability in your own research.

  17. PDF Reliability vs Validity

    Date updated: May 1, 2020. Reliability and validity are concepts used to evaluate the quality of research. They indicate how well a method, technique or test measures something. Reliability is about the consistency of a measure, and validity is about the accuracy of a measure.

  18. Reliability vs. Validity

    Validity and reliability are both used in the evaluation of research quality. They are equally important in creating the research design, selecting research methods, analyzing, and interpreting ...

  19. Validity, reliability, and generalizability in qualitative research

    Validity, reliability, and generalizability in qualitative research. In general practice, qualitative research contributes as significantly as quantitative research, in particular regarding psycho-social aspects of patient-care, health services provision, policy setting, and health administrations. In contrast to quantitative research ...

  20. Validity Vs. Reliability: What's The Difference?

    Find Jobs. The difference between validity and reliability is important in research, testing, and statistical analysis. Both are used to determine how well a test measures something, but the two of them tell you different things about your test. Validity is all about accuracy in your measurements, while reliability determines consistency.

  21. Reliability vs. Validity: A Comparison for Research Study

    In conclusion, understanding the differences between reliability and validity is crucial in conducting research. While both concepts are important, they serve different purposes. Reliability refers to the consistency and stability of a measure or research findings, while validity refers to the accuracy of a measure in measuring what it claims ...

  22. Reliability vs Validity in Research: Differences & Examples

    Reliability refers to the consistency of research findings over time or across different studies. Research is considered reliable if it produces identical outcomes when repeated under similar conditions. Validity means the accuracy or truthfulness of research findings. A valid study measures what it is supposed to measure and its results can be ...

  23. Assessing the Reliability and Validity of the Mini Clinical Evaluation

    The authors' objective was to determine the reliability and validity of the mCEX evaluation format. METHOD: Twenty-three first-year residents at Wright-Patterson Medical Center in Dayton, Ohio, were included in the study (academic years 1996-97, 1997-98, and 1998-99).

  24. Applied Sciences

    This study aimed to assess the validity and reliability of a commercially available inertial measurement unit (Enode) for measuring barbell kinematics and kinetics during a snatch. In order to assess validity and within- and between-session reliability, thirteen competitive weightlifters conducted two snatches on two separate occasions at 85% of their one-repetition maximum.

  25. Growth Mindset Scale: Aspects of reliability and validity of a new 8

    The reliability data for the scale comes from Dweck et al. (1995) and is based on the 8-item scale. The scale shows good internal consistency (α = 0.85) and test-retest reliability at 2-weeks (r = 0.80). The scale also shows a good construct validity with scores predicting meaningful relationship with important variables (Dweck et al., 1995).

  26. Development, validity and reliability of the healthy ...

    Background Healthy lifestyle behaviors encompass activities aimed at promoting, maintaining, or reclaiming health. Evaluating these behaviors accurately requires comprehensive, valid, and reliable tools. Aims This study aimed to develop the Healthy Lifestyle Behavior Scale and evaluate its psychometric properties in the Turkish population. Methods For this methodological research, a cross ...

  27. Goniometry and fleximetry measurements to assess cervical range of

    Purpose To assess the test-retest and inter-rater reliability of goniometry and fleximetry in measuring cervical range of motion in individuals with chronic neck pain. Methods A reliability study. Thirty individuals with chronic neck pain were selected. Cervical range of motion was measured by goniometry and fleximetry at two time points 7 days apart. To characterize the sample, we used the ...