2. variables
3. variables
4. variables
5. variables
6. variables
7. variables
8. variables
The simplest way to understand a variable is as any characteristic or attribute that can experience change or vary over time or context – hence the name “variable”. For example, the dosage of a particular medicine could be classified as a variable, as the amount can vary (i.e., a higher dose or a lower dose). Similarly, gender, age or ethnicity could be considered demographic variables, because each person varies in these respects.
Within research, especially scientific research, variables form the foundation of studies, as researchers are often interested in how one variable impacts another, and the relationships between different variables. For example:
As you can see, variables are often used to explain relationships between different elements and phenomena. In scientific studies, especially experimental studies, the objective is often to understand the causal relationships between variables. In other words, the role of cause and effect between variables. This is achieved by manipulating certain variables while controlling others – and then observing the outcome. But, we’ll get into that a little later…
Variables can be a little intimidating for new researchers because there are a wide variety of variables, and oftentimes, there are multiple labels for the same thing. To lay a firm foundation, we’ll first look at the three main types of variables, namely:
Simply put, the independent variable is the “ cause ” in the relationship between two (or more) variables. In other words, when the independent variable changes, it has an impact on another variable.
For example:
It’s useful to know that independent variables can go by a few different names, including, explanatory variables (because they explain an event or outcome) and predictor variables (because they predict the value of another variable). Terminology aside though, the most important takeaway is that independent variables are assumed to be the “cause” in any cause-effect relationship. As you can imagine, these types of variables are of major interest to researchers, as many studies seek to understand the causal factors behind a phenomenon.
While the independent variable is the “ cause ”, the dependent variable is the “ effect ” – or rather, the affected variable . In other words, the dependent variable is the variable that is assumed to change as a result of a change in the independent variable.
Keeping with the previous example, let’s look at some dependent variables in action:
In scientific studies, researchers will typically pay very close attention to the dependent variable (or variables), carefully measuring any changes in response to hypothesised independent variables. This can be tricky in practice, as it’s not always easy to reliably measure specific phenomena or outcomes – or to be certain that the actual cause of the change is in fact the independent variable.
As the adage goes, correlation is not causation . In other words, just because two variables have a relationship doesn’t mean that it’s a causal relationship – they may just happen to vary together. For example, you could find a correlation between the number of people who own a certain brand of car and the number of people who have a certain type of job. Just because the number of people who own that brand of car and the number of people who have that type of job is correlated, it doesn’t mean that owning that brand of car causes someone to have that type of job or vice versa. The correlation could, for example, be caused by another factor such as income level or age group, which would affect both car ownership and job type.
To confidently establish a causal relationship between an independent variable and a dependent variable (i.e., X causes Y), you’ll typically need an experimental design , where you have complete control over the environmen t and the variables of interest. But even so, this doesn’t always translate into the “real world”. Simply put, what happens in the lab sometimes stays in the lab!
As an alternative to pure experimental research, correlational or “ quasi-experimental ” research (where the researcher cannot manipulate or change variables) can be done on a much larger scale more easily, allowing one to understand specific relationships in the real world. These types of studies also assume some causality between independent and dependent variables, but it’s not always clear. So, if you go this route, you need to be cautious in terms of how you describe the impact and causality between variables and be sure to acknowledge any limitations in your own research.
In an experimental design, a control variable (or controlled variable) is a variable that is intentionally held constant to ensure it doesn’t have an influence on any other variables. As a result, this variable remains unchanged throughout the course of the study. In other words, it’s a variable that’s not allowed to vary – tough life 🙂
As we mentioned earlier, one of the major challenges in identifying and measuring causal relationships is that it’s difficult to isolate the impact of variables other than the independent variable. Simply put, there’s always a risk that there are factors beyond the ones you’re specifically looking at that might be impacting the results of your study. So, to minimise the risk of this, researchers will attempt (as best possible) to hold other variables constant . These factors are then considered control variables.
Some examples of variables that you may need to control include:
Which specific variables need to be controlled for will vary tremendously depending on the research project at hand, so there’s no generic list of control variables to consult. As a researcher, you’ll need to think carefully about all the factors that could vary within your research context and then consider how you’ll go about controlling them. A good starting point is to look at previous studies similar to yours and pay close attention to which variables they controlled for.
Of course, you won’t always be able to control every possible variable, and so, in many cases, you’ll just have to acknowledge their potential impact and account for them in the conclusions you draw. Every study has its limitations , so don’t get fixated or discouraged by troublesome variables. Nevertheless, always think carefully about the factors beyond what you’re focusing on – don’t make assumptions!
As we mentioned, independent, dependent and control variables are the most common variables you’ll come across in your research, but they’re certainly not the only ones you need to be aware of. Next, we’ll look at a few “secondary” variables that you need to keep in mind as you design your research.
Let’s jump into it…
A moderating variable is a variable that influences the strength or direction of the relationship between an independent variable and a dependent variable. In other words, moderating variables affect how much (or how little) the IV affects the DV, or whether the IV has a positive or negative relationship with the DV (i.e., moves in the same or opposite direction).
For example, in a study about the effects of sleep deprivation on academic performance, gender could be used as a moderating variable to see if there are any differences in how men and women respond to a lack of sleep. In such a case, one may find that gender has an influence on how much students’ scores suffer when they’re deprived of sleep.
It’s important to note that while moderators can have an influence on outcomes , they don’t necessarily cause them ; rather they modify or “moderate” existing relationships between other variables. This means that it’s possible for two different groups with similar characteristics, but different levels of moderation, to experience very different results from the same experiment or study design.
Mediating variables are often used to explain the relationship between the independent and dependent variable (s). For example, if you were researching the effects of age on job satisfaction, then education level could be considered a mediating variable, as it may explain why older people have higher job satisfaction than younger people – they may have more experience or better qualifications, which lead to greater job satisfaction.
Mediating variables also help researchers understand how different factors interact with each other to influence outcomes. For instance, if you wanted to study the effect of stress on academic performance, then coping strategies might act as a mediating factor by influencing both stress levels and academic performance simultaneously. For example, students who use effective coping strategies might be less stressed but also perform better academically due to their improved mental state.
In addition, mediating variables can provide insight into causal relationships between two variables by helping researchers determine whether changes in one factor directly cause changes in another – or whether there is an indirect relationship between them mediated by some third factor(s). For instance, if you wanted to investigate the impact of parental involvement on student achievement, you would need to consider family dynamics as a potential mediator, since it could influence both parental involvement and student achievement simultaneously.
A confounding variable (also known as a third variable or lurking variable ) is an extraneous factor that can influence the relationship between two variables being studied. Specifically, for a variable to be considered a confounding variable, it needs to meet two criteria:
Some common examples of confounding variables include demographic factors such as gender, ethnicity, socioeconomic status, age, education level, and health status. In addition to these, there are also environmental factors to consider. For example, air pollution could confound the impact of the variables of interest in a study investigating health outcomes.
Naturally, it’s important to identify as many confounding variables as possible when conducting your research, as they can heavily distort the results and lead you to draw incorrect conclusions . So, always think carefully about what factors may have a confounding effect on your variables of interest and try to manage these as best you can.
Latent variables are unobservable factors that can influence the behaviour of individuals and explain certain outcomes within a study. They’re also known as hidden or underlying variables , and what makes them rather tricky is that they can’t be directly observed or measured . Instead, latent variables must be inferred from other observable data points such as responses to surveys or experiments.
For example, in a study of mental health, the variable “resilience” could be considered a latent variable. It can’t be directly measured , but it can be inferred from measures of mental health symptoms, stress, and coping mechanisms. The same applies to a lot of concepts we encounter every day – for example:
One way in which we overcome the challenge of measuring the immeasurable is latent variable models (LVMs). An LVM is a type of statistical model that describes a relationship between observed variables and one or more unobserved (latent) variables. These models allow researchers to uncover patterns in their data which may not have been visible before, thanks to their complexity and interrelatedness with other variables. Those patterns can then inform hypotheses about cause-and-effect relationships among those same variables which were previously unknown prior to running the LVM. Powerful stuff, we say!
In the world of scientific research, there’s no shortage of variable types, some of which have multiple names and some of which overlap with each other. In this post, we’ve covered some of the popular ones, but remember that this is not an exhaustive list .
To recap, we’ve explored:
If you’re still feeling a bit lost and need a helping hand with your research project, check out our 1-on-1 coaching service , where we guide you through each step of the research journey. Also, be sure to check out our free dissertation writing course and our collection of free, fully-editable chapter templates .
This post was based on one of our popular Research Bootcamps . If you're working on a research project, you'll definitely want to check this out ...
Very informative, concise and helpful. Thank you
Helping information.Thanks
practical and well-demonstrated
Very helpful and insightful
Your email address will not be published. Required fields are marked *
Save my name, email, and website in this browser for the next time I comment.
Educational resources and simple solutions for your research journey
A variable is an important element of research. It is a characteristic, number, or quantity of any category that can be measured or counted and whose value may change with time or other parameters.
Variables are defined in different ways in different fields. For instance, in mathematics, a variable is an alphabetic character that expresses a numerical value. In algebra, a variable represents an unknown entity, mostly denoted by a, b, c, x, y, z, etc. In statistics, variables represent real-world conditions or factors. Despite the differences in definitions, in all fields, variables represent the entity that changes and help us understand how one factor may or may not influence another factor.
Variables in research and statistics are of different types—independent, dependent, quantitative (discrete or continuous), qualitative (nominal/categorical, ordinal), intervening, moderating, extraneous, confounding, control, and composite. In this article we compare the first two types— independent vs dependent variables .
Table of Contents
Researchers conduct experiments to understand the cause-and-effect relationships between various entities. In such experiments, the entities whose values change are called variables. These variables describe the relationships among various factors and help in drawing conclusions in experiments. They help in understanding how some factors influence others. Some examples of variables include age, gender, race, income, weight, etc.
As mentioned earlier, different types of variables are used in research. Of these, we will compare the most common types— independent vs dependent variables . The independent variable is the cause and the dependent variable is the effect, that is, independent variables influence dependent variables. In research, a dependent variable is the outcome of interest of the study and the independent variable is the factor that may influence the outcome. Let’s explain this with an independent and dependent variable example : In a study to analyze the effect of antibiotic use on microbial resistance, antibiotic use is the independent variable and microbial resistance is the dependent variable because antibiotic use affects microbial resistance.( 1)
Here is a list of the important characteristics of independent variables .( 2,3)
Independent variables in research are of the following two types:( 4)
Quantitative independent variables differ in amounts or scales. They are numeric and answer questions like “how many” or “how often.”
Here are a few quantitative independent variables examples :
Qualitative independent variables are non-numerical variables.
A few qualitative independent variables examples are listed below:
A quantitative variable is represented by actual amounts and a qualitative variable by categories or groups.
Here are a few characteristics of dependent variables: ( 3)
Here are a few dependent variable examples :
Dependent variables are of two types:( 5)
These variables can take on any value within a given range and are measured on a continuous scale, for example, weight, height, temperature, time, distance, etc.
These variables are divided into distinct categories. They are not measured on a continuous scale so only a limited number of values are possible, for example, gender, race, etc.
The following table compares independent vs dependent variables .
How to identify | Manipulated or controlled | Observed or measured |
Purpose | Cause or predictor variable | Outcome or response variable |
Relationship | Independent of other variables | Influenced by the independent variable |
Control | Manipulated or assigned by researcher | Measured or observed during experiments |
Listed below are a few examples of research questions from various disciplines and their corresponding independent and dependent variables.( 6)
Genetics | What is the relationship between genetics and susceptibility to diseases? | genetic factors | susceptibility to diseases |
History | How do historical events influence national identity? | historical events | national identity |
Political science | What is the effect of political campaign advertisements on voter behavior? | political campaign advertisements | voter behavior |
Sociology | How does social media influence cultural awareness? | social media exposure | cultural awareness |
Economics | What is the impact of economic policies on unemployment rates? | economic policies | unemployment rates |
Literature | How does literary criticism affect book sales? | literary criticism | book sales |
Geology | How do a region’s geological features influence the magnitude of earthquakes? | geological features | earthquake magnitudes |
Environment | How do changes in climate affect wildlife migration patterns? | climate changes | wildlife migration patterns |
Gender studies | What is the effect of gender bias in the workplace on job satisfaction? | gender bias | job satisfaction |
Film studies | What is the relationship between cinematographic techniques and viewer engagement? | cinematographic techniques | viewer engagement |
Archaeology | How does archaeological tourism affect local communities? | archaeological techniques | local community development |
Experiments usually have at least two variables—independent and dependent. The independent variable is the entity that is being tested and the dependent variable is the result. Classifying independent and dependent variables as discrete and continuous can help in determining the type of analysis that is appropriate in any given research experiment, as shown in the table below. ( 7)
Chi-Square | t-test | ||
Logistic regression | ANOVA | ||
Phi | Regression | ||
Cramer’s V | Point-biserial correlation | ||
Logistic regression | Regression | ||
Point-biserial correlation | Correlation |
Here are some more research questions and their corresponding independent and dependent variables. ( 6)
What is the impact of online learning platforms on academic performance? | type of learning | academic performance |
What is the association between exercise frequency and mental health? | exercise frequency | mental health |
How does smartphone use affect productivity? | smartphone use | productivity levels |
Does family structure influence adolescent behavior? | family structure | adolescent behavior |
What is the impact of nonverbal communication on job interviews? | nonverbal communication | job interviews |
In addition to all the characteristics of independent and dependent variables listed previously, here are few simple steps to identify the variable types in a research question.( 8)
Let’s try out these steps with an example.
A researcher wants to conduct a study to see if his new weight loss medication performs better than two bestseller alternatives. He wants to randomly select 20 subjects from Richmond, Virginia, aged 20 to 30 years and weighing above 60 pounds. Each subject will be randomly assigned to three treatment groups.
To identify the independent and dependent variables, we convert this paragraph into a question, as follows: Does the new medication perform better than the alternatives? Here, the medications are the independent variable and their performances or effect on the individuals are the dependent variable.
Data visualization is the graphical representation of information by using charts, graphs, and maps. Visualizations help in making data more understandable by making it easier to compare elements, identify trends and relationships (among variables), among other functions.
Bar graphs, pie charts, and scatter plots are the best methods to graphically represent variables. While pie charts and bar graphs are suitable for depicting categorical data, scatter plots are appropriate for quantitative data. The independent variable is usually placed on the X-axis and the dependent variable on the Y-axis.
Figure 1 is a scatter plot that depicts the relationship between the number of household members and their monthly grocery expenses. 9 The number of household members is the independent variable and the expenses the dependent variable. The graph shows that as the number of members increases the expenditure also increases.
Let’s summarize the key takeaways about independent vs dependent variables from this article:
The following table lists the different types of variables used in research.( 10)
Categorical | Measures a construct that has different categories | gender, race, religious affiliation, political affiliation |
Quantitative | Measures constructs that vary by degree of the amount | weight, height, age, intelligence scores |
Independent (IV) | Measures constructs considered to be the cause | Higher education (IV) leads to higher income (DV) |
Dependent (DV) | Measures constructs that are considered the effect | Exercise (IV) will reduce anxiety levels (DV) |
Intervening or mediating (MV) | Measures constructs that intervene or stand in between the cause and effect | Incarcerated individuals are more likely to have psychiatric disorder (MV), which leads to disability in social roles |
Confounding (CV) | “Rival explanations” that explain the cause-and-effect relationship | Age (CV) explains the relationship between increased shoe size and increase in intelligence in children |
Control variable | Extraneous variables whose influence can be controlled or eliminated | Demographic data such as gender, socioeconomic status, age |
2. Why is it important to differentiate between independent vs dependent variables ?
Differentiating between independent vs dependent variables is important to ensure the correct application in your own research and also the correct understanding of other studies. An incorrectly framed research question can lead to confusion and inaccurate results. An easy way to differentiate is to identify the cause and effect.
3. How are independent and dependent variables used in non-experimental research?
So far in this article we talked about variables in relation to experimental research, wherein variables are manipulated or measured to test a hypothesis, that is, to observe the effect on dependent variables. Let’s examine non-experimental research and how variable are used. 11 In non-experimental research, variables are not manipulated but are observed in their natural state. Researchers do not have control over the variables and cannot manipulate them based on their research requirements. For example, a study examining the relationship between income and education level would not manipulate either variable. Instead, the researcher would observe and measure the levels of each variable in the sample population. The level of control researchers have is the major difference between experimental and non-experimental research. Another difference is the causal relationship between the variables. In non-experimental research, it is not possible to establish a causal relationship because other variables may be influencing the outcome.
4. Are there any advantages and disadvantages of using independent vs dependent variables ?
Here are a few advantages and disadvantages of both independent and dependent variables.( 12)
Advantages:
Disadvantages:
We hope this article has provided you with an insight into the use and importance of independent vs dependent variables , which can help you effectively use variables in your next research study.
Editage All Access is a subscription-based platform that unifies the best AI tools and services designed to speed up, simplify, and streamline every step of a researcher’s journey. The Editage All Access Pack is a one-of-a-kind subscription that unlocks full access to an AI writing assistant, literature recommender, journal finder, scientific illustration tool, and exclusive discounts on professional publication services from Editage.
Based on 22+ years of experience in academia, Editage All Access empowers researchers to put their best research forward and move closer to success. Explore our top AI Tools pack, AI Tools + Publication Services pack, or Build Your Own Plan. Find everything a researcher needs to succeed, all in one place – Get All Access now starting at just $14 a month !
Dependent Variable The variable that depends on other factors that are measured. These variables are expected to change as a result of an experimental manipulation of the independent variable or variables. It is the presumed effect.
Independent Variable The variable that is stable and unaffected by the other variables you are trying to measure. It refers to the condition of an experiment that is systematically manipulated by the investigator. It is the presumed cause.
Cramer, Duncan and Dennis Howitt. The SAGE Dictionary of Statistics . London: SAGE, 2004; Penslar, Robin Levin and Joan P. Porter. Institutional Review Board Guidebook: Introduction . Washington, DC: United States Department of Health and Human Services, 2010; "What are Dependent and Independent Variables?" Graphic Tutorial.
Don't feel bad if you are confused about what is the dependent variable and what is the independent variable in social and behavioral sciences research . However, it's important that you learn the difference because framing a study using these variables is a common approach to organizing the elements of a social sciences research study in order to discover relevant and meaningful results. Specifically, it is important for these two reasons:
A variable in research simply refers to a person, place, thing, or phenomenon that you are trying to measure in some way. The best way to understand the difference between a dependent and independent variable is that the meaning of each is implied by what the words tell us about the variable you are using. You can do this with a simple exercise from the website, Graphic Tutorial. Take the sentence, "The [independent variable] causes a change in [dependent variable] and it is not possible that [dependent variable] could cause a change in [independent variable]." Insert the names of variables you are using in the sentence in the way that makes the most sense. This will help you identify each type of variable. If you're still not sure, consult with your professor before you begin to write.
Fan, Shihe. "Independent Variable." In Encyclopedia of Research Design. Neil J. Salkind, editor. (Thousand Oaks, CA: SAGE, 2010), pp. 592-594; "What are Dependent and Independent Variables?" Graphic Tutorial; Salkind, Neil J. "Dependent Variable." In Encyclopedia of Research Design , Neil J. Salkind, editor. (Thousand Oaks, CA: SAGE, 2010), pp. 348-349;
The process of examining a research problem in the social and behavioral sciences is often framed around methods of analysis that compare, contrast, correlate, average, or integrate relationships between or among variables . Techniques include associations, sampling, random selection, and blind selection. Designation of the dependent and independent variable involves unpacking the research problem in a way that identifies a general cause and effect and classifying these variables as either independent or dependent.
The variables should be outlined in the introduction of your paper and explained in more detail in the methods section . There are no rules about the structure and style for writing about independent or dependent variables but, as with any academic writing, clarity and being succinct is most important.
After you have described the research problem and its significance in relation to prior research, explain why you have chosen to examine the problem using a method of analysis that investigates the relationships between or among independent and dependent variables . State what it is about the research problem that lends itself to this type of analysis. For example, if you are investigating the relationship between corporate environmental sustainability efforts [the independent variable] and dependent variables associated with measuring employee satisfaction at work using a survey instrument, you would first identify each variable and then provide background information about the variables. What is meant by "environmental sustainability"? Are you looking at a particular company [e.g., General Motors] or are you investigating an industry [e.g., the meat packing industry]? Why is employee satisfaction in the workplace important? How does a company make their employees aware of sustainability efforts and why would a company even care that its employees know about these efforts?
Identify each variable for the reader and define each . In the introduction, this information can be presented in a paragraph or two when you describe how you are going to study the research problem. In the methods section, you build on the literature review of prior studies about the research problem to describe in detail background about each variable, breaking each down for measurement and analysis. For example, what activities do you examine that reflect a company's commitment to environmental sustainability? Levels of employee satisfaction can be measured by a survey that asks about things like volunteerism or a desire to stay at the company for a long time.
The structure and writing style of describing the variables and their application to analyzing the research problem should be stated and unpacked in such a way that the reader obtains a clear understanding of the relationships between the variables and why they are important. This is also important so that the study can be replicated in the future using the same variables but applied in a different way.
Fan, Shihe. "Independent Variable." In Encyclopedia of Research Design. Neil J. Salkind, editor. (Thousand Oaks, CA: SAGE, 2010), pp. 592-594; "What are Dependent and Independent Variables?" Graphic Tutorial; “Case Example for Independent and Dependent Variables.” ORI Curriculum Examples. U.S. Department of Health and Human Services, Office of Research Integrity; Salkind, Neil J. "Dependent Variable." In Encyclopedia of Research Design , Neil J. Salkind, editor. (Thousand Oaks, CA: SAGE, 2010), pp. 348-349; “Independent Variables and Dependent Variables.” Karl L. Wuensch, Department of Psychology, East Carolina University [posted email exchange]; “Variables.” Elements of Research. Dr. Camille Nebeker, San Diego State University.
Chris Drew (PhD)
Dr. Chris Drew is the founder of the Helpful Professor. He holds a PhD in education and has published over 20 articles in scholarly journals. He is the former editor of the Journal of Learning Development in Higher Education. [Image Descriptor: Photo of Chris]
Learn about our Editorial Process
In research and statistics, a variable is a characteristic or attribute that can take on different values or categories. It represents data points or information that can be measured, observed, or manipulated within a study.
Statistical and experimental analysis aims to explore the relationships between variables. For example, researchers may hypothesize a connection between a particular variable and an outcome, like the association between physical activity levels (an independent variable) and heart health (a dependent variable).
Variables play a crucial role in data analysis . Data sets collected through research typically consist of multiple variables, and the analysis is driven by how these variables are related, how they influence each other, and what patterns emerge from these relationships.
Therefore, as a researcher, your understanding of variables and their manipulation forms the crux of your study.
To help with your understanding, I’ve presented 27 of the most common types of variables below.
1. quantitative (numerical) variables.
Definition: Quantitative variables, also known as numerical variables, are quantifiable in nature and represented in numbers, allowing the data collected to be measured on a scale or range (Moodie & Johnson, 2021). These variables generally yield data that can be organized, ranked, measured, and subjected to mathematical operations.
Explanation: The values of quantitative variables can either be counted (referred to as discrete variables) or measured (continuous variables). Quantifying data in numerical form allows for a range of statistical analysis techniques to be applied, from calculating averages to finding correlations.
Pros | Cons |
---|---|
They provide a precise measure, allow for a higher level of measurement, and can be manipulated statistically for inferential analysis. The resulting data is objective and consistent (Moodie & Johnson, 2021). | can be time-consuming and costly. Secondly, important context or explanation may be lost when data is purely numerical (Katz, 2006). |
Quantitative Variable Example : Consider a marketing survey where you ask respondents to rate their satisfaction with your product on a scale of 1 to 10. The satisfaction score here represents a quantitative variable. The data can be quantified and used to calculate average satisfaction scores, identify the scope for product improvement, or compare satisfaction levels across different demographic groups.
Definition: Continuous variables are a subtype of quantitative variables that can have an infinite number of measurements within a specified range. They provide detailed insights based on precise measurements and are often representative on a continuous scale (Christmann & Badgett, 2009).
Explanation: The variable is “continuous” because there are an infinite number of possible values within the chosen range. For instance, variables like height, weight, or time are measured continuously.
Pros | Cons |
---|---|
They give a higher level of detail, useful in determining precise measurements, and allow for complex statistical analysis (Christmann & Badgett, 2009). | They can easily lead to information overload due to granularity (Allen, 2017). The representation and interpretation of results may also be more complex. |
Continuous Variable Example : The best real-world example of a continuous variable is time. For instance, the time it takes for a customer service representative to resolve a customer issue can range anywhere from few seconds to several hours, and can accurately be measured down to the second, providing an almost finite set of possible values.
Definition: Discrete variables are a form of quantitative variable that can only assume a finite number of values. They are typically count-based (Frankfort-Nachmias & Leon-Guerrero, 2006).
Explanation: Discrete variables are commonly used in situations where the “count” or “quantity” is distinctly separate. For instance, the number of children in a family is a common example – you can’t have 2.5 kids.
Pros | Cons |
---|---|
They are easier to comprehend and simpler to analyze, as they provide direct and countable insight (Frankfort-Nachmias & Leon-Guerrero, 2006). | They might lack in-depth information because they cannot provide the granularity that continuous variables offer (Privitera, 2022). |
Discrete Variable Example : The number of times a customer contacts customer service within a month. This is a discrete variable because it can only take a whole number of values – you can’t call customer service 2.5 times.
Definition: Qualitative, or categorical variables, are non-numerical data points that categorize or group data entities based on shared features or qualities (Moodie & Johnson, 2021).
Explanation: They are often used in research to classify particular traits, characteristics, or properties of subjects that are not easily quantifiable, such as colors, textures, tastes, or smells.
Pros | Cons |
---|---|
Essences or characteristics that cannot be measured numerically can be captured. They provide richer, subjective, and explanatory data (Moodie & Johnson, 2021). | The analysis might be challenging because these variables cannot be subjected to mathematical calculations or operations (Creswell & Creswell, 2018). |
Qualitative Variable Example : Consider a survey that asks respondents to identify their favorite color from a list of choices. The color preference would be a qualitative variable as it categorizes data into different categories corresponding to different colors.
Definition: Nominal variables, a subtype of qualitative variables, represent categories without any inherent order or ranking (Norman & Streiner, 2008).
Explanation: Nominal variables are often used to label or categorize particular sets of items or individuals, with no intention of giving numerical value or order. For example, race, gender, or religion.
Pros | Cons |
---|---|
They are simple to understand and effective in segregating data into clearly defined, mutually exclusive categories (Norman & Streiner, 2008). | They can often be overly simplistic, leading to a loss of data differentiation and information (Katz, 2006). They also do not provide any directionality or order. |
Nominal Variable Example : For instance, the type of car someone owns (sedan, SUV, truck, etc.) is a nominal variable. Each category is unique and one is not inherently higher, better, or larger than the others.
Definition: Ordinal variables are a subtype of categorical (qualitative) variables with a key feature of having a clear, distinct, and meaningful order or ranking to the categories (De Vaus, 2001).
Explanation: Ordinal variables represent categories that can be logically arranged in a specific order or sequence but the difference between categories is unknown or doesn’t matter, such as satisfaction rating scale (unsatisfied, neutral, satisfied).
Pros | Cons |
---|---|
.Ordinal variables allow categorization of data that also reflect some sort of ranking or order, allowing more nuanced insights from your data (De Vaus, 2001). | .It becomes challenging during data analysis due to the unequal intervals (Katz, 2006). Differences between the adjacent categories are unknown and not measurable. |
Ordinal Variable Example : A classic example is asking survey respondents how strongly they agree or disagree with a statement (strongly disagree, disagree, neither agree nor disagree, agree, strongly agree). The answers form an ordinal scale; they can be ranked, but the intervals between responses are not necessarily equal.
Definition: Dichotomous or binary variables are a type of categorical variable that consist of only two opposing categories like true/false, yes/no, success/failure, and so on (Adams & McGuire, 2022).
Explanation: Dichotomous variables refer to situations where there can only be two, and just two, possible outcomes – there is no middle ground.
Pros | Cons |
---|---|
Dichotomous variables simplify analysis. They are particularly useful for “yes/no” questions, which can be coded into a numerical format for statistical analysis (Coolidge, 2012). | Dichotomous variables might , losing valuable information by reducing them to just two categories (Adams & McGuire, 2022). |
Dichotomous Variable Example : Whether a customer completed a transaction (Yes or No) is a binary variable. Either they completed the purchase (yes) or they did not (no).
Definition: Ratio variables are the highest level of quantitative variables that contain a zero point or absolute zero, which represents a complete absence of the quantity (Norman & Streiner, 2008).
Explanation: Besides being able to categorize and order units, ratio variables also allow for the relative degree of difference between them to be calculated. For example, income, height, weight, and temperature (in Kelvin) are ratio variables.
Pros | Cons |
---|---|
Having an inherent zero value allows for a broad range of statistical analysis that involves ratios (Norman & Streiner, 2008). It provides a larger volume of information than any other variable type. | Ratio variables may give results that do not actually reflect the reality if zero does not exist (De Vaus, 2001). |
Ratio Variable Example : An individual’s annual income is a ratio variable. You can say someone earning $50,000 earns twice as much as someone making $25,000. The zero point in this case would be an income of $0, which indicates that no income is being earned.
Definition: Interval variables are quantitative variables that have equal, predictable differences between values, but they do not have a true zero point (Norman & Streiner, 2008).
Explanation: Interval variables are similar to ratio variables; both provide a clear ordering of categories and have equal intervals between successive values. The primary difference is the absence of an absolute zero.
Pros | Cons |
---|---|
Interval variables allow for more complex statistical analyses as they can accommodate a range of mathematical operations like addition and subtraction (Norman & Streiner, 2008). | They restrict the ability to measure the ratio of categories since there’s no true zero (Babbie, Halley & Zaino, 2007). |
Interval Variable Example : The classic example of an interval variable is the temperature in Fahrenheit or Celsius. The difference between 20 degrees and 30 degrees is the same as the difference between 70 degrees and 80 degrees, but there isn’t a true zero because the scale doesn’t start from absolute nonexistence of the quantity being measured.
Related: Quantitative Reasoning Examples
Definition: The dependent variable is the outcome or effect that the researcher wants to study. Its value depends on or is influenced by one or more other variables known as independent variables.
Explanation: In a research study, the dependent variable is the phenomenon or behavior that may be affected by manipulations in the independent variable. It’s what you measure to see if your predictions about the effects of the independent variable are correct.
Pros | Cons |
---|---|
It provides the results for the research question. Without a dependent variable, it would be impossible to draw conclusions from the conducted experiment or study. | It’s not always straightforward to isolate the impact of independent variables on the dependent variable, especially when multiple independent variables are influencing the results. |
Dependent Variable Example: Suppose you want to study the impact of exercise frequency on weight loss. In this case, the dependent variable is weight loss, which changes based on how often the subject exercises (the independent variable).
Definition: The independent variable, or the predictor variable, is what the researcher manipulates to test its effect on the dependent variable.
Explanation: The independent variable is presumed to have some effect on the dependent variable in a study. It can often be thought of as the cause in a cause-and-effect relationship.
Pros | Cons |
---|---|
Manipulating the independent variable allows researchers to observe changes it causes in the dependent variable, aiding in understanding causal relationships in the data. | It can be challenging to isolate the impact of a single independent variable when multiple factors may influence the dependent variable. |
Independent Variable Example: In a study looking at how different dosages of a medication affect the severity of symptoms, the medication dosage is an independent variable. Researchers will adjust the dosage to see what effect it has on the symptoms (the dependent variable).
See Also: Independent and Dependent Variable Examples
Definition: Confounding variables—also known as confounders—are variables that might distort, confuse or interfere with the relationship between an independent variable and a dependent variable, leading to a false correlation (Boniface, 2019).
Explanation: Confounders are typically related in some way to both the independent and dependent variables. Because of this, they can create or hide relationships, leading researchers to make inaccurate conclusions about causality.
Pros | Cons |
---|---|
Identifying potential confounders during study design can help optimize the process and to the conclusions drawn (Knapp, 2017). | Confounders can introduce bias and affect the validity of a study. If overlooked, they can lead to about correlations or cause-and-effect relationships (Bonidace, 2019). |
Confounding Variable Example : If you’re studying the relationship between physical activity and heart health, diet could potentially act as a confounding variable. People who are physically active often also eat healthier diets, which could independently improve heart health [National Heart, Lung, and Blood Institute].
Definition: Control variables are variables in a research study that the researcher keeps constant to prevent them from interfering with the relationship between the independent and dependent variables (Sproull, 2002).
Explanation: Control variables allow researchers to isolate the effects of the independent variable on the dependent variable, ensuring that any changes observed are solely due to the manipulation of the independent variable and not an external factor.
Pros | Cons |
---|---|
Control variables increase the reliability of experiments, ensure a fair comparison between groups, and support the validity of the conclusions (Sproull, 2002). | Misidentification or non-consideration of control variables might affect the outcome of the experiment, leading to biased results (Bonidace, 2019). |
Control Variable Example : In a study evaluating the impact of a tutoring program on student performance, some control variables could include the teacher’s experience, the type of test used to measure performance, and the student’s previous grades.
Definition: Latent variables—also referred to as hidden or unobserved variables—are variables that are not directly observed or measured but are inferred from other variables that are observed (measured directly).
Explanation: Latent variables can represent abstract concepts like intelligence, socioeconomic status, or even happiness. They are often used in psychological and sociological research, where certain concepts can’t be measured directly.
Pros | Cons |
---|---|
Latent variables can help capture unseen factors and give insight into the underlying constructs affecting observable behaviors. | Inferring the values of latent variables can involve complex statistical methods and assumptions. Also, there might be several ways to interpret the values of latent variables, potentially impacting the validity and consistency of findings. |
Latent Variable Example: In a study on job satisfaction, factors like job stress, financial reward, work-life balance, or relationship with colleagues can be measured directly. However, “job satisfaction” itself is a latent variable as it is inferred from these observed variables.
Definition: Derived variables are variables that are created or developed based on existing variables in a dataset. They involve applying certain calculations or manipulations to one or more variables to create a new one.
Explanation: Derived variables can be created by either transforming a single variable (like taking the square root) or combining multiple variables (computing the ratio of two variables).
Pros | Cons |
---|---|
Derived variables can reduce complexity, extract more relevant information, and create new insights from existing data. | They require careful creation as any errors in the genesis of the original variables will impact the derived variable. Also, the process of deriving variables needs to be adequately documented to ensure replicability and avoid misunderstanding. |
Derived Variable Example: In a dataset containing a person’s height and weight, a derived variable could be the Body Mass Index (BMI). The BMI is calculated by dividing weight (in kilograms) by the square of height (in meters).
Definition: Time-series variables are a set of data points ordered or indexed in time order. They provide a sequence of data points, each associated with a specific instance in time.
Explanation: Time-series variables are often used in statistical models to study trends, analyze patterns over time, make forecasts, and understand underlying causes and characteristics of the trend.
Pros | Cons |
---|---|
Time series variables allow for the exploration of causal relationships, testing of theories, and forecasting of future values based on established patterns. | They can be difficult to work with due to issues like seasonality, irregular intervals, autocorrelation, or non-stationarity. Often, additional statistical techniques- such as decomposition, differencing, or transformations- may need to be employed. |
Time-series Variable Example : The quarterly GDP (Gross Domestic Product) data over a period of several years would be an example of a time series variable. Economists use such data to examine economic trends over time.
Definition: Cross-sectional variables are data collected from many subjects at the same point in time or without regard to differences in time.
Explanation: This type of data provides a “snapshot” of the variables at a specific time. They’re often used in research to compare different population groups at a single point in time.
Pros | Cons |
---|---|
Cross-sectional data can be relatively easy and quick to collect. They are useful for examining the relationship between different variables at a given point in time. | Cross-sectional data does not provide any information about causality or the sequence of events. It’s also susceptive to “snapshot bias” since it does not take into account changes over time. |
Cross-sectional Variable Example: A basic example of a set of cross-sectional data could be a national survey that asks respondents about their current employment status. The data captured represents a single point in time and does not track changes in employment over time.
Definition: A predictor variable—also known as independent or explanatory variable—is a variable that is being manipulated in an experiment or study to see how it influences the dependent or response variable.
Explanation: In a cause-and-effect relationship, the predictor variable is the cause. Its modification allows the researcher to study its effect on the response variable.
Pros | Cons |
---|---|
Predictor variables establish cause-and-effect relationships and allow for the prediction of outcomes for the response variable. | It can be challenging to isolate a single predictor variable’s impact when multiple predictor variables are involved, leading to potential interaction effects. |
Predictor Variable Example : In a study evaluating the impact of studying hours on exam score, the number of studying hours is a predictor variable. Researchers alter the study duration to see its impact on the exam results (response variable).
Definition: A response variable—also known as the dependent or outcome variable—is what the researcher observes for any changes in an experiment or study. Its value depends on the predictor or independent variable.
Explanation: The response variable is the “effect” in a cause-and-effect scenario. Any changes occurring to this variable due to the predictor variable are observed and recorded.
Pros | Cons |
---|---|
The response variable supplies the results for the research question, offering crucial insights into the study. | It may be influenced by several predictor variables making it difficult to isolate the effect of one specific predictor. |
Response Variable Example: Continuing from the previous example, the exam score is the response variable. It changes based on the manipulation of the predictor variable, i.e., the number of studying hours.
Definition: Exogenous variables are variables that are not affected by other variables in the system but can affect other variables within the same system.
Explanation: In a model, an exogenous variable is considered to be an input, it’s determined outside the model, and its value is simply imposed on the system.
Pros | Cons |
---|---|
Exogenous variables are often used as control variables in experimental studies, making them essential for creating cause-and-effect relationships. | The relationship between exogenous variables and the dependent variable can be complex and challenging to identify precisely. |
Exogenous Variable Example: In an economic model, the government’s taxation rate may be considered an exogenous variable. The rate is set externally (not determined within the economic model) but impacts variables within the model, such as business profitability.
Definition: In contrast, endogenous variables are variables whose value is determined by the functional relationships within the system in an economic or statistical model. They depend on the values of other variables in the model.
Explanation: These are the “output” variables of a system, determined through cause-and-effect relationships within the system.
Pros | Cons |
---|---|
Endogenous variables play a significant role in understanding complex systems’ dynamics and aid in developing nuanced mathematical or statistical models. | It can be difficult to untangle the causal relationships and influences surrounding endogenous variables. |
Endogenous Variable Example: To continue the previous example, business profitability in an economic model may be considered an endogenous variable. It is influenced by several other variables within the model, including the exogenous taxation rate set by the government.
Definition: Causal variables are variables which can directly cause an effect on the outcome or dependent variable. Their value or level determines the value or level of other variables.
Explanation: In a cause-and-effect relationship, a causal variable is the cause. The understanding of causal relationships is the basis of scientific enquiry, allowing researchers to manipulate variables to see the effect.
Pros | Cons |
---|---|
Identifying and understanding causal variables can lead to practical interventions as it offers the opportunity to control or change the outcome. | Confusion can arise between correlation and causation. Just because two variables move together doesn’t necessarily mean that one causes the other to move. |
Causal Variable Example: In a study examining the effect of fertilizer on plant growth, the type or amount of fertilizer used is the causal variable. Changing its type or amount should directly affect the outcome—plant growth.
Definition: Moderator variables are variables that can affect the strength or direction of the association between the predictor (independent) and response (dependent) variable. They specify when or under what conditions a relationship holds.
Explanation: The role of a moderator is to illustrate “how” or “when” an independent variable’s effect on a dependent variable changes.
Pros | Cons |
---|---|
The identification of the moderator variables can provide a more nuanced understanding of the relationship between independent and dependent variables. | It’s often challenging to identify potential moderators and require experimental design to appropriately assess their impact. |
Moderator Variable Example: If you are studying the effect of a training program on job performance, a potential moderator variable could be the employee’s education level. The influence of the training program on job performance could depend on the employee’s initial level of education.
Definition: Mediator variables are variables that account for, or explain, the relationship between an independent variable and a dependent variable, providing an understanding of “why” or “how” an effect occurs.
Explanation: Often, the relationship between an independent and a dependent variable isn’t direct—it’s through a third, intervening, variable known as a mediator variable.
Pros | Cons |
---|---|
The identification of mediators can enhance the understanding of underlying processes or mechanisms that explain why an effect exists. | The establishment of mediation effects requires strong and complex modeling techniques, and it may be difficult to establish temporal precedence, a prerequisite for mediation. |
Mediator Variable Example: In a study looking at the relationship between socioeconomic status and academic performance, a mediator variable might be the access to educational resources. Socioeconomic status may influence access to educational resources, which in turn affects academic performance. The relationship between socioeconomic status and academic performance isn’t direct but through access to resources.
Definition: Extraneous variables are variables that are not of primary interest to a researcher but might influence the outcome of a study. They can add “noise” to the research data if not controlled.
Explanation: An extraneous variable is anything else that has the potential to influence our dependent variable or confound our results if not kept in check, other than our independent variable.
Pros | Cons |
---|---|
The identification and control of extraneous variables can improve the validity of the study’s conclusions by minimizing potential sources of bias. | These variables can confuse the outcome of a study if not adequately observed, measured, and controlled. |
Extraneous Variable Example : Consider an experiment to test whether temperature influences the rate of a chemical reaction. Potential extraneous variables could include the light level, humidity, or impurities in the chemicals used—each could affect the reaction rate and, thus, should be controlled to ensure valid results.
Definition: Dummy variables, often used in regression analysis, are artificial variables created to represent an attribute with two or more distinct categories or levels.
Explanation: They are used to turn a qualitative variable into a quantitative one to facilitate mathematical processing. Typically, dummy variables are binary – taking a value of either 0 or 1.
Pros | Cons |
---|---|
Using dummy variables allows the modelling of categorical or nominal variables in regression equations, which can only handle numerical values. | Creating too many dummy variables—known as the “dummy variable trap”—can lead to multicollinearity in regression models, making the results hard to interpret. |
Dummy Variable Example: Consider a dataset that includes a variable “Gender” with categories “male” and “female”. A corresponding dummy variable “IsMale” could be introduced, where males get classified as 1 and females as 0.
Definition: Composite variables are new variables created by combining or grouping two or more variables.
Explanation: Depending upon their complexity, composite variables can help assess concepts that are explicit (e.g., “total score”) or relatively abstract (e.g., “life quality index”).
Pros | Cons |
---|---|
They can simplify analysis by reducing the number of variables considered and may help in handling multicollinearity in statistical models. | The creation of composite variables requires careful consideration of the underlying variables that make up the composite. It might be hard to interpret and requires an understanding of the individual variables. |
Composite Variable Example: A “Healthy Living Index” might be created as a composite of multiple variables such as eating habits, physical activity level, sleep quality, and stress level. Each of these variables contributes to the overall “Healthy Living Index”.
Knowing your variables will make you a better researcher. Some you need to keep an eye out for: confounding variables , for instance, always need to be in the backs of our minds. Others you need to think about during study design, matching the research design to the research objectives.
Adams, K. A., & McGuire, E. K. (2022). Research Methods, Statistics, and Applications . SAGE Publications.
Allen, M. (2017). The SAGE Encyclopedia of Communication Research Methods (Vol. 1). New York: SAGE Publications.
Babbie, E., Halley, F., & Zaino, J. (2007). Adventures in Social Research: Data Analysis Using SPSS 14.0 and 15.0 for Windows (6th ed.). New York: SAGE Publications.
Boniface, D. R. (2019). Experiment Design and Statistical Methods For Behavioural and Social Research . CRC Press. ISBN: 9781351449298.
Christmann, E. P., & Badgett, J. L. (2009). Interpreting Assessment Data: Statistical Techniques You Can Use. New York: NSTA Press.
Coolidge, F. L. (2012). Statistics: A Gentle Introduction (3rd ed.). SAGE Publications.
Creswell, J. W., & Creswell, J. D. (2018). Research Design: Qualitative, Quantitative, and Mixed Methods Approaches . New York: SAGE Publications.
De Vaus, D. A. (2001). Research Design in Social Research . New York: SAGE Publications.
Katz, M. (2006) . Study Design and Statistical Analysis: A Practical Guide for Clinicians . Cambridge: Cambridge University Press.
Knapp, H. (2017). Intermediate Statistics Using SPSS. SAGE Publications.
Moodie, P. F., & Johnson, D. E. (2021). Applied Regression and ANOVA Using SAS. CRC Press.
Norman, G. R., & Streiner, D. L. (2008). Biostatistics: The Bare Essentials . New York: B.C. Decker.
Privitera, G. J. (2022). Research Methods for the Behavioral Sciences . New Jersey: SAGE Publications.
Your email address will not be published. Required fields are marked *
An official website of the United States government
The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.
The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.
Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .
Feroze kaliyadan.
Department of Dermatology, King Faisal University, Al Hofuf, Saudi Arabia
1 Department of Dermatology, Prayas Amrita Clinic, Pune, Maharashtra, India
This short “snippet” covers three important aspects related to statistics – the concept of variables , the importance, and practical aspects related to descriptive statistics and issues related to sampling – types of sampling and sample size estimation.
What is a variable?[ 1 , 2 ] To put it in very simple terms, a variable is an entity whose value varies. A variable is an essential component of any statistical data. It is a feature of a member of a given sample or population, which is unique, and can differ in quantity or quantity from another member of the same sample or population. Variables either are the primary quantities of interest or act as practical substitutes for the same. The importance of variables is that they help in operationalization of concepts for data collection. For example, if you want to do an experiment based on the severity of urticaria, one option would be to measure the severity using a scale to grade severity of itching. This becomes an operational variable. For a variable to be “good,” it needs to have some properties such as good reliability and validity, low bias, feasibility/practicality, low cost, objectivity, clarity, and acceptance. Variables can be classified into various ways as discussed below.
A variable can collect either qualitative or quantitative data. A variable differing in quantity is called a quantitative variable (e.g., weight of a group of patients), whereas a variable differing in quality is called a qualitative variable (e.g., the Fitzpatrick skin type)
A simple test which can be used to differentiate between qualitative and quantitative variables is the subtraction test. If you can subtract the value of one variable from the other to get a meaningful result, then you are dealing with a quantitative variable (this of course will not apply to rating scales/ranks).
Discrete variables are variables in which no values may be assumed between the two given values (e.g., number of lesions in each patient in a sample of patients with urticaria).
Continuous variables, on the other hand, can take any value in between the two given values (e.g., duration for which the weals last in the same sample of patients with urticaria). One way of differentiating between continuous and discrete variables is to use the “mid-way” test. If, for every pair of values of a variable, a value exactly mid-way between them is meaningful, the variable is continuous. For example, two values for the time taken for a weal to subside can be 10 and 13 min. The mid-way value would be 11.5 min which makes sense. However, for a number of weals, suppose you have a pair of values – 5 and 8 – the midway value would be 6.5 weals, which does not make sense.
Nominal/categorical variables are, as the name suggests, variables which can be slotted into different categories (e.g., gender or type of psoriasis).
Ordinal variables or ranked variables are similar to categorical, but can be put into an order (e.g., a scale for severity of itching).
In the context of an experimental study, the dependent variable (also called outcome variable) is directly linked to the primary outcome of the study. For example, in a clinical trial on psoriasis, the PASI (psoriasis area severity index) would possibly be one dependent variable. The independent variable (sometime also called explanatory variable) is something which is not affected by the experiment itself but which can be manipulated to affect the dependent variable. Other terms sometimes used synonymously include blocking variable, covariate, or predictor variable. Confounding variables are extra variables, which can have an effect on the experiment. They are linked with dependent and independent variables and can cause spurious association. For example, in a clinical trial for a topical treatment in psoriasis, the concomitant use of moisturizers might be a confounding variable. A control variable is a variable that must be kept constant during the course of an experiment.
Statistics can be broadly divided into descriptive statistics and inferential statistics.[ 3 , 4 ] Descriptive statistics give a summary about the sample being studied without drawing any inferences based on probability theory. Even if the primary aim of a study involves inferential statistics, descriptive statistics are still used to give a general summary. When we describe the population using tools such as frequency distribution tables, percentages, and other measures of central tendency like the mean, for example, we are talking about descriptive statistics. When we use a specific statistical test (e.g., Mann–Whitney U-test) to compare the mean scores and express it in terms of statistical significance, we are talking about inferential statistics. Descriptive statistics can help in summarizing data in the form of simple quantitative measures such as percentages or means or in the form of visual summaries such as histograms and box plots.
Descriptive statistics can be used to describe a single variable (univariate analysis) or more than one variable (bivariate/multivariate analysis). In the case of more than one variable, descriptive statistics can help summarize relationships between variables using tools such as scatter plots.
Descriptive statistics can be broadly put under two categories:
Sorting and grouping is most commonly done using frequency distribution tables. For continuous variables, it is generally better to use groups in the frequency table. Ideally, group sizes should be equal (except in extreme ends where open groups are used; e.g., age “greater than” or “less than”).
Another form of presenting frequency distributions is the “stem and leaf” diagram, which is considered to be a more accurate form of description.
Suppose the weight in kilograms of a group of 10 patients is as follows:
56, 34, 48, 43, 87, 78, 54, 62, 61, 59
The “stem” records the value of the “ten's” place (or higher) and the “leaf” records the value in the “one's” place [ Table 1 ].
Stem and leaf plot
0 | - |
1 | - |
2 | - |
3 | 4 |
4 | 3 8 |
5 | 4 6 9 |
6 | 1 2 |
7 | 8 |
8 | 7 |
9 | - |
The most common tools used for visual display include frequency diagrams, bar charts (for noncontinuous variables) and histograms (for continuous variables). Composite bar charts can be used to compare variables. For example, the frequency distribution in a sample population of males and females can be illustrated as given in Figure 1 .
Composite bar chart
A pie chart helps show how a total quantity is divided among its constituent variables. Scatter diagrams can be used to illustrate the relationship between two variables. For example, global scores given for improvement in a condition like acne by the patient and the doctor [ Figure 2 ].
Scatter diagram
The main tools used for summary statistics are broadly grouped into measures of central tendency (such as mean, median, and mode) and measures of dispersion or variation (such as range, standard deviation, and variance).
Imagine that the data below represent the weights of a sample of 15 pediatric patients arranged in ascending order:
30, 35, 37, 38, 38, 38, 42, 42, 44, 46, 47, 48, 51, 53, 86
Just having the raw data does not mean much to us, so we try to express it in terms of some values, which give a summary of the data.
The mean is basically the sum of all the values divided by the total number. In this case, we get a value of 45.
The problem is that some extreme values (outliers), like “'86,” in this case can skew the value of the mean. In this case, we consider other values like the median, which is the point that divides the distribution into two equal halves. It is also referred to as the 50 th percentile (50% of the values are above it and 50% are below it). In our previous example, since we have already arranged the values in ascending order we find that the point which divides it into two equal halves is the 8 th value – 42. In case of a total number of values being even, we choose the two middle points and take an average to reach the median.
The mode is the most common data point. In our example, this would be 38. The mode as in our case may not necessarily be in the center of the distribution.
The median is the best measure of central tendency from among the mean, median, and mode. In a “symmetric” distribution, all three are the same, whereas in skewed data the median and mean are not the same; lie more toward the skew, with the mean lying further to the skew compared with the median. For example, in Figure 3 , a right skewed distribution is seen (direction of skew is based on the tail); data values' distribution is longer on the right-hand (positive) side than on the left-hand side. The mean is typically greater than the median in such cases.
Location of mode, median, and mean
The range gives the spread between the lowest and highest values. In our previous example, this will be 86-30 = 56.
A more valuable measure is the interquartile range. A quartile is one of the values which break the distribution into four equal parts. The 25 th percentile is the data point which divides the group between the first one-fourth and the last three-fourth of the data. The first one-fourth will form the first quartile. The 75 th percentile is the data point which divides the distribution into a first three-fourth and last one-fourth (the last one-fourth being the fourth quartile). The range between the 25 th percentile and 75 th percentile is called the interquartile range.
Variance is also a measure of dispersion. The larger the variance, the further the individual units are from the mean. Let us consider the same example we used for calculating the mean. The mean was 45.
For the first value (30), the deviation from the mean will be 15; for the last value (86), the deviation will be 41. Similarly we can calculate the deviations for all values in a sample. Adding these deviations and averaging will give a clue to the total dispersion, but the problem is that since the deviations are a mix of negative and positive values, the final total becomes zero. To calculate the variance, this problem is overcome by adding squares of the deviations. So variance would be the sum of squares of the variation divided by the total number in the population (for a sample we use “n − 1”). To get a more realistic value of the average dispersion, we take the square root of the variance, which is called the “standard deviation.”
The box plot is a composite representation that portrays the mean, median, range, and the outliers [ Figure 4 ].
Skewness is a measure of the symmetry of distribution. Basically if the distribution curve is symmetric, it looks the same on either side of the central point. When this is not the case, it is said to be skewed. Kurtosis is a representation of outliers. Distributions with high kurtosis tend to have “heavy tails” indicating a larger number of outliers, whereas distributions with low kurtosis have light tails, indicating lesser outliers. There are formulas to calculate both skewness and kurtosis [Figures [Figures5 5 – 8 ].
Positive skew
High kurtosis (positive kurtosis – also called leptokurtic)
Negative skew
Low kurtosis (negative kurtosis – also called “Platykurtic”)
In an ideal study, we should be able to include all units of a particular population under study, something that is referred to as a census.[ 5 , 6 ] This would remove the chances of sampling error (difference between the outcome characteristics in a random sample when compared with the true population values – something that is virtually unavoidable when you take a random sample). However, it is obvious that this would not be feasible in most situations. Hence, we have to study a subset of the population to reach to our conclusions. This representative subset is a sample and we need to have sufficient numbers in this sample to make meaningful and accurate conclusions and reduce the effect of sampling error.
We also need to know that broadly sampling can be divided into two types – probability sampling and nonprobability sampling. Examples of probability sampling include methods such as simple random sampling (each member in a population has an equal chance of being selected), stratified random sampling (in nonhomogeneous populations, the population is divided into subgroups – followed be random sampling in each subgroup), systematic (sampling is based on a systematic technique – e.g., every third person is selected for a survey), and cluster sampling (similar to stratified sampling except that the clusters here are preexisting clusters unlike stratified sampling where the researcher decides on the stratification criteria), whereas nonprobability sampling, where every unit in the population does not have an equal chance of inclusion into the sample, includes methods such as convenience sampling (e.g., sample selected based on ease of access) and purposive sampling (where only people who meet specific criteria are included in the sample).
An accurate calculation of sample size is an essential aspect of good study design. It is important to calculate the sample size much in advance, rather than have to go for post hoc analysis. A sample size that is too less may make the study underpowered, whereas a sample size which is more than necessary might lead to a wastage of resources.
We will first go through the sample size calculation for a hypothesis-based design (like a randomized control trial).
The important factors to consider for sample size calculation include study design, type of statistical test, level of significance, power and effect size, variance (standard deviation for quantitative data), and expected proportions in the case of qualitative data. This is based on previous data, either based on previous studies or based on the clinicians' experience. In case the study is something being conducted for the first time, a pilot study might be conducted which helps generate these data for further studies based on a larger sample size). It is also important to know whether the data follow a normal distribution or not.
Two essential aspects we must understand are the concept of Type I and Type II errors. In a study that compares two groups, a null hypothesis assumes that there is no significant difference between the two groups, and any observed difference being due to sampling or experimental error. When we reject a null hypothesis, when it is true, we label it as a Type I error (also denoted as “alpha,” correlating with significance levels). In a Type II error (also denoted as “beta”), we fail to reject a null hypothesis, when the alternate hypothesis is actually true. Type II errors are usually expressed as “1- β,” correlating with the power of the test. While there are no absolute rules, the minimal levels accepted are 0.05 for α (corresponding to a significance level of 5%) and 0.20 for β (corresponding to a minimum recommended power of “1 − 0.20,” or 80%).
For a clinical trial, the investigator will have to decide in advance what clinically detectable change is significant (for numerical data, this is could be the anticipated outcome means in the two groups, whereas for categorical data, it could correlate with the proportions of successful outcomes in two groups.). While we will not go into details of the formula for sample size calculation, some important points are as follows:
In the context where effect size is involved, the sample size is inversely proportional to the square of the effect size. What this means in effect is that reducing the effect size will lead to an increase in the required sample size.
Reducing the level of significance (alpha) or increasing power (1-β) will lead to an increase in the calculated sample size.
An increase in variance of the outcome leads to an increase in the calculated sample size.
A note is that for estimation type of studies/surveys, sample size calculation needs to consider some other factors too. This includes an idea about total population size (this generally does not make a major difference when population size is above 20,000, so in situations where population size is not known we can assume a population of 20,000 or more). The other factor is the “margin of error” – the amount of deviation which the investigators find acceptable in terms of percentages. Regarding confidence levels, ideally, a 95% confidence level is the minimum recommended for surveys too. Finally, we need an idea of the expected/crude prevalence – either based on previous studies or based on estimates.
Sample size calculation also needs to add corrections for patient drop-outs/lost-to-follow-up patients and missing records. An important point is that in some studies dealing with rare diseases, it may be difficult to achieve desired sample size. In these cases, the investigators might have to rework outcomes or maybe pool data from multiple centers. Although post hoc power can be analyzed, a better approach suggested is to calculate 95% confidence intervals for the outcome and interpret the study results based on this.
Conflicts of interest.
There are no conflicts of interest.
Examples of Independent and Dependent Variables
Variables in psychology are things that can be changed or altered, such as a characteristic or value. Variables are generally used in psychology experiments to determine if changes to one thing result in changes to another.
Variables in psychology play a critical role in the research process. By systematically changing some variables in an experiment and measuring what happens as a result, researchers are able to learn more about cause-and-effect relationships.
The two main types of variables in psychology are the independent variable and the dependent variable. Both variables are important in the process of collecting data about psychological phenomena.
This article discusses different types of variables that are used in psychology research. It also covers how to operationalize these variables when conducting experiments.
Students often report problems with identifying the independent and dependent variables in an experiment. While this task can become more difficult as the complexity of an experiment increases, in a psychology experiment:
So how do you differentiate between the independent and dependent variables? Start by asking yourself what the experimenter is manipulating. The things that change, either naturally or through direct manipulation from the experimenter, are generally the independent variables. What is being measured? The dependent variable is the one that the experimenter is measuring.
Intervening variables, also sometimes called intermediate or mediator variables, are factors that play a role in the relationship between two other variables. In the previous example, sleep problems in university students are often influenced by factors such as stress. As a result, stress might be an intervening variable that plays a role in how much sleep people get, which may then influence how well they perform on exams.
Independent and dependent variables are not the only variables present in many experiments. In some cases, extraneous variables may also play a role. This type of variable is one that may have an impact on the relationship between the independent and dependent variables.
For example, in our previous example of an experiment on the effects of sleep deprivation on test performance, other factors such as age, gender, and academic background may have an impact on the results. In such cases, the experimenter will note the values of these extraneous variables so any impact can be controlled for.
There are two basic types of extraneous variables:
Other extraneous variables include the following:
In many cases, extraneous variables are controlled for by the experimenter. A controlled variable is one that is held constant throughout an experiment.
In the case of participant variables, the experiment might select participants that are the same in background and temperament to ensure that these factors don't interfere with the results. Holding these variables constant is important for an experiment because it allows researchers to be sure that all other variables remain the same across all conditions.
Using controlled variables means that when changes occur, the researchers can be sure that these changes are due to the manipulation of the independent variable and not caused by changes in other variables.
It is important to also note that a controlled variable is not the same thing as a control group . The control group in a study is the group of participants who do not receive the treatment or change in the independent variable.
All other variables between the control group and experimental group are held constant (i.e., they are controlled). The dependent variable being measured is then compared between the control group and experimental group to see what changes occurred because of the treatment.
If a variable cannot be controlled for, it becomes what is known as a confounding variabl e. This type of variable can have an impact on the dependent variable, which can make it difficult to determine if the results are due to the influence of the independent variable, the confounding variable, or an interaction of the two.
An operational definition describes how the variables are measured and defined in the study. Before conducting a psychology experiment , it is essential to create firm operational definitions for both the independent variable and dependent variables.
For example, in our imaginary experiment on the effects of sleep deprivation on test performance, we would need to create very specific operational definitions for our two variables. If our hypothesis is "Students who are sleep deprived will score significantly lower on a test," then we would have a few different concepts to define:
Once all the variables are operationalized, we're ready to conduct the experiment.
Variables play an important part in psychology research. Manipulating an independent variable and measuring the dependent variable allows researchers to determine if there is a cause-and-effect relationship between them.
Understanding the different types of variables used in psychology research is important if you want to conduct your own psychology experiments. It is also helpful for people who want to better understand what the results of psychology research really mean and become more informed consumers of psychology information .
Independent and dependent variables are used in experimental research. Unlike some other types of research (such as correlational studies ), experiments allow researchers to evaluate cause-and-effect relationships between two variables.
Researchers can use statistical analyses to determine the strength of a relationship between two variables in an experiment. Two of the most common ways to do this are to calculate a p-value or a correlation. The p-value indicates if the results are statistically significant while the correlation can indicate the strength of the relationship.
In an experiment on how sugar affects short-term memory, sugar intake would be the independent variable and scores on a short-term memory task would be the independent variable.
In an experiment looking at how caffeine intake affects test anxiety, the amount of caffeine consumed before a test would be the independent variable and scores on a test anxiety assessment would be the dependent variable.
Just as with other types of research, the independent variable in a cognitive psychology study would be the variable that the researchers manipulate. The specific independent variable would vary depending on the specific study, but it might be focused on some aspect of thinking, memory, attention, language, or decision-making.
American Psychological Association. Operational definition . APA Dictionary of Psychology.
American Psychological Association. Mediator . APA Dictionary of Psychology.
Altun I, Cınar N, Dede C. The contributing factors to poor sleep experiences in according to the university students: A cross-sectional study . J Res Med Sci . 2012;17(6):557-561. PMID:23626634
Skelly AC, Dettori JR, Brodt ED. Assessing bias: The importance of considering confounding . Evid Based Spine Care J . 2012;3(1):9-12. doi:10.1055/s-0031-1298595
By Kendra Cherry, MSEd Kendra Cherry, MS, is a psychosocial rehabilitation specialist, psychology educator, and author of the "Everything Psychology Book."
Advertisement
In any scientific research, there are typically two variables of interest: independent variables and dependent variables. In forming the backbone of scientific experiments , they help scientists understand relationships, predict outcomes and, in general, make sense of the factors that they're investigating.
Understanding the independent variable vs. dependent variable is so fundamental to scientific research that you need to have a good handle on both if you want to design your own research study or interpret others' findings.
To grasp the distinction between the two, let's delve into their definitions and roles.
What is a dependent variable, research study example, predictor variables vs. outcome variables, other variables, the relationship between independent and dependent variables.
The independent variable, often denoted as X, is the variable that is manipulated or controlled by the researcher intentionally. It's the factor that researchers believe may have a causal effect on the dependent variable.
In simpler terms, the independent variable is the variable you change or vary in an experiment so you can observe its impact on the dependent variable.
The dependent variable, often represented as Y, is the variable that is observed and measured to determine the outcome of the experiment.
In other words, the dependent variable is the variable that is affected by the changes in the independent variable. The values of the dependent variable always depend on the independent variable.
Let's consider an example to illustrate these concepts. Imagine you're conducting a research study aiming to investigate the effect of studying techniques on test scores among students.
In this scenario, the independent variable manipulated would be the studying technique, which you could vary by employing different methods, such as spaced repetition, summarization or practice testing.
The dependent variable, in this case, would be the test scores of the students. As the researcher following the scientific method , you would manipulate the independent variable (the studying technique) and then measure its impact on the dependent variable (the test scores).
You can also categorize variables as predictor variables or outcome variables. Sometimes a researcher will refer to the independent variable as the predictor variable since they use it to predict or explain changes in the dependent variable, which is also known as the outcome variable.
When conducting an experiment or study, it's crucial to acknowledge the presence of other variables, or extraneous variables, which may influence the outcome of the experiment but are not the focus of study.
These variables can potentially confound the results if they aren't controlled. In the example from above, other variables might include the students' prior knowledge, level of motivation, time spent studying and preferred learning style.
As a researcher, it would be your goal to control these extraneous variables to ensure you can attribute any observed differences in the dependent variable to changes in the independent variable. In practice, however, it's not always possible to control every variable.
The distinction between independent and dependent variables is essential for designing and conducting research studies and experiments effectively.
By manipulating the independent variable and measuring its impact on the dependent variable while controlling for other factors, researchers can gain insights into the factors that influence outcomes in their respective fields.
Whether investigating the effects of a new drug on blood pressure or studying the relationship between socioeconomic factors and academic performance, understanding the role of independent and dependent variables is essential for advancing knowledge and making informed decisions.
Understanding the relationship between independent and dependent variables is essential for making sense of research findings. Depending on the nature of this relationship, researchers may identify correlations or infer causation between the variables.
Correlation implies that changes in one variable are associated with changes in another variable, while causation suggests that changes in the independent variable directly cause changes in the dependent variable.
In experimental research, the researcher has control over the independent variable, allowing them to manipulate it to observe its effects on the dependent variable. This controlled manipulation distinguishes experiments from other types of research designs.
For example, in observational studies, researchers merely observe variables without intervention, meaning they don't control or manipulate any variables.
Whether it's intentional or unintentional, independent, dependent and other variables can vary in different contexts, and their effects may differ based on various factors, such as age, characteristics of the participants, environmental influences and so on.
Researchers employ statistical analysis techniques to measure and analyze the relationships between these variables, helping them to draw meaningful conclusions from their data.
We created this article in conjunction with AI technology, then made sure it was fact-checked and edited by a HowStuffWorks editor.
Please copy/paste the following text to properly cite this HowStuffWorks.com article:
What are independent and dependent variables.
You can think of independent and dependent variables in terms of cause and effect: an independent variable is the variable you think is the cause , while a dependent variable is the effect .
In an experiment, you manipulate the independent variable and measure the outcome in the dependent variable. For example, in an experiment about the effect of nutrients on crop growth:
Defining your variables, and deciding how you will manipulate and measure them, is an important part of experimental design .
Attrition refers to participants leaving a study. It always happens to some extent—for example, in randomized controlled trials for medical research.
Differential attrition occurs when attrition or dropout rates differ systematically between the intervention and the control group . As a result, the characteristics of the participants who drop out differ from the characteristics of those who stay in the study. Because of this, study results may be biased .
Action research is conducted in order to solve a particular issue immediately, while case studies are often conducted over a longer period of time and focus more on observing and analyzing a particular ongoing phenomenon.
Action research is focused on solving a problem or informing individual and community-based knowledge in a way that impacts teaching, learning, and other related processes. It is less focused on contributing theoretical input, instead producing actionable input.
Action research is particularly popular with educators as a form of systematic inquiry because it prioritizes reflection and bridges the gap between theory and practice. Educators are able to simultaneously investigate an issue as they solve it, and the method is very iterative and flexible.
A cycle of inquiry is another name for action research . It is usually visualized in a spiral shape following a series of steps, such as “planning → acting → observing → reflecting.”
To make quantitative observations , you need to use instruments that are capable of measuring the quantity you want to observe. For example, you might use a ruler to measure the length of an object or a thermometer to measure its temperature.
Criterion validity and construct validity are both types of measurement validity . In other words, they both show you how accurately a method measures something.
While construct validity is the degree to which a test or other measurement method measures what it claims to measure, criterion validity is the degree to which a test can predictively (in the future) or concurrently (in the present) measure something.
Construct validity is often considered the overarching type of measurement validity . You need to have face validity , content validity , and criterion validity in order to achieve construct validity.
Convergent validity and discriminant validity are both subtypes of construct validity . Together, they help you evaluate whether a test measures the concept it was designed to measure.
You need to assess both in order to demonstrate construct validity. Neither one alone is sufficient for establishing construct validity.
Content validity shows you how accurately a test or other measurement method taps into the various aspects of the specific construct you are researching.
In other words, it helps you answer the question: “does the test measure all aspects of the construct I want to measure?” If it does, then the test has high content validity.
The higher the content validity, the more accurate the measurement of the construct.
If the test fails to include parts of the construct, or irrelevant parts are included, the validity of the instrument is threatened, which brings your results into question.
Face validity and content validity are similar in that they both evaluate how suitable the content of a test is. The difference is that face validity is subjective, and assesses content at surface level.
When a test has strong face validity, anyone would agree that the test’s questions appear to measure what they are intended to measure.
For example, looking at a 4th grade math test consisting of problems in which students have to add and multiply, most people would agree that it has strong face validity (i.e., it looks like a math test).
On the other hand, content validity evaluates how well a test represents all the aspects of a topic. Assessing content validity is more systematic and relies on expert evaluation. of each question, analyzing whether each one covers the aspects that the test was designed to cover.
A 4th grade math test would have high content validity if it covered all the skills taught in that grade. Experts(in this case, math teachers), would have to evaluate the content validity by comparing the test to the learning objectives.
Snowball sampling is a non-probability sampling method . Unlike probability sampling (which involves some form of random selection ), the initial individuals selected to be studied are the ones who recruit new participants.
Because not every member of the target population has an equal chance of being recruited into the sample, selection in snowball sampling is non-random.
Snowball sampling is a non-probability sampling method , where there is not an equal chance for every member of the population to be included in the sample .
This means that you cannot use inferential statistics and make generalizations —often the goal of quantitative research . As such, a snowball sample is not representative of the target population and is usually a better fit for qualitative research .
Snowball sampling relies on the use of referrals. Here, the researcher recruits one or more initial participants, who then recruit the next ones.
Participants share similar characteristics and/or know each other. Because of this, not every member of the population has an equal chance of being included in the sample, giving rise to sampling bias .
Snowball sampling is best used in the following cases:
The reproducibility and replicability of a study can be ensured by writing a transparent, detailed method section and using clear, unambiguous language.
Reproducibility and replicability are related terms.
Stratified sampling and quota sampling both involve dividing the population into subgroups and selecting units from each subgroup. The purpose in both cases is to select a representative sample and/or to allow comparisons between subgroups.
The main difference is that in stratified sampling, you draw a random sample from each subgroup ( probability sampling ). In quota sampling you select a predetermined number or proportion of units, in a non-random manner ( non-probability sampling ).
Purposive and convenience sampling are both sampling methods that are typically used in qualitative data collection.
A convenience sample is drawn from a source that is conveniently accessible to the researcher. Convenience sampling does not distinguish characteristics among the participants. On the other hand, purposive sampling focuses on selecting participants possessing characteristics associated with the research study.
The findings of studies based on either convenience or purposive sampling can only be generalized to the (sub)population from which the sample is drawn, and not to the entire population.
Random sampling or probability sampling is based on random selection. This means that each unit has an equal chance (i.e., equal probability) of being included in the sample.
On the other hand, convenience sampling involves stopping people at random, which means that not everyone has an equal chance of being selected depending on the place, time, or day you are collecting your data.
Convenience sampling and quota sampling are both non-probability sampling methods. They both use non-random criteria like availability, geographical proximity, or expert knowledge to recruit study participants.
However, in convenience sampling, you continue to sample units or cases until you reach the required sample size.
In quota sampling, you first need to divide your population of interest into subgroups (strata) and estimate their proportions (quota) in the population. Then you can start your data collection, using convenience sampling to recruit participants, until the proportions in each subgroup coincide with the estimated proportions in the population.
A sampling frame is a list of every member in the entire population . It is important that the sampling frame is as complete as possible, so that your sample accurately reflects your population.
Stratified and cluster sampling may look similar, but bear in mind that groups created in cluster sampling are heterogeneous , so the individual characteristics in the cluster vary. In contrast, groups created in stratified sampling are homogeneous , as units share characteristics.
Relatedly, in cluster sampling you randomly select entire groups and include all units of each group in your sample. However, in stratified sampling, you select some units of all groups and include them in your sample. In this way, both methods can ensure that your sample is representative of the target population .
A systematic review is secondary research because it uses existing research. You don’t collect new data yourself.
The key difference between observational studies and experimental designs is that a well-done observational study does not influence the responses of participants, while experiments do have some sort of treatment condition applied to at least some participants by random assignment .
An observational study is a great choice for you if your research question is based purely on observations. If there are ethical, logistical, or practical concerns that prevent you from conducting a traditional experiment , an observational study may be a good choice. In an observational study, there is no interference or manipulation of the research subjects, as well as no control or treatment groups .
It’s often best to ask a variety of people to review your measurements. You can ask experts, such as other researchers, or laypeople, such as potential participants, to judge the face validity of tests.
While experts have a deep understanding of research methods , the people you’re studying can provide you with valuable insights you may have missed otherwise.
Face validity is important because it’s a simple first step to measuring the overall validity of a test or technique. It’s a relatively intuitive, quick, and easy way to start checking whether a new measure seems useful at first glance.
Good face validity means that anyone who reviews your measure says that it seems to be measuring what it’s supposed to. With poor face validity, someone reviewing your measure may be left confused about what you’re measuring and why you’re using this method.
Face validity is about whether a test appears to measure what it’s supposed to measure. This type of validity is concerned with whether a measure seems relevant and appropriate for what it’s assessing only on the surface.
Statistical analyses are often applied to test validity with data from your measures. You test convergent validity and discriminant validity with correlations to see if results from your test are positively or negatively related to those of other established tests.
You can also use regression analyses to assess whether your measure is actually predictive of outcomes that you expect it to predict theoretically. A regression analysis that supports your expectations strengthens your claim of construct validity .
When designing or evaluating a measure, construct validity helps you ensure you’re actually measuring the construct you’re interested in. If you don’t have construct validity, you may inadvertently measure unrelated or distinct constructs and lose precision in your research.
Construct validity is often considered the overarching type of measurement validity , because it covers all of the other types. You need to have face validity , content validity , and criterion validity to achieve construct validity.
Construct validity is about how well a test measures the concept it was designed to evaluate. It’s one of four types of measurement validity , which includes construct validity, face validity , and criterion validity.
There are two subtypes of construct validity.
Naturalistic observation is a valuable tool because of its flexibility, external validity , and suitability for topics that can’t be studied in a lab setting.
The downsides of naturalistic observation include its lack of scientific control , ethical considerations , and potential for bias from observers and subjects.
Naturalistic observation is a qualitative research method where you record the behaviors of your research subjects in real world settings. You avoid interfering or influencing anything in a naturalistic observation.
You can think of naturalistic observation as “people watching” with a purpose.
A dependent variable is what changes as a result of the independent variable manipulation in experiments . It’s what you’re interested in measuring, and it “depends” on your independent variable.
In statistics, dependent variables are also called:
An independent variable is the variable you manipulate, control, or vary in an experimental study to explore its effects. It’s called “independent” because it’s not influenced by any other variables in the study.
Independent variables are also called:
As a rule of thumb, questions related to thoughts, beliefs, and feelings work well in focus groups. Take your time formulating strong questions, paying special attention to phrasing. Be careful to avoid leading questions , which can bias your responses.
Overall, your focus group questions should be:
A structured interview is a data collection method that relies on asking questions in a set order to collect data on a topic. They are often quantitative in nature. Structured interviews are best used when:
More flexible interview options include semi-structured interviews , unstructured interviews , and focus groups .
Social desirability bias is the tendency for interview participants to give responses that will be viewed favorably by the interviewer or other participants. It occurs in all types of interviews and surveys , but is most common in semi-structured interviews , unstructured interviews , and focus groups .
Social desirability bias can be mitigated by ensuring participants feel at ease and comfortable sharing their views. Make sure to pay attention to your own body language and any physical or verbal cues, such as nodding or widening your eyes.
This type of bias can also occur in observations if the participants know they’re being observed. They might alter their behavior accordingly.
The interviewer effect is a type of bias that emerges when a characteristic of an interviewer (race, age, gender identity, etc.) influences the responses given by the interviewee.
There is a risk of an interviewer effect in all types of interviews , but it can be mitigated by writing really high-quality interview questions.
A semi-structured interview is a blend of structured and unstructured types of interviews. Semi-structured interviews are best used when:
An unstructured interview is the most flexible type of interview, but it is not always the best fit for your research topic.
Unstructured interviews are best used when:
The four most common types of interviews are:
Deductive reasoning is commonly used in scientific research, and it’s especially associated with quantitative research .
In research, you might have come across something called the hypothetico-deductive method . It’s the scientific method of testing hypotheses to check whether your predictions are substantiated by real-world data.
Deductive reasoning is a logical approach where you progress from general ideas to specific conclusions. It’s often contrasted with inductive reasoning , where you start with specific observations and form general conclusions.
Deductive reasoning is also called deductive logic.
There are many different types of inductive reasoning that people use formally or informally.
Here are a few common types:
Inductive reasoning is a bottom-up approach, while deductive reasoning is top-down.
Inductive reasoning takes you from the specific to the general, while in deductive reasoning, you make inferences by going from general premises to specific conclusions.
In inductive research , you start by making observations or gathering data. Then, you take a broad scan of your data and search for patterns. Finally, you make general conclusions that you might incorporate into theories.
Inductive reasoning is a method of drawing conclusions by going from the specific to the general. It’s usually contrasted with deductive reasoning, where you proceed from general information to specific conclusions.
Inductive reasoning is also called inductive logic or bottom-up reasoning.
A hypothesis states your predictions about what your research will find. It is a tentative answer to your research question that has not yet been tested. For some research projects, you might have to write several hypotheses that address different aspects of your research question.
A hypothesis is not just a guess — it should be based on existing theories and knowledge. It also has to be testable, which means you can support or refute it through scientific research methods (such as experiments, observations and statistical analysis of data).
Triangulation can help:
But triangulation can also pose problems:
There are four main types of triangulation :
Many academic fields use peer review , largely to determine whether a manuscript is suitable for publication. Peer review enhances the credibility of the published manuscript.
However, peer review is also common in non-academic settings. The United Nations, the European Union, and many individual nations use peer review to evaluate grant applications. It is also widely used in medical and health-related fields as a teaching or quality-of-care measure.
Peer assessment is often used in the classroom as a pedagogical tool. Both receiving feedback and providing it are thought to enhance the learning process, helping students think critically and collaboratively.
Peer review can stop obviously problematic, falsified, or otherwise untrustworthy research from being published. It also represents an excellent opportunity to get feedback from renowned experts in your field. It acts as a first defense, helping you ensure your argument is clear and that there are no gaps, vague terms, or unanswered questions for readers who weren’t involved in the research process.
Peer-reviewed articles are considered a highly credible source due to this stringent process they go through before publication.
In general, the peer review process follows the following steps:
Exploratory research is often used when the issue you’re studying is new or when the data collection process is challenging for some reason.
You can use exploratory research if you have a general idea or a specific question that you want to study but there is no preexisting knowledge or paradigm with which to study it.
Exploratory research is a methodology approach that explores research questions that have not previously been studied in depth. It is often used when the issue you’re studying is new, or the data collection process is challenging in some way.
Explanatory research is used to investigate how or why a phenomenon occurs. Therefore, this type of research is often one of the first stages in the research process , serving as a jumping-off point for future research.
Exploratory research aims to explore the main aspects of an under-researched problem, while explanatory research aims to explain the causes and consequences of a well-defined problem.
Explanatory research is a research method used to investigate how or why something occurs when only a small amount of information is available pertaining to that topic. It can help you increase your understanding of a given topic.
Clean data are valid, accurate, complete, consistent, unique, and uniform. Dirty data include inconsistencies and errors.
Dirty data can come from any part of the research process, including poor research design , inappropriate measurement materials, or flawed data entry.
Data cleaning takes place between data collection and data analyses. But you can use some methods even before collecting data.
For clean data, you should start by designing measures that collect valid data. Data validation at the time of data entry or collection helps you minimize the amount of data cleaning you’ll need to do.
After data collection, you can use data standardization and data transformation to clean your data. You’ll also deal with any missing values, outliers, and duplicate values.
Every dataset requires different techniques to clean dirty data , but you need to address these issues in a systematic way. You focus on finding and resolving data points that don’t agree or fit with the rest of your dataset.
These data might be missing values, outliers, duplicate values, incorrectly formatted, or irrelevant. You’ll start with screening and diagnosing your data. Then, you’ll often standardize and accept or remove data to make your dataset consistent and valid.
Data cleaning is necessary for valid and appropriate analyses. Dirty data contain inconsistencies or errors , but cleaning your data helps you minimize or resolve these.
Without data cleaning, you could end up with a Type I or II error in your conclusion. These types of erroneous conclusions can be practically significant with important consequences, because they lead to misplaced investments or missed opportunities.
Data cleaning involves spotting and resolving potential data inconsistencies or errors to improve your data quality. An error is any value (e.g., recorded weight) that doesn’t reflect the true value (e.g., actual weight) of something that’s being measured.
In this process, you review, analyze, detect, modify, or remove “dirty” data to make your dataset “clean.” Data cleaning is also called data cleansing or data scrubbing.
Research misconduct means making up or falsifying data, manipulating data analyses, or misrepresenting results in research reports. It’s a form of academic fraud.
These actions are committed intentionally and can have serious consequences; research misconduct is not a simple mistake or a point of disagreement but a serious ethical failure.
Anonymity means you don’t know who the participants are, while confidentiality means you know who they are but remove identifying information from your research report. Both are important ethical considerations .
You can only guarantee anonymity by not collecting any personally identifying information—for example, names, phone numbers, email addresses, IP addresses, physical characteristics, photos, or videos.
You can keep data confidential by using aggregate information in your research report, so that you only refer to groups of participants rather than individuals.
Research ethics matter for scientific integrity, human rights and dignity, and collaboration between science and society. These principles make sure that participation in studies is voluntary, informed, and safe.
Ethical considerations in research are a set of principles that guide your research designs and practices. These principles include voluntary participation, informed consent, anonymity, confidentiality, potential for harm, and results communication.
Scientists and researchers must always adhere to a certain code of conduct when collecting data from others .
These considerations protect the rights of research participants, enhance research validity , and maintain scientific integrity.
In multistage sampling , you can use probability or non-probability sampling methods .
For a probability sample, you have to conduct probability sampling at every stage.
You can mix it up by using simple random sampling , systematic sampling , or stratified sampling to select units at different stages, depending on what is applicable and relevant to your study.
Multistage sampling can simplify data collection when you have large, geographically spread samples, and you can obtain a probability sample without a complete sampling frame.
But multistage sampling may not lead to a representative sample, and larger samples are needed for multistage samples to achieve the statistical properties of simple random samples .
These are four of the most common mixed methods designs :
Triangulation in research means using multiple datasets, methods, theories and/or investigators to address a research question. It’s a research strategy that can help you enhance the validity and credibility of your findings.
Triangulation is mainly used in qualitative research , but it’s also commonly applied in quantitative research . Mixed methods research always uses triangulation.
In multistage sampling , or multistage cluster sampling, you draw a sample from a population using smaller and smaller groups at each stage.
This method is often used to collect data from a large, geographically spread group of people in national surveys, for example. You take advantage of hierarchical groupings (e.g., from state to city to neighborhood) to create a sample that’s less expensive and time-consuming to collect data from.
No, the steepness or slope of the line isn’t related to the correlation coefficient value. The correlation coefficient only tells you how closely your data fit on a line, so two datasets with the same correlation coefficient can have very different slopes.
To find the slope of the line, you’ll need to perform a regression analysis .
Correlation coefficients always range between -1 and 1.
The sign of the coefficient tells you the direction of the relationship: a positive value means the variables change together in the same direction, while a negative value means they change together in opposite directions.
The absolute value of a number is equal to the number without its sign. The absolute value of a correlation coefficient tells you the magnitude of the correlation: the greater the absolute value, the stronger the correlation.
These are the assumptions your data must meet if you want to use Pearson’s r :
Quantitative research designs can be divided into two main categories:
Qualitative research designs tend to be more flexible. Common types of qualitative design include case study , ethnography , and grounded theory designs.
A well-planned research design helps ensure that your methods match your research aims, that you collect high-quality data, and that you use the right kind of analysis to answer your questions, utilizing credible sources . This allows you to draw valid , trustworthy conclusions.
The priorities of a research design can vary depending on the field, but you usually have to specify:
A research design is a strategy for answering your research question . It defines your overall approach and determines how you will collect and analyze data.
Questionnaires can be self-administered or researcher-administered.
Self-administered questionnaires can be delivered online or in paper-and-pen formats, in person or through mail. All questions are standardized so that all respondents receive the same questions with identical wording.
Researcher-administered questionnaires are interviews that take place by phone, in-person, or online between researchers and respondents. You can gain deeper insights by clarifying questions for respondents or asking follow-up questions.
You can organize the questions logically, with a clear progression from simple to complex, or randomly between respondents. A logical flow helps respondents process the questionnaire easier and quicker, but it may lead to bias. Randomization can minimize the bias from order effects.
Closed-ended, or restricted-choice, questions offer respondents a fixed set of choices to select from. These questions are easier to answer quickly.
Open-ended or long-form questions allow respondents to answer in their own words. Because there are no restrictions on their choices, respondents can answer in ways that researchers may not have otherwise considered.
A questionnaire is a data collection tool or instrument, while a survey is an overarching research method that involves collecting and analyzing data from people using questionnaires.
The third variable and directionality problems are two main reasons why correlation isn’t causation .
The third variable problem means that a confounding variable affects both variables to make them seem causally related when they are not.
The directionality problem is when two variables correlate and might actually have a causal relationship, but it’s impossible to conclude which variable causes changes in the other.
Correlation describes an association between variables : when one variable changes, so does the other. A correlation is a statistical indicator of the relationship between variables.
Causation means that changes in one variable brings about changes in the other (i.e., there is a cause-and-effect relationship between variables). The two variables are correlated with each other, and there’s also a causal link between them.
While causation and correlation can exist simultaneously, correlation does not imply causation. In other words, correlation is simply a relationship where A relates to B—but A doesn’t necessarily cause B to happen (or vice versa). Mistaking correlation for causation is a common error and can lead to false cause fallacy .
Controlled experiments establish causality, whereas correlational studies only show associations between variables.
In general, correlational research is high in external validity while experimental research is high in internal validity .
A correlation is usually tested for two variables at a time, but you can test correlations between three or more variables.
A correlation coefficient is a single number that describes the strength and direction of the relationship between your variables.
Different types of correlation coefficients might be appropriate for your data based on their levels of measurement and distributions . The Pearson product-moment correlation coefficient (Pearson’s r ) is commonly used to assess a linear relationship between two quantitative variables.
A correlational research design investigates relationships between two variables (or more) without the researcher controlling or manipulating any of them. It’s a non-experimental type of quantitative research .
A correlation reflects the strength and/or direction of the association between two or more variables.
Random error is almost always present in scientific studies, even in highly controlled settings. While you can’t eradicate it completely, you can reduce random error by taking repeated measurements, using a large sample, and controlling extraneous variables .
You can avoid systematic error through careful design of your sampling , data collection , and analysis procedures. For example, use triangulation to measure your variables using multiple methods; regularly calibrate instruments or procedures; use random sampling and random assignment ; and apply masking (blinding) where possible.
Systematic error is generally a bigger problem in research.
With random error, multiple measurements will tend to cluster around the true value. When you’re collecting data from a large sample , the errors in different directions will cancel each other out.
Systematic errors are much more problematic because they can skew your data away from the true value. This can lead you to false conclusions ( Type I and II errors ) about the relationship between the variables you’re studying.
Random and systematic error are two types of measurement error.
Random error is a chance difference between the observed and true values of something (e.g., a researcher misreading a weighing scale records an incorrect measurement).
Systematic error is a consistent or proportional difference between the observed and true values of something (e.g., a miscalibrated scale consistently records weights as higher than they actually are).
On graphs, the explanatory variable is conventionally placed on the x-axis, while the response variable is placed on the y-axis.
The term “ explanatory variable ” is sometimes preferred over “ independent variable ” because, in real world contexts, independent variables are often influenced by other variables. This means they aren’t totally independent.
Multiple independent variables may also be correlated with each other, so “explanatory variables” is a more appropriate term.
The difference between explanatory and response variables is simple:
In a controlled experiment , all extraneous variables are held constant so that they can’t influence the results. Controlled experiments require:
Depending on your study topic, there are various other methods of controlling variables .
There are 4 main types of extraneous variables :
An extraneous variable is any variable that you’re not investigating that can potentially affect the dependent variable of your research study.
A confounding variable is a type of extraneous variable that not only affects the dependent variable, but is also related to the independent variable.
In a factorial design, multiple independent variables are tested.
If you test two variables, each level of one independent variable is combined with each level of the other independent variable to create different conditions.
Within-subjects designs have many potential threats to internal validity , but they are also very statistically powerful .
Advantages:
Disadvantages:
While a between-subjects design has fewer threats to internal validity , it also requires more participants for high statistical power than a within-subjects design .
Yes. Between-subjects and within-subjects designs can be combined in a single study when you have two or more independent variables (a factorial design). In a mixed factorial design, one variable is altered between subjects and another is altered within subjects.
In a between-subjects design , every participant experiences only one condition, and researchers assess group differences between participants in various conditions.
In a within-subjects design , each participant experiences all conditions, and researchers test the same participants repeatedly for differences between conditions.
The word “between” means that you’re comparing different conditions between groups, while the word “within” means you’re comparing different conditions within the same group.
Random assignment is used in experiments with a between-groups or independent measures design. In this research design, there’s usually a control group and one or more experimental groups. Random assignment helps ensure that the groups are comparable.
In general, you should always use random assignment in this type of experimental design when it is ethically possible and makes sense for your study topic.
To implement random assignment , assign a unique number to every member of your study’s sample .
Then, you can use a random number generator or a lottery method to randomly assign each number to a control or experimental group. You can also do so manually, by flipping a coin or rolling a dice to randomly assign participants to groups.
Random selection, or random sampling , is a way of selecting members of a population for your study’s sample.
In contrast, random assignment is a way of sorting the sample into control and experimental groups.
Random sampling enhances the external validity or generalizability of your results, while random assignment improves the internal validity of your study.
In experimental research, random assignment is a way of placing participants from your sample into different groups using randomization. With this method, every member of the sample has a known or equal chance of being placed in a control group or an experimental group.
“Controlling for a variable” means measuring extraneous variables and accounting for them statistically to remove their effects on other variables.
Researchers often model control variable data along with independent and dependent variable data in regression analyses and ANCOVAs . That way, you can isolate the control variable’s effects from the relationship between the variables of interest.
Control variables help you establish a correlational or causal relationship between variables by enhancing internal validity .
If you don’t control relevant extraneous variables , they may influence the outcomes of your study, and you may not be able to demonstrate that your results are really an effect of your independent variable .
A control variable is any variable that’s held constant in a research study. It’s not a variable of interest in the study, but it’s controlled because it could influence the outcomes.
Including mediators and moderators in your research helps you go beyond studying a simple relationship between two variables for a fuller picture of the real world. They are important to consider when studying complex correlational or causal relationships.
Mediators are part of the causal pathway of an effect, and they tell you how or why an effect takes place. Moderators usually help you judge the external validity of your study by identifying the limitations of when the relationship between variables holds.
If something is a mediating variable :
A confounder is a third variable that affects variables of interest and makes them seem related when they are not. In contrast, a mediator is the mechanism of a relationship between two variables: it explains the process by which they are related.
A mediator variable explains the process through which two variables are related, while a moderator variable affects the strength and direction of that relationship.
There are three key steps in systematic sampling :
Systematic sampling is a probability sampling method where researchers select members of the population at a regular interval – for example, by selecting every 15th person on a list of the population. If the population is in a random order, this can imitate the benefits of simple random sampling .
Yes, you can create a stratified sample using multiple characteristics, but you must ensure that every participant in your study belongs to one and only one subgroup. In this case, you multiply the numbers of subgroups for each characteristic to get the total number of groups.
For example, if you were stratifying by location with three subgroups (urban, rural, or suburban) and marital status with five subgroups (single, divorced, widowed, married, or partnered), you would have 3 x 5 = 15 subgroups.
You should use stratified sampling when your sample can be divided into mutually exclusive and exhaustive subgroups that you believe will take on different mean values for the variable that you’re studying.
Using stratified sampling will allow you to obtain more precise (with lower variance ) statistical estimates of whatever you are trying to measure.
For example, say you want to investigate how income differs based on educational attainment, but you know that this relationship can vary based on race. Using stratified sampling, you can ensure you obtain a large enough sample from each racial group, allowing you to draw more precise conclusions.
In stratified sampling , researchers divide subjects into subgroups called strata based on characteristics that they share (e.g., race, gender, educational attainment).
Once divided, each subgroup is randomly sampled using another probability sampling method.
Cluster sampling is more time- and cost-efficient than other probability sampling methods , particularly when it comes to large samples spread across a wide geographical area.
However, it provides less statistical certainty than other methods, such as simple random sampling , because it is difficult to ensure that your clusters properly represent the population as a whole.
There are three types of cluster sampling : single-stage, double-stage and multi-stage clustering. In all three types, you first divide the population into clusters, then randomly select clusters for use in your sample.
Cluster sampling is a probability sampling method in which you divide a population into clusters, such as districts or schools, and then randomly select some of these clusters as your sample.
The clusters should ideally each be mini-representations of the population as a whole.
If properly implemented, simple random sampling is usually the best sampling method for ensuring both internal and external validity . However, it can sometimes be impractical and expensive to implement, depending on the size of the population to be studied,
If you have a list of every member of the population and the ability to reach whichever members are selected, you can use simple random sampling.
The American Community Survey is an example of simple random sampling . In order to collect detailed data on the population of the US, the Census Bureau officials randomly select 3.5 million households per year and use a variety of methods to convince them to fill out the survey.
Simple random sampling is a type of probability sampling in which the researcher randomly selects a subset of participants from a population . Each member of the population has an equal chance of being selected. Data is then collected from as large a percentage as possible of this random subset.
Quasi-experimental design is most useful in situations where it would be unethical or impractical to run a true experiment .
Quasi-experiments have lower internal validity than true experiments, but they often have higher external validity as they can use real-world interventions instead of artificial laboratory settings.
A quasi-experiment is a type of research design that attempts to establish a cause-and-effect relationship. The main difference with a true experiment is that the groups are not randomly assigned.
Blinding is important to reduce research bias (e.g., observer bias , demand characteristics ) and ensure a study’s internal validity .
If participants know whether they are in a control or treatment group , they may adjust their behavior in ways that affect the outcome that researchers are trying to measure. If the people administering the treatment are aware of group assignment, they may treat participants differently and thus directly or indirectly influence the final results.
Blinding means hiding who is assigned to the treatment group and who is assigned to the control group in an experiment .
A true experiment (a.k.a. a controlled experiment) always includes at least one control group that doesn’t receive the experimental treatment.
However, some experiments use a within-subjects design to test treatments without a control group. In these designs, you usually compare one group’s outcomes before and after a treatment (instead of comparing outcomes between different groups).
For strong internal validity , it’s usually best to include a control group if possible. Without a control group, it’s harder to be certain that the outcome was caused by the experimental treatment and not by other variables.
An experimental group, also known as a treatment group, receives the treatment whose effect researchers wish to study, whereas a control group does not. They should be identical in all other ways.
Individual Likert-type questions are generally considered ordinal data , because the items have clear rank order, but don’t have an even distribution.
Overall Likert scale scores are sometimes treated as interval data. These scores are considered to have directionality and even spacing between them.
The type of data determines what statistical tests you should use to analyze your data.
A Likert scale is a rating scale that quantitatively assesses opinions, attitudes, or behaviors. It is made up of 4 or more questions that measure a single attitude or trait when response scores are combined.
To use a Likert scale in a survey , you present participants with Likert-type questions or statements, and a continuum of items, usually with 5 or 7 possible responses, to capture their degree of agreement.
In scientific research, concepts are the abstract ideas or phenomena that are being studied (e.g., educational achievement). Variables are properties or characteristics of the concept (e.g., performance at school), while indicators are ways of measuring or quantifying variables (e.g., yearly grade reports).
The process of turning abstract concepts into measurable variables and indicators is called operationalization .
There are various approaches to qualitative data analysis , but they all share five steps in common:
The specifics of each step depend on the focus of the analysis. Some common approaches include textual analysis , thematic analysis , and discourse analysis .
There are five common approaches to qualitative research :
Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is used by scientists to test specific predictions, called hypotheses , by calculating how likely it is that a pattern or relationship between variables could have arisen by chance.
Operationalization means turning abstract conceptual ideas into measurable observations.
For example, the concept of social anxiety isn’t directly observable, but it can be operationally defined in terms of self-rating scores, behavioral avoidance of crowded places, or physical anxiety symptoms in social situations.
Before collecting data , it’s important to consider how you will operationalize the variables that you want to measure.
When conducting research, collecting original data has significant advantages:
However, there are also some drawbacks: data collection can be time-consuming, labor-intensive and expensive. In some cases, it’s more efficient to use secondary data that has already been collected by someone else, but the data might be less reliable.
Data collection is the systematic process by which observations or measurements are gathered in research. It is used in many different contexts by academics, governments, businesses, and other organizations.
There are several methods you can use to decrease the impact of confounding variables on your research: restriction, matching, statistical control and randomization.
In restriction , you restrict your sample by only including certain subjects that have the same values of potential confounding variables.
In matching , you match each of the subjects in your treatment group with a counterpart in the comparison group. The matched subjects have the same values on any potential confounding variables, and only differ in the independent variable .
In statistical control , you include potential confounders as variables in your regression .
In randomization , you randomly assign the treatment (or independent variable) in your study to a sufficiently large number of subjects, which allows you to control for all potential confounding variables.
A confounding variable is closely related to both the independent and dependent variables in a study. An independent variable represents the supposed cause , while the dependent variable is the supposed effect . A confounding variable is a third variable that influences both the independent and dependent variables.
Failing to account for confounding variables can cause you to wrongly estimate the relationship between your independent and dependent variables.
To ensure the internal validity of your research, you must consider the impact of confounding variables. If you fail to account for them, you might over- or underestimate the causal relationship between your independent and dependent variables , or even find a causal relationship where none exists.
Yes, but including more than one of either type requires multiple research questions .
For example, if you are interested in the effect of a diet on health, you can use multiple measures of health: blood sugar, blood pressure, weight, pulse, and many more. Each of these is its own dependent variable with its own research question.
You could also choose to look at the effect of exercise levels as well as diet, or even the additional effect of the two combined. Each of these is a separate independent variable .
To ensure the internal validity of an experiment , you should only change one independent variable at a time.
No. The value of a dependent variable depends on an independent variable, so a variable cannot be both independent and dependent at the same time. It must be either the cause or the effect, not both!
You want to find out how blood sugar levels are affected by drinking diet soda and regular soda, so you conduct an experiment .
Determining cause and effect is one of the most important parts of scientific research. It’s essential to know which is the cause – the independent variable – and which is the effect – the dependent variable.
In non-probability sampling , the sample is selected based on non-random criteria, and not every member of the population has a chance of being included.
Common non-probability sampling methods include convenience sampling , voluntary response sampling, purposive sampling , snowball sampling, and quota sampling .
Probability sampling means that every member of the target population has a known chance of being included in the sample.
Probability sampling methods include simple random sampling , systematic sampling , stratified sampling , and cluster sampling .
Using careful research design and sampling procedures can help you avoid sampling bias . Oversampling can be used to correct undercoverage bias .
Some common types of sampling bias include self-selection bias , nonresponse bias , undercoverage bias , survivorship bias , pre-screening or advertising bias, and healthy user bias.
Sampling bias is a threat to external validity – it limits the generalizability of your findings to a broader group of people.
A sampling error is the difference between a population parameter and a sample statistic .
A statistic refers to measures about the sample , while a parameter refers to measures about the population .
Populations are used when a research question requires data from every member of the population. This is usually only feasible when the population is small and easily accessible.
Samples are used to make inferences about populations . Samples are easier to collect data from because they are practical, cost-effective, convenient, and manageable.
There are seven threats to external validity : selection bias , history, experimenter effect, Hawthorne effect , testing effect, aptitude-treatment and situation effect.
The two types of external validity are population validity (whether you can generalize to other groups of people) and ecological validity (whether you can generalize to other situations and settings).
The external validity of a study is the extent to which you can generalize your findings to different groups of people, situations, and measures.
Cross-sectional studies cannot establish a cause-and-effect relationship or analyze behavior over a period of time. To investigate cause and effect, you need to do a longitudinal study or an experimental study .
Cross-sectional studies are less expensive and time-consuming than many other types of study. They can provide useful insights into a population’s characteristics and identify correlations for further research.
Sometimes only cross-sectional data is available for analysis; other times your research question may only require a cross-sectional study to answer it.
Longitudinal studies can last anywhere from weeks to decades, although they tend to be at least a year long.
The 1970 British Cohort Study , which has collected data on the lives of 17,000 Brits since their births in 1970, is one well-known example of a longitudinal study .
Longitudinal studies are better to establish the correct sequence of events, identify changes over time, and provide insight into cause-and-effect relationships, but they also tend to be more expensive and time-consuming than other types of studies.
Longitudinal studies and cross-sectional studies are two different types of research design . In a cross-sectional study you collect data from a population at a specific point in time; in a longitudinal study you repeatedly collect data from the same sample over an extended period of time.
Longitudinal study | Cross-sectional study |
---|---|
observations | Observations at a in time |
Observes the multiple times | Observes (a “cross-section”) in the population |
Follows in participants over time | Provides of society at a given point |
There are eight threats to internal validity : history, maturation, instrumentation, testing, selection bias , regression to the mean, social interaction and attrition .
Internal validity is the extent to which you can be confident that a cause-and-effect relationship established in a study cannot be explained by other factors.
In mixed methods research , you use both qualitative and quantitative data collection and analysis methods to answer your research question .
The research methods you use depend on the type of data you need to answer your research question .
A confounding variable , also called a confounder or confounding factor, is a third variable in a study examining a potential cause-and-effect relationship.
A confounding variable is related to both the supposed cause and the supposed effect of the study. It can be difficult to separate the true effect of the independent variable from the effect of the confounding variable.
In your research design , it’s important to identify potential confounding variables and plan how you will reduce their impact.
Discrete and continuous variables are two types of quantitative variables :
Quantitative variables are any variables where the data represent amounts (e.g. height, weight, or age).
Categorical variables are any variables where the data represent groups. This includes rankings (e.g. finishing places in a race), classifications (e.g. brands of cereal), and binary outcomes (e.g. coin flips).
You need to know what type of variables you are working with to choose the right statistical test for your data and interpret your results .
Experimental design means planning a set of procedures to investigate a relationship between variables . To design a controlled experiment, you need:
When designing the experiment, you decide:
Experimental design is essential to the internal and external validity of your experiment.
I nternal validity is the degree of confidence that the causal relationship you are testing is not influenced by other factors or variables .
External validity is the extent to which your results can be generalized to other contexts.
The validity of your experiment depends on your experimental design .
Reliability and validity are both about how well a method measures something:
If you are doing experimental research, you also have to consider the internal and external validity of your experiment.
A sample is a subset of individuals from a larger population . Sampling means selecting the group that you will actually collect data from in your research. For example, if you are researching the opinions of students in your university, you could survey a sample of 100 students.
In statistics, sampling allows you to test a hypothesis about the characteristics of a population.
Quantitative research deals with numbers and statistics, while qualitative research deals with words and meanings.
Quantitative methods allow you to systematically measure variables and test hypotheses . Qualitative methods allow you to explore concepts and experiences in more detail.
Methodology refers to the overarching strategy and rationale of your research project . It involves studying the methods used in your field and the theories or principles behind them, in order to develop an approach that matches your objectives.
Methods are the specific tools and procedures you use to collect and analyze data (for example, experiments, surveys , and statistical tests ).
In shorter scientific papers, where the aim is to report the findings of a specific study, you might simply describe what you did in a methods section .
In a longer or more complex research project, such as a thesis or dissertation , you will probably include a methodology section , where you explain your approach to answering the research questions and cite relevant sources to support your choice of methods.
Want to contact us directly? No problem. We are always here for you.
Our team helps students graduate by offering:
Scribbr specializes in editing study-related documents . We proofread:
Scribbr’s Plagiarism Checker is powered by elements of Turnitin’s Similarity Checker , namely the plagiarism detection software and the Internet Archive and Premium Scholarly Publications content databases .
The add-on AI detector is powered by Scribbr’s proprietary software.
The Scribbr Citation Generator is developed using the open-source Citation Style Language (CSL) project and Frank Bennett’s citeproc-js . It’s the same technology used by dozens of other popular citation tools, including Mendeley and Zotero.
You can find all the citation styles and locales used in the Scribbr Citation Generator in our publicly accessible repository on Github .
Home » Dependent Variable – Definition, Types and Example
Table of Contents
Definition:
Dependent variable is a variable in a study or experiment that is being measured or observed and is affected by the independent variable. In other words, it is the variable that researchers are interested in understanding, predicting, or explaining based on the changes made to the independent variable.
Types of Dependent Variables are as follows:
Here are some examples of dependent variables in different fields:
The purpose of the dependent variable is to help researchers understand the relationship between the independent variable and the outcome they are studying. By measuring the changes in the dependent variable, researchers can determine the effects of different variables on the outcome of interest.
Following are some situations When to use Dependent Variable:
Some Characteristics of Dependent Variable are as follows:
Some Advantages of Dependent Variable are as follows:
Researcher, Academic Writer, Web developer
Discover the world's research
As you were browsing something about your browser made us think you were a bot. There are a few reasons this might happen:
To regain access, please make sure that cookies and JavaScript are enabled before reloading the page.
AIDS Research and Therapy volume 21 , Article number: 58 ( 2024 ) Cite this article
Metrics details
HIV/AIDS is one of the most dangerous diseases globally, impacting public health, economics, society, political issues, and communities. As of 2023, the World Health Organization estimates that 40.4 million people are living with HIV/AIDS. This study aimed to identify the determinants of survival time for HIV/AIDS patients in the pastoralist region of Borena at Yabelo General Hospital.
The study design was a retrospective cohort study, with a sample size of 293 individuals living with HIV/AIDS, based on recorded data. This research utilized survival model analysis, employing Kaplan-Meier plots, the log-rank test, and Cox proportional hazard model analysis.
Out of the total sample size, 179 (61.1%) were female and 114 (38.1%) were male. Among these males, 36 (31.6%) were deceased. The analysis using the Cox proportional hazard model revealed that the following variables were significantly associated with the survival time of HIV/AIDS patients: gender, educational status, area of residence, tuberculosis (TB), and opportunistic infections.
We concluded that individuals living with HIV/AIDS in urban areas have a lower risk of death compared to those in rural areas, indicating that rural residents have a reduced survival probability. Therefore, the Borena zone administration should focus on adult patients to enhance life expectancy.
The human immunodeficiency virus (HIV) is the world’s most critical public health issue. According to estimates from the World Health Organization, approximately 40.4 million people were living with HIV by mid-2023. In the African region, an estimated 25.6 million individuals had HIV by that time, as reported by the WHO. In 2022, over 20.9 million people received antiretroviral treatment. That same year, an estimated 660,000 individuals acquired HIV, and by mid-2023, the rate of new HIV infections across all ages had decreased to 0.57 per 1,000, although the uninfected population had declined from 1.75 in 2010 [ 1 ].
Survival patterns among African communities following HIV infection before the introduction of ART served as an initial benchmark for assessing the future viability of intervention initiatives [ 2 ]. Since the advent of antiretroviral therapy (ART), HIV infection has transitioned from a severe condition to a chronic illness [ 3 ]. In Ethiopia, current estimates indicate a slight decline in PLWH, from 610,350 in 2022 to 603,537 in 2023. Reported prevalence shows that the number of PLHIV in the Oromia region gradually decreased, from 158,152 in 2022 to 156,184 in 2023 [ 4 ].
The Borena community pastoralists have long existed under the Gada society’s cultural, social, community, and political organization, led by the Abba Gada or elders of Borena. Following 1950, the modern education system in Borena began, but the Gada system’s structure has been in place since around the 14th century, resulting in a lack of contemporary education. According to a report from the Ethiopia Public Health Institute [ 5 ], 2,600 adult Borena individuals are living with HIV infection, indicating that many pastoralists remain unaware of disease transmission. This vulnerability to the disease is prevalent throughout all areas of the Borena pastoralist community. Consequently, numerous individuals have been infected, primarily due to insufficient protective measures and insufficient education.
In addition, concurrent extramarital sexual activities, polygamy, and marrying a deceased wife’s sister have been identified as risk factors for HIV infection. Although not extensively documented, the practice of maintaining extramarital sexual partners by both men and women, widow inheritance, and polygamy appears to have decreased, although it continues to occur in secret [ 6 , 7 ]. Despite the lack of studies on vulnerability within the Borana population, a few behavioral and biological studies indicate a very high HIV prevalence in the region compared to similar contexts [ 8 , 9 ]. The researcher aimed to determine the survival time for HIV/AIDS patients in the pastoralist region of Borena at Yabelo General Hospital from January 2016 to December 2019. The results will provide information about the determinants of survival time for people living with HIV/AIDS in the pastoralist region of Borena.
The study was conducted at Yabelo General Hospital, situated in Yabelo town, Borena Zone. This zone is one of twenty-one zones in the Oromia Region. In 2010, the hospital was upgraded from a Health Center to a general hospital. It provides various services to the residents of Borena Zone and other Ethiopian ethnic groups. Currently, the zone comprises ten rural pastoralist woredas and one town administration, Yabelo, which has a state function. The zone is located in the southern part of the Oromia region. It shares borders with the West Guji Zone to the north, the South Nations, Nationalities, and Peoples region to the west, the Somali region to the southeast, and an international boundary with Kenya to the south (as shown in the geographical map below, Fig. 1 ).
Map of Borena zone
According to the 2023 report from the Borena Zone Administration Office, over 1.4 million people reside in the zone, with a male-to-female ratio of 1:1. This suggests significant variation in settlement patterns from district to district. Approximately 89% of the population inhabits the rural pastoralist areas of the zone [ 10 ]. The Borana Zone is one of the most pastoralist regions in Ethiopia, primarily relying on livestock rearing. The livestock population in Borena includes 1,482,053 goats, 1,179,645 sheep, 637,632 horses, 2,222 mules, 5,525 donkeys, 68,799 camels, and 185,382 cattle [ 11 ].
The study is a retrospective cohort analysis, indicating that all events and exposures detailed in the review subjects’ patient cards and information sheets occurred in the past. All individuals diagnosed with HIV at Yabelo General Hospital and receiving ART were included in the study at regular intervals. Based on the inclusion and exclusion criteria, 293 adult HIV/AIDS patients were selected from their medical records. Participants in this study were HIV-positive individuals receiving follow-up antiretroviral therapy during the study intervals. This study encompassed all adult HIV-positive patients who visited the hospital for treatment three or more times, as well as adult HIV/AIDS patients who initiated treatment between January 2016 and December 2019. According to hospital records, 1,147 HIV patients underwent ART treatment and were assessed for baseline CD4 count cells during the study periods.
The researcher was able to obtain statistically significant results by employing the formula for calculating the required sample size [ 12 ]. According to [ 13 ], the sample size was determined by analyzing the mortality rates in two groups of HIV-positive individuals on ART, categorized by their WHO clinical stage as exposure status. Consequently, the sample size for this current study has reached 293 HIV/AIDS-positive subjects, taking into account the inclusion criteria (further calculations are available in the supplementary material).
The outcome variable for survival analysis is the survival time and/or time to death of patients under follow-up among HIV-infected adults. The predictors included in this study were gender, age, marital status, educational status, place of residence, WHO stages, TB, adherence to ART treatment, functional status, family history, and opportunistic infectious diseases.
These are the clinical stages of patients based on CD4 values, classified into four stages: stage I, stage II, stage III, and stage IV.
Individuals with HIV and weakened immune systems are at a higher risk of contracting tuberculosis compared to those with typical immune systems.
This refers to the previous occurrences of HIV/AIDS disease or past incidences among family members.
These are infections that occur more frequently and are more severe in individuals with declining immune systems.
Working: able to perform usual work in or out of the house; Ambulatory: able to carry out activities of daily living; Bedridden: unable to perform activities of daily living [ 14 ].
Adherence was categorized as good if patients adhered to at least 95% of the prescribed medication, fair if they adhered between 85% and 95%, and poor if they adhered to less than 85% of the prescribed medication [ 15 ].
The analysis was conducted using R software version 4.3.1. It includes descriptive statistics of variables, the Kaplan-Meier method, the log-rank test, and the Cox proportional hazards model for the time-to-event data from the survival datasets.
Survival analysis is a branch of statistics that investigates the anticipated duration until one or more events take place [ 16 ]. This data shows that not all patients experience the event by the conclusion of the observation period; thus, the actual survival times for some individuals living with HIV/AIDS remain unknown, a phenomenon referred to as censoring, which must be accounted for in the study to yield meaningful results [ 17 , 18 ].
The Kaplan-Meier estimator [ 19 ] provides a non-parametric maximum likelihood estimate of the survival function.
The basic model for survival analysis is investigated under the Cox proportional hazard model, a model originated by Cox [ 16 ]. In a model, the unique effect of a unit increase in a covariate is multiplicative in terms of the hazard rate. Its covariates can be time-independent. This model implies that the hazard function \(\:{\lambda\:}_{\:}\) (t, X,) \(\:\beta\:\) is connected to the covariates as a product of a baseline hazard \(\:{{\lambda\:}}_{0}\left(\text{t}\right)\) and a function of covariates.
In this study, records of 293 individuals living with HIV/AIDS were included; of this total, 179 (61.1%) were female. Among these females, 33 (18.4%) had died, while the others were censored. Among the male patients, 36 (31.6%) were deceased. Of the total samples, 83 (28.3%) were related to tuberculosis. Among the tuberculosis (TB) patients, 34 (41.0%) died, whereas 35 (16.7%) of the non-tuberculosis patients died. Regarding functional status, 221 (75.4%) of the patients were working, 27 (9.2%) were bedridden, and 45 (15.4%) were ambulatory. Among those who were working, 50 (22.6%) patients died.
In the baseline test results, 201 (68.6%) of the patients had no family members related to this disease (none related to HIV/AIDS previously), while the remaining 92 (31.4%) were suffering from opportunistic infections of another disease, with 35 (38.0%) of these patients having died from their opportunistic infections (Table 1 ).
Comparison of survival grouped data.
The survival data for these studies consists of baseline information extracted from the entire sample patient set. The significant difference in group variables was determined using Kaplan-Meier plots and a log-rank test. Figure 2 below illustrates a significant difference between the categorical groups, as shown in the Kaplan-Meier plot. Female patients had slightly higher survival rates than males from the beginning to the end. Based on place of residence, patients from urban areas exhibited a higher survival probability than those from rural areas regarding survival time. The log-rank test for these variables indicates a statistically significant difference between patients from urban areas and those from rural areas (Supplementary Table 1 ).
When comparing the different educational statuses of patients, a Kaplan-Meier plot for this variable is presented in Fig. 2 . It is evident that there is no significant difference between the groups in the plot. In comparing the categories, primary and secondary education displayed similar patterns, while not formally educated and tertiary groups also showed similar trends, though not statistically supported. A statistical test using the log-rank method reveals a statistically significant difference ( P = 0.02) among not formally educated, primary, secondary, and tertiary groups concerning survival time in months.
Among tuberculosis (TB) patients, the Kaplan-Meier estimate plot indicates that individuals living with HIV/AIDS who did not have TB were more likely to survive than those who had TB, in terms of survival time in months. The log-rank test for these variables also demonstrates a statistically significant difference between patients with TB and those without (Supplementary Table 1 ).
Kaplan-Meier plots of different categorical variables
The results of the covariates and the global test for the proportionality assumption of the Cox proportional hazards model are presented. The p-values for the covariate terms and the global test are insignificant at the 5% level, indicating that the proportional hazards assumptions are not violated. In the Schoenfeld residual plot, no patterns are observed between the variables and time. The assumption of proportional hazards has been satisfied for both methods (Supplementary Tables 2 and Supplementary Fig. 1 ).
Variables such as gender, educational status, place of residence, tuberculosis, family history, and opportunistic infections were significantly associated with the survival time of adults living with HIV/AIDS undergoing ART treatment at the 5% level of significance. According to the adjusted hazard ratio, male HIV-infected patients were 1.69 times more likely to die than their female counterparts (HR = 1.69, p-value = 0.036). This indicates that male patients faced a 69% higher risk of experiencing an event compared to female patients (Table 2 ).
It has been estimated that patients educated at the secondary level have a hazard rate of 0.31, indicating a 0.31-fold lower risk of death compared to non-formally educated patients (HR = 0.31, p-value = 0.028). There was a 1.72 times greater mortality risk for HIV-infected adults with TB compared to those without TB. The results indicate that 72% of TB patients face an increased risk of death compared to those without TB.
Regarding the family history of HIV patients, families with a history of the disease were at 1.66 times higher risk of death than those without a family history of HIV/AIDS (HR = 1.66, p-value = 0.047). Concerning opportunistic infections, patients with a risk of opportunistic infections had a 2.30 times higher risk of death than patients without such a risk (HR = 2.30, p-value = 0.002). However, marital status and WHO stages do not significantly affect the survival time to death in HIV patients.
This study aimed to identify factors affecting the survival time of adult HIV/AIDS patients in the pastoralist area of Borena at Yabelo General Hospital from January 2016 to December 2019. In the current study, the gender variable is significantly associated with survival time until death, consistent with several other studies [ 20 , 21 , 22 ]. The mortality risk for adult male patients was higher than that for adult female patients, suggesting that female patients are more likely to know their HIV status at an earlier stage and to start ART with higher CD4 counts than males [ 20 ]. According to other studies, gender status was not associated with survival time until HIV/AIDS-related risks [ 23 , 24 , 25 , 26 ].
The findings of this study revealed that individuals living with HIV/AIDS who had a secondary educational status had a lower hazard ratio of death than those with no formal education. Various studies supported the notion that secondary educational status was linked to a lower risk of mortality among HIV-infected antiretroviral therapy users, indicating significant effects on the survival time of adult patients [ 25 , 27 , 28 , 29 , 30 , 31 ].
A patient living in urban areas has a 0.46 times lower death rate than a patient living in rural areas, indicating that patients from urban areas are more likely to survive than those in rural regions. Similarly, the study at Debre Tabor Referral Hospital suggests that patients in urban areas had significantly higher survival rates compared to those from rural areas [ 32 ]. In a study examining the impact of the “universal test and treat” program on HIV treatment in the Gurage Zone, it was found that rural patients had significantly better survival rates than urban patients [ 33 ]. Possible reasons include better drug adherence, improved access to services, closer proximity to health centers, superior care provided, and varying levels of knowledge.
According to the findings of this study, patients with tuberculosis (TB) and HIV faced 1.72 times the risk of dying from the disease compared to patients without TB. Therefore, patients without coinfection diseases have a better survival rate than those with them. A similar study conducted at Goba Hospital in Southeast Ethiopia found that TB coinfection at the start of ART was strongly associated with increased mortality risks among ART patients [ 26 , 33 ]. However, other study results did not demonstrate any association between baseline TB infection and the death hazard rate [ 23 ].
People living with HIV/AIDS who have opportunistic infections are linked to an increase in HIV-infected patients, according to our study. It has been estimated that patients with opportunistic infections alongside other diseases face a higher risk of death compared to those without such infections. Various studies support the notion that opportunistic infections are significantly associated with the survival and mortality of HIV-infected patients [ 23 , 25 ].
The main objective of this study was to determine the survival time for HIV/AIDS patients in the pastoralist region of Borena at Yabelo General Hospital from January 2016 to December 2019. In this study, a total of 293 adults living with HIV/AIDS were analyzed. According to the Cox-PH model, covariates such as gender, educational status, place of residence, TB, family history, and opportunistic infections were identified as factors affecting the survival time of HIV-infected individuals. Patients residing in urban areas have a lower risk of death than those living in rural areas, indicating that rural patients have a lower survival probability compared to their urban counterparts. Therefore, the Borena zone administration should pay special attention to adult patients to enhance life expectancy.
after getting acceptance.
Ethiopian Public Health Institute
Proportional Hazard
Tuberculosis
Joint United Nations Programme on HIV/AIDS
World Health Organization
The path that ends AIDS: UNAIDS Global AIDS Update. 2023. Geneva: Joint United Nations Programme on HIV/AIDS, 2023. License: CC BY-NC-SA 3.0 IGO.
US Department of Health and Human Services. Guidelines for the use of antiretroviral agents in HIV-1-infected adults and adolescents. http://aidsinfo.nih.gov/OrderPublication/OrderPubsBrowseSearchResultsTable . aspx? 2009, ID = 115.
EPHI. (2023). HIV-Related Estimates and Projections in Ethiopia for the Year 2022–2023, Addis Ababa.
The Ethiopia Public Health Institute. (EPHI, 2023). HIV Related Estimates and Projections in Ethiopia for the Year 2022–2023. May 2023, Addis Ababa. https://ephi.gov.et/wp-content/uploads/2021/02/HIV-Estimates-and-projection-for-the-year-2022-and-2023.pdf
Mirgissa K, Ibrahim A, Damen HM. Extramarital sexual practices and perceived association with HIV infection among the Borana pastoral community. Ethiop J Health Dev. 2013;27(1):25–32.
Google Scholar
Miz-Hasab Research Centre. HIV/AIDS and gender in Ethiopia: the case of ten Weredas in Oromia and Southern Nations and Nationalities people’s region. Addis Ababa: Miz-Hasab research center; 2004.
Tefera B, Ahmed Y. Contribution of the anti HIV/AIDS community conversation programs in preventing and controlling the spread of HIV/ AIDS. Ethiop J Health Dev. 2013;27(3):216–29.
Mela Research. Know Your HIV Epidemic/Know Your HIV Response (KYE/KYR) Synthesis in Oromia, Ethiopia. Addis Ababa, Ethiopia; 2014.
Collett D. Modeling survival data in medical research. Chapman and Hall/CRC; 2023.
Borena. (2023). Borena zone administration office report on the population severe drought effects in 2023.unpublished document.
Fenetahun Y, Fentahun T. Socio-economic profile of arid and semi-arid agro-pastoral region of Borana rangeland Southern Ethiopia. MOJ Eco Environ Sci. 2020;5(3):113–22.
Gebrerufael GG, Asfaw ZG, Chekole DM. The effect of longitudinal body weight and CD4 cell progression for the survival of HIV/AIDS patients. Cogent Med. 2021;8(1):1986269.
Article Google Scholar
Cox DR. Regression models and life-tables. J Roy Stat Soc: Ser B (Methodol). 1972;34(2):187–202.
Tsegaye E, Worku A. Assessment of antiretroviral treatment outcome in public hospitals, South nations Nationalities and Peoples Region, Ethiopia. Ethiop J Health Dev. 2011;25:102–9.
Abbastabar H, Rezaianzadeh A, Rajaeefard A, Ghaem H, Motamedifar M, Kazeroon PA. 2016. Determining factors of CD4 cell count in HIV patients: in a historical cohort study. International Journal of Life Science and Pharma Research, 2016, 93–101.
Schober P, Vetter TR. Survival analysis and interpretation of time-to-event data: the tortoise and the hare. Anesth Analgesia. 2018;127(3):792–8.
George B, Seals S, Aban I. Survival analysis and regression models. J Nuclear Cardiol. 2014;21(4):686–94.
Kaplan EL, Meier P. Nonparametric estimation from incomplete observations. J Am Stat Assoc. 1958;53(282):457–81.
Mageda K, Leyna GH, Mmbaga EJ. High initial HIV/AIDS-Related mortality and predictors among patients on antiretroviral therapy in the Kagera Region of Tanzania: a five-year retrospective cohort study. AIDS Res Treat. 2012;2012(1):843598.
PubMed PubMed Central Google Scholar
Zheng H, Wang L, Huang P, Norris J, Wang Q, Guo W, Peng Z, Yu R, Wang N. Incidence and risk factors for AIDS-related mortality in HIV patients in China: a cross-sectional study. BMC Public Health. 2014;14:1–9.
Article CAS Google Scholar
Mengesha S, Belayihun B, Kumie A. Predictors of survival in HIV-infected patients after initiation of HAART in Zewditu Memorial Hospital, Addis Ababa, Ethiopia. Int Sch Res Notices. 2014;2014(1):250913.
Seyoum D, Degryse JM, Kifle YG, Taye A, Tadesse M, Birlie B, Banbeta A, Rosas-Aguirre A, Duchateau L, Speybroeck N. Risk factors for mortality among adult HIV/AIDS patients following antiretroviral therapy in Southwestern Ethiopia: an assessment through survival models. Int J Environ Res Public Health. 2017;14(3):296.
Article PubMed PubMed Central Google Scholar
Tegegne AS, Ndlovu P, Zewotir T. Determinants of CD4 cell count change and time-to default from HAART; a comparison of separate and joint models. BMC Infect Dis. 2018;18:1–1.
Setegn T, Takele A, Gizaw T, Nigatu D, Haile D. Predictors of mortality among adult antiretroviral therapy users in southeastern Ethiopia: retrospective cohort study. AIDS Res Treat. 2015;2015(1):148769.
Hassan AS, Mwaringa SM, Ndirangu KK, Sanders EJ, de Wit TF, Berkley JA. Incidence and predictors of attrition from antiretroviral care among adults in a rural HIV clinic in Coastal Kenya: a retrospective cohort study. BMC Public Health. 2015;15:1–9.
Tadesse K, Haile F, Hiruy N. Predictors of mortality among patients enrolled on antiretroviral therapy in Aksum hospital, northern Ethiopia: a retrospective cohort study. PLoS ONE. 2014;9(1):e87392.
Bello SI, Itiola OA. Drug adherence amongst tuberculosis patients in the University of Ilorin Teaching Hospital, Ilorin, Nigeria. Afr J Pharm Pharmacol. 2010;4(3):109–14.
Jarrin I, Lumbreras B, Ferrero I, Pérez-Hoyos S, Hurtado I, Hernández-Aguado I. Effect of education on overall and cause-specific mortality in injecting drug users, according to HIV and introduction of HAART. Int J Epidemiol. 2007;36(1):187–94.
Article CAS PubMed Google Scholar
Seid A, Getie M, Birlie B, Getachew Y. Joint modeling of longitudinal CD4 cell counts and time-to-default from HAART treatment: a comparison of separate and joint models. Electron J Appl Stat Anal. 2014;7(2):292–314.
Kebede MM, Zegeye DT, Zeleke BM. Predictors of CD4 count changes after initiation of antiretroviral treatment in University of Gondar Hospital, Gondar in Ethiopia. Clin Res HIV/AIDS. 2015;1(2):1–5.
Birhan H, Seyoum A, Derebe K, Muche S, Wale M, Sisay S. Joint clinical and socio-demographic determinants of CD4 cell count and body weight in HIV/TB co-infected adult patients on HAART. Sci Afr. 2022;18:e01396.
Girum T, Yasin F, Wasie A, Shumbej T, Bekele F, Zeleke B. The effect of the universal test and treat program on HIV treatment outcomes and patient survival among a cohort of adults taking antiretroviral treatment (ART) in low-income settings of Gurage Zone, South Ethiopia. AIDS Res Therapy. 2020;17:1–9.
Ayalew J, Moges H, Worku A. Identifying factors related to the survival of AIDS patients under the follow-up of antiretroviral therapy (ART): the case of South Wollo. Int J Data Envelopment Anal Oper Res. 2014;1:21–7.
Download references
First and foremost, I would like to thank the almighty God for being there with me in every step of my life. Next, I would like to express my grateful and sincere gratitude to my principal advisor Dr. Markos Abiso (PhD).
The authors received no specific funding for this work.
Authors and affiliations.
Borena Zone Labour and Social Affairs Office, Borena, Oromia, Ethiopia
Galgalo Jaba Nura
Department of Economics, Borena University, Borena, Ethiopia
Kumbi Sara Wario
Department of Statistics, Arba Minch University, Arba Minch, Ethiopia
Markos Abiso Erango
You can also search for this author in PubMed Google Scholar
Conceptualization: Galgalo Jaba Nura, Markos Abiso Erango.Data curation: Galgalo Jaba Nura, Kumbi Sara Wario. Formal analysis: Galgalo Jaba Nura. Investigation: Galgalo Jaba Nura, Kumbi Sara Wario, Markos Abiso Erango.Methodology: Galgalo Jaba Nura, Markos Abiso Erango.Project administration: Markos Abiso Erango.Software: Galgalo Jaba Nura, Markos Abiso Erango. Supervision: Markos Abiso Erango.Validation: Markos Abiso Erango. Writing – original draft: Kumbi Sara Wario, Markos Abiso Erango. Writing – review & editing: Kumbi Sara Wario, Markos Abiso Erango.
Correspondence to Galgalo Jaba Nura .
Competing interests.
The authors declare no competing interests.
Publisher’s note.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Below is the link to the electronic supplementary material.
Supplementary material 2, rights and permissions.
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/ .
Reprints and permissions
Cite this article.
Nura, G.J., Wario, K.S. & Erango, M.A. Determinants of survival time for HIV/AIDS patients in the pastoralist region of Borena: a study at Yabelo General Hospital, South East Ethiopia. AIDS Res Ther 21 , 58 (2024). https://doi.org/10.1186/s12981-024-00644-1
Download citation
Received : 03 July 2024
Accepted : 09 August 2024
Published : 28 August 2024
DOI : https://doi.org/10.1186/s12981-024-00644-1
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
ISSN: 1742-6405
IMAGES
VIDEO
COMMENTS
It is a variable that comes in between the independent and dependent variables and is affected by the independent variable, which then affects the dependent variable. For example, in a study on the relationship between exercise and weight loss, the mediating variable could be metabolism, as exercise can increase metabolism, which can then lead ...
The independent variable is the cause. Its value is independent of other variables in your study. The dependent variable is the effect. Its value depends on changes in the independent variable. Example: Independent and dependent variables. You design a study to test whether changes in room temperature have an effect on math test scores.
Types of Variables in Research & Statistics | Examples. Published on September 19, 2022 by Rebecca Bevans. Revised on June 21, 2023. In statistical research, a variable is defined as an attribute of an object of study. Choosing which variables to measure is central to good experimental design.
By maintaining consistent control variables, researchers can isolate the effects of the independent variable on the dependent variable, strengthening the validity of the study. Example: In the plant growth study, the researcher might control variables such as soil type, temperature, and water supply to ensure that the observed effects on plant ...
In statistical research, a variable is defined as an attribute of an object of study. Choosing which variables to measure is central to good experimental design. Example: Variables If you want to test whether some plant species are more salt-tolerant than others, ...
Variables can be categorized based on their role in the study (such as independent and dependent variables), the type of data they represent (quantitative or categorical), and their relationship to other variables (like confounding or control variables). Understanding what constitutes a variable and the various variable types available is a ...
What is a control variable? In an experimental design, a control variable (or controlled variable) is a variable that is intentionally held constant to ensure it doesn't have an influence on any other variables. As a result, this variable remains unchanged throughout the course of the study. In other words, it's a variable that's not allowed to vary - tough life 🙂
The independent variable is the cause and the dependent variable is the effect, that is, independent variables influence dependent variables. In research, a dependent variable is the outcome of interest of the study and the independent variable is the factor that may influence the outcome. Let's explain this with an independent and dependent ...
Don't feel bad if you are confused about what is the dependent variable and what is the independent variable in social and behavioral sciences research. However, it's important that you learn the difference because framing a study using these variables is a common approach to organizing the elements of a social sciences research study in order ...
Variables in Research. The definition of a variable in the context of a research study is some feature with the potential to change, typically one that may influence or reflect a relationship or ...
An independent variable is a variable believed to affect the dependent variable. Confounding variables are defined as interference caused by another variable. Read Variables in Research ...
Suitable statistical design represents a critical factor in permitting inferences from any research or scientific study.[1] Numerous statistical designs are implementable due to the advancement of software available for extensive data analysis.[1] Healthcare providers must possess some statistical knowledge to interpret new studies and provide up-to-date patient care. We present an overview of ...
18. Predictor Variables. Definition: A predictor variable—also known as independent or explanatory variable—is a variable that is being manipulated in an experiment or study to see how it influences the dependent or response variable. Explanation: In a cause-and-effect relationship, the predictor variable is the cause.
Variables. What is a variable?[1,2] To put it in very simple terms, a variable is an entity whose value varies.A variable is an essential component of any statistical data. It is a feature of a member of a given sample or population, which is unique, and can differ in quantity or quantity from another member of the same sample or population.
Just as with other types of research, the independent variable in a cognitive psychology study would be the variable that the researchers manipulate. The specific independent variable would vary depending on the specific study, but it might be focused on some aspect of thinking, memory, attention, language, or decision-making.
Quantitative Variables. Quantitative variables, also called numeric variables, are those variables that are measured in terms of numbers. A simple example of a quantitative variable is a person's age. Age can take on different values because a person can be 20 years old, 35 years old, and so on.
The independent variable, often denoted as X, is the variable that is manipulated or controlled by the researcher intentionally. It's the factor that researchers believe may have a causal effect on the dependent variable. In simpler terms, the independent variable is the variable you change or vary in an experiment so you can observe its impact ...
A confounding variable is closely related to both the independent and dependent variables in a study. An independent variable represents the supposed cause, while the dependent variable is the supposed effect. A confounding variable is a third variable that influences both the independent and dependent variables.
Types. Discrete and continuous. Binary, nominal and ordinal. Researchers can further categorize quantitative variables into discrete or continuous types of variables: Discrete: Any numerical variables you can realistically count, such as the coins in your wallet or the money in your savings account.
The number of hours the student studies is the independent variable because nothing directly affects the number of study hours. The grade the student earns in the class is the dependent variable because how much time the student commits to preparing can affect the grade. Related: 23 Research Databases for Professional and Academic Use.
Dependent variable is a variable in a study or experiment that is being measured or observed and is affected by the independent variable. In other words, it is the variable that researchers are interested in understanding, predicting, or explaining based on the changes made to the independent variable.
The study revealed that as assessed by the sports officiating officials when they are grouped according to the study's variables, the results show a "high level" in all areas.Furthermore, the ...
An independent variable is a condition in a research study that causes an effect on a dependent variable. In research, scientists try to understand cause-and-effect relationships between two or more conditions. To identify how specific conditions affect others, researchers define independent and dependent variables.
In most studies, the research question is written so that it outlines various aspects of the study, including the population and variables to be studied and the problem the study addresses. 5 When collecting information from respondents in a case study, the researcher can use various question formats to gather the data they need.
Down syndrome is associated with a range of developmental strengths and challenges. The treatment use of individuals with Down syndrome along with associated factors have not yet been determined. In a pilot study to address this issue, we elected to conduct an online survey rather than a classical representative population survey to generate relevant information quickly. An online survey was ...
Variables of the study The outcome variable for survival analysis is the survival time and/or time to death of patients under follow-up among HIV-infected adults. The predictors included in this study were gender, age, marital status, educational status, place of residence, WHO stages, TB, adherence to ART treatment, functional status, family ...