• PRO Courses Guides New Tech Help Pro Expert Videos About wikiHow Pro Upgrade Sign In
  • EDIT Edit this Article
  • EXPLORE Tech Help Pro About Us Random Article Quizzes Request a New Article Community Dashboard This Or That Game Happiness Hub Popular Categories Arts and Entertainment Artwork Books Movies Computers and Electronics Computers Phone Skills Technology Hacks Health Men's Health Mental Health Women's Health Relationships Dating Love Relationship Issues Hobbies and Crafts Crafts Drawing Games Education & Communication Communication Skills Personal Development Studying Personal Care and Style Fashion Hair Care Personal Hygiene Youth Personal Care School Stuff Dating All Categories Arts and Entertainment Finance and Business Home and Garden Relationship Quizzes Cars & Other Vehicles Food and Entertaining Personal Care and Style Sports and Fitness Computers and Electronics Health Pets and Animals Travel Education & Communication Hobbies and Crafts Philosophy and Religion Work World Family Life Holidays and Traditions Relationships Youth
  • Browse Articles
  • Learn Something New
  • Quizzes Hot
  • Happiness Hub
  • This Or That Game
  • Train Your Brain
  • Explore More
  • Support wikiHow
  • About wikiHow
  • Log in / Sign up
  • Education and Communications
  • College University and Postgraduate
  • Academic Writing
  • Research Papers

How to Quote in a Research Paper

Last Updated: September 30, 2022 Fact Checked

This article was co-authored by Christopher Taylor, PhD . Christopher Taylor is an Adjunct Assistant Professor of English at Austin Community College in Texas. He received his PhD in English Literature and Medieval Studies from the University of Texas at Austin in 2014. There are 16 references cited in this article, which can be found at the bottom of the page. This article has been fact-checked, ensuring the accuracy of any cited facts and confirming the authority of its sources. This article has been viewed 910,195 times.

A research paper can be made stronger through the use of quotations. You may use quotes when you need to cite a key piece of primary source material, strengthen your argument through another writer's work, or highlight a term of art. It is important to both use quotations effectively and cite them properly to write an effective paper and avoid plagiarizing.

research paper intro quote

Using Different Types of Quotes

Step 1 Understand how to use dropped quotes.

  • Use a complete sentence to incorporate a dropped quote. Ex: As Rembrandt’s skill developed, he began painting landscapes that are “romantic and visionary” (Wallace 96).
  • Use a short phrase to incorporate a dropped quote: Rembrandt’s landscapes are “romantic and visionary” (Wallace 96).

Step 2 Understand how to use full sentence quotes.

  • Use a complete sentence to introduce a full sentence quote. Ex: Over the course of time Rembrandt’s work began to change and focus on different themes, but as Wallace points out: "Rembrandt’s great gift as an etcher lay in preserving a sense of spontaneity while scrupulously attending to close detail” (142).
  • Use a signal phrase to introduce your full sentence quote. Ex: As Wallace states, “Rembrandt’s great gift as an etcher lay in preserving a sense of spontaneity while scrupulously attending to close detail” (142).

Step 3 Understand how to use block quotes.

  • Introduce your block quote with a colon. Ex: According to Wallace: (add a line break here, and then indent the entire quote).
  • Block quotes do not use quotation marks. You have already stated who the author is/what is being referred to in the introduction sentence. Add the in-text parenthetical citation after the period at the end of the quote, though.
  • If your block quote is inside a paragraph, you don’t have to start a new paragraph at the end of it. Simply add another line break and begin writing along the left margin (with no indent). [4] X Research source However, you will need to indent the second paragraph by an extra 0.25 in (0.64 cm) if you are citing more than 1 paragraph. [5] X Trustworthy Source Purdue Online Writing Lab Trusted resource for writing and citation guidelines Go to source

Step 4 Understand how to use indirect quotes.

  • Change the structure of the sentence by moving clauses around. Aim to change at least half of the sentence into a new structure, but also make sure that the grammar is correct and the meaning of the sentence is still clear. You can use a thesaurus to exchange words with synonyms.
  • Paraphrasing should only be done if you are certain that you understand the content you are copying. If you are unclear as to the meaning of the quote, you won’t be able to put it adequately into your own words.
  • When you write your paraphrase, don’t look at the quote. Keep the meaning in your head and create a new sentence to match. [7] X Research source

Formatting Your Quotes

Step 1 Know where to place commas and periods.

  • To use a comma, you might structure the quote with in sentence like this: “Yogurt provides beneficial bacteria to your gut,” so it is good to include 1 serving per day in your diet.
  • To use a period, you might structure the quote like this: “Carrots are a valuable source of vitamin A.”

Step 2 Know where to place exclamation points and question marks.

  • Example of a quotation that comes with a question mark: Alice said “but where will I go?” (24).
  • Example of asking a question about a quotation: With so much contention, will literary scholars ever agree on “the dream-like quality of Alice’s adventure” (39)?
  • Example of a question about a quoted question: At this point in the story, readers communally ask “but where will I go?” (24).

Step 3 Use ellipses correctly.

  • Ellipses can be used in the center of a quote to leave out words that you feel add unnecessary length to the statement without adding value. For example: As the man stated, “reading the book was...enlightening and life-changing.” This is done rather than: As the man stated, “reading the book over the last few weeks was not only incredibly enjoyable, but also enlightening and life-changing.”
  • Ellipses should be used only before or after a quote, not both. If you are only use a part of a quote from the center of a selection, it is just a partial or dropped quote. However, keep in mind that ellipses rarely come at the beginning of a quotation. [11] X Research source

Step 4 Use brackets correctly.

  • For example: As scholars have noted, “Rembrandt’s portrait of her [Henrickje, his mistress] was both accurate and emotion-filled” (Wallace 49).

Step 5 Use colons and semicolons correctly.

  • Ex: As Dormer has noted, “his work is much more valuable now then [sic] it was at the time of its creation.”

Quoting in Different Styles

Step 1 Quote in MLA format.

  • Ex: We can therefore ascertain that “Rembrandt’s decline in popularity may have been his dedication to Biblical painting” (Wallace 112).
  • Ex: According to some, “another reason for Rembrandt’s decline in popularity may have been his dedication to Biblical painting” (Wallace 112), but not everyone agree on this matter.
  • Ex: Wallace states that “another reason for Rembrandt’s decline in popularity may have been his dedication to Biblical painting” (112). [15] X Research source

Step 2 Quote in APA format.

  • Ex: As Billy’s character is described, we learn “Billy wasn’t a Catholic, even though he grew up with a ghastly crucifix on his wall” (Vonnegut 1969).
  • Ex: Vonnegut gives a factual statement with a clear opinion thrown in when he says “Billy wasn’t a Catholic, even though he grew up with a ghastly crucifix on his wall” (1969).
  • Ex: With the knowledge that “Billy wasn’t a Catholic, even though he grew up with a ghastly crucifix on his wall” (Vonnegut 1969), we begin to understand his philosophical standings.

Step 3 Quote in Chicago style.

Quoting Successfully

Step 1 Choose the quotations you want to use in the paper with care.

Community Q&A

Community Answer

  • Keep a list of quotations as you take research notes, and star your favorites to return later. Thanks Helpful 0 Not Helpful 0
  • Watch for quotations that are quoted by other researchers again and again. Often secondary material will give you hints to finding the best parts of the primary sources. Thanks Helpful 0 Not Helpful 0
  • Quote the opposition so that you can directly pick apart their argument. It's easier to argue against someone if you're using exactly what they said and pointing out its flaws. Otherwise, the opposition can claim that you simply twisted their meaning. Rely on their words and attack directly. Thanks Helpful 0 Not Helpful 0

research paper intro quote

  • Don't let a research paper become a sea of he-said, she-said. While you want to set up the arguments that have been made on both sides in the past, you also want to make a compelling argument for yourself. Rephrasing, re-organizing an argument, and synthesizing different arguments in your own words makes it clear that you understand what you've researched and makes the paper interesting to read. The reader is searching for a new way to understand the research or a new idea. Too many quotes tend to bury the lead. Thanks Helpful 1 Not Helpful 0
  • Don't rely too heavily on one source. It's easy to fall in love with a single book when doing research, particularly if there aren't a lot of books on the subject and one author particularly agrees with you. Try to limit how much you quote that author, particularly if a lot of your argument is relying on his or her groundwork already. Look for quotations that complement or challenge that person, and provide your own analysis. Thanks Helpful 0 Not Helpful 0
  • Don't be a sloppy note-taker. Unfortunately, accidental plagiarism is all too common, and it has serious consequences. You may not have meant to plagiarize, but if you write someone else's words down without indicating that you are using a direct quotation, you are plagiarizing whether it was intentional or not (after all, merely relying on lecture notes and not on your own research is lazy and not acknowledging direct quotes as you take notes from texts reflects poor organization). Always indicate quotations in your notes. It's also better to write down a lot of quotations and then paraphrase them later than to write down a paraphrased version. The danger here, particularly if you don't alter the quote much, is that you'll unwittingly change it back to the quotation later, in revision. It's better to have the original right in front of you. If you find yourself unable to choose better language, just quote it properly. Thanks Helpful 0 Not Helpful 0

You Might Also Like

Write a Research Paper

  • ↑ https://midway.libguides.com/c.php?g=1100261&p=8025172
  • ↑ https://facultyweb.ivcc.edu/rrambo/eng1001/quotes.htm
  • ↑ https://owl.purdue.edu/owl/research_and_citation/mla_style/mla_formatting_and_style_guide/mla_formatting_quotations.html
  • ↑ http://public.wsu.edu/~campbelld/engl402/cited.htm
  • ↑ https://owl.purdue.edu/owl/general_writing/punctuation/quotation_marks/index.html
  • ↑ http://writing.wisc.edu/Handbook/QPA_paraphrase2.html
  • ↑ http://www.thepunctuationguide.com/ellipses.html
  • ↑ https://www.unr.edu/writing-speaking-center/student-resources/writing-speaking-resources/mla-quotation-punctuation
  • ↑ https://guides.libraries.psu.edu/mlacitation/intext
  • ↑ http://owl.english.purdue.edu/owl/resource/747/03/
  • ↑ https://apastyle.apa.org/style-grammar-guidelines/citations/quotations
  • ↑ https://owl.purdue.edu/owl/research_and_citation/chicago_manual_17th_edition/cmos_formatting_and_style_guide/general_format.html
  • ↑ https://writingcenter.uagc.edu/quoting-paraphrasing-summarizing
  • ↑ https://writingcenter.unc.edu/tips-and-tools/quotations/
  • ↑ https://academicguides.waldenu.edu/writingcenter/evidence/quotation

About This Article

Christopher Taylor, PhD

To quote in a research paper in APA style, use in-text parenthetical citations at the end of quotes that have the author's last name and the year the text was published. If you mention the author's name in the sentence with the quote, just include the year the text was published in the citation. If you're citing a quote in MLA style, do the same thing you would for APA style, but use the page number instead of the year the text was published. To learn how to quote a research paper in Chicago style, scroll down! Did this summary help you? Yes No

  • Send fan mail to authors

Reader Success Stories

Jamil

Dec 27, 2021

Did this article help you?

research paper intro quote

Mar 5, 2018

Anonymous

Apr 10, 2018

Anonymous

Dec 10, 2017

Do I Have a Dirty Mind Quiz

Featured Articles

Make a Good Impression on the First Day of School

Trending Articles

Am I Gaining Weight Due To Menopause Quiz

Watch Articles

Make Body Oil

  • Terms of Use
  • Privacy Policy
  • Do Not Sell or Share My Info
  • Not Selling Info

wikiHow Tech Help Pro:

Level up your tech skills and stay ahead of the curve

Starting Your Research Paper: Writing an Introductory Paragraph

  • Choosing Your Topic
  • Define Keywords
  • Planning Your Paper
  • Writing an Introductory Paragraph

The Dreaded Introductory Paragraph

Writing the introductory paragraph can be a frustrating and slow process -- but it doesn't have to be.  If you planned your paper out, then most of the introductory paragraph is already written.  Now you just need a beginning and an end.

 
     
 
     
  for writing thesis statements.

Here's an introductory paragraph for a paper I wrote.  I started the paper with a factoid, then presented each main point of my paper and then ended with my thesis statement.

  Breakdown:

1st Sentence   I lead with a quick factoid about comics.
2nd & 3rd These sentences define graphic novels and gives a brief history. This is also how the body of my paper starts.
4rd Sentence This sentence introduces the current issue. See how I gave the history first and now give the current issue? That's flow.
5th Sentence Since I was pro-graphic novels, I gave the opposing (con) side first. Remember if you're picking a side, you give the other side first and then your side.
6th Sentence Now I can give my pro-graphic novel argument.
7th Sentence This further expands my pro-graphic novel argument.
8th Sentence This is my thesis statement.
  • << Previous: Planning Your Paper
  • Last Updated: Feb 12, 2024 12:16 PM
  • URL: https://libguides.astate.edu/papers

A-State Library Facebook

Research Paper Introduction Examples

Academic Writing Service

Looking for research paper introduction examples? Quotes, anecdotes, questions, examples, and broad statements—all of them can be used successfully to write an introduction for a research paper. It’s instructive to see them in action, in the hands of skilled academic writers.

Let’s begin with David M. Kennedy’s superb history, Freedom from Fear: The American People in Depression and War, 1929–1945 . Kennedy begins each chapter with a quote, followed by his text. The quote above chapter 1 shows President Hoover speaking in 1928 about America’s golden future. The text below it begins with the stock market collapse of 1929. It is a riveting account of just how wrong Hoover was. The text about the Depression is stronger because it contrasts so starkly with the optimistic quotation.

Academic Writing, Editing, Proofreading, And Problem Solving Services

Get 10% off with 24start discount code.

“We in America today are nearer the final triumph over poverty than ever before in the history of any land.”—Herbert Hoover, August 11, 1928 Like an earthquake, the stock market crash of October 1929 cracked startlingly across the United States, the herald of a crisis that was to shake the American way of life to its foundations. The events of the ensuing decade opened a fissure across the landscape of American history no less gaping than that opened by the volley on Lexington Common in April 1775 or by the bombardment of Sumter on another April four score and six years later. (adsbygoogle = window.adsbygoogle || []).push({}); The ratcheting ticker machines in the autumn of 1929 did not merely record avalanching stock prices. In time they came also to symbolize the end of an era. (David M. Kennedy, Freedom from Fear: The American People in Depression and War, 1929–1945 . New York: Oxford University Press, 1999, p. 10)

Kennedy has exciting, wrenching material to work with. John Mueller faces the exact opposite problem. In Retreat from Doomsday: The Obsolescence of Major War , he is trying to explain why Great Powers have suddenly stopped fighting each other. For centuries they made war on each other with devastating regularity, killing millions in the process. But now, Mueller thinks, they have not just paused; they have stopped permanently. He is literally trying to explain why “nothing is happening now.” That may be an exciting topic intellectually, it may have great practical significance, but “nothing happened” is not a very promising subject for an exciting opening paragraph. Mueller manages to make it exciting and, at the same time, shows why it matters so much. Here’s his opening, aptly entitled “History’s Greatest Nonevent”:

On May 15, 1984, the major countries of the developed world had managed to remain at peace with each other for the longest continuous stretch of time since the days of the Roman Empire. If a significant battle in a war had been fought on that day, the press would have bristled with it. As usual, however, a landmark crossing in the history of peace caused no stir: the most prominent story in the New York Times that day concerned the saga of a manicurist, a machinist, and a cleaning woman who had just won a big Lotto contest. This book seeks to develop an explanation for what is probably the greatest nonevent in human history. (John Mueller, Retreat from Doomsday: The Obsolescence of Major War . New York: Basic Books, 1989, p. 3)

In the space of a few sentences, Mueller sets up his puzzle and reveals its profound human significance. At the same time, he shows just how easy it is to miss this milestone in the buzz of daily events. Notice how concretely he does that. He doesn’t just say that the New York Times ignored this record setting peace. He offers telling details about what they covered instead: “a manicurist, a machinist, and a cleaning woman who had just won a big Lotto contest.” Likewise, David Kennedy immediately entangles us in concrete events: the stunning stock market crash of 1929. These are powerful openings that capture readers’ interests, establish puzzles, and launch narratives.

Sociologist James Coleman begins in a completely different way, by posing the basic questions he will study. His ambitious book, Foundations of Social Theory , develops a comprehensive theory of social life, so it is entirely appropriate for him to begin with some major questions. But he could just as easily have begun with a compelling story or anecdote. He includes many of them elsewhere in his book. His choice for the opening, though, is to state his major themes plainly and frame them as a paradox. Sociologists, he says, are interested in aggregate behavior—how people act in groups, organizations, or large numbers—yet they mostly examine individuals:

A central problem in social science is that of accounting for the function of some kind of social system. Yet in most social research, observations are not made on the system as a whole, but on some part of it. In fact, the natural unit of observation is the individual person…  This has led to a widening gap between theory and research… (James S. Coleman, Foundations of Social Theory . Cambridge, MA: Harvard University Press, 1990, pp. 1–2)

After expanding on this point, Coleman explains that he will not try to remedy the problem by looking solely at groups or aggregate-level data. That’s a false solution, he says, because aggregates don’t act; individuals do. So the real problem is to show the links between individual actions and aggregate outcomes, between the micro and the macro.

The major problem for explanations of system behavior based on actions and orientations at a level below that of the system [in this case, on individual-level actions] is that of moving from the lower level to the system level. This has been called the micro-to-macro problem, and it is pervasive throughout the social sciences. (Coleman, Foundations of Social Theory , p. 6)

Explaining how to deal with this “micro-to-macro problem” is the central issue of Coleman’s book, and he announces it at the beginning.

Coleman’s theory-driven opening stands at the opposite end of the spectrum from engaging stories or anecdotes, which are designed to lure the reader into the narrative and ease the path to a more analytic treatment later in the text. Take, for example, the opening sentences of Robert L. Herbert’s sweeping study Impressionism: Art, Leisure, and Parisian Society : “When Henry Tuckerman came to Paris in 1867, one of the thousands of Americans attracted there by the huge international exposition, he was bowled over by the extraordinary changes since his previous visit twenty years before.” (Robert L. Herbert, Impressionism: Art, Leisure, and Parisian Society . New Haven, CT: Yale University Press, 1988, p. 1.) Herbert fills in the evocative details to set the stage for his analysis of the emerging Impressionist art movement and its connection to Parisian society and leisure in this period.

David Bromwich writes about Wordsworth, a poet so familiar to students of English literature that it is hard to see him afresh, before his great achievements, when he was just a young outsider starting to write. To draw us into Wordsworth’s early work, Bromwich wants us to set aside our entrenched images of the famous mature poet and see him as he was in the 1790s, as a beginning writer on the margins of society. He accomplishes this ambitious task in the opening sentences of Disowned by Memory: Wordsworth’s Poetry of the 1790s :

Wordsworth turned to poetry after the revolution to remind himself that he was still a human being. It was a curious solution, to a difficulty many would not have felt. The whole interest of his predicament is that he did feel it. Yet Wordsworth is now so established an eminence—his name so firmly fixed with readers as a moralist of self-trust emanating from complete self-security—that it may seem perverse to imagine him as a criminal seeking expiation. Still, that is a picture we get from The Borderers and, at a longer distance, from “Tintern Abbey.” (David Bromwich, Disowned by Memory: Wordsworth’s Poetry of the 1790s . Chicago: University of Chicago Press, 1998, p. 1)

That’s a wonderful opening! Look at how much Bromwich accomplishes in just a few words. He not only prepares the way for analyzing Wordsworth’s early poetry; he juxtaposes the anguished young man who wrote it to the self-confident, distinguished figure he became—the eminent man we can’t help remembering as we read his early poetry.

Let us highlight a couple of other points in this passage because they illustrate some intelligent writing choices. First, look at the odd comma in this sentence: “It was a curious solution, to a difficulty many would not have felt.” Any standard grammar book would say that comma is wrong and should be omitted. Why did Bromwich insert it? Because he’s a fine writer, thinking of his sentence rhythm and the point he wants to make. The comma does exactly what it should. It makes us pause, breaking the sentence into two parts, each with an interesting point. One is that Wordsworth felt a difficulty others would not have; the other is that he solved it in a distinctive way. It would be easy for readers to glide over this double message, so Bromwich has inserted a speed bump to slow us down. Most of the time, you should follow grammatical rules, like those about commas, but you should bend them when it serves a good purpose. That’s what the writer does here.

The second small point is the phrase “after the revolution” in the first sentence: “Wordsworth turned to poetry after the revolution to remind himself that he was still a human being.” Why doesn’t Bromwich say “after the French Revolution”? Because he has judged his book’s audience. He is writing for specialists who already know which revolution is reverberating through English life in the 1790s. It is the French Revolution, not the earlier loss of the American colonies. If Bromwich were writing for a much broader audience—say, the New York Times Book Review—he would probably insert the extra word to avoid confusion.

The message “Know your audience” applies to all writers. Don’t talk down to them by assuming they can’t get dressed in the morning. Don’t strut around showing off your book learnin’ by tossing in arcane facts and esoteric language for its own sake. Neither will win over readers.

Bromwich, Herbert, and Coleman open their works in different ways, but their choices work well for their different texts. Your task is to decide what kind of opening will work best for yours. Don’t let that happen by default, by grabbing the first idea you happen upon. Consider a couple of different ways of opening your thesis and then choose the one you prefer. Give yourself some options, think them over, then make an informed choice.

Using the Introduction to Map out Your Writing

Whether you begin with a story, puzzle, or broad statement, the next part of the research paper introduction should pose your main questions and establish your argument. This is your thesis statement—your viewpoint along with the supporting reasons and evidence. It should be articulated plainly so readers understand full well what your paper is about and what it will argue.

After that, give your readers a road map of what’s to come. That’s normally done at the end of the introductory section (or, in a book, at the end of the introductory chapter). Here’s John J. Mearsheimer presenting such a road map in The Tragedy of Great Power Politics . He not only tells us the order of upcoming chapters, he explains why he’s chosen that order and which chapters are most important:

The Plan of the Book The rest of the chapters in this book are concerned mainly with answering the six big questions about power which I identified earlier. Chapter 2, which is probably the most important chapter in the book, lays out my theory of why states compete for power and why they pursue hegemony. In Chapters 3 and 4, I define power and explain how to measure it. I do this in order to lay the groundwork for testing my theory… (John J. Mearsheimer, The Tragedy of Great Power Politics . New York: W. W. Norton, 2001, p. 27)

As this excerpt makes clear, Mearsheimer has already laid out his “six big questions” in the research paper introduction. Now he’s showing us the path ahead, the path to answering those questions.

At the end of the research paper introduction, give your readers a road map of what’s to come. Tell them what the upcoming sections will be and why they are arranged in this particular order.

Learn how to write an introduction for a research paper .

ORDER HIGH QUALITY CUSTOM PAPER

research paper intro quote

research paper intro quote

Research Blog

How to write a research paper introduction (with examples).

research paper intro quote

I hope you enjoy reading this blog post.

If you would like to learn more about research, check out this  Research Course .

Welcome to our comprehensive guide on crafting the perfect introduction for your research paper. In this blog, we’ll explore the crucial elements of a strong introduction, highlight common pitfalls to avoid, and provide practical tips to effectively set the stage for your study’s objectives and significance. 

Table of Contents

Lack of a clear thesis statement, lack of clear objectives and scope, failure to establish the research significance, insufficient background information, inadequate literature review, ignoring the research gap, overly technical language, poor organization and flow, neglecting the audience, the importance of a good introduction.

A strong introduction sets the tone for the entire paper, guiding the reader through the research journey. It provides context, establishes relevance, and ensures the reader understands the importance of the study.

Starting a research project is exciting, but getting the introduction right is key. It’s like opening the door to your study and inviting readers in. However, there are some common missteps that can trip you up along the way.

Find the U.S. Research Position of your dreams!

Common mistakes to avoid.

A thesis statement is the central argument or claim that guides the entire research paper. It is a concise summary of the main point or claim of the paper and is typically found at the end of the introduction. A clear thesis statement helps to focus the research, provide direction, and inform the reader of the paper’s purpose. Expert reviewers may even skip the rest of the introduction (as they are well versed in the topic) and focus only on your thesis statement, so it’s vital to make sure it is perfect!

When a research introduction lacks a clear thesis statement, several issues can arise:

  • Ambiguity : Without a clear thesis, the reader may be confused about the paper’s purpose and the main argument. Do not talk in vague terms. Whenever possible, use terminology established in recent literature. Narrow down the key aspects of the association that you are investigating (the study sample, the outcome and predictor measures) as much as possible.
  • Lack of Focus : The paper can become unfocused and meander through unrelated topics, making it difficult for the reader to follow the argument. Do not try to have more than 1-2 main aims in a paper. Even if you have done supplementary analysis, it is better to say so in the discussion. As a rule of thumb, try to answer one major question only!
  • Weak Argumentation : A well-defined thesis provides a strong foundation for building arguments. Without it, the arguments may appear weak and unsupported.

Let's be more practical:

1- In this paper, I will discuss climate change.

  • Problem: This statement is too broad and vague. It does not provide a clear direction or specific argument.

2- This paper argues that climate change, measured by global average temperature change, is primarily driven by human activities, such as deforestation and the burning of fossil fuels, and proposes policy measures to mitigate its impact.(1)

  • Strengths: – Specificity : It clearly states that the paper will focus on human activities as the main drivers of climate change. – Argument : It presents a specific claim that the paper will argue. – Direction : It hints at the structure of the paper by mentioning policy measures.

If you would like to learn more about introductions and other aspects of clinical research, check out the Medical Research Course from the Match Guy here .

Powerful Tips:

  • Be Specific : Clearly define the main argument or claim. Avoid vague or broad statements.
  • Be Concise : Keep the thesis statement concise, ideally one to two sentences.
  • Provide Direction : Indicate the structure of the paper by hinting at the main points that will be discussed.
  • Revise as Needed : Be prepared to revise the thesis statement as your research progresses and your understanding deepens.

Transform research ideas into published papers!

A clear statement of objectives and scope is crucial in a research paper introduction because it outlines what the study aims to achieve and defines the boundaries within which the research will be conducted.

Example of Lacking Clear Objectives and Scope: This paper examines the impacts of climate change on agriculture.

  • Problem : This statement is too broad and vague. It does not specify what aspects of climate change or agriculture will be studied, nor does it define the geographical or temporal scope.

Example with Clear Objectives and Scope: This study aims to investigate the effects of rising temperatures and changing precipitation patterns on crop yields in the Midwest United States from 2000 to 2010. The objectives are to (1) assess the impact of temperature changes on corn and soybean yields, (2) analyze how variations in precipitation affect crop growth, and (3) identify adaptive strategies employed by farmers in the region.(2)

Powerful tips:

  • Be Specific : Clearly state what the study aims to achieve and avoid vague or broad statements.
  • Identify Key Areas : Outline the main areas or aspects that the research will focus on.
  • Set Boundaries : Define the geographical, temporal, and conceptual boundaries of the research.
  • List Objectives : Clearly articulate specific research objectives or questions that the study will address.
  • Stay Realistic : Ensure that the objectives and scope are achievable within the constraints of the research project.
  • Make it flow : Make sure you are not repeating the same concepts as the thesis statement, as these two sections are often presented back-to-back in the final paragraph of the introduction! Remember: the thesis statement is your hypothesis or question, and your objectives are ‘how’ you are going to test your thesis.

Master medical statistics and boost your productivity!

This mistake can result in the research appearing trivial or irrelevant, diminishing its potential impact. When the significance of the research is not well-established, readers may struggle to understand the value of the study and why they should care about it.

Example of Failure to Establish Research Significance: This study investigates the effects of social media usage on sleep patterns among teenagers.

  • Problem : The significance of studying social media’s impact on sleep patterns is not explained. The reader may wonder why this research is important or what implications it has.

Example with Established Research Significance: This study investigates the effects of social media usage on sleep patterns among teenagers. Understanding this relationship is crucial because insufficient sleep is linked to numerous health issues, including decreased academic performance, heightened stress levels, and increased risk of mental health problems. With the pervasive use of social media among adolescents, identifying how it impacts sleep can inform strategies for promoting healthier habits and improving overall well-being in this vulnerable age group.(3)

  • Link to Broader Issues : Connect the research topic to broader issues or trends that highlight its relevance and importance.
  • Explain Practical Implications : Discuss the potential practical applications or benefits of the research findings.
  • Address Gaps in Knowledge : Identify gaps in the existing literature that the research aims to fill.
  • Highlight Potential Impact : Emphasize the potential impact of the research on the field, society, or specific populations.
  • Use Concrete Examples : Provide concrete examples or scenarios to illustrate the significance of the research.

Feeling swamped by the complexities of Systematic Reviews?

Insufficient background information in the introduction of a research paper refers to failing to provide enough context for the reader to understand the research problem and its significance. Background information sets the stage for the research by offering necessary details about the topic, relevant theories, previous studies, and key terms.

This may lead to:

  • Reader Confusion : Without adequate context, readers may struggle to understand the research question, its importance, and how it fits into the broader field of study.
  • Weak Justification : Insufficient background can undermine the rationale for the research, making it difficult to justify why the study is necessary or valuable.
  • Misinterpretation : Lack of context can lead to misinterpretation of the research objectives, methods, and findings.

Example of Insufficient Background Information: In recent years, many researchers have studied the effects of social media on teenagers. This paper explores the relationship between social media use and anxiety among teenagers.

  • Problem : This introduction lacks specific details about the previous research, the theoretical framework, and key terms. It does not provide enough context for the reader to understand why the study is important.

Example of Adequate Background Information: Social media platforms have become an integral part of teenagers’ daily lives, with studies showing that 95% of teens have access to a smartphone and 45% are online almost constantly. Previous research has linked excessive social media use to various mental health issues, including anxiety and depression. However, the mechanisms underlying this relationship remain unclear. This paper explores the impact of social media use on anxiety levels among teenagers, focusing on the roles of social comparison and cyberbullying.(4)

If you would like to learn more about manuscript writing, check out the Comprehensive Research Course from the Match Guy here .

  • Review Relevant Literature : Summarize key studies and theories related to your topic.
  • Provide Context : Explain the broader context of your research problem.
  • Define Key Terms : Ensure that any specialized terms or concepts are clearly defined.
  • Identify the Research Gap : Highlight what is not yet known or understood about your topic.
  • Be Concise : Provide enough information to set the stage without overwhelming the reader with details.

Getting No Replies from Institutions? Don't Stress!

This mistake can occur when the literature review is too brief, lacks depth, omits key studies, or fails to critically analyze previous work. An inadequate literature review can undermine the foundation of the research by failing to provide the necessary context and justification for the study.

Inadequate Literature Review: There has been some research on the relationship between exercise and mental health. This paper will investigate this relationship further.

  • Problem : This review is too general and does not provide sufficient detail about the existing research or how it informs the current study.

Example with Adequate Literature Review: Research has consistently shown that regular physical activity has positive effects on mental health. For example, a study by Gujral et al. (2019) demonstrated that aerobic exercise can significantly reduce symptoms of depression and anxiety. Similarly, Smith and Lee (2020) found that strength training also contributes to improved mood and reduced stress levels. However, much of the existing research has focused on adult populations, with relatively few studies examining these effects in adolescents. Additionally, the specific types of exercise that are most beneficial for different mental health outcomes have not been thoroughly investigated. This study aims to explore the effects of various types of exercise on the mental health of high school students, thereby addressing these gaps in the literature.(5-6)

  • Be Comprehensive : Review a broad range of studies related to the research topic to provide a thorough context.
  • Be Specific : Cite specific studies, including their methodologies, findings, and relevance to the current research.
  • Be Critical : Analyze and evaluate the existing research, identifying strengths, weaknesses, and gaps.
  • Be Structured : Organize the literature review logically, grouping studies by themes or findings to create a coherent narrative.
  • Be Relevant : Focus on the most relevant studies that directly relate to the research question and objectives.

Unlock your research potential with our dynamic course designed for medical students! Built by an experienced researcher with 150+ publications and 3000+ citations!

Ignoring the research gap in a research paper introduction means failing to identify and articulate what specific aspect of the topic has not been explored or adequately addressed in existing literature. The research gap is a critical component because it justifies the necessity and originality of the study. Without highlighting this gap, the research may appear redundant or lacking in significance.

How huge is this mistake?

  • Lack of Justification : The study may not appear necessary or relevant, diminishing its perceived value.
  • Redundancy : The research may seem to duplicate existing studies, offering no new insights or contributions to the field. Even if you are using methodology similar to previous studies, it is important to note why you are doing so e.g., few studies have used that specific methodology, and you would like to validate it in your sample population!
  • Reader Disinterest : Readers may lose interest if they do not see the unique contribution or purpose of the research.

Example of Ignoring the Research Gap: Many studies have examined the effects of exercise on mental health. This paper looks at the relationship between physical activity and depression.

  • Problem : This introduction does not specify what aspect of the relationship between physical activity and depression has not been studied, failing to highlight the unique contribution of the research.

Example of Identifying the Research Gap: Numerous studies have demonstrated the general benefits of physical activity on mental health, particularly its role in alleviating symptoms of depression. However, there is limited research on how different types of exercise (e.g., aerobic vs. anaerobic) specifically impact depression levels among various age groups. This study investigates the differential effects of aerobic and anaerobic exercise on depression in young adults, aiming to fill this gap in the literature.(6)

  • Conduct a Thorough Literature Review : Understand the current state of research in your field to identify what has been studied and where gaps exist.
  • Be Specific : Clearly articulate what specific aspect has not been covered in existing studies.
  • Link to Your Study : Explain how your research will address this gap and contribute to the field.
  • Use Evidence : Support your identification of the gap with references to previous studies.
  • Emphasize Significance : Highlight why filling this gap is important for advancing knowledge or practical applications.

Do Statistical Analysis on your own with confidence in 2 weeks!

Overly technical language refers to the excessive use of jargon, complex terms, and highly specialized language that may be difficult for readers, especially those not familiar with the field, to understand. While technical language is sometimes necessary in academic writing, overusing it in the introduction can create several problems:

  • Reader Alienation : Readers may find the text intimidating or inaccessible, leading to disengagement.
  • Lack of Clarity : The main points and significance of the research can become obscured by complex terminology.
  • Reduced Impact : The research may fail to communicate its importance effectively if readers struggle to understand the introduction.

Example of Overly Technical Language: The present study examines the metacognitive strategies employed by individuals in the domain of second language acquisition, specifically focusing on the interaction between declarative and procedural memory systems in the process of syntactic parsing.

  • Problem : This sentence is loaded with jargon (“metacognitive strategies,” “second language acquisition,” “declarative and procedural memory systems,” “syntactic parsing”), which can be overwhelming and confusing for readers not familiar with these terms.

Example with Simplified Language: This study looks at the thinking strategies people use when learning a second language. It focuses on how different types of memory, such as the knowledge of facts and the skills for doing things, help in understanding sentence structures.(7)

  • Know Your Audience : Tailor the language to the intended audience, ensuring it is accessible to both specialists and non-specialists.
  • Define Term s: When technical terms are necessary, provide clear definitions or explanations.
  • Use Analogies : Simplify complex concepts using analogies or examples that are easy to understand.
  • Avoid Jargon : Limit the use of jargon and specialized terms, especially in the introduction.
  • Seek Feedback : Ask peers or non-experts to read the introduction and provide feedback on clarity and accessibility.

Curious about delving into research but unsure where to begin?

Poor organization and flow in a research paper introduction refer to a lack of logical structure and coherence that makes the introduction difficult to follow. This can occur when ideas are presented in a haphazard manner, transitions between sections are weak or non-existent, and the overall narrative is disjointed. A well-organized introduction should smoothly guide the reader from the general context to the specific objectives of the study.

Example of Poor Organization and Flow: “Climate change affects agriculture in various ways. Many studies have looked at the impact on crop yields. This paper will discuss the economic implications of these changes. Climate models predict increased variability in weather patterns, which will affect water availability. Researchers have found that higher temperatures reduce the growing season for many crops.”

  • Problem : The ideas are presented in a scattered manner without clear connections. The mention of economic implications seems out of place, and there are abrupt shifts between topics.

Example with Good Organization and Flow: Climate change poses significant challenges to agriculture by altering weather patterns, impacting crop yields, and affecting water availability. Numerous studies have shown that increased temperatures can shorten the growing season for many crops, leading to reduced yields. Additionally, climate models predict increased variability in weather patterns, which complicates water management for farmers. These changes not only affect food production but also have substantial economic implications for agricultural communities. This paper will examine the economic impacts of climate-induced changes in agriculture, focusing on crop yield variability and water resource management.(1)

  • Create an Outline : Before writing, outline the main points you want to cover in the introduction.
  • Think in terms of an inverted triangle : Begin broadly to introduce basic concepts related to your topic. As you progress through the introduction, you can introduce more and more specific topics until you have enough information to justify your thesis statement
  • Use Transitional Phrases : Employ transitional phrases and sentences to connect ideas and sections smoothly.
  • Follow a Logical Sequence : Present information in a logical order, moving from general context to specific objectives.
  • Maintain Focus : Stay focused on the main topic and avoid introducing unrelated ideas.
  • Revise for Coherence : Review and revise the introduction to ensure that it flows well and that each part contributes to the overall narrative.

research paper intro quote

Advisor UNLIMITED Access

We get how stressful the residency match process is, so we're here for you - communicate with your personal advisor ANYTIME you need!

research paper intro quote

Personal Statement Editing

Our editing includes not only language but also context, structure, and content advising.

research paper intro quote

ERAS Application Editing

The editing goes beyond language and grammar corrections to structure, design, and content based on your personal story and achievement.

research paper intro quote

  • Interview Preparation

The best way to learn something is to do it. That’s why we divide our interview preparation sessions into two parts. Mock Interview + Feedback

Neglecting the audience refers to failing to consider the background, knowledge level, and interests of the intended readers when writing the introduction of a research paper. This mistake can manifest in several ways, such as using overly technical language for a general audience, providing insufficient background information for readers unfamiliar with the topic, or failing to engage the readers’ interest.

Example of Neglecting the Audience: For experts in genomic sequencing, this study explores the epigenetic modifications resulting from CRISPR-Cas9 interventions, focusing on the methylation patterns and histone modifications observed in gene-edited cells.

  • Problem : This introduction assumes a high level of expertise in genomic sequencing and epigenetics, which may alienate readers without this background.

Example with Audience Consideration: CRISPR-Cas9 is a groundbreaking tool in genetic research that allows scientists to edit DNA with precision. However, altering genes can lead to unexpected changes in how genes are expressed, known as epigenetic modifications. This study investigates these changes by looking at specific markers on DNA, such as methylation patterns, and how they affect gene activity in cells that have been edited using CRISPR-Cas9. Our goal is to understand the broader implications of gene editing on cellular functions, which is crucial for advancing medical research and treatments.(8)

  • Identify the Audience : Determine who the intended readers are (e.g., experts, students, general public) and tailor the language and content accordingly. Read papers from the journals you are considering for submission. Professional editors curate the language used in these papers and are a great starting point to identify the level of expertise of your audience!
  • Simplify Language : Use clear and straightforward language, avoiding jargon and technical terms unless they are necessary and well-explained.
  • Provide Background Information : Include sufficient background information to help readers understand the context and significance of the research.
  • Engage the Reader : Start with an engaging introduction that highlights the relevance and importance of the research topic.
  • Anticipate Questions : Consider what questions or concerns the audience might have and address them in the introduction

USMLE Tutoring

By following these guidelines and avoiding common pitfalls, you can create an introduction that not only grabs the attention of your readers but also sets the stage for a compelling and impactful research paper.

Final Tips:

  • Revise and refine your introduction multiple times to ensure clarity and coherence.
  • Seek feedback from peers, mentors, or advisors to identify areas for improvement.
  • Keep your audience in mind and tailor your language and content to their needs and interests.
  • Stay focused on your research objectives and ensure that every part of your introduction contributes to achieving them.
  • Be confident in the significance of your research and its potential impact on your field or community.

Let your introduction be more than just words on a page. It’s a doorway to understanding. To help you along, we’ve created a practical course on writing and publishing research projects. It’s 100% risk-free, with a money-back guarantee if you’re not satisfied. Try it out now by clicking here .

Wishing you success on your research journey!

Marina Ramzy Mourid, Hamza Ibad, MBBS

Dr. Ibad graduated from the Aga Khan University Medical College and completed a post-doctoral research fellowship at Johns Hopkins in the Department of Radiology (Musculoskeletal Division). Dr. Ibad’s research and clinical interests include deep-learning applications for automated image interpretation, osteoarthritis, and sarcopenia-related health outcomes.

research paper intro quote

2025 ERAS Application Updates 

General Surgery Residency Personal Statement Examples

General Surgery Residency Personal Statement Examples 

Emergency Medicine Residency Personal Statement Examples

Emergency Medicine Residency Personal Statement Examples 

Internal Medicine Residency Personal Statement Examples

Internal Medicine Residency Personal Statement Examples 

About thematchguy, become a researcher in the united states, interested in learning more about literature search with examples from published literature, the comprehensive research course, the systematic review course, the medical statistics course, how to find research positions in the us.

1. Abbass K, Qasim MZ, Song H, Murshed M, Mahmood H, Younis I. A review of the global climate change impacts, adaptation, and sustainable mitigation measures. Environ Sci Pollut Res. 2022;29(28):42539-42559. doi:10.1007/s11356-022-19718-6

2. Cai X, Wang D, Laurent R. Impact of climate change on crop yield: a case study of rainfed corn in central illinois. Journal of Applied Meteorology and Climatology. 2009;48(9):1868-1881. doi:10.1175/2009JAMC1880.1

3. Van Den Eijnden RJJM, Geurts SM, Ter Bogt TFM, Van Der Rijst VG, Koning IM. Social media use and adolescents’ sleep: a longitudinal study on the protective role of parental rules regarding internet use before sleep. IJERPH. 2021;18(3):1346. doi:10.3390/ijerph18031346

4. Schmitt, M. (2021). Effects of social media and technology on adolescents: What the evidence is showing and what we can do about it. Journal of Family and Consumer Sciences Education, 38(1), 51-59.

5. Gujral S, Aizenstein H, Reynolds CF, Butters MA, Erickson KI. Exercise effects on depression: Possible neural mechanisms. General Hospital Psychiatry. 2017;49:2-10. doi:10.1016/j.genhosppsych.2017.04.012

6. Smith PJ, Merwin RM. The role of exercise in management of mental health disorders: an integrative review. Annu Rev Med. 2021;72(1):45-62. doi:10.1146/annurev-med-060619-022943

7. Sun Q, Zhang LJ. Understanding learners’ metacognitive experiences in learning to write in English as a foreign language: A structural equation modeling approach. Front Psychol. 2022;13:986301. doi:10.3389/fpsyg.2022.986301

8. Kolanu ND. Crispr–cas9 gene editing: curing genetic diseases by inherited epigenetic modifications. Glob Med Genet. 2024;11(01):113-122. doi:10.1055/s-0044-1785234

How can we help you?

Leave your message here and we will get in touch with you as soon as possible.

research paper intro quote

Quick Links

  • Our Reviews
  • Our Podcast
  • Residency Advising
  • Personal Statement
  • Match®Application Package
  • P.O. Box 40388 Pittsburgh, PA 15201
  • [email protected]
  • + 1 412-295-8358
  • Privacy Policy
  • Terms & Conditions

WhatsApp Us

research paper intro quote

Join Over 20,000 Readers

  • Resources Home 🏠
  • Try SciSpace Copilot
  • Search research papers
  • Add Copilot Extension
  • Try AI Detector
  • Try Paraphraser
  • Try Citation Generator
  • April Papers
  • June Papers
  • July Papers

SciSpace Resources

How to Write an Introduction for a Research Paper

Sumalatha G

Table of Contents

Writing an introduction for a research paper is a critical element of your paper, but it can seem challenging to encapsulate enormous amount of information into a concise form. The introduction of your research paper sets the tone for your research and provides the context for your study. In this article, we will guide you through the process of writing an effective introduction that grabs the reader's attention and captures the essence of your research paper.

Understanding the Purpose of a Research Paper Introduction

The introduction acts as a road map for your research paper, guiding the reader through the main ideas and arguments. The purpose of the introduction is to present your research topic to the readers and provide a rationale for why your study is relevant. It helps the reader locate your research and its relevance in the broader field of related scientific explorations. Additionally, the introduction should inform the reader about the objectives and scope of your study, giving them an overview of what to expect in the paper. By including a comprehensive introduction, you establish your credibility as an author and convince the reader that your research is worth their time and attention.

Key Elements to Include in Your Introduction

When writing your research paper introduction, there are several key elements you should include to ensure it is comprehensive and informative.

  • A hook or attention-grabbing statement to capture the reader's interest.  It can be a thought-provoking question, a surprising statistic, or a compelling anecdote that relates to your research topic.
  • A brief overview of the research topic and its significance. By highlighting the gap in existing knowledge or the problem your research aims to address, you create a compelling case for the relevance of your study.
  • A clear research question or problem statement. This serves as the foundation of your research and guides the reader in understanding the unique focus of your study. It should be concise, specific, and clearly articulated.
  • An outline of the paper's structure and main arguments, to help the readers navigate through the paper with ease.

Preparing to Write Your Introduction

Before diving into writing your introduction, it is essential to prepare adequately. This involves 3 important steps:

  • Conducting Preliminary Research: Immerse yourself in the existing literature to develop a clear research question and position your study within the academic discourse.
  • Identifying Your Thesis Statement: Define a specific, focused, and debatable thesis statement, serving as a roadmap for your paper.
  • Considering Broader Context: Reflect on the significance of your research within your field, understanding its potential impact and contribution.

By engaging in these preparatory steps, you can ensure that your introduction is well-informed, focused, and sets the stage for a compelling research paper.

Structuring Your Introduction

Now that you have prepared yourself to tackle the introduction, it's time to structure it effectively. A well-structured introduction will engage the reader from the beginning and provide a logical flow to your research paper.

Starting with a Hook

Begin your introduction with an attention-grabbing hook that captivates the reader's interest. This hook serves as a way to make your introduction more engaging and compelling. For example, if you are writing a research paper on the impact of climate change on biodiversity, you could start your introduction with a statistic about the number of species that have gone extinct due to climate change. This will immediately grab the reader's attention and make them realize the urgency and importance of the topic.

Introducing Your Topic

Provide a brief overview, which should give the reader a general understanding of the subject matter and its significance. Explain the importance of the topic and its relevance to the field. This will help the reader understand why your research is significant and why they should continue reading. Continuing with the example of climate change and biodiversity, you could explain how climate change is one of the greatest threats to global biodiversity, how it affects ecosystems, and the potential consequences for both wildlife and human populations. By providing this context, you are setting the stage for the rest of your research paper and helping the reader understand the importance of your study.

Presenting Your Thesis Statement

The thesis statement should directly address your research question and provide a preview of the main arguments or findings discussed in your paper. Make sure your thesis statement is clear, concise, and well-supported by the evidence you will present in your research paper. By presenting a strong and focused thesis statement, you are providing the reader with the information they could anticipate in your research paper. This will help them understand the purpose and scope of your study and will make them more inclined to continue reading.

Writing Techniques for an Effective Introduction

When crafting an introduction, it is crucial to pay attention to the finer details that can elevate your writing to the next level. By utilizing specific writing techniques, you can captivate your readers and draw them into your research journey.

Using Clear and Concise Language

One of the most important writing techniques to employ in your introduction is the use of clear and concise language. By choosing your words carefully, you can effectively convey your ideas to the reader. It is essential to avoid using jargon or complex terminology that may confuse or alienate your audience. Instead, focus on communicating your research in a straightforward manner to ensure that your introduction is accessible to both experts in your field and those who may be new to the topic. This approach allows you to engage a broader audience and make your research more inclusive.

Establishing the Relevance of Your Research

One way to establish the relevance of your research is by highlighting how it fills a gap in the existing literature. Explain how your study addresses a significant research question that has not been adequately explored. By doing this, you demonstrate that your research is not only unique but also contributes to the broader knowledge in your field. Furthermore, it is important to emphasize the potential impact of your research. Whether it is advancing scientific understanding, informing policy decisions, or improving practical applications, make it clear to the reader how your study can make a difference.

By employing these two writing techniques in your introduction, you can effectively engage your readers. Take your time to craft an introduction that is both informative and captivating, leaving your readers eager to delve deeper into your research.

Revising and Polishing Your Introduction

Once you have written your introduction, it is crucial to revise and polish it to ensure that it effectively sets the stage for your research paper.

Self-Editing Techniques

Review your introduction for clarity, coherence, and logical flow. Ensure each paragraph introduces a new idea or argument with smooth transitions.

Check for grammatical errors, spelling mistakes, and awkward sentence structures.

Ensure that your introduction aligns with the overall tone and style of your research paper.

Seeking Feedback for Improvement

Consider seeking feedback from peers, colleagues, or your instructor. They can provide valuable insights and suggestions for improving your introduction. Be open to constructive criticism and use it to refine your introduction and make it more compelling for the reader.

Writing an introduction for a research paper requires careful thought and planning. By understanding the purpose of the introduction, preparing adequately, structuring effectively, and employing writing techniques, you can create an engaging and informative introduction for your research. Remember to revise and polish your introduction to ensure that it accurately represents the main ideas and arguments in your research paper. With a well-crafted introduction, you will capture the reader's attention and keep them inclined to your paper.

Suggested Reads

ResearchGPT: A Custom GPT for Researchers and Scientists Best Academic Search Engines [2023] How To Humanize AI Text In Scientific Articles Elevate Your Writing Game With AI Grammar Checker Tools

You might also like

Boosting Citations: A Comparative Analysis of Graphical Abstract vs. Video Abstract

Boosting Citations: A Comparative Analysis of Graphical Abstract vs. Video Abstract

Sumalatha G

The Impact of Visual Abstracts on Boosting Citations

Introducing SciSpace’s Citation Booster To Increase Research Visibility

Introducing SciSpace’s Citation Booster To Increase Research Visibility

Quoting and integrating sources into your paper

In any study of a subject, people engage in a “conversation” of sorts, where they read or listen to others’ ideas, consider them with their own viewpoints, and then develop their own stance. It is important in this “conversation” to acknowledge when we use someone else’s words or ideas. If we didn’t come up with it ourselves, we need to tell our readers who did come up with it.

It is important to draw on the work of experts to formulate your own ideas. Quoting and paraphrasing the work of authors engaged in writing about your topic adds expert support to your argument and thesis statement. You are contributing to a scholarly conversation with scholars who are experts on your topic with your writing. This is the difference between a scholarly research paper and any other paper: you must include your own voice in your analysis and ideas alongside scholars or experts.

All your sources must relate to your thesis, or central argument, whether they are in agreement or not. It is a good idea to address all sides of the argument or thesis to make your stance stronger. There are two main ways to incorporate sources into your research paper.

Quoting is when you use the exact words from a source. You will need to put quotation marks around the words that are not your own and cite where they came from. For example:

“It wasn’t really a tune, but from the first note the beast’s eyes began to droop . . . Slowly the dog’s growls ceased – it tottered on its paws and fell to its knees, then it slumped to the ground, fast asleep” (Rowling 275).

Follow these guidelines when opting to cite a passage:

  • Choose to quote passages that seem especially well phrased or are unique to the author or subject matter.
  • Be selective in your quotations. Avoid over-quoting. You also don’t have to quote an entire passage. Use ellipses (. . .) to indicate omitted words. Check with your professor for their ideal length of quotations – some professors place word limits on how much of a sentence or paragraph you should quote.
  • Before or after quoting a passage, include an explanation in which you interpret the significance of the quote for the reader. Avoid “hanging quotes” that have no context or introduction. It is better to err on the side of your reader not understanding your point until you spell it out for them, rather than assume readers will follow your thought process exactly.
  • If you are having trouble paraphrasing (putting something into your own words), that may be a sign that you should quote it.
  • Shorter quotes are generally incorporated into the flow of a sentence while longer quotes may be set off in “blocks.” Check your citation handbook for quoting guidelines.

Paraphrasing is when you state the ideas from another source in your own words . Even when you use your own words, if the ideas or facts came from another source, you need to cite where they came from. Quotation marks are not used. For example:

With the simple music of the flute, Harry lulled the dog to sleep (Rowling 275).

Follow these guidelines when opting to paraphrase a passage:

  • Don’t take a passage and change a word here or there. You must write out the idea in your own words. Simply changing a few words from the original source or restating the information exactly using different words is considered plagiarism .
  • Read the passage, reflect upon it, and restate it in a way that is meaningful to you within the context of your paper . You are using this to back up a point you are making, so your paraphrased content should be tailored to that point specifically.
  • After reading the passage that you want to paraphrase, look away from it, and imagine explaining the main point to another person.
  • After paraphrasing the passage, go back and compare it to the original. Are there any phrases that have come directly from the original source? If so, you should rephrase it or put the original in quotation marks. If you cannot state an idea in your own words, you should use the direct quotation.

A summary is similar to paraphrasing, but used in cases where you are trying to give an overview of many ideas. As in paraphrasing, quotation marks are not used, but a citation is still necessary. For example:

Through a combination of skill and their invisibility cloak, Harry, Ron, and Hermione slipped through Hogwarts to the dog’s room and down through the trapdoor within (Rowling 271-77).

Important guidelines

When integrating a source into your paper, remember to use these three important components:

  • Introductory phrase to the source material : mention the author, date, or any other relevant information when introducing a quote or paraphrase.
  • Source material : a direct quote, paraphrase, or summary with proper citation.
  • Analysis of source material : your response, interpretations, or arguments regarding the source material should introduce or follow it. When incorporating source material into your paper, relate your source and analysis back to your original thesis.

Ideally, papers will contain a good balance of direct quotations, paraphrasing and your own thoughts. Too much reliance on quotations and paraphrasing can make it seem like you are only using the work of others and have no original thoughts on the topic.

Always properly cite an author’s original idea, whether you have directly quoted or paraphrased it. If you have questions about how to cite properly in your chosen citation style, browse these citation guides . You can also review our guide to understanding plagiarism .

University Writing Center

The University of Nevada, Reno Writing Center provides helpful guidance on quoting and paraphrasing and explains how to make sure your paraphrasing does not veer into plagiarism. If you have any questions about quoting or paraphrasing, or need help at any point in the writing process, schedule an appointment with the Writing Center.

Works Cited

Rowling, J.K. Harry Potter and the Sorcerer's Stone.  A.A. Levine Books, 1998.

Stack Exchange Network

Stack Exchange network consists of 183 Q&A communities including Stack Overflow , the largest, most trusted online community for developers to learn, share their knowledge, and build their careers.

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Is it bad to start an introduction with a direct quote?

I'm a statistics student at the undergraduate level, writing an undergraduate thesis about genetics. I already have the objectives well defined and I will write the introduction lined with those objectives.

I found a text in a website with authors name and publication date that would be nice to put in my introduction. Is a bad thing start an introduction with a citation?

EDIT: Is a direct quote.

  • introduction

aeismail's user avatar

  • Perhaps this is tangential but monograph (which generally refers to a single-author book) can't be the right word for an undergraduate thesis (at least in AmE). –  virmaior Commented Aug 28, 2016 at 13:53
  • Do you mean literally the very first words? I had put a long quote from at the end of the first paragraph of my thesis and it was fine I think, but it wasn't the very first words, it was more the eye-catcher on the first page of Introduction. –  yo' Commented Aug 28, 2016 at 19:56
  • Are you quoting a website, or are you quoting a book which was also cited by this website? If the latter, you most certainly need to follow up and get the book and check that the quote exists, is in fact the exact wording you're using, and that you're comfortable with the work as your initial building block - it says a lot about what you're doing, particularly to people who have read it in full (have you?). –  E.P. Commented Aug 28, 2016 at 19:58
  • Also, compare the following two narratives: "I found this great quote in some website and I just decided to pop it into my thesis, and as my star starting lines no less", versus "I chased up this quote from some website, which led me to this great book which I read and I thought it was awesome, so I found the bit I liked best and I quoted it in my thesis". Spot the difference? –  E.P. Commented Aug 28, 2016 at 20:00

4 Answers 4

While I am not aware of any hard and fast rule against it, I would find it off-putting to read a scientific document that started its prose with a direct quote. There are just so many ways to start with one's own words that it feels lazy and lacking in self-confidence to me to use another's words for the first impression of a reader.

Note, however, that I am speaking of using a quotation as the first piece of one's prose, as distinct from the practice of putting a separated "thematic" quotation at the start of a document of chapter. I see this somewhat more often, and though I find it a bit "cute" in scientific work, I don't have a problem with it the way I would if somebody starts their prose with a quotation.

jakebeal's user avatar

I generally am not crazy about starting anything with a direct quote unless it absolutely adds value. It's easy when you use this particular trope to just use a quote for the sake of using a quote, and often it comes off as cheesy. In the context of a scientific paper I would probably shy away from doing this, unless you have some sort of reason that you want to "lighten up" your content. Honestly, I really only have found it to be done tastefully with fiction.

Tam Hartman's user avatar

It depends a lot on the purpose of the quote. If it is there to replace your own explanations and thoughts, then it is bad. However, a short quote from a respected source that emphasizes the importance or the difficulty of the problem discussed in the thesis can look way more convincing than any of your own blah-blah-blah in this respect, especially if you have something to say about the problem later that other people could not. In general, use your own words when speaking about the nitty-gritty of what you do, but refer to other people opinions, if you can, when you discuss the meta-issues like why it is worth doing, etc. In the first case, excessive quotations create an impression that you are just parroting somebody else's work, and in the second case absence of any appeal to the external opinion may make you look like if you are desperately trying to sell something no one else cares about, especially if you use a lot of standard buzzwords (another thing I would rather abstain from unless they are a part of a direct quote).

fedja's user avatar

The introduction of any document should provide the readers the importance and necessity of this work. If the quote is relevant to the importance you can use it by a presentence of yourself. Overally speaking the direct quote in academics is not generally acceptable with an exception for pioneers of that field. You can use an implication of this qoute and write in your language with a citation to the original document.

Hadi's user avatar

You must log in to answer this question.

Not the answer you're looking for browse other questions tagged citations statistics introduction ..

  • Featured on Meta
  • We've made changes to our Terms of Service & Privacy Policy - July 2024
  • Announcing a change to the data-dump process

Hot Network Questions

  • How can DC charge a capacitor?
  • Does a router lookup in routing table twice for each packet?
  • How should I acknowledge a reviewer who left a negative review?
  • How can Blowfish be resistant against differential cryptanalysis if it doesn't have S-boxes tuned for that?
  • Can I replace this resistor (in USB adapter that plugs to mains) with this other resistor?
  • Can you be resurrected while your soul is under the effect of Magic Jar?
  • What is this usage of 安全ライン?
  • Do some chemicals degrade at low temperatures?
  • ESTA renewal advice
  • A movie maybe from the 70’s about people being turned to dust out in the sunlight
  • Power harness on interconnected smoke detector
  • Why would radio-capable transhumans still vocalise to each-other?
  • Why did Herod not judge Jesus in Lk 23?
  • What is the “history“ of mindfulness
  • How 'Aristotelian' is Rawls' 'Aristotelian principle'?
  • How should Psalm 5:3 אֶֽעֱרָךְ־ best be translated
  • Was homology influenced by Euler's polyhedron formula?
  • Passport Renewls
  • Foundations and contradictions of Scholze's work: the category of presentable infinity categories contains itself
  • Why is "today" separated into "To day
  • What would the cryptographic strength of 3DES-squared be?
  • How to extract pipe remnant from ground sleeve?
  • How to play mono (or very unbalanced stereo) through both ears?
  • Glowing eyes facilitate control over other beings

research paper intro quote

Writing Studio

Who said what introducing and contextualizing quotations.

In an effort to make our handouts more accessible, we have begun converting our PDF handouts to web pages. Download this page as a PDF: Introducing and Contextualizing Quotations Return to Writing Studio Handouts

Quotations (as well as paraphrases and summaries) play an essential role in academic writing, from literary analyses to scientific research papers; they are part of a writer’s ever-important evidence, or support, for his or her argument.

But oftentimes, writers aren’t sure how to incorporate quotes and thus shove them into paragraphs without much attention to logic or style.

For better quotations (and better writing), try these tips.

Identify Clearly Where the Borrowed Material Begins

The quotation should include a signal phrase, or introductory statement, which tells the reader whom or what you are citing. The phrase may indicate the author’s name or credentials, the title of the source, and/or helpful background information.

Sample signal phrases

  • According to (author/article)
  • Author + verb

Some key verbs for signal phrases

  • says, writes, accepts, criticizes, describes, disagrees, discusses, explains, identifies, insists, offers, points out, suggests, warns

Two Signal Phrase Examples

  • According to scholar Mary Poovey, Shelley’s narrative structure, which allows the creature to speak from a first-person point of view, forces the reader “to identify with [the creature’s] anguish and frustration” (259).
  • In an introduction to Frankenstein in 1831, the author Mary Shelley describes even her own creative act with a sense of horror: “The idea so possessed my mind, that a thrill of fear ran through me, and I wished to exchange that ghastly image of my fancy for the realities around” (172).

Create Context for the Material

Don’t just plop in quotes and expect the reader to understand. Explain, expand, or refute the quote. Remember, quotations should be used to support your ideas and points.

Here’s one simple, useful pattern: Introduce quote, give quote, explain quote.

“Introduce, Give, Explain” Example 1

[Introduce] Dorianne Laux’s “Girl in the Doorway” uses many metaphors to evoke a sense of change between the mother and daughter: [Give] “I stand at the dryer, listening/through the thin wall between us, her voice/rising and falling as she describes her new life” (3-5). [Explain] The “thin wall” is literal but also references their communication barrier; “rising and falling” is the sound of the girl’s voice but also a reference to her tumultuous preteen emotions.

“Introduce, Give, Explain” Example 2 (longer block quotation)

[Introduce] After watching the cottagers with pleasure, Frankenstein’s creature has a startling moment of revelation and horror when he sees his own reflection for the first time:

[Give] I had admired the perfect forms of my cottagers — their grace, beauty, and delicate complexions: but how was I terrified, when I viewed myself in a transparent pool! At first I started back, unable to believe that it was indeed I who was reflected in the mirror; and when I became fully convinced that I was in reality the monster that I am, I was filled with the bitterest sensations of despondence and mortification. Alas! I did not yet entirely know the fatal effects of this miserable deformity. (76)

[Explain] This literal moment of reflection is key in the creature’s growing reflection of self: In comparing himself with humans, he sees himself not just as different but as “the monster that I am.”

Additional Advice

Pay attention to proper format and grammar (See VU Writing Studio handout Quotation Basics: Grammar, Punctuation, and Style ), and always, always credit your source in order to avoid plagiarism.

Citation styles (e.g. MLA, APA, or Chicago) vary by discipline. Ask your professor if you are uncertain, and then check style guides for formats. (The above examples use MLA format.)

Last revised: 06/2008 | Adapted for web delivery: 06/2021

In order to access certain content on this page, you may need to download Adobe Acrobat Reader or an equivalent PDF viewer software.

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base
  • Working with sources
  • How to Quote | Citing Quotes in APA, MLA & Chicago

How to Quote | Citing Quotes in APA, MLA & Chicago

Published on April 15, 2022 by Shona McCombes and Jack Caulfield. Revised on May 31, 2023.

Quoting means copying a passage of someone else’s words and crediting the source. To quote a source, you must ensure:

  • The quoted text is enclosed in quotation marks or formatted as a block quote
  • The original author is correctly cited
  • The text is identical to the original

The exact format of a quote depends on its length and on which citation style you are using. Quoting and citing correctly is essential to avoid plagiarism which is easy to detect with a good plagiarism checker .

How to Quote

Table of contents

How to cite a quote in apa, mla and chicago, introducing quotes, quotes within quotes, shortening or altering a quote, block quotes, when should i use quotes, other interesting articles, frequently asked questions about quoting sources.

Every time you quote, you must cite the source correctly . This looks slightly different depending on the citation style you’re using. Three of the most common styles are APA , MLA , and Chicago .

Citing a quote in APA Style

To cite a direct quote in APA , you must include the author’s last name, the year, and a page number, all separated by commas . If the quote appears on a single page, use “p.”; if it spans a page range, use “pp.”

An APA in-text citation can be parenthetical or narrative. In a parenthetical citation , you place all the information in parentheses after the quote. In a narrative citation , you name the author in your sentence (followed by the year), and place the page number after the quote.

Punctuation marks such as periods and commas are placed after the citation, not within the quotation marks .

  • Evolution is a gradual process that “can act only by very short and slow steps” (Darwin, 1859, p. 510) .
  • Darwin (1859) explains that evolution “can act only by very short and slow steps” (p. 510) .

Complete guide to APA

Citing a quote in mla style.

An MLA in-text citation includes only the author’s last name and a page number. As in APA, it can be parenthetical or narrative, and a period (or other punctuation mark) appears after the citation.

  • Evolution is a gradual process that “can act only by very short and slow steps” (Darwin 510) .
  • Darwin explains that evolution “can act only by very short and slow steps” (510) .

Complete guide to MLA

Citing a quote in chicago style.

Chicago style uses Chicago footnotes to cite sources. A note, indicated by a superscript number placed directly after the quote, specifies the author, title, and page number—or sometimes fuller information .

Unlike with parenthetical citations, in this style, the period or other punctuation mark should appear within the quotation marks, followed by the footnote number.

, 510.

Complete guide to Chicago style

Don't submit your assignments before you do this

The academic proofreading tool has been trained on 1000s of academic texts. Making it the most accurate and reliable proofreading tool for students. Free citation check included.

research paper intro quote

Try for free

Make sure you integrate quotes properly into your text by introducing them in your own words, showing the reader why you’re including the quote and providing any context necessary to understand it.  Don’t  present quotations as stand-alone sentences.

There are three main strategies you can use to introduce quotes in a grammatically correct way:

  • Add an introductory sentence
  • Use an introductory signal phrase
  • Integrate the quote into your own sentence

The following examples use APA Style citations, but these strategies can be used in all styles.

Introductory sentence

Introduce the quote with a full sentence ending in a colon . Don’t use a colon if the text before the quote isn’t a full sentence.

If you name the author in your sentence, you may use present-tense verbs , such as “states,” “argues,” “explains,” “writes,” or “reports,” to describe the content of the quote.

  • In Denmark, a recent poll shows that: “A membership referendum held today would be backed by 55 percent of Danish voters” (Levring, 2018, p. 3).
  • In Denmark, a recent poll shows that support for the EU has grown since the Brexit vote: “A membership referendum held today would be backed by 55 percent of Danish voters” (Levring, 2018, p. 3).
  • Levring (2018) reports that support for the EU has grown since the Brexit vote: “A membership referendum held today would be backed by 55 percent of Danish voters” (p. 3).

Introductory signal phrase

You can also use a signal phrase that mentions the author or source, but doesn’t form a full sentence. In this case, you follow the phrase with a comma instead of a colon.

  • According to a recent poll, “A membership referendum held today would be backed by 55 percent of Danish voters” (Levring, 2018, p. 3).
  • As Levring (2018) explains, “A membership referendum held today would be backed by 55 percent of Danish voters” (p. 3).

Integrated into your own sentence

To quote a phrase that doesn’t form a full sentence, you can also integrate it as part of your sentence, without any extra punctuation .

  • A recent poll suggests that EU membership “would be backed by 55 percent of Danish voters” in a referendum (Levring, 2018, p. 3).
  • Levring (2018) reports that EU membership “would be backed by 55 percent of Danish voters” in a referendum (p. 3).

When you quote text that itself contains another quote, this is called a nested quotation or a quote within a quote. It may occur, for example, when quoting dialogue from a novel.

To distinguish this quote from the surrounding quote, you enclose it in single (instead of double) quotation marks (even if this involves changing the punctuation from the original text). Make sure to close both sets of quotation marks at the appropriate moments.

Note that if you only quote the nested quotation itself, and not the surrounding text, you can just use double quotation marks.

  • Carraway introduces his narrative by quoting his father: “ “ Whenever you feel like criticizing anyone, ” he told me, “ just remember that all the people in this world haven’t had the advantages that you’ve had ” ” (Fitzgerald 1).
  • Carraway introduces his narrative by quoting his father: “‘Whenever you feel like criticizing anyone,’ he told me, ‘just remember that all the people in this world haven’t had the advantages that you’ve had ” (Fitzgerald 1).
  • Carraway introduces his narrative by quoting his father: “‘Whenever you feel like criticizing anyone,’ he told me, ‘just remember that all the people in this world haven’t had the advantages that you’ve had’” (Fitzgerald 1).
  • Carraway begins by quoting his father’s invocation to “remember that all the people in this world haven’t had the advantages that you’ve had” (Fitzgerald 1).

Note:  When the quoted text in the source comes from another source, it’s best to just find that original source in order to quote it directly. If you can’t find the original source, you can instead cite it indirectly .

Often, incorporating a quote smoothly into your text requires you to make some changes to the original text. It’s fine to do this, as long as you clearly mark the changes you’ve made to the quote.

Shortening a quote

If some parts of a passage are redundant or irrelevant, you can shorten the quote by removing words, phrases, or sentences and replacing them with an ellipsis (…). Put a space before and after the ellipsis.

Be careful that removing the words doesn’t change the meaning. The ellipsis indicates that some text has been removed, but the shortened quote should still accurately represent the author’s point.

Altering a quote

You can add or replace words in a quote when necessary. This might be because the original text doesn’t fit grammatically with your sentence (e.g., it’s in a different verb tense), or because extra information is needed to clarify the quote’s meaning.

Use brackets to distinguish words that you have added from words that were present in the original text.

The Latin term “ sic ” is used to indicate a (factual or grammatical) mistake in a quotation. It shows the reader that the mistake is from the quoted material, not a typo of your own.

In some cases, it can be useful to italicize part of a quotation to add emphasis, showing the reader that this is the key part to pay attention to. Use the phrase “emphasis added” to show that the italics were not part of the original text.

You usually don’t need to use brackets to indicate minor changes to punctuation or capitalization made to ensure the quote fits the style of your text.

Prevent plagiarism. Run a free check.

If you quote more than a few lines from a source, you must format it as a block quote . Instead of using quotation marks, you set the quote on a new line and indent it so that it forms a separate block of text.

Block quotes are cited just like regular quotes, except that if the quote ends with a period, the citation appears after the period.

To the end of his days Bilbo could never remember how he found himself outside, without a hat, a walking-stick or any money, or anything that he usually took when he went out; leaving his second breakfast half-finished and quite unwashed-up, pushing his keys into Gandalf’s hands, and running as fast as his furry feet could carry him down the lane, past the great Mill, across The Water, and then on for a mile or more. (16)

Avoid relying too heavily on quotes in academic writing . To integrate a source , it’s often best to paraphrase , which means putting the passage in your own words. This helps you integrate information smoothly and keeps your own voice dominant.

However, there are some situations in which quoting is more appropriate.

When focusing on language

If you want to comment on how the author uses language (for example, in literary analysis ), it’s necessary to quote so that the reader can see the exact passage you are referring to.

When giving evidence

To convince the reader of your argument, interpretation or position on a topic, it’s often helpful to include quotes that support your point. Quotes from primary sources (for example, interview transcripts or historical documents) are especially credible as evidence.

When presenting an author’s position or definition

When you’re referring to secondary sources such as scholarly books and journal articles, try to put others’ ideas in your own words when possible.

But if a passage does a great job at expressing, explaining, or defining something, and it would be very difficult to paraphrase without changing the meaning or losing the weakening the idea’s impact, it’s worth quoting directly.

If you want to know more about ChatGPT, AI tools , citation , and plagiarism , make sure to check out some of our other articles with explanations and examples.

  • ChatGPT vs human editor
  • ChatGPT citations
  • Is ChatGPT trustworthy?
  • Using ChatGPT for your studies
  • What is ChatGPT?
  • Chicago style
  • Paraphrasing
  • Critical thinking

 Plagiarism

  • Types of plagiarism
  • Self-plagiarism
  • Avoiding plagiarism
  • Academic integrity
  • Consequences of plagiarism
  • Common knowledge

A quote is an exact copy of someone else’s words, usually enclosed in quotation marks and credited to the original author or speaker.

In academic writing , there are three main situations where quoting is the best choice:

  • To analyze the author’s language (e.g., in a literary analysis essay )
  • To give evidence from primary sources
  • To accurately present a precise definition or argument

Don’t overuse quotes; your own voice should be dominant. If you just want to provide information from a source, it’s usually better to paraphrase or summarize .

Every time you quote a source , you must include a correctly formatted in-text citation . This looks slightly different depending on the citation style .

For example, a direct quote in APA is cited like this: “This is a quote” (Streefkerk, 2020, p. 5).

Every in-text citation should also correspond to a full reference at the end of your paper.

A block quote is a long quote formatted as a separate “block” of text. Instead of using quotation marks , you place the quote on a new line, and indent the entire quote to mark it apart from your own words.

The rules for when to apply block quote formatting depend on the citation style:

  • APA block quotes are 40 words or longer.
  • MLA block quotes are more than 4 lines of prose or 3 lines of poetry.
  • Chicago block quotes are longer than 100 words.

If you’re quoting from a text that paraphrases or summarizes other sources and cites them in parentheses , APA and Chicago both recommend retaining the citations as part of the quote. However, MLA recommends omitting citations within a quote:

  • APA: Smith states that “the literature on this topic (Jones, 2015; Sill, 2019; Paulson, 2020) shows no clear consensus” (Smith, 2019, p. 4).
  • MLA: Smith states that “the literature on this topic shows no clear consensus” (Smith, 2019, p. 4).

Footnote or endnote numbers that appear within quoted text should be omitted in all styles.

If you want to cite an indirect source (one you’ve only seen quoted in another source), either locate the original source or use the phrase “as cited in” in your citation.

In scientific subjects, the information itself is more important than how it was expressed, so quoting should generally be kept to a minimum. In the arts and humanities, however, well-chosen quotes are often essential to a good paper.

In social sciences, it varies. If your research is mainly quantitative , you won’t include many quotes, but if it’s more qualitative , you may need to quote from the data you collected .

As a general guideline, quotes should take up no more than 5–10% of your paper. If in doubt, check with your instructor or supervisor how much quoting is appropriate in your field.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

McCombes, S. & Caulfield, J. (2023, May 31). How to Quote | Citing Quotes in APA, MLA & Chicago. Scribbr. Retrieved July 30, 2024, from https://www.scribbr.com/working-with-sources/how-to-quote/

Is this article helpful?

Shona McCombes

Shona McCombes

Other students also liked, how to block quote | length, format and examples, how to paraphrase | step-by-step guide & examples, how to avoid plagiarism | tips on citing sources, get unlimited documents corrected.

✔ Free APA citation check included ✔ Unlimited document corrections ✔ Specialized in correcting academic texts

  • Clerc Center | PK-12 & Outreach
  • KDES | PK-8th Grade School (D.C. Metro Area)
  • MSSD | 9th-12th Grade School (Nationwide)
  • Gallaudet University Regional Centers
  • Parent Advocacy App
  • K-12 ASL Content Standards
  • National Resources
  • Youth Programs
  • Academic Bowl
  • Battle Of The Books
  • National Literary Competition
  • Youth Debate Bowl
  • Youth Esports Series
  • Bison Sports Camp
  • Discover College and Careers (DC²)
  • Financial Wizards
  • Immerse Into ASL
  • Alumni Relations
  • Alumni Association
  • Homecoming Weekend
  • Class Giving
  • Get Tickets / BisonPass
  • Sport Calendars
  • Cross Country
  • Swimming & Diving
  • Track & Field
  • Indoor Track & Field
  • Cheerleading
  • Winter Cheerleading
  • Human Resources
  • Plan a Visit
  • Request Info

research paper intro quote

  • Areas of Study
  • Accessible Human-Centered Computing
  • American Sign Language
  • Art and Media Design
  • Communication Studies
  • Criminal Justice
  • Data Science
  • Deaf Studies
  • Early Intervention Studies Graduate Programs
  • Educational Neuroscience
  • Hearing, Speech, and Language Sciences
  • Information Technology
  • International Development
  • Interpretation and Translation
  • Linguistics
  • Mathematics
  • Philosophy and Religion
  • Physical Education & Recreation
  • Public Affairs
  • Public Health
  • Sexuality and Gender Studies
  • Social Work
  • Theatre and Dance
  • World Languages and Cultures
  • B.A. in American Sign Language
  • B.A. in Biology
  • B.A. in Communication Studies
  • B.A. in Communication Studies for Online Degree Completion Program
  • B.A. in Deaf Studies
  • B.A. in Deaf Studies for Online Degree Completion Program
  • B.A. in Education with a Specialization in Early Childhood Education
  • B.A. in Education with a Specialization in Elementary Education
  • B.A. in English
  • B.A. in English for Online Degree Completion Program
  • B.A. in Government
  • B.A. in Government with a Specialization in Law
  • B.A. in History
  • B.A. in Interdisciplinary Spanish
  • B.A. in International Studies
  • B.A. in Mathematics
  • B.A. in Philosophy
  • B.A. in Psychology
  • B.A. in Psychology for Online Degree Completion Program
  • B.A. in Social Work (BSW)
  • B.A. in Sociology with a concentration in Criminology
  • B.A. in Theatre Arts: Production/Performance
  • B.A. or B.S. in Education with a Specialization in Secondary Education: Science, English, Mathematics or Social Studies
  • B.S. in Accounting
  • B.S. in Accounting for Online Degree Completion Program
  • B.S. in Biology
  • B.S. in Business Administration
  • B.S. in Business Administration for Online Degree Completion Program
  • B.S. in Data Science
  • B.S. in Information Technology
  • B.S. in Mathematics
  • B.S. in Physical Education and Recreation
  • B.S. in Public Health
  • B.S. in Risk Management and Insurance
  • General Education
  • Honors Program
  • Peace Corps Prep program
  • Self-Directed Major
  • M.A. in Counseling: Clinical Mental Health Counseling
  • M.A. in Counseling: School Counseling
  • M.A. in Deaf Education
  • M.A. in Deaf Education Studies
  • M.A. in Deaf Studies: Cultural Studies
  • M.A. in Deaf Studies: Language and Human Rights
  • M.A. in Early Childhood Education and Deaf Education
  • M.A. in Early Intervention Studies
  • M.A. in Elementary Education and Deaf Education
  • M.A. in International Development
  • M.A. in Interpretation: Combined Interpreting Practice and Research
  • M.A. in Interpretation: Interpreting Research
  • M.A. in Linguistics
  • M.A. in Secondary Education and Deaf Education
  • M.A. in Sign Language Education
  • M.S. in Accessible Human-Centered Computing
  • M.S. in Speech-Language Pathology
  • Master of Public Administration
  • Master of Social Work (MSW)
  • Au.D. in Audiology
  • Ed.D. in Transformational Leadership and Administration in Deaf Education
  • Ph.D. in Clinical Psychology
  • Ph.D. in Critical Studies in the Education of Deaf Learners
  • Ph.D. in Hearing, Speech, and Language Sciences
  • Ph.D. in Linguistics
  • Ph.D. in Translation and Interpreting Studies
  • Ph.D. Program in Educational Neuroscience (PEN)
  • Psy.D. in School Psychology
  • Individual Courses and Training
  • Fall Online Courses
  • Summer Online Courses
  • National Caregiver Certification Course
  • Certificates
  • Certificate in Sexuality and Gender Studies
  • Educating Deaf Students with Disabilities (online, post-bachelor’s)
  • American Sign Language and English Bilingual Early Childhood Deaf Education: Birth to 5 (online, post-bachelor’s)
  • Early Intervention Studies
  • Certificate in American Sign Language and English Bilingual Early Childhood Deaf Education: Birth to 5
  • Online Degree Programs
  • ODCP Minor in Communication Studies
  • ODCP Minor in Deaf Studies
  • ODCP Minor in Psychology
  • ODCP Minor in Writing
  • University Capstone Honors for Online Degree Completion Program

Quick Links

  • PK-12 & Outreach
  • NSO Schedule

Wavy Decoration

Words that introduce Quotes or Paraphrases

202.448-7036

Remember that you are required to cite your sources for paraphrases and direct quotes. For more information on MLA Style, APA style, Chicago Style, ASA Style, CSE Style, and I-Search Format, refer to our Gallaudet TIP Citations and References  link.

Words that introduce Quotes or Paraphrases are basically three keys verbs:

  • Neutral Verbs( here )
  • Stronger Verbs( here )
  • Inference Verbs( here )

Neutral Verbs: When used to introduce a quote, the following verbs basically mean “says”

Examples of Neutral Verbs

The author  says. The author  notes. The author  believes. The author  observes. The author  comments. The author  relates. The author  declares. The author  remarks. The author  discusses. The author  reports. The author  explains. The author  reveals. The author  expresses. The author  states. The author  mentions. The author  acknowledges. The author  suggests. The author  thinks. The author  points out. The author  responds. The author  shows. The author  confirms.

Sample Sentences

  • Dr. Billow  says  that being exposed to television violence at a young age desensitizes children to violence in real life (author’s last name p.##).
  • As the author  notes , “In an ideal classroom, both gifted children and learning disabled children should feel challenged” (p.##).
  • Burdow  believes  that being able to write using proper English grammar is an important skill (author’s last name p.##).
  • Dr. Patel  observes  that “most people tend to respond well to hypnotherapy” (p. ##).
  • We see this self doubt again in the second scene, when Agatha comments , “Oh, times like this I just don’t know whether I am right or wrong, good or bad” (p. ##).
  • Goeff then relates  that his childhood was “the time he learned to live on less than bread alone” (p. ##).
  • The author declares , “All people, rich or poor, should pay the same taxes to the government” (p. ##).
  • Godfried remarks , “Ignorance is a skill learned by many of the greatest fools” (author’s last name p.##).
  • The article discusses the qualities of a good American housewife in the 1950s (author’s last name p.##).
  • After the war is over, the General reports that “It seemed a useless battle to fight even from the start” (p.##).
  • Danelli explains , “All mammals have hair” (p.##).
  • The author reveals his true feelings with his ironic remark that we should “just resort to cannibalism to defeat world hunger” (p. ##).
  • Forton expresses disapproval of the American welfare system (author’s last name, year, p. ##).
  • The author states that “More than fifty percent of all marriages end in divorce” (p. ##).
  • He also mentions , “Many children grow up feeling responsible for their parents’ mistakes” (p. ##).
  • Jones acknowledges that although the divorce rate is increasing, most young children still dream of getting married (author’s last name, year, p. ##).
  • The author suggests that we hone our English skills before venturing into the work force (author’s last name, year, p. ##).
  • The author thinks that the recent weather has been too hot (author’s last name, year, p. ##).
  • Folsh points out that there were hundreds of people from varying backgrounds at the convention (author’s last name, year, p. ##).
  • Julia Hertz responded to allegations that her company was aware of the faulty tires on their cars (author’s last name, year, p. ##).
  • His research shows that 7% of Americans suffer from Social Anxiety Disorder (author’s last name, year, p. ##).
  • Jostin’s research confirmed his earlier hypothesis: mice really are smarter than rats (author’s last, year, name p. ##).

Stronger Verbs: These verbs indicate that there is some kind of argument, and that the quote shows either support of or disagreement with one side of the argument.

Examples of Stronger Verbs The author agrees . . .The author rejects . The author argues . The author compares . (the two studies) The author asserts . The author admits . The author cautions . The author disputes . The author emphasizes . The author contends . The author insists . The author denies . The author maintains . The author refutes . The author claims . The author endorses .

Sample Sentences MLA Style

  • Despite criticism, Johnston agrees that smoking should be banned in all public places (author’s last name p.##).
  • The author argues that “subjecting non-smokers to toxic second-hand smoke is not only unfair, but a violation of their right to a safe environment” (p.##).
  • Vick asserts that “cigarette smoke is unpleasant, and dangerous” (p.##).
  • The author cautions that “people who subject themselves to smoky bars night after night could develop illnesses such as emphysema or lung cancer” (p.##).
  • Rosentrhaw emphasizes that “second-hand smoke can kill” (p.##).
  • Still, tobacco company executives insist that they “were not fully aware of the long term damages caused by smoking” when they launched their nationwide advertising campaign (author’s last name p.##).
  • Though bar owners disagree, Johnston maintains that banning smoking in all public places will not negatively affect bar business (author’s last name p.##).
  • Jefferson claims that banning smoking in public places will hurt America’s economy (author’s last name p.##).
  • Johnson refutes allegations that his personal finances have been in trouble for the past five years (author’s last name, year, p. ##).
  • Whiley rejects the idea that the earth could have been formed by a massive explosion in space (author’s last name, year, p. ##).
  • Lucci compares the house prices in Maryland, Virginia, and the District of Columbia (author’s last name, year, p. ##).
  • Although they have stopped short of admitting that smoking causes cancer in humans, tobacco companies have admitted that “smoking causes cancer in laboratory rats” (p. ##).
  • For years, local residents have been disputing the plans to build a new highway right through the center of town (author’s last name, year, p. ##).
  • Residents contend that the new highway will lower property values (author’s last name, year, p. ##).
  • The Department of Transportation denies claims that the new bridge will damage the fragile ecosystem of the Potomac River (author’s last name, year, p. ##).
  • Joley endorses the bridge, saying “our goal is to make this city more accessible to those who live outside of it” (p. ##).

Inference Verbs: These verbs indicate that there is some kind of argument, and that the quote shows either support of or disagreement with one side of the argument. Examples of Inference Verbs The author implies . The author suggests . The author thinks . Sample Sentences MLA Style

  • By calling them ignorant, the author implies that they were unschooled and narrow minded (author’s last name p.##).
  • Her preoccupation with her looks suggests that she is too superficial to make her a believable character (author’s last name p.##).
  • Based on his research, we can assume Hatfield thinks that our treatment of our environment has been careless (author’s last name p.##).

One phrase that is often used to introduce a quotation is: According to the author, . . .

  • According to the author, children with ADD have a shorter attention span than children without ADD (author’s last name, year, p. ##).

202-448-7036

At a Glance

  • Quick Facts
  • University Leadership
  • History & Traditions
  • Accreditation
  • Consumer Information
  • Our 10-Year Vision: The Gallaudet Promise
  • Annual Report of Achievements (ARA)
  • The Signing Ecosystem
  • Not Your Average University

Our Community

  • Library & Archives
  • Technology Support
  • Interpreting Requests
  • Ombuds Support
  • Health and Wellness Programs
  • Profile & Web Edits

Visit Gallaudet

  • Explore Our Campus
  • Virtual Tour
  • Maps & Directions
  • Shuttle Bus Schedule
  • Kellogg Conference Hotel
  • Welcome Center
  • National Deaf Life Museum
  • Apple Guide Maps

Engage Today

  • Work at Gallaudet / Clerc Center
  • Social Media Channels
  • University Wide Events
  • Sponsorship Requests
  • Data Requests
  • Media Inquiries
  • Gallaudet Today Magazine
  • Giving at Gallaudet
  • Financial Aid
  • Registrar’s Office
  • Residence Life & Housing
  • Safety & Security
  • Undergraduate Admissions
  • Graduate Admissions
  • University Communications
  • Clerc Center

Gallaudet Logo

Gallaudet University, chartered in 1864, is a private university for deaf and hard of hearing students.

Copyright © 2024 Gallaudet University. All rights reserved.

  • Accessibility
  • Cookie Consent Notice
  • Privacy Policy
  • File a Report

800 Florida Avenue NE, Washington, D.C. 20002

research paper intro quote

How to Write a Research Proposal: (with Examples & Templates)

how to write a research proposal

Table of Contents

Before conducting a study, a research proposal should be created that outlines researchers’ plans and methodology and is submitted to the concerned evaluating organization or person. Creating a research proposal is an important step to ensure that researchers are on track and are moving forward as intended. A research proposal can be defined as a detailed plan or blueprint for the proposed research that you intend to undertake. It provides readers with a snapshot of your project by describing what you will investigate, why it is needed, and how you will conduct the research.  

Your research proposal should aim to explain to the readers why your research is relevant and original, that you understand the context and current scenario in the field, have the appropriate resources to conduct the research, and that the research is feasible given the usual constraints.  

This article will describe in detail the purpose and typical structure of a research proposal , along with examples and templates to help you ace this step in your research journey.  

What is a Research Proposal ?  

A research proposal¹ ,²  can be defined as a formal report that describes your proposed research, its objectives, methodology, implications, and other important details. Research proposals are the framework of your research and are used to obtain approvals or grants to conduct the study from various committees or organizations. Consequently, research proposals should convince readers of your study’s credibility, accuracy, achievability, practicality, and reproducibility.   

With research proposals , researchers usually aim to persuade the readers, funding agencies, educational institutions, and supervisors to approve the proposal. To achieve this, the report should be well structured with the objectives written in clear, understandable language devoid of jargon. A well-organized research proposal conveys to the readers or evaluators that the writer has thought out the research plan meticulously and has the resources to ensure timely completion.  

Purpose of Research Proposals  

A research proposal is a sales pitch and therefore should be detailed enough to convince your readers, who could be supervisors, ethics committees, universities, etc., that what you’re proposing has merit and is feasible . Research proposals can help students discuss their dissertation with their faculty or fulfill course requirements and also help researchers obtain funding. A well-structured proposal instills confidence among readers about your ability to conduct and complete the study as proposed.  

Research proposals can be written for several reasons:³  

  • To describe the importance of research in the specific topic  
  • Address any potential challenges you may encounter  
  • Showcase knowledge in the field and your ability to conduct a study  
  • Apply for a role at a research institute  
  • Convince a research supervisor or university that your research can satisfy the requirements of a degree program  
  • Highlight the importance of your research to organizations that may sponsor your project  
  • Identify implications of your project and how it can benefit the audience  

What Goes in a Research Proposal?    

Research proposals should aim to answer the three basic questions—what, why, and how.  

The What question should be answered by describing the specific subject being researched. It should typically include the objectives, the cohort details, and the location or setting.  

The Why question should be answered by describing the existing scenario of the subject, listing unanswered questions, identifying gaps in the existing research, and describing how your study can address these gaps, along with the implications and significance.  

The How question should be answered by describing the proposed research methodology, data analysis tools expected to be used, and other details to describe your proposed methodology.   

Research Proposal Example  

Here is a research proposal sample template (with examples) from the University of Rochester Medical Center. 4 The sections in all research proposals are essentially the same although different terminology and other specific sections may be used depending on the subject.  

Research Proposal Template

Structure of a Research Proposal  

If you want to know how to make a research proposal impactful, include the following components:¹  

1. Introduction  

This section provides a background of the study, including the research topic, what is already known about it and the gaps, and the significance of the proposed research.  

2. Literature review  

This section contains descriptions of all the previous relevant studies pertaining to the research topic. Every study cited should be described in a few sentences, starting with the general studies to the more specific ones. This section builds on the understanding gained by readers in the Introduction section and supports it by citing relevant prior literature, indicating to readers that you have thoroughly researched your subject.  

3. Objectives  

Once the background and gaps in the research topic have been established, authors must now state the aims of the research clearly. Hypotheses should be mentioned here. This section further helps readers understand what your study’s specific goals are.  

4. Research design and methodology  

Here, authors should clearly describe the methods they intend to use to achieve their proposed objectives. Important components of this section include the population and sample size, data collection and analysis methods and duration, statistical analysis software, measures to avoid bias (randomization, blinding), etc.  

5. Ethical considerations  

This refers to the protection of participants’ rights, such as the right to privacy, right to confidentiality, etc. Researchers need to obtain informed consent and institutional review approval by the required authorities and mention this clearly for transparency.  

6. Budget/funding  

Researchers should prepare their budget and include all expected expenditures. An additional allowance for contingencies such as delays should also be factored in.  

7. Appendices  

This section typically includes information that supports the research proposal and may include informed consent forms, questionnaires, participant information, measurement tools, etc.  

8. Citations  

research paper intro quote

Important Tips for Writing a Research Proposal  

Writing a research proposal begins much before the actual task of writing. Planning the research proposal structure and content is an important stage, which if done efficiently, can help you seamlessly transition into the writing stage. 3,5  

The Planning Stage  

  • Manage your time efficiently. Plan to have the draft version ready at least two weeks before your deadline and the final version at least two to three days before the deadline.
  • What is the primary objective of your research?  
  • Will your research address any existing gap?  
  • What is the impact of your proposed research?  
  • Do people outside your field find your research applicable in other areas?  
  • If your research is unsuccessful, would there still be other useful research outcomes?  

  The Writing Stage  

  • Create an outline with main section headings that are typically used.  
  • Focus only on writing and getting your points across without worrying about the format of the research proposal , grammar, punctuation, etc. These can be fixed during the subsequent passes. Add details to each section heading you created in the beginning.   
  • Ensure your sentences are concise and use plain language. A research proposal usually contains about 2,000 to 4,000 words or four to seven pages.  
  • Don’t use too many technical terms and abbreviations assuming that the readers would know them. Define the abbreviations and technical terms.  
  • Ensure that the entire content is readable. Avoid using long paragraphs because they affect the continuity in reading. Break them into shorter paragraphs and introduce some white space for readability.  
  • Focus on only the major research issues and cite sources accordingly. Don’t include generic information or their sources in the literature review.  
  • Proofread your final document to ensure there are no grammatical errors so readers can enjoy a seamless, uninterrupted read.  
  • Use academic, scholarly language because it brings formality into a document.  
  • Ensure that your title is created using the keywords in the document and is neither too long and specific nor too short and general.  
  • Cite all sources appropriately to avoid plagiarism.  
  • Make sure that you follow guidelines, if provided. This includes rules as simple as using a specific font or a hyphen or en dash between numerical ranges.  
  • Ensure that you’ve answered all questions requested by the evaluating authority.  

Key Takeaways   

Here’s a summary of the main points about research proposals discussed in the previous sections:  

  • A research proposal is a document that outlines the details of a proposed study and is created by researchers to submit to evaluators who could be research institutions, universities, faculty, etc.  
  • Research proposals are usually about 2,000-4,000 words long, but this depends on the evaluating authority’s guidelines.  
  • A good research proposal ensures that you’ve done your background research and assessed the feasibility of the research.  
  • Research proposals have the following main sections—introduction, literature review, objectives, methodology, ethical considerations, and budget.  

research paper intro quote

Frequently Asked Questions  

Q1. How is a research proposal evaluated?  

A1. In general, most evaluators, including universities, broadly use the following criteria to evaluate research proposals . 6  

  • Significance —Does the research address any important subject or issue, which may or may not be specific to the evaluator or university?  
  • Content and design —Is the proposed methodology appropriate to answer the research question? Are the objectives clear and well aligned with the proposed methodology?  
  • Sample size and selection —Is the target population or cohort size clearly mentioned? Is the sampling process used to select participants randomized, appropriate, and free of bias?  
  • Timing —Are the proposed data collection dates mentioned clearly? Is the project feasible given the specified resources and timeline?  
  • Data management and dissemination —Who will have access to the data? What is the plan for data analysis?  

Q2. What is the difference between the Introduction and Literature Review sections in a research proposal ?  

A2. The Introduction or Background section in a research proposal sets the context of the study by describing the current scenario of the subject and identifying the gaps and need for the research. A Literature Review, on the other hand, provides references to all prior relevant literature to help corroborate the gaps identified and the research need.  

Q3. How long should a research proposal be?  

A3. Research proposal lengths vary with the evaluating authority like universities or committees and also the subject. Here’s a table that lists the typical research proposal lengths for a few universities.  

     
  Arts programs  1,000-1,500 
University of Birmingham  Law School programs  2,500 
  PhD  2,500 
    2,000 
  Research degrees  2,000-3,500 

Q4. What are the common mistakes to avoid in a research proposal ?  

A4. Here are a few common mistakes that you must avoid while writing a research proposal . 7  

  • No clear objectives: Objectives should be clear, specific, and measurable for the easy understanding among readers.  
  • Incomplete or unconvincing background research: Background research usually includes a review of the current scenario of the particular industry and also a review of the previous literature on the subject. This helps readers understand your reasons for undertaking this research because you identified gaps in the existing research.  
  • Overlooking project feasibility: The project scope and estimates should be realistic considering the resources and time available.   
  • Neglecting the impact and significance of the study: In a research proposal , readers and evaluators look for the implications or significance of your research and how it contributes to the existing research. This information should always be included.  
  • Unstructured format of a research proposal : A well-structured document gives confidence to evaluators that you have read the guidelines carefully and are well organized in your approach, consequently affirming that you will be able to undertake the research as mentioned in your proposal.  
  • Ineffective writing style: The language used should be formal and grammatically correct. If required, editors could be consulted, including AI-based tools such as Paperpal , to refine the research proposal structure and language.  

Thus, a research proposal is an essential document that can help you promote your research and secure funds and grants for conducting your research. Consequently, it should be well written in clear language and include all essential details to convince the evaluators of your ability to conduct the research as proposed.  

This article has described all the important components of a research proposal and has also provided tips to improve your writing style. We hope all these tips will help you write a well-structured research proposal to ensure receipt of grants or any other purpose.  

References  

  • Sudheesh K, Duggappa DR, Nethra SS. How to write a research proposal? Indian J Anaesth. 2016;60(9):631-634. Accessed July 15, 2024. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5037942/  
  • Writing research proposals. Harvard College Office of Undergraduate Research and Fellowships. Harvard University. Accessed July 14, 2024. https://uraf.harvard.edu/apply-opportunities/app-components/essays/research-proposals  
  • What is a research proposal? Plus how to write one. Indeed website. Accessed July 17, 2024. https://www.indeed.com/career-advice/career-development/research-proposal  
  • Research proposal template. University of Rochester Medical Center. Accessed July 16, 2024. https://www.urmc.rochester.edu/MediaLibraries/URMCMedia/pediatrics/research/documents/Research-proposal-Template.pdf  
  • Tips for successful proposal writing. Johns Hopkins University. Accessed July 17, 2024. https://research.jhu.edu/wp-content/uploads/2018/09/Tips-for-Successful-Proposal-Writing.pdf  
  • Formal review of research proposals. Cornell University. Accessed July 18, 2024. https://irp.dpb.cornell.edu/surveys/survey-assessment-review-group/research-proposals  
  • 7 Mistakes you must avoid in your research proposal. Aveksana (via LinkedIn). Accessed July 17, 2024. https://www.linkedin.com/pulse/7-mistakes-you-must-avoid-your-research-proposal-aveksana-cmtwf/  

Paperpal is a comprehensive AI writing toolkit that helps students and researchers achieve 2x the writing in half the time. It leverages 21+ years of STM experience and insights from millions of research articles to provide in-depth academic writing, language editing, and submission readiness support to help you write better, faster.  

Get accurate academic translations, rewriting support, grammar checks, vocabulary suggestions, and generative AI assistance that delivers human precision at machine speed. Try for free or upgrade to Paperpal Prime starting at US$19 a month to access premium features, including consistency, plagiarism, and 30+ submission readiness checks to help you succeed.  

Experience the future of academic writing – Sign up to Paperpal and start writing for free!  

Related Reads:

  • How to Paraphrase Research Papers Effectively
  • How to Cite Social Media Sources in Academic Writing? 
  • What is the Importance of a Concept Paper and How to Write It 

APA format: Basic Guide for Researchers

The future of academia: how ai tools are changing the way we do research, you may also like, the ai revolution: authors’ role in upholding academic..., the future of academia: how ai tools are..., how to choose a dissertation topic, how to write a phd research proposal, how to write an academic paragraph (step-by-step guide), five things authors need to know when using..., 7 best referencing tools and citation management software..., maintaining academic integrity with paperpal’s generative ai writing..., research funding basics: what should a grant proposal....

popular searches

  • Request Info

Suggested Ways to Introduce Quotations

When you quote another writer's words, it's best to introduce or contextualize the quote. 

How To Quote In An Essay?

To introduce a quote in an essay, don't forget to include author's last name and page number (MLA) or author, date, and page number (APA) in your citation. Shown below are some possible ways to introduce quotations. The examples use MLA format.

Use A Full Sentence Followed by A Colon To Introduce A Quotation

  • The setting emphasizes deception: "Nothing is as it appears" (Smith 1).
  • Piercy ends the poem on an ironic note: "To every woman a happy ending" (25).

Begin A Sentence with Your Own Words, Then Complete It with Quoted Words

Note that in the second example below, a slash with a space on either side ( / ) marks a line break in the original poem.

  • Hamlet's task is to avenge a "foul and most unnatural murder" (Shakespeare 925).
  • The speaker is mystified by her sleeping baby, whose "moth-breath / flickers among the flat pink roses" (Plath 17).

Use An Introductory Phrase Naming The Source, Followed By A Comma to Quote A Critic or Researcher

Note that the first letter after the quotation marks should be upper case. According to MLA guidelines, if you change the case of a letter from the original, you must indicate this with brackets. APA format doesn't require brackets.

  • According to Smith, "[W]riting is fun" (215).
  • In Smith's words, " . . .
  • In Smith's view, " . . .

Use A Descriptive Verb, Followed by A Comma To Introduce A Critic's Words

Avoid using says unless the words were originally spoken aloud, for instance, during an interview.

  • Smith states, "This book is terrific" (102).
  • Smith remarks, " . . .
  • Smith writes, " . . .
  • Smith notes, " . . .
  • Smith comments, " . . .
  • Smith observes, " . . .
  • Smith concludes, " . . .
  • Smith reports, " . . .
  • Smith maintains, " . . .
  • Smith adds, " . . .

Don't Follow It with A Comma If Your Lead into The Quotation Ends in That or As

The first letter of the quotation should be lower case.

  • Smith points out that "millions of students would like to burn this book" (53).
  • Smith emphasizes that " . . .
  • Smith interprets the hand washing in MacBeth as "an attempt at absolution" (106).
  • Smith describes the novel as "a celebration of human experience" (233).

Other Writing Resources

Enhance your academic writing skills by exploring our additional writing resources that will help you craft compelling essays, research papers, and more.

Student sitting at a table, discussing the information in a booklet in front of her with her advisor.

Your New Chapter Starts Wherever You Are

The support you need and education you deserve, no matter where life leads.

because a future built by you is a future built for you.

Too many people have been made to feel that higher education isn’t a place for them— that it is someone else’s dream. But we change all that. With individualized attention and ongoing support, we help you write a new story for the future where you play the starring role.

Quote Explanation Generator + Guide & Tips

  • 👁️‍🗨️ Intro to Generator
  • 👨‍🏫 Benefits of Quotations
  • ✨ How to Perform a Quote Analysis

🔎 Quotation Types & Examples

  • 🏆 Examples of Quote Analysis

🔗 References

👁️‍🗨️ intro to our quote explanation generator.

A quotation is a direct repetition of someone else's words, enclosed within quotation marks, and attributed to the original author. You can use it to provide evidence, support arguments , or add authority to writing or speech.

Quote analysis involves examining a quotation's context , intended meaning , and implications . You should go beyond surface-level understanding, exploring the underlying ideas, emotions, and significance conveyed by the selected words.

Quotations hold the power to inspire, inform, and challenge us. Thus, we created a quote explanation generator – the ultimate tool for unraveling the depths of any saying! Whether you picked a random quote that sparked your curiosity or you have an analysis assignment, it’ll provide insightful explanations for each phrase.

3 Key Reasons to Try Our Quotation Explanation Generator

Sometimes, quotes have double meanings or a thick layer of context that’s difficult to get through. Our quote explanation generator lets you get to the heart of the matter without spending hours on this process. Several factors make our app a valuable tool in your academic pursuits:

⌚ Instant Analysis. The tool conducts each step of the quotation analysis at the speed of light as it uses .
🤗 User-Friendly Interface. We tailored the tool to have the most straightforward user interface students can turn to any time they wish.
💡 Overcome Writer’s Block. Our quotation explanation generator can give you an extra boost of creativity that will help you finish a paper much faster.

👨‍🏫 7 Benefits of Quotations in Your Writing

To master the art of writing papers, you need a lot of practice. Producing volumes of bland text isn’t enough. Your work should be interesting to follow, and quotes are one of the best tools to improve the quality of your written assignments.

  • Better persuasion . Any piece should ultimately get the reader to accept your argument. You can achieve this goal by adding the most impactful quotes in the body of the text.
  • Connect text and sources . Sometimes your writing doesn’t connect data from cited sources to the ideas discussed in its paragraphs. A good quote or two can bridge this gap.
  • Enhance your writing . Quotes can make your writing more subtle and creative. This way, the text becomes more interesting and less monotonous.
  • Larger context . Adding quotes lets you better evaluate and discuss the topic . They expand on the context and give essential details to your piece.
  • More credibility . Having quotes in your paper makes it more credible. It helps readers see that your arguments aren’t based merely on your opinions.
  • A touch of sophistication . Quotes are a great way to make your paper more sophisticated. It will lead to higher grades from your professors.
  • Stronger arguments . Adding quotations lets you better introduce arguments and analyze ideas.

✨ How to Perform a Good Quote Analysis

Finding the core meaning of sayings can enrich your written assignment. In this segment, we’ve prepared a step-by-step guide to analyze quotes and uncover their potential. These guidelines will help you evaluate any quotes you can come across.

  • Step 1 . Select a suitable quotation. Add the ones relevant to your paper’s subject. It should relate to either a person or an argument. Better use short citations to save time on analysis.
  • Step 2 . Identify literary devices. Take a look at the choice of words in the quote. Pay attention to personifications, similes, metaphors, rhythm, alterations, and other devices . It helps better understand what the author tried to say.
  • Step 3 . Establish the effect of the quote. Assess how the literary devices and additional techniques affect the quote. Experiment with different interpretations and consider which one is the most convincing.
  • Step 4 . Find out the author’s intent behind the quote. Additionally, consider why the author chose to phrase their words in a certain way. For example, state if the quote has multiple meanings to keep it ambiguous deliberately.

During your detailed research , you might come across different kinds of quotes. There are three common ways of formatting a quotation in a piece of writing. We’ve decided to make a short guide that helps differentiate between them:

📖 Direct quotation. They represent a person’s own words. These are used when quoting someone else directly. According to the eyewitness, “It happened so quickly; we didn’t even have the time to react.”
✍️ Indirect quotation. Such quotes report what was said or written. Some of the words can be altered or omitted. The eyewitness said it happened so quickly there was no time to react.
🪄 Integrated quotation. In this case, the quoted content becomes a part of a sentence. It can serve many grammatical purposes. Oscar Wilde once wrote that a person “can resist everything except temptation.”

Integrating Quotations into Sentences

Knowing how to correctly incorporate quotes into your text is an essential skill that helps improve the value of your work. This section contains practical tips that make this process easier and enables you to create more credible writing.

  • Introduce the quote with a colon and a complete sentence. It’s the most common use of a quotation that involves marks at the start and end of a quote. For example, in the words of Shakespeare: “ We know what we are, but know not what we may be .”
  • Provide an explanatory or introductory phrase separate from the quotation with a comma. Here, you give a bit of context to explain a quote better. In The Picture of Dorian Gray, Oscar Wilde writes, “ Nowadays people know the price of everything and the value of nothing .”
  • Make the quote a part of your sentence without additional punctuation . In this instance, the quotation becomes a part of your writing. For example, Dostoevsky argues that “ conscience without God is a horror. It can get lost to the most immoral .”
  • Use several words from the quote in your sentence. It works the same as the previous example, although with smaller bits of the quotation. Duma’s advice to “ wait and hope ” runs throughout The Count of Monte Cristo .

🏆 Great Example of Quote Analysis

Here we've prepared an analysis of Les Brown's inspirational quote, “Shoot for the moon. Even if you miss, you will land among the stars.” This sample can inspire you to write your own essay on quote analysis; take advantage of our generator and tips!

🧩 Literary Devices. The quote uses to compare aiming for the moon, setting ambitious goals, and landing among the stars to achieve significant success. Likewise, it evokes vivid mental images of shooting toward the moon and landing among the stars, creating a sense of grandeur and aspiration. Also, the quote includes hyperbole, as landing among the stars after missing the moon is an exaggeration to emphasize achieving remarkable things despite not reaching the primary goal.
🎭 Quotation Effect. This is a motivational quote and therefore creates an uplifting effect. It encourages individuals to and pursue their dreams without fear of failure. Using vivid imagery and hyperbole instills a sense of wonder and aspiration, pushing people to aim high and strive for greatness. The parallel structure reinforces that even if one's primary objective is not achieved, there will still be valuable and extraordinary accomplishments.
✒️ Author's Intent. intent behind the quote is to inspire individuals to overcome self-doubt and take risks in pursuing life's success. By urging people to “shoot for the moon,” he encourages them to be daring, set audacious goals, and be unafraid of failure. The quote reflects Brown's belief in the power of ambition and determination. He intends to convey that the journey will lead to significant achievements and if you do not reach your ultimate goal. The quote serves as a reminder that taking action and striving for greatness can lead to unexpected and fulfilling outcomes, ultimately motivating readers to pursue their dreams with enthusiasm and perseverance.

We did our best to provide a comprehensive guide on quote analysis. After you check out our specialized tool, please take a look at our FAQ section. Lastly, if you need help analyzing other types of text, try our rhetorical analyzer .

❓ Explain the Quote Generator – FAQ

Updated: Oct 25th, 2023

  • Suggested Ways to Introduce Quotations. – Columbia College
  • Integrating a Quotation into an Essay. – Kelly Johnson, Ursinus College
  • 5 Steps to Quote Analysis. – Luna Laliberte, Rutgers, The State University of New Jersey
  • Quotation Analysis for Source Use. – Amanda Hardman, Red Rocks Community College, RRCC
  • Working with Quotations. – Empire State University
  • Quote Integration. – University of Nevada, Reno
  • Free Essays
  • Writing Tools
  • Lit. Guides
  • Donate a Paper
  • Q&A by Experts
  • Referencing Guides
  • Free Textbooks
  • Tongue Twisters
  • Editorial Policy
  • Job Openings
  • Video Contest
  • Writing Scholarship
  • Discount Codes
  • Brand Guidelines
  • IvyPanda Shop
  • Online Courses
  • Terms and Conditions
  • Privacy Policy
  • Cookies Policy
  • Copyright Principles
  • DMCA Request
  • Service Notice

This page contains our free quote explainer generator. Discover the hidden meaning of famous sayings. This tool will also be helpful if you need to uncover literary devices. As a bonus, we have compiled a guide about quotations and their use in academic papers.

  • Search Search Please fill out this field.

What Is a White Paper?

Purpose of a white paper, how to write a white paper, the bottom line.

  • Trading Strategies

White Paper: Types, Purpose, and How to Write One

Adam Hayes, Ph.D., CFA, is a financial writer with 15+ years Wall Street experience as a derivatives trader. Besides his extensive derivative trading expertise, Adam is an expert in economics and behavioral finance. Adam received his master's in economics from The New School for Social Research and his Ph.D. from the University of Wisconsin-Madison in sociology. He is a CFA charterholder as well as holding FINRA Series 7, 55 & 63 licenses. He currently researches and teaches economic sociology and the social studies of finance at the Hebrew University in Jerusalem.

research paper intro quote

Thomas J Catalano is a CFP and Registered Investment Adviser with the state of South Carolina, where he launched his own financial advisory firm in 2018. Thomas' experience gives him expertise in a variety of areas including investments, retirement, insurance, and financial planning.

research paper intro quote

Investopedia / Michela Buttignol

A white paper is an informational document issued by a company or not-for-profit organization to promote or highlight the features of a solution, product, or service that it offers or plans to offer.

White papers are also used as a method of presenting government policies and legislation and gauging public opinion.

Key Takeaways

  • A white paper promotes a certain product, service, or methodology to influence current and prospective customer or investor decisions.
  • Three main types of white papers include backgrounders, numbered lists, and problem/solution white papers.
  • A white paper provides persuasive and factual evidence that a particular offering is a superior product or method of solving a problem.
  • White papers are commonly designed for business-to-business (B2B) marketing purposes between a manufacturer and a wholesaler, or between a wholesaler and a retailer.

White papers are sales and marketing documents used to entice or persuade potential customers to learn more about a particular product, service, technology, or methodology.

White papers are commonly designed for business-to-business (B2B) marketing purposes between a manufacturer and a wholesaler , or between a wholesaler and a retailer. It can provide an in-depth report or guide about a specific product or topic and is meant to educate its readers.

The facts presented in white papers are often backed by research and statistics from reliable sources and can include charts, graphs, tables, and other ways of visualizing data. A white paper can communicate an organization’s philosophy or present research findings related to an industry.

Types of White Papers

A startup , large corporation, or government agency will use white papers differently. There are three main types of white papers: backgrounders, numbered lists, and problem/solution white papers.

Backgrounders detail the technical features of a new product or service. Designed to simplify complicated technical information, they are used to:

  • Support a technical evaluation
  • Launch a product
  • Promote a product or industry leader

Numbered lists highlight the key takeaways of a new product or service, and are often formatted with headings and bullet points such as the following familiar format:

  • 3 Questions to Ask
  • 5 Things You Need to Know

Problem/solution papers identify specific problems faced by potential customers and suggest a data-driven argument about how a featured product or service provides a solution to:

  • Generate new sales
  • Educate salespeople on product characteristics
  • Build industry interest

White papers differ from other marketing materials, such as brochures. Brochures and traditional marketing materials might be flashy and obvious, but a white paper is intended to provide persuasive and factual evidence that solves a problem or challenge.

White papers are commonly at least 2,500 words in length and written in an academic style.

A white paper should provide well-researched information that is not found with a simple Internet search and has a compelling narrative to keep the reader’s attention. The author of a white paper should:

  • Research and fully define the topic
  • Create an accurate outline of information
  • Write an attention-grabbing introduction
  • Format the paper for easy reading
  • Revise and proofread

What Is an Example of a White Paper?

All of the documents listed below, publicly available on Microsoft’s website, focus on aspects of the company’s suite of cloud services. In contrast with brochures, these white papers don’t have a clear sales pitch. Instead, they dive into relevant topics, such as cloud security, hybrid clouds, and the economic benefits of adopting cloud computing.

  • Digital Transformation and the Art of the Possible
  • Harvard Business Review Analytic Services: Drive Agility and Innovation with ERP in the Cloud
  • IDC: The Business Value of Migrating and Modernizing with Azure

How Have New Industries Used White Papers?

Cryptocurrencies have also been known to publish white papers during initial coin offerings (ICOs) and frequently issued white papers to entice users and “investors” to their projects.

Bitcoin famously launched a few months after the pseudonymous Satoshi Nakamoto issued its famous white paper online in October 2008.

Why Is It Called a White Paper?

White papers may have developed from the use of “Blue Papers” in 19th century Britain, where a Parliament report cover was blue. When a topic for the government was less serious, the blue cover was discarded and published with white covers. These reports were called White Papers. In the United States, the use of government white papers often means a background report or guidance on a specific issue.

A white paper is an informational document issued by a company, government agency, or not-for-profit organization to promote the features of a solution, product, or service that it offers or plans to offer. The facts presented in white papers are often backed by research and statistics from reliable sources and are commonly written in one of three formats: backgrounders, numbered lists, and problem/solution papers.

Bitcoin.org. " Bitcoin: A Peer-to-Peer Electronic Cash System ."

Michigan State University. " Finding British Parliamentary Papers in the M.S.U. Libraries, Collections Guide No. 6 (Advanced): Parliamentary, or Sessional Papers--Discussion ."

research paper intro quote

  • Terms of Service
  • Editorial Policy
  • Privacy Policy
  • Open access
  • Published: 29 July 2024

Predicting hospital length of stay using machine learning on a large open health dataset

  • Raunak Jain 1 ,
  • Mrityunjai Singh 1 ,
  • A. Ravishankar Rao 2 &
  • Rahul Garg 1  

BMC Health Services Research volume  24 , Article number:  860 ( 2024 ) Cite this article

227 Accesses

Metrics details

Governments worldwide are facing growing pressure to increase transparency, as citizens demand greater insight into decision-making processes and public spending. An example is the release of open healthcare data to researchers, as healthcare is one of the top economic sectors. Significant information systems development and computational experimentation are required to extract meaning and value from these datasets. We use a large open health dataset provided by the New York State Statewide Planning and Research Cooperative System (SPARCS) containing 2.3 million de-identified patient records. One of the fields in these records is a patient’s length of stay (LoS) in a hospital, which is crucial in estimating healthcare costs and planning hospital capacity for future needs. Hence it would be very beneficial for hospitals to be able to predict the LoS early. The area of machine learning offers a potential solution, which is the focus of the current paper.

We investigated multiple machine learning techniques including feature engineering, regression, and classification trees to predict the length of stay (LoS) of all the hospital procedures currently available in the dataset. Whereas many researchers focus on LoS prediction for a specific disease, a unique feature of our model is its ability to simultaneously handle 285 diagnosis codes from the Clinical Classification System (CCS). We focused on the interpretability and explainability of input features and the resulting models. We developed separate models for newborns and non-newborns.

The study yields promising results, demonstrating the effectiveness of machine learning in predicting LoS. The best R 2 scores achieved are noteworthy: 0.82 for newborns using linear regression and 0.43 for non-newborns using catboost regression. Focusing on cardiovascular disease refines the predictive capability, achieving an improved R 2 score of 0.62. The models not only demonstrate high performance but also provide understandable insights. For instance, birth-weight is employed for predicting LoS in newborns, while diagnostic-related group classification proves valuable for non-newborns.

Our study showcases the practical utility of machine learning models in predicting LoS during patient admittance. The emphasis on interpretability ensures that the models can be easily comprehended and replicated by other researchers. Healthcare stakeholders, including providers, administrators, and patients, stand to benefit significantly. The findings offer valuable insights for cost estimation and capacity planning, contributing to the overall enhancement of healthcare management and delivery.

Peer Review reports

Introduction

Democratic governments worldwide are placing an increasing importance on transparency, as this leads to better governance, market efficiency, improvement, and acceptance of government policies. This is highlighted by reports from the Organization for Economic Co-operation and Development (OECD) an international organization whose mission it is to shape policies that foster prosperity, equality, opportunity and well-being for all [ 1 ]. Openness and transparency have been recognized as pillars for democracy, and also for fostering sustainable development goals [ 2 ], which is a major focus of the United Nations ( https://sustainabledevelopment.un.org/sdg16 ).

An important government function is to provide for the healthcare needs of its citizens. The U.S. spends about $3.6 trillion a year on healthcare, which represents 18% of its GDP [ 3 ]. Other developed nations spend around 10% of their GDP on healthcare. The percentage of GDP spent on healthcare is rising as populations age. Consequently, research on healthcare expenditure and patient outcomes is crucial to maintain viable national economies. It is advantageous for nations to combine investigations by the private sector, government sector, non-profit agencies, and universities to find the best solutions. A promising path is to make health data open, which allows investigators from all sectors to participate and contribute their expertise. Though there are obvious patient privacy concerns, open health data has been made available by organizations such as New York State Statewide Planning and Research Cooperative System (SPARCS) [ 4 ].

Once the data is made available, it needs to be suitably processed to extract meaning and insights that will help healthcare providers and patients. We favor the creation and use of an open-source analytics system so that the entire research community can benefit from the effort [ 5 , 6 , 7 ]. As a concrete demonstration of the utility of our system and approach, we revealed that there is a growing incidence of mental health issues amongst adolescents in specific counties in New York State [ 8 ]. This has resulted in targeted interventions to address these problems in these communities [ 8 ]. Knowing where the problems lie allows policymakers and funding agencies to direct resources where needed.

Healthcare in the U.S. is largely provided through private insurance companies and it is difficult for patients to reliably understand what their expected healthcare costs are [ 9 , 10 ]. It is ironic that consumers can readily find prices of electronics items, books, clothes etc. online, but cannot find information about healthcare as easily. The availability of healthcare information including costs, incidence of diseases, and the expected length of stay for different procedures will allow consumers and patients to make better and more informed choices. For instance, in the U.S., patients can budget pre-tax contributions to health savings accounts, or decide when to opt for an elective surgery based on the expected duration of that procedure.

To achieve this capability, it is essential to have the underlying data and models that interpret the data. Our goal in this paper is twofold: (a) to demonstrate how to design an analytics system that works with open health data and (b) to apply it to a problem of interest to both healthcare providers and patients. Significant advances have been made recently in the fields of data mining, machine-learning and artificial intelligence, with growing applications in healthcare [ 11 ]. To make our work concrete, we use our machine-learning system to predict the length of stay (LoS) in hospitals given the patient information in the open healthcare data released by New York State SPARCS [ 4 ].

The LoS is an important variable in determining healthcare costs, as costs directly increase for longer stays. The analysis by Jones [ 12 ] shows that the trends in LoS, hospital bed capacity and population growth have to be carefully analyzed for capacity planning and to ensure that adequate healthcare can be provided in the future. With certain health conditions such as cardiovascular disease, the hospital LoS is expected to increase due to the aging of the population in many countries worldwide [ 13 ]. During the COVID-19 pandemic, hospital bed capacity became a critical issue [ 14 ], and many regions in the world experienced a shortage of healthcare resources. Hence it is desirable to have models that can predict the LoS for a variety of diseases from available patient data.

The LoS is usually unknown at the time a patient is admitted. Hence, the objective of our research is to investigate whether we can predict the patient LoS from variables collected at the time of admission. By building a predictive model through machine learning techniques, we demonstrate that it is possible to predict the LoS from data that includes the Clinical Classifications Software (CCS) diagnosis code, severity of illness, and the need for surgery. We investigate several analytics techniques including feature selection, feature encoding, feature engineering, model selection, and model training in order to thoroughly explore the choices that affect eventual model performance. By using a linear regression model, we obtain an R 2 value of 0.42 when we predict the LoS from a set of 23 patient features. The success of our model will be beneficial to healthcare providers and policymakers for capacity planning purposes and to understand how to control healthcare costs. Patients and consumers can also use our model to estimate the LoS for procedures they are undergoing or for planning elective surgeries.

Stone et al. [ 15 ] present a survey of techniques used to predict the LoS, which include statistical and arithmetic methods, intelligent data mining approaches and operations-research based methods. Lequertier et al. [ 16 ] surveyed methods for LoS prediction.

The main gap in the literature is that most methods focus on analyzing trends in the LoS or predicting the LoS only for specific conditions or restrict their analysis to data from specific hospitals. For instance, Sridhar et al. [ 17 ] created a model to predict the LoS for joint replacements in rural hospitals in the state of Montana by using a training set with 127 patients and a test set with 31 patients. In contrast, we have developed our model to predict the LoS for 285 different CCS diagnosis codes, over a set of 2.3 million patients over all hospitals in New York state. The CCS diagnosis code refers to the code used by the Clinical Classifications Software system, which encompasses 285 possible diagnosis and procedure categories [ 18 ]. Since the CCS diagnosis codes are too numerous to list, we give a few examples that we analyzed, including but not limited to abdominal hernia, acute myocardial infarction, acute renal failure, behavioral disorders, bladder cancer, Hodgkins disease, multiple sclerosis, multiple myeloma, schizophrenia, septicemia, and varicose veins. To the best of our knowledge, we are not aware of models that predict the LoS on such a variety of diagnosis codes, with a patient sample greater than 2 million records, and with freely available open data. Hence, our investigation is unique from this point of view.

Sotodeh et al. [ 19 ] developed a Markov model to predict the LoS in intensive care unit patients. Ma et al. [ 20 ] used decision tree methods to predict LoS in 11,206 patients with respiratory disease.

Burn et. al. examined trends in the LoS for patients undergoing hip-replacement and knee-replacement in the U.K. [ 21 ]. Their study demonstrated a steady decline in the LoS from 1997–2012. The purpose of their study was to determine factors that contributed to this decline, and they identified improved surgical techniques such as fast-track arthroplasty. However, they did not develop any machine-learning models to predict the LoS.

Hachesu et al. examined the LoS for cardiac disease patients [ 22 ] and found that blood pressure is an important predictor of LoS. Garcia et al. determined factors influencing the LoS for undergoing treatment for hip fracture [ 23 ]. B. Vekaria et al. analyzed the variability of LoS for COVID-19 patients [ 24 ]. Arjannikov et al. [ 25 ] used positive-unlabeled learning to develop a predictive model for LoS.

Gupta et al. [ 26 ] conducted a meta-analysis of previously published papers on the role of nutrition on the LoS of cancer patients, and found that nutrition status is especially important in predicting LoS for gastronintestinal cancer. Similarly, Almashrafi et al. [ 27 ] performed a meta-analysis of existing literature on cardiac patients and reviewed factors affecting their LoS. However, they did not develop quantitative models in their work. Kalgotra et al. [ 28 ] use recurrent neural networks to build a prediction model for LoS.

Daghistani et al. [ 13 ] developed a machine learning model to predict length of stay for cardiac patients. They used a database of 16,414 patient records and predicted the length of stay into three classes, consisting of short LoS (< 3 days), intermediate LoS ( 3–5 days) and long LoS (> 5 days). They used detailed patient information, including blood test results, blood pressure, and patient history including smoking habits. Such detailed information is not available in the much larger SPARCS dataset that we utilized in our study.

Awad et al. [ 29 ] provide a comprehensive review of various techniques to predict the LoS. Though simple statistical methods have been used in the past, they make assumptions that the LoS is normally distributed, whereas the LoS has an exponential distribution [ 29 ]. Consequently, it is preferable to use techniques that do not make assumptions about the distribution of the data. Candidate techniques include regression, classification and regression trees, random forests, and neural networks. Rather than using statistical parametric techniques that fit parameters to specific statistical distributions, we favor data-driven techniques that apply machine-learning.

In 2020, during the height of the COVID-19 pandemic, the Lancet, a premier medical journal drew widespread rebuke [ 30 , 31 , 32 ] for publishing a paper based on questionable data. Many medical journals published expressions of concern [ 33 , 34 ]. The Lancet itself retracted the questionable paper [ 35 ], which is available at [ 36 ] with the stamp “retracted” placed on all pages. One possible solution to prevent such incidents from occurring is for top medical journals to require authors to make their data available for verification by the scientific community. Patient privacy concerns can be mitigated by de-identifying the records made available, as is already done by the New York State SPARCS effort [ 4 ]. Our methodology and analytics system design will become more relevant in the future, as there is a desire to prevent a repetition of the Lancet debacle. Even before the Lancet incident, there was declining trust amongst the public related to medicine and healthcare policy [ 37 ]. This situation continues today, with multiple factors at play, including biased news reporting in mainstream media [ 38 ]. A desirable solution is to make these fields more transparent, by releasing data to the public and explaining the various decisions in terms that the public can understand. The research in this paper demonstrates how such a solution can be developed.

Requirements

We describe the following three requirements of an ideal system for processing open healthcare data

Utilize open-source platforms to permit easy replicability and reproducibility.

Create interpretable and explainable models.

Demonstrate an understanding of how the input features determine the outcomes of interest.

The first requirement captures the need for research to be easily reproduced by peers in the field. There is growing concern that scientific results are becoming hard for researchers to reproduce [ 39 , 40 , 41 ]. This undermines the validity of the research and ultimately hurts the fields. Baker termed this the “reproducibility crisis”, and performed an analysis of the top factors that lead to irreproducibility of research [ 39 ]. Two of the top factors consist of the unavailability of raw data and code.

The second requirement addresses the need for the machine-learning models to produce explanations of their results. Though deep-learning models are popular today, they have been criticized for functioning as black-boxes, and the precise working of the model is hard to discern. In the field of healthcare, it is more desirable to have models that can be explained easily [ 42 ]. Unless healthcare providers understand how a model works, they will be reluctant to apply it in their practice. For instance, Reyes et al. determined that interpretable Artificial Intelligence systems can be better verified, trusted, and adopted in radiology practice [ 43 ].

The third requirement shows that it is important for relevant patient features to be captured that can be related to the outcomes of interest, such as LoS, total cost, mortality rate etc. Furthermore, healthcare providers should be able to understand the influence of these features on the performance of the model [ 44 ]. This is especially critical when feature engineering methods are used to combine existing features and create new features.

In the subsequent sections, we present our design for a healthcare analytics system that satisfies these requirements. We apply this methodology to the specific problem of predicting the LoS.

We have designed the overall system architecture as shown in Fig.  1 . This system is built to handle any open data source. We have shown the New York SPARCS as one of the data sources for the sake of specificity. Our framework can be applied to data from multiple sources such as the Center for Medicare and Medicaid Services (CMS in the U.S.) as shown in our previous work [ 6 ]. We chose a Python-based framework that utilizes Pandas [ 45 ] and Scikit learn [ 46 ]. Python is currently the most popular programming language for engineering and system design applications [ 47 ].

figure 1

Shows the system architecture. We use Python-based open-source tools such as Pandas and Scikit-Learn to implement the system

In Fig.  2 , we provide a detailed overview of the necessary processing stages. The specific algorithms used in each stage are described in the following sections.

figure 2

Shows the processing stages in our analytics pipeline

Recent research has shown that it is highly desirable for machine learning models used in the healthcare domain to be explainable to healthcare providers and professionals [ 48 ]. Hence, we focused on the interpretability and explainability of input features in our dataset and the models we chose to explore. We restricted our investigation to models that are explainable, including regression models, multinomial logistic regression, random forests, and decision trees. We also developed separate models for newborns and non-newborns.

Brief description of the dataset

During our investigation, we utilized open-health data provided by the New York State SPARCS system. The data we accessed was from the year 2016, which was the most recent year available at the time. This data was provided in the form of a CSV file, containing 2,343,429 rows and 34 columns. Each row contains de-identified in-patient discharge information. The dataset columns contained various types of information. They included geographic descriptors related to the hospital where care was provided, demographic descriptors such as patient race, ethnicity, and age, medical descriptors such as the CCS diagnosis code, APR DRG code, severity of illness, and length of stay. Additionally, payment descriptors were present, which included information about the type of insurance, total charges, and total cost of the procedure.

Detailed descriptions of all the elements in the data can be found in [ 49 ]. The CCS diagnosis code has been described earlier. The term “DRG” stands for Diagnostic Related Group [ 49 ], which is used by the Center for Medicare and Medicaid services in the U.S. for reimbursement purposes [ 50 ].

The data includes all patients who underwent inpatient procedures at all New York State Hospitals [ 51 ]. The payment for the care can come from multiple sources: Department of Corrections, Federal/State/Local/Veterans Administration, Managed Care, Medicare, Medicaid, Miscellaneous, Private Health Insurance, and Self-Pay. The dataset sourced from the New York State SPARCS system, encompassing a wider patient population beyond Medicare/Medicaid, holds greater value compared to datasets exclusively composed of Medicare/Medicaid patients. For instance, Gilmore et al. analyzed only Medicare patients [ 52 ].

We examine the distribution of the LoS in the dataset, as shown in Fig.  3 . We note that the providers of the data have truncated the length of stay to 120 days. This explains the peak we see at the tail of the distribution.

figure 3

Distribution of the length of stay in the dataset

Data pre-processing and cleaning

We identified 36,280 samples, comprising 1.55% of the data where there were missing values. These were discarded for further analysis. We removed samples which have Type of Admission = ‘Unknown’ (0.02% samples). So, the final data set has 2,306,668 samples. ‘Payment Typology 2’, and ‘Payment Typology 3’, have missing values (> = 50% samples), which were replaced by a ‘None’ string.

We note that approximately 10% of the dataset consists of rows representing newborns. We treat this group as a separate category. We found that the ‘Birth Weight’ feature had a zero value for non-newborn samples. Accordingly, to better use the ‘Birth Weight’ feature, we partitioned the data into two classes: newborns and non-newborns. This results in two classes of models, one for newborns and the second for all other patients. We removed the ‘Birth Weight’ feature in the input for the non-newborn samples as its value was zero for those samples.

The column ‘Total Costs’ (and in a similar way, ‘Total Charges’) are usually proportional to the LoS, and it would not be fair to use these variables to predict the LoS. Hence, we removed this column. We found that the columns 'Discharge Year', 'Abortion Edit Indicator'' are redundant for LoS prediction models, and we removed them. We also removed the columns ‘CCS Diagnosis Description’, ‘CCS Procedure Description’, ‘APR DRG Description’, ‘APR MDC Description’, and ‘APR Severity of Illness Description’ as we were given their corresponding numerical codes as features.

Since the focus of this paper is on the prediction of the LoS, we analyzed the distribution of LoS values in the dataset.

We developed regression models using all the LoS values, from 1–120. We also developed classification models where we discretized the LoS into specific bins. Since the distribution of LoS values is not uniform, and is heavily clustered around smaller values, we discretized the LoS into a small number of bins, e.g. 6 to 8 bins.

We utilized 10% of the data as a holdout test-set, which was not seen during the training phase. For the remaining 90% of the data, we used tenfold cross-validation in order to train the model and determine the best parameters to use.

Feature encoding

Many variables in the dataset are categorical, e.g., the variable “APR Severity of Illness Description” has the values in the set [Major, Minor, Moderate, Extreme]. We used distribution-dependent target encoding techniques and one-hot techniques to improve the model performance [ 53 ]. We replaced categorical data with the product of mean LoS and median LoS for a category value. The categorical feature can then better capture the dependence distribution of LoS with the value of the categorical feature.

For the linear regression model [ 54 ], we sampled a set of 6 categorical features, [‘Type of Admission’, ‘Patient Disposition’, ‘APR Severity of Illness Code’, ‘APR Medical Surgical Description’, ‘APR MDC Code’] which we target encoded with the mean of the LoS and the median of the LoS. We then one-hot encoded every feature (all features are categorical) and for each such one-hot encoded feature, created a new feature for each of the features in the sampled set, by replacing the ones in the one-hot encoded feature with the value of the corresponding feature in the sampled set. For example, we one-hot encoded ‘Operating Certificate Number’, and for samples where ‘Operating Certificate Number’ was 3, we created 6 features, each where samples having the value 3 were assigned the target encoded values of the sampled set features, and the other samples were assigned zero. We used such techniques to exploit the linear relation between LoS and each feature.

According to the sklearn documentation [ 55 ], a random forest regressor is “a meta estimator that fits a number of decision tree regressors on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting”. The random forest regressor leverages ensemble learning based on many randomized decision trees to make accurate and robust predictions for regression problems. The averaging of many trees protects against single trees overfitting the training data.

The random forest classifier is also an ensemble learning technique and uses many randomized decision trees to make predictions for classification problems. The 'wisdom of crowds' concept suggests that the decision made by a larger group of people is typically better than an individual. The random forest classifier uses this intuition, and allows each decision tree to make a prediction. Finally, the most popular predicted class is chosen as the overall classification.

For the Random Forest Regressor [ 56 , 57 ] and Random Forest Classifier [ 58 ], we only used a similar distribution dependent target encoding as a random forest classifier/ regressor is unsuitable for sparse one-hot encoded columns.

Multinomial logistic regression is a type of regression analysis that predicts the probabilities of the different possible outcomes of a categorically distributed dependent variable, given a set of independent variables. It allows for more than two discrete outcomes, extending binomial logistic regression for binary classification to models with multiple class membership. For the multinomial logistic regression model [ 59 ], we used only one-hot encoding, and not target encoding, as the target value was categorical.

Finally, we experimented with combinations of target encoding and one-hot encoding. We can either use target encoding, or one-hot encoding, or both. When both encodings are employed, the dimensionality of the data increases to accommodate the one-hot encoded features. For each combination of encodings, we also experimented with different regression models including linear regression and random forest regression.

Feature importance, selection, and feature engineering

We experimented with different feature selection methods. Since the focus of our work is on developing interpretable and explainable models, we used SHAP analysis to determine relevant features.

We examine the importance of different features in the dataset. We used the SHAP value (Shapley Additive Explanations), a popular measure for feature importance [ 60 ]. Intuitively, the SHAP value measures the difference in model predictions when a feature is used versus omitted. It is captured by the following formula.

where \({{\varnothing }}_{i}\) is the SHAP value of feature \(i\) , \(p\) is the prediction by the model, n is the number of features and S is any set of features that does not include the feature \(i\) . The specific model we used for the prediction was the random forest regressor where we target-encoded all features with the product of the mean and the median of the LoS, since most of the features were categorical.

Classification models

One approach to the problem is to bin the LoS into different classes, and train a classifier to predict which class an input sample falls in. We binned the LoS into roughly balanced classes as follows: 1 day, 2 days, 3 days, 4–6 days, > 6 days. This strategy is based on the distribution of the LoS as shown earlier in Figs.  3 and  4 .

figure 4

A density plot of the distribution of the length of stay. The area under the curve is 1. We used a kernel density estimation with a Gaussian kernel [ 61 ] to generate the plot

We used three different classification models, comprising the following:

Multinomial Logistic Regression

Random Forest Classifier

CatBoost classifier [ 62 ].

We used a Multinomial Logistic Regression model [ 59 ] trained and tested using tenfold cross validation to classify the LoS into one of the bins. The multinomial logistic regression model is capable of providing explainable results, which is part of the requirements. We used the feature engineering techniques described in the previous section.

We used a Random Forest Classifier model trained and tested using tenfold cross validation to classify the LoS into one of the bins. We used a maximum depth of 10 so as to get explainable insights into the model.

Finally, we used a CatBoost Classifier model trained and tested using tenfold cross validation to classify the LoS into one of the bins.

Regression models

We used three different regression models with the feature engineering techniques mentioned above ( Feature encoding section). These comprise:

Linear regression

Catboost regression

Random forest regression

The linear regression was implemented using the nn.Linear() function in the open source library PyTorch [ 63 ]. We used the ‘Adam’ optimization algorithm [ 64 ] in mini-batch settings to train the model weights for linear regression.

We investigated CatBoost regression in order to create models with minimal feature sets, whereby models with a low number of input features would provide adequate results. Accordingly, we trained a CatBoost Regressor [ 65 ] in order to determine the relationship between combinations of features and the prediction accuracy as determined by the R 2 correlation score.

The random forest regression was implemented using the function RandomForestRegressor() in scikit learn [ 55 ].

Model performance measures

For the regression models, we used the following metrics to compare the model performance.

The R 2 score and the p -value. We use a significance level of α = 0.05 (5 %) for our statistical tests.  If the p -value is small, i.e. less than α = 0.05, then the R 2 score is statistically significant.

For classifier models, we used the following metrics to compare the model performance.

True positive rate, false negative rate, and F1 score [ 66 ].

We computed the Brier score using Brier’s original calculation in his paper [ 67 ]. In this formulation, for R classes the Brier score B can vary between 0 and R, with 0 being the best score possible.

where \({\widehat{y}}_{i,c}\) is the class probability as per the model and \({I}_{i,c}=1\) if the i th sample belongs to class c and \({I}_{i,c}=0\) if it does not belong to class c .

We used the Delong test [ 68 ] to compare the AUC for different classifiers.

These metrics will allow other researchers to replicate our study and provide benchmarks for future improvements.

In this section we present the results of applying the techniques in the Methods section.

Descriptive statistics

We provide descriptive statistics that help the reader understand the distributions of the variables of interest.

Table 1 summarizes basic statistical properties of the LoS variable.

Figure  5 shows the distribution of the LoS variable for newborns.

figure 5

This figure depicts the distribution of the LoS variable for newborns

Table 2 shows the top 20 APR DRG descriptions based on their frequency of occurrence in the dataset.

Figure  6 shows the distribution of the LoS variable for the top 20 most frequently occurring APR DRG descriptions shown in Table  2 .

figure 6

A 3-d plot showing the distribution of the LoS for the top-20 most frequently occuring APR DRG descriptions. The x-axis (horizontal) depicts the LoS, the y-axis shows the APR DRG codes and the z-axis shows the density or frequency of occurrence of the LoS

We experimented with different encoding schemes for the categorical variables and for each encoding we examined different regression techniques. Our results are shown in Table 3 . We experimented with the three encoding schemes shown in the first column. The last row in the table shows a combination of one-hot encoding and target encoding, where the number of columns in the dataset are increased to accommodate one-hot encoded feature values for categorical variables.

Feature importance, selection and feature engineering

We obtained the SHAP plots using a Random Forest Regressor trained with target-encoded features.

Figures  7  and 8 show the SHAP values plots obtained for the features in the newborn partition of the dataset. We find that the features, “APR DRG Code”, “APR Severity of Illness Code”, “Patient Disposition”, “CCS Procedure Code”, are very useful in predicting the LoS. For instance, high feature values for “APR Severity of Illness Code”, which are encoded by red dots have higher SHAP values than the blue dots, which correspond to low feature values.

figure 7

SHAP Value plot for newborns

figure 8

1-D SHAP plot, in order of decreasing feature importance: top to bottom (for non-newborns)

A similar interpretation can be applied to the features in the non-newborn partition of the dataset. We note that “Operating Certificate Number” is among the top-10 most important features in both the newborn and non-newborn partitions. This finding is discussed in the Discussion section.

From Fig.  9 , we observe that as the severity of illness code increases from 1–4, there is a corresponding increase in the SHAP values.

figure 9

A 2-D plot showing the relationship between SHAP values for one feature, “APR Severity of Illness Code”, and the feature values themselves (non-newborns)

To further understand the relationship between the APR Severity of Illness code and the LoS, we created the plot in Fig.  10 . This shows that the most frequently occurring APR Severity of Illness code is 1 (Minor), and that the most frequently occurring LoS is 2 days. We provide this 2-D projection of the overall distribution of the multi-dimensional data as a way of understanding the relationship between the input features and the target variable, LoS.

figure 10

A density plot showing the relationship between APR Severity of Illness Code and the LoS. The color scale on the right determines the interpretation of colors in the plot. We used a kernel density estimation with a Gaussian kernel [ 61 ] to generate the plot

Similarly, Fig.  11 shows the relationship between the birth weight and the length of stay. The most common length of stay is two days.

figure 11

A density plot showing the distribution of the birth weight values (in grams) versus the LoS. The colorbar on the right shows the interpretation of color values shown in the plot. We used a kernel density estimation with a Gaussian kernel [ 61 ] to generate the plot

Classification

We obtained a classification accuracy of 46.98% using Multinomial Logistic Regression with tenfold cross-validation in the 5-class classification task for non-newborn cases. The confusion matrix in Fig.  12 shows that the highest density of correctly classified samples is in or close to the diagonal region. The regions where out model fails occurs between adjacent classes as can be inferred from the given confusion matrix.

figure 12

Confusion matrix for classification of non-newborns. The number inside each square along the diagonal represents the number of correctly classified samples. The color is coded so lighter colors represent lower numbers

For the newborn cases, we obtained a classification accuracy of 60.08% using Random Forest Classification model with tenfold cross-validation in the 5-class classification task. The confusion matrix in Fig.  13 shows that the majority of data samples lie in or close to the diagonal region. The regions where our model does not do well occurs between adjacent classes as can be inferred from the given confusion matrix,

figure 13

Confusion matrix for classification of newborns. The number inside each square along the diagonal represents the number of correctly classified samples. The color is coded so lighter colors represent lower numbers

The density plot in Fig.  14 shows the relationship between the actual LoS and the predicted LoS. For a LoS of 2 days, the centroid of the predicted LoS cluster is between 2 and 3 days.

figure 14

Shows the density plot of the predicted length of stay versus actual length of stay for the classifier model for non-newborns. We used a kernel density estimation with a Gaussian kernel [ 61 ] to generate the plot

A quantitative depiction of our model errors is shown in Fig.  15 . The values in Fig.  15 are interpreted as follows. Referring to the column for LoS = 2, the top row shows that 51% of the predicted LoS values for an actual stay of 2 days is also 2 days (zero error), and that 23% of the predicted values for LoS equal to 2 days have an error of 1 day and so on. The relatively high values in the top row indicates that the model is performing well, with an error of less than 1 day. There are relatively few instances of errors between 2 and 3 days (typically less than 10% of the values show up in this row). The only exception is for the class corresponding to LoS great than 8 days. The truncation of the data to produce this class results in larger model errors specifically for this class.

figure 15

Shows the distribution of correctly predicted LoS values for each class used in our model. Along the columns, we depict the different classes used in the model, consisting of LoS equal to 1, 2, 3 …8, and more than 8. Each row depicts different errors made in the prediction. For instance, the top row depicts an error of less than or equal to one day between the actual LoS and the predicted Los. The second row from the top depicts an error which is greater than 1 and less than or equal 2 days. And so on for the other rows, for non-newborns

Figures  16 and 17 show the scatter plots for the linear regression models. The exact line represents a line with slope 1, and a perfect model would be one that produced all points lying on this line.

figure 16

Scatter plot showing an instance of a linear regression fit to the data (newborns). The R 2 score is 0.82. The blue line represents an exact fit, where the predicted LoS equals the actual LoS (slope of the line is 1)

figure 17

Scatter plot for linear regression. (non-newborns). The R 2 score is 0.42. The blue line represents an exact fit, where the predicted LoS equals the actual LoS (slope of the line is 1)

Figure  18 shows a density plot depicting the relationship between the predicted length of stay and the actual length of stay.

figure 18

Shows the density plot of the predicted length of stay versus actual length of stay for the classifier model for non-newborns. We used a kernel density estimation with a Gaussian kernel [ 40 ] to generate the plot. The best fit regression line to our predictions is shown in green, whereas the blue line represents the ideal fit (line of slope 1, where actual LoS and predicted LoS are equal)

Most of the existing literature on LoS stay prediction is based on data for specific disease conditions such as cancer or cardiac disease. Hence, in order to understand which CCS diagnosis codes produce good model fits, we produced the plot in Fig.  19 .

figure 19

This figure shows the three CCS diagnosis codes that produced the top three R 2 scores using linear regression. These are 101, 100 and 109. The three CCS Diagnosis codes that produced the lowest R 2 scores are 159, 657, and 659

We provide the following descriptions in Tables  4  and 5 for the 3 CCS Diagnosis Codes in Fig.  19 with the top R 2 Scores using linear regression.

Similarly, the following table shows the 3 CCS Diagnosis Codes in Fig.  19 for the lowest R 2 Scores using linear regression.

Models with minimal feature sets

We trained a CatBoost Regressor [ 65 ] on the complete dataset in order to determine the relationship between combinations of features and the prediction accuracy as determined by the R 2 correlation score. This is shown in Fig.  20

figure 20

The labels for each row on the left show combinations of different input features. A CatBoost regression model was developed using the selected combination of features. The R 2 correlation scores for each model is shown in the bar graph

We can infer from Fig.  20 that only four features (‘'APR MDC Code', 'APR Severity of Illness Code', 'APR DRG Code', 'Patient Disposition') are sufficient for the model to reach very close to its maximum performance. We obtain similar concurring results when using other regression models for the same experiment.

Classification trees

We used a random forest tree approach to generate the trees in Figs.  21 and 22 .

figure 21

A random forest tree that represents a best-fit model to the data for newborns. With 4 levels of the decision tree, the R 2 score is 0.65

figure 22

A random forest tree using only a tree of depth 3 that represents a best-fit model to the data for non-newborns. The R 2 score is 0.28. We can generate trees with greater depth that better fit the data, but we have shown only a depth of 3 for the sake of readability in the printed version of this paper. Otherwise, the tree would be too large to be legible on this page. The main point in this figure is to showcase the ease of interpretation of the working of the model through rules

We used tenfold cross validation to determine the regression scores. The results are summarized in Tables  6 and 7 .

We computed the multi-class classifier metrics for logistic regression, using one-hot encoding for non-newborns. The results are presented in Table  8 . The first row represents the accuracy of the classifier when Class 0 is compared against the rest of the classes. A similar interpretation applies to the other rows in the table, ie one-versus-rest. The macro average gives the balanced recall and precision, and the resulting F1 score. The weighted average gives a support (number of samples) weighted average of the individual class metric. The overall accuracy is computed by dividing the total number of accurate predictions, which is 49,686 out of a total number of 105,932 samples, which yields a value of 0.47.

For the category of non-newborns, Fig.  23  provides a graphical plot that visualizes the ROC curves for the different multiclass classifiers we developed.

figure 23

This figure applies to data concerning non-newborns. We show the multiclass ROC curves for the performance of the catboost classifier for the different classes shown. The area under the ROC curve is 0.7844

In Table  9 we compare the performance of our multiclass classifier using logistic regression developed on 2016 SPARCS data against 2017 SPARCS data.

In order to compare the performance of the different classifiers, we computed the AUC measures reported in Table  10 . Figure 24 visualizes the data in Table 10 and Fig. 25 visualizes the data in Table 11 . In Tables 12 and 13 we report the results of computing the Delong test for non-newborns and newborns respectively. In Tables 14 and 15 we report the results of computing the Brier scores for non-new borns and newborns respectively.

figure 24

A bar chart that depicts the data in Table  10 for non-newborns

figure 25

A bar chart that depicts the data in Table  11

Model parameters

In Table  16 we present the parameter and hyperparameter values used in the different models.

Additional results shown in the Appendix/Supplementary material

Due to space restrictions, we show additional results in the Appendix/Supplementary Material. These results are in tabular form and describe the R 2 scores for different segmentations of the variables in the dataset, e.g. according to age group, severity of illness code, etc.

The most significant result we obtain is shown in Figs.  21 and 22 , which provides an interpretable working of the decision trees using random forest modeling. Figure  21 for newborns shows that the birth weight features prominently in the decision tree, occurring at the root node. Low birth weights are represented on the left side of the tree and are typically associated with longer hospital stays. Higher birth weights occur on the right side of the tree, and the node in the bottom row with 189,574 samples shows that the most frequently occurring predicted stay is 2.66 days. Figure  22 for non-newborns shows that the features of “APR DRG Code”, “APR Severity of Illness Code” and “Patient Disposition” are the most important top-level features to predict the LoS. This provides a relatively simple rule-based model, which can be easily interpreted by healthcare providers as well as patients. For instance, the right-most branch of the tree classifies the input data into a relatively high LoS (46 days) when the branch conditions APR DRG Code is greater than 813.55 and the APR Severity of Illness Code is less than 91.

The results in Fig.  19 and Table  4 show that if we restrict our model to specific CCS Diagnosis descriptions such as “coronary atherosclerosis and other heart disease”, we obtain a good R 2 Score of 0.62. The objective of our work is not to cherry-pick CCS Diagnosis codes that produce good results, but rather to develop a single model for the entire SPARCS dataset to obtain a birds-eye perspective. For future work, we can explicitly build separate models for each CCS Diagnosis code, and that could have relevance to specific medical specialties, such as cardiovascular care.

Similarly, the results in Fig.  19 and Table  5 show that there are CCS Diagnosis codes corresponding to schizophrenia and mood disorders that produce a poor model fit. Factors that contribute to this include the type of data in the SPARCS dataset, where information about patient vitals, medications, or a patient’s income level is not provided, and the inherent variability in treating schizophrenia and mood disorders. Baeza et al. [ 69 ] identified several variables that affect the LoS in psychiatric patients, which include psychiatric admissions in the previous years, psychiatric rating scale scores, history of attempted suicide, and not having sufficient income. Such variables are not provided in the SPARCS dataset. Hence a policy implication is to collect and make such data available, perhaps as a separate dataset focused on mental health issues, which have proven challenging to treat.

Figures  16 and 17 show that a better regression fit is obtained when a specific CCS Diagnosis code is used to build the model, such as “Newborn” in Fig.  16 . To put these results in context, we note that it is difficult to obtain a high R 2 value for healthcare datasets in general, and especially for large numbers of patient samples that span multiple hospitals. For instance, Bertsimas [ 70 ] reported an R 2 value of 0.2 and Kshirsagar [ 71 ] reported an R 2 value of 0.33 for similar types of prediction problems as studied in this paper.

Further details for a segmentation of R 2 scores by the different variable categories are shown in the Appendix/Supplementary Material section. For instance, the table corresponding to Age Groups shows that there is close agreement between the mean of the predicted LoS from our model and the actual LoS. Furthermore, the mean LoS increases steadily from 4.8 days for Age group 0–17 to 6.4 days for ages 70 or older. A discussion of these tables is outside the scope of this paper. However, they are being provided to help other researchers form hypotheses for further investigations or to find supporting evidence for ongoing research.

Table 3 shows that the best encoding scheme is to combine target encoding with one-hot encoding and then apply linear regression. This produces an R 2 score of 0.42 for the non-newborn data, which is the best fit we could obtain. This table also shows that significant improvements can be obtained by exploring the search space which consists of different strategies of feature encoding and regression methods. There is no theoretical framework which determines the optimum choice, and the best method is to conduct an experimental search. An important contribution of the current paper is to explore this search space so that other researchers can use and build upon our methodology.

The distribution of errors in Fig.  15 shows that the truncation we employed at a LoS of 8 days produces artifacts in the prediction model as all stays of greater than 8 days are lumped into one class. Nevertheless, the distribution of LoS values in Fig.  4 shows that a relatively small number of data samples have LoS greater than 8 days. In the future, we will investigate different truncation levels, and this is outside the scope of the current paper. By using our methodology, the truncation level can also be tuned by practitioners in the field, including hospital administrators and other researchers.

Our results in Fig.  7 show that certain features are not useful in predicting the LoS. The SHAP plot shows that features such as race, gender, and ethnicity are not useful in predicting the LoS. It would have been interesting if this were not the case, as that implies that there is systemic bias based on race, gender or ethnicity. For instance, a person with a given race may have a smaller LoS based on their demographic identity. This would be unacceptable in the medical field. It is satisfying to see that a large and detailed healthcare dataset does not show evidence of bias.

To place this finding in context, racial bias is an important area of research in the U.S., especially in fields such as criminology and access to financial services such as loans. In the U.S., it is well known that there is a disproportional imprisonment of black and Hispanic males [ 72 ]. Researchers working on criminal justice have determined that there is racial bias in the process of sentencing and granting parole, with blacks being adversely affected [ 73 ]. This bias is reinforced through any algorithms that are trained on the underlying data. There is evidence that banks discriminate against applicants for loans based on their race or gender [ 74 ].

This does not appear to be the case in our analysis of the SPARCS data. Though we did not specifically investigate the issue of racial bias in the LoS, the feature analysis we conducted automatically provides relevant answers. Other researchers including those in the U.K [ 21 ] have also determined that gender does not have an effect on LoS or costs. Hence the results in the current paper are consistent with the findings of other researchers in other countries working on entirely different datasets.

From Table  6 we see that in the case of data concerning non-newborns, the catboost regression performs the best, with an R 2 score of 0.432. The p -value is less than 0.01, indicating that the correlation between the actual and predicted values of LoS through catboost regression is statistically significant. Similarly, the p -values for linear regression and random forest regression indicate that these models produce predictions that are statistically significant, i.e. they did not occur by random chance.

From Table  7 that refers to data from newborns, the linear regression performs the best, with an R 2 score of 0.82. The p -value is less than 0.01, indicating that the correlation between the actual and predicted values of LoS through linear regression is statistically significant. Similarly, the p -values for random forest regression and catboost regression indicate that these models produce predictions that are statistically significant.

We examine the performance of classifiers on non-newborn data, as shown in Tables  10 and 12 . The Delong test conducted in Table  12 shows that there is a statistically significant difference between the AUCs of the pairwise comparisons of the models. Hence, we conclude that the catboost classifier performs the best with an average AUC of 0.7844. We also note that there is a marginal improvement in performance when we use the catboost classifier instead of the random forest classifier. Both the catboost classifier and the random forest classifier perform better than logistic regression. We conclude that the best performing model for non-newborns is the catboost classifier, followed by the random forest classifier, and then logistic regression.

In the case of newborn data, we examine the performance of the classifiers as shown in Tables  11 and 13 . From Table 13 , we note that the p -values in all the rows are less than 0.05, except for the binary class “one vs. rest for class 3”, random forests vs. catboost. Hence, for this particular comparison between the random forest classifier and the catboost classifier for “one vs. rest for class 3”, we cannot conclude that there is a statistically significant difference between the performance of these two classifiers. From Table  11 we observe that the AUCs of these two classifiers are very similar. We also note that only about 10% of the dataset consists of newborn cases.

From Table  14 we note that the Brier score for the catboost classifier is the lowest. A lower Brier score indicates better performance. According to the Brier scores for the non-newborn data, the catboost classifier performs the best, followed by the random forest classifier and then logistic regression. Table 15 shows that for newborns, the random forest classifier performs the best, followed by the catboost classifier and logistic regression. The performance of the random forest classifier and catboost classifier are very similar.

From a practical perspective, it may make sense to use a catboost classifier on both newborn and non-newborn data as it simplifies the processing pipeline. The ultimate decision rests with the administrators and implementers of these decision systems in the hospital environment.

Burn et al. observe [ 21 ] that though the U.S. has reported similar declines in LoS as in the U.K, the overall costs of joint replacement have risen. The U.K. government created policies to encourage the formation of specialist centers for joint replacement, which have resulted in reduction in the LoS as well as delivering cost reductions. The results and analysis presented in our current paper can help educate patients and healthcare consumers about trends in healthcare costs and how they can be reduced. An informed and educated electorate can press their elected representatives to make changes to the healthcare system to benefit the populace.

Hachesu et al. examined the LoS for cardiac disease patients [ 22 ] where they used data from around 5000 patients and considered 35 input variables to build a predictive model. They found that the LoS was longer in patients with high blood pressure. In contrast, our method uses data from 2.5 million patients and considers multiple disease conditions simultaneously. We also do not have access to patient vitals such as blood pressure measurements, due to the limitation of the existing New York State SPARCS data.

Garcia et al. [ 23 ] conducted a study of elderly patients (age greater than 60) to understand factors governing the LoS for hip fracture treatment. They used 660 patient records and determined that the most significant variable was the American Society of Anesthesiologists (ASA) classification system. The ASA score ranges from 1–5 and captures the anesthesiologist’s impression of a patient’s health and comorbidities at the time of surgery. Garcia et al. showed a monotonically increasing relationship between the ASA score and the LoS. However, they did not build a specific predictive model. Their work shows that it is possible to find single variables with significant information content in order to estimate the LoS. The New York SPARCS dataset that we used does not contain the ASA score. Hence a policy implication of our research is to alert the healthcare authorities include such variables such as the ASA score where relevant in the datasets released in the future. The additional storage required is very small (one additional byte per patient record).

Arjannikov et al. [ 25 ] developed predictive models by binarizing the data into two categories, e.g. LoS <  = 2 days or LoS > 2 days. In our work, we did not employ such a discretization. In contrast, we used continuous regression techniques as well as classification into more than two bins. It is preferable to stay as close to the actual data as possible.

Almashrafi et al. [ 27 ] and Cots et al. [ 75 ] observed that larger hospitals tended to have longer LoS for patients undergoing cardiac surgery. Though we did not specifically examine cardiac surgery outcomes, our feature analysis indicated that the hospital operating certificate number had lower relevance than other features such as DRG codes. Nevertheless, the SHAP plots in Fig.  7 and Fig.  8 show that the hospital operating certificate number occurs within the top 10 features in order of SHAP values. We will investigate this relationship in more detail in future research, as it requires determining the size of the hospital from the operating certificate number and creating an appropriate machine-learning model. The Appendix contains results that show certain operating certificate numbers that produce a good model fit to the data.

A major focus of our research is on building interpretable and explainable models. Based on the principle of parsimony, it is preferable to utilize models which involve fewer features. This will provide simpler explanations to healthcare professionals as well as patients. We have shown through Fig.  20 that a model with five features performs just as well as a model with seven features. These features also make intuitive sense and the model’s operation can be understood by both patients and healthcare providers.

Patients in the U.S. increasingly have to pay for medical procedures out-of-pocket as insurance payments do not cover all the expenses, leading to unexpectedly large bills [ 76 ]. Many patients also do not possess health insurance in the U.S., with the consequence that they get charged the highest [ 77 ]. Kullgreen et.al. observe that patients in the U.S. need to be discerning healthcare consumers [ 78 ], as they can optimize the value they receive from out-of-pocket spending. In addition to estimating the cost of medical procedures, patients will also benefit from estimating the expected duration for a procedure such as joint replacement. This will allow them to budget adequate time for their medical procedures. Patients and consumers will benefit from obtaining estimates from an unbiased open data source such as New York State SPARCS and the use of our model.

Other researchers have developed specific LoS models for particular health conditions, such as cardiac disease [ 22 ], hip replacement [ 21 ], cancer [ 26 ], or COVID-19 [ 24 ]. In addition, researchers typically assume a prior statistical distribution for the outcomes, such a Weibull distribution [ 24 ]. However, we have not made any assumptions of specific prior statistical distributions, nor have we restricted our analysis to specific diseases. Consequently, our model and techniques should be more widely applicable, especially in the face of rapidly changing disease trajectories worldwide.

Our study is based exclusively on freely available open health data. Consequently, we cannot control the granularity of the data and must use the data as-is. We are unable to obtain more detailed patient information such as their physiological variables such as blood pressure, heartrate variability etc. at the time of admittance and during their stay. Hospitals, healthcare providers, and insurers have access to this data. However, there is no mandate for them to make this available to researchers outside their own organizations. Sometimes they sell de-identified data to interested parties such as pharmaceutical companies [ 79 ]. Due to the high costs involved in purchasing this data, researchers worldwide, especially in developing countries are at a disadvantage in developing AI algorithms for healthcare.

There is growing recognition that medical researchers need to standardize data formats and tools used for their analysis, and share them openly. One such effort is the organization for Observational Health Data Sciences and Informatics (OHDSI) as described in [ 80 ].

Twitter has demonstrated an interesting path forward, where a small percentage of its data was made available freely to all users for non-commercial purposes through an API [ 81 ]. Recently, Twitter has made a larger proportion of its data available to qualified academic researchers [ 82 ]. In the future, the profit motives of companies need to be balanced with considerations for the greater public good. An advantage of using the Twitter model is that it spurs more academic research and allows universities to train students and the workforce of the future on real-world and relevant datasets.

In the U.S., a new law went into effect in January 2021 requiring hospitals to make pricing data available publicly. The premise is that having this data would provide better transparency into the working of the healthcare system in the U.S. and lead to cost efficiencies. However, most hospitals are not in compliance with this law [ 83 ]. Concerted efforts by government officials as well as pressure by the public will be necessary to achieve compliance. If the eventual release of such data is not accompanied by a corresponding interest shown by academicians, healthcare researchers, policymakers, and the public it is likely that the very premise of the utility of this data will be called into question. Furthermore, merely dumping large quantities of data into the public domain is unlikely to benefit anyone. Hence research efforts such as the one presented in this paper will be valuable in demonstrating the utility of this data to all stakeholders.

Our machine-learning pipeline can easily be applied to new data that will be released periodically by New York SPARCS, and also to hospital pricing data [ 83 ]. Due to our open-source methodology, other researchers can easily extend our work and apply it to extract meaning from open health data. This improves reproducibility, which is an essential aspect of science. We will make our code available on Github to interested researchers for non-commercial purposes.

Limitations of our models

Our models are restricted to the data available through New York State SPARCS, which does not provide detailed information about patient vitals. More detailed physiological data is available through the Multiparameter Intelligent Monitoring in Intensive Care (MIMIC) framework [ 84 ], though for a smaller number of patients. We plan to extend our methodology to handle such data in the future. Another limitation of our study is that it does not account for patient co-morbidities. This arises from the de-identification process used to release the SPARCS data, where patient information is removed. Hence we are unable to analyze multiple hospital admissions for a given patient, possibly for different conditions. The main advantage of our approach is that it uses large-scale population data (2.3 million patients) but at a coarse level of granularity, where physiological data is not available. Nevertheless, our approach provides a high-level view of the operation of the healthcare system, which provides valuable insights.

There is growing interest in using data analytics to increase government transparency and inform policymaking. It is expected that the meaning and insights gained from such evidence-based analysis will translate to better policies and optimal usage of the available infrastructure. This requires cooperation between computer scientists, domain experts, and policy makers. Open healthcare data is especially valuable in this context due to its economic significance. This paper presents an open-source analytics system to conduct evidence-based analysis on openly available healthcare data.

The goal is to develop interpretable machine learning models that identify key drivers and make accurate predictions related to healthcare costs and utilization. Such models can provide actionable insights to guide healthcare administrators and policy makers. A specific illustration is provided via a robust machine learning pipeline that predicts hospital length of stay across 285 disease categories based on 2.3 million de-identified patient records. The length of stay is directly related to costs.

We focused on the interpretability and explainability of input features and the resulting models. Hence, we developed separate models for newborns and non-newborns, given differences in input features. The best performing model for non-newborn data was catboost regression, which used linear regression and achieved an R 2 score of 0.43. The best performing model for newborns and non-newborns respectively was linear regression, which achieved an R 2 score of 0.82. Key newborn predictors included birth weight, while non-newborn models relied heavily on the diagnostic related group classification. This demonstrates model interpretability, which is important for adoption. There is an opportunity to further improve performance for specific diseases. If we restrict our analysis to cardiovascular disease, we obtain an improved R 2 score of 0.62.

The presented approach has several desirable qualities. Firstly, transparency and reproducibility are enabled through the open-source methodology. Secondly, the model generalizability facilitates insights across numerous disease states. Thirdly, the technical framework can easily integrate new data while allowing modular extensions by the research community. Lastly, the evidence generated can readily inform multiple key stakeholders including healthcare administrators planning capacity, policy makers optimizing delivery, and patients making medical decisions.

Availability of data and materials

Data is publicly available at the website mentioned in the paper, https://www.health.ny.gov/statistics/sparcs/

There is an “About Us” tab in the website which contains all the contact details. The authors have nothing to do with this website as it is maintained by New York State.

Gurría A. Openness and Transparency - Pillars for Democracy, Trust and Progress. OECD.org. Available: https://www.oecd.org/unitedstates/opennessandtransparency-pillarsfordemocracytrustandprogress.htm . Accessed 28 June 2024.

Jetzek T. The Sustainable Value of Open Government Data: Uncovering the Generative Mechanisms of Open Data through a Mixed Methods Approach. lCopenhagen Business School, Institut for IT-Ledelse Department of IT Management. 2015.

Move fast and heal things: How health care is turning into a consumer product. The Economist. 2022.  https://www.economist.com/business/how-health-care-is-turning-into-a-consumer-product/21807114 . Accessed 28 June 2024.

New York State Department Of Health, Statewide Planning and Research Cooperative System (SPARCS).  https://www.health.ny.gov/statistics/sparcs/ . Accessed 5 Oct 2022.

Rao AR, Chhabra A, Das R, Ruhil V. A framework for analyzing publicly available healthcare data. In 2015 17th International Conference on E-health Networking, Application & Services (IEEE HealthCom). 2015: IEEE, pp. 653–656.

Rao AR, Clarke D. A fully integrated open-source toolkit for mining healthcare big-data: architecture and applications. In IEEE International Conference on Healthcare Informatics ICHI, Chicago. 2016: IEEE, pp. 255–261.

Rao AR, Garai S, Dey S, Peng H. PIKS: A Technique to Identify Actionable Trends for Policy-Makers Through Open Healthcare Data. SN Computer Science. 2021;2(6):1–22.

Article   Google Scholar  

Rao AR, Rao S, Chhabra R. Rising mental health incidence among adolescents in Westchester, NY. Community Ment Health J. 2021:1–1. 

Boylan J F. My $145,000 Surprise Medical Bill. New York Times. 2020.  https://www.nytimes.com/2020/02/19/opinion/surprise-medical-bill.html . Accessed 28 June 2024.

Peterson K, Bykowicz J. Congress Debates Push to End Surprise Medical Billing. Wall Street J. 2020.  https://www.wsj.com/articles/congress-debates-push-to-end-surprise-medical-billing-11589448603 . Accessed 28 June 2024.

Wang S, Zhang J, Fu Y, Li Y. ACM TIST Special Issue on Deep Learning for Spatio-Temporal Data: Part 1. 12th ed. NY: ACM New York; 2021. p. 1–3.

Google Scholar  

Jones R. lining length of stay and future bed numbers. BJHCM. 2015;21(9):440–1.

Daghistani TA, Elshawi R, Sakr S, Ahmed AM, Al-Thwayee A, Al-Mallah MH. Predictors of in-hospital length of stay among cardiac patients: a machine learning approach. Int J Cardiol. 2019;288:140–7.

Article   PubMed   Google Scholar  

Sen-Crowe B, Sutherland M, McKenney M, Elkbuli A. A closer look into global hospital beds capacity and resource shortages during the COVID-19 pandemic. J Surg Res. 2021;260:56–63.

Article   CAS   PubMed   Google Scholar  

Stone K, Zwiggelaar R, Jones P, Mac Parthaláin N. A systematic review of the prediction of hospital length of stay: Towards a unified framework. PLOS Digital Health. 2022;1(4):e0000017.

Article   PubMed   PubMed Central   Google Scholar  

Lequertier V, Wang T, Fondrevelle J, Augusto V, Duclos A. Hospital length of stay prediction methods: a systematic review. Med Care. 2021;59(10):929–38.

Sridhar S, Whitaker B, Mouat-Hunter A, McCrory B. Predicting Length of Stay using machine learning for total joint replacements performed at a rural community hospital. PLoS ONE. 2022;17(11);e0277479.

Article   CAS   PubMed   PubMed Central   Google Scholar  

CCS (Clinical Classifications Software) - Synopsis. https://www.nlm.nih.gov/research/umls/sourcereleasedocs/current/CCS/index.html . Accessed 13 Jan 2022.

Sotoodeh M, Ho JC. Improving length of stay prediction using a hidden Markov model. AMIA Summits on Translational Science Proceedings. 2019;2019:425.

PubMed Central   Google Scholar  

Ma F, Yu L, Ye L, Yao DD, Zhuang W. Length-of-stay prediction for pediatric patients with respiratory diseases using decision tree methods. IEEE J Biomed Health Inform. 2020;24(9):2651–62.

Burn E, et al. Trends and determinants of length of stay and hospital reimbursement following knee and hip replacement: evidence from linked primary care and NHS hospital records from 1997 to 2014. BMJ Open. 2018;8(1);e019146.

Hachesu PR, Ahmadi M, Alizadeh S, Sadoughi F. Use of data mining techniques to determine and predict length of stay of cardiac patients. Healthcare informatics research. 2013;19(2):121–9.

Garcia AE, et al. Patient variables which may predict length of stay and hospital costs in elderly patients with hip fracture. J Orthop Trauma. 2012;26(11):620–3.

Vekaria B, et al. Hospital length of stay for COVID-19 patients: Data-driven methods for forward planning. BMC Infect Dis. 2021;21(1):1–15.

Arjannikov T, Tzanetakis G. An empirical investigation of PU learning for predicting length of stay. In 2021 IEEE 9th International Conference on Healthcare Informatics (ICHI). 2021: IEEE, pp. 41–47.

Gupta D, Vashi PG, Lammersfeld CA, Braun DP. Role of nutritional status in predicting the length of stay in cancer: a systematic review of the epidemiological literature. Ann Nutr Metab. 2011;59(2–4):96–106.

Almashrafi A, Elmontsri M, Aylin P. Systematic review of factors influencing length of stay in ICU after adult cardiac surgery. BMC Health Serv Res. 2016;16(1):318.

Kalgotra P, Sharda R. When will I get out of the hospital? Modeling Length of Stay using Comorbidity Networks. J Manag Inf Syst. 2021;38(4):1150–84.

Awad A, Bader-El-Den M, McNicholas J. Patient length of stay and mortality prediction: a survey. Health Serv Manage Res. 2017;30(2):105–20.

Editorial-Board. The Lancet, HCL and Trump. Wall Street J. 2020.  https://www.wsj.com/articles/the-lancet-hcl-and-trump-11591226880 . Accessed 28 June 2024.

Servick  K, Enserink M. A mysterious company’s coronavirus papers in top medical journals may be unraveling. Science. 2020.  https://www.science.org/content/article/mysterious-company-s-coronavirus-papers-top-medical-journals-may-be-unraveling . Accessed 28 June 2024.

Gabler E, Rabin RC. The Doctor Behind the Disputed Covid Data. New York Times. 2020.  https://www.nytimes.com/2020/07/27/science/coronavirus-retracted-studies-data.html . Accessed 28 June 2024.

Lancet-Editors. Expression of concern: Hydroxychloroquine or chloroquine with or without a macrolide for treatment of COVID-19: a multinational registry analysis. 2020;395:10240. https://www.science.org/content/article/mysterious-company-s-coronavirus-papers-topmedical-journals-may-be-unraveling . Accessed 28 June 2024.

Editorial-Board. Expression of Concern: Mehra MR et al. Cardiovascular Disease, Drug Therapy, and Mortality in Covid-19. N Engl J Med. 2020.  https://www.nejm.org/doi/full/10.1056/NEJMoa2007621 . Accessed 28 June 2024.

Hopkins JS, Gold R. Authors Retract Studies That Found Risks of Using Antimalaria Drugs Against Covid-19. Wall Street J. 2020. https://www.wsj.com/articles/authors-retract-study-that-found-risks-of-using-antimalaria-drug-against-covid-19-11591299329 . Accessed 28 June 2024.

https://www.thelancet.com/pdfs/journals/lancet/PIIS0140-6736(20)31180-6.pdf . Accessed 9 Jan 2022.

Wolfensberger M, Wrigley A. Trust in Medicine. Cambridge University Press. 2019. ISBN-13: 978-1108487191.

Bhattacharya J, Nicholson T. A Deceptive Covid Study, Unmasked. Wall Street J. 2022. https://www.wsj.com/articles/deceptive-covid-study-unmasked-abc-misleading-omicron-north-carolina-students-duke-mask-test-to-stay-11641933613 . Accessed 28 June 2024.

Baker M. 1,500 scientists lift the lid on reproducibility. Nature. 2016;533(7604):452–4.

Begley CG, Ioannidis JP. Reproducibility in science: improving the standard for basic and preclinical research. Circ Res. 2015;116(1):116–26.

Eisner D. Reproducibility of science: Fraud, impact factors and carelessness. J Mol Cell Cardiol. 2018;114:364–8.

Wang F, Kaushal R, Khullar D. Should health care demand interpretable artificial intelligence or accept “black box” medicine? Am College Phys. 2020;172:59–60.

Reyes M, et al. On the interpretability of artificial intelligence in radiology: challenges and opportunities. Radiol Art Intell. 2020;2(3):e190043.

Savadjiev P, et al. Demystification of AI-driven medical image interpretation: past, present and future. Eur Radiol. 2019;29(3):1616–24.

McKinney W. Python for data analysis: Data wrangling with Pandas, NumPy, and IPython. " O’Reilly Media, Inc. 2012.

Pedregosa F, et al. Scikit-learn: Machine learning in Python. J Machine Learn Res. 2011;12:2825–30.

Cass S. The top programming languages: Our latest rankings put Python on top-again-[Careers]. IEEE Spectr. 2020;57(8):22–22.

Tjoa E, Guan C. A survey on explainable artificial intelligence (xai): Toward medical xai," IEEE Transactions on Neural Networks and Learning Systems. 2020.

https://www.health.ny.gov/statistics/sparcs/docs/sparcs_data_dictionary.xlsx . Accessed 28 June 2024.

Design and development of the Diagnosis Related Group (DRG). https://www.cms.gov/icd10m/version37-fullcode-cms/fullcode_cms/Design_and_development_of_the_Diagnosis_Related_Group_(DRGs).pdf . Accessed 5 Oct 2022.

ARTICLE 28, Hospitals, Public Health (PBH) CHAPTER 45. 2023. Available: https://www.nysenate.gov/legislation/laws/PBH/A28 . Accessed 28 June 2024.

Gilmore‐Bykovskyi A, et al. Disparities in 30‐day readmission rates among Medicare enrollees with dementia. J Am Geriatr Soc. 2023.

Rodríguez P, Bautista MA, Gonzalez J, Escalera S. Beyond one-hot encoding: Lower dimensional target embedding. Image Vis Comput. 2018;75:21–31.

Montgomery DC, Peck EA, Vining GG. Introduction to linear regression analysis. 6th ed. John Wiley & Sons; 2021. ISBN-13 978-1119578727.

Random forest regressor in sklearn. Available: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html . Accessed 28 June 2024.

Breiman L. Random forests. Mach Learn. 2001;45:5–32.

Svetnik V, Liaw A, Tong C, Culberson JC, Sheridan RP, Feuston BP. Random forest: a classification and regression tool for compound classification and QSAR modeling. J Chem Inf Comput Sci. 2003;43(6):1947–58.

Liaw A, Wiener M. Classification and regression by randomForest. R news. 2002;2(3):18–22.

Böhning D. Multinomial logistic regression algorithm. Ann Inst Stat Math. 1992;44(1):197–200.

Vaid A, et al. Machine Learning to Predict Mortality and Critical Events in a Cohort of Patients With COVID-19 in New York City: Model Development and Validation. J Med Internet Res. 2020;22(11);e24018.

Density Estimation.  https://scikit-learn.org/stable/modules/density.html . Accessed 5 Oct 2022.

CatBoost, a high-performance open source library for gradient boosting on decision trees. Available:  https://catboost.ai/  and https://catboost.ai/en/docs/concepts/python-usages-examples . Accessed 28 June 2024.

PyTorch documentation for torch.nn, the basic building blocks for graphs. Available: https://pytorch.org/docs/stable/nn.html . Accessed 28 June 2024.

Kingma DP, Ba J. Adam: A method for stochastic optimization," arXiv preprint arXiv:1412.6980. 2014.

Prokhorenkova L, Gusev G, Vorobev A, Dorogush AV, Gulin A. CatBoost: unbiased boosting with categorical features," arXiv preprint arXiv:1706.09516. 2017.

Tharwat A. Classification assessment methods. Applied computing and informatics. 2020;17(1):168–92.

Brier GW. Verification of forecasts expressed in terms of probability. Mon Weather Rev. 1950;78(1):1–3.

DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics. 1988:837–45.

Baeza FL, da Rocha NS, Fleck MP. Predictors of length of stay in an acute psychiatric inpatient facility in a general hospital: a prospective study. Brazilian Journal of Psychiatry. 2017;40:89–96.

Bertsimas D, et al. Algorithmic prediction of health-care costs. Oper Res. 2008;56(6):1382–92.

Kshirsagar R. Accurate and Interpretable Machine Learning for Transparent Pricing of Health Insurance Plans," presented at the AAAI 2021 Conference. 2021.

Ulmer J, Painter-Davis N, Tinik L. Disproportional imprisonment of Black and Hispanic males: Sentencing discretion, processing outcomes, and policy structures. Justice Q. 2016;33(4):642–81.

Angwin J, J. Larso J, Mattu S, Kirchner L. Machine bias: There’s software used across the country to predict future criminals. And it’s biased against blacks. ProPublica (2016). Google Scholar. 2016;23.

Steil JP, Albright L, Rugh JS, Massey DS. The social structure of mortgage discrimination. Hous Stud. 2018;33(5):759–76.

Cots F, Mercadé L, Castells X, Salvador X. Relationship between hospital structural level and length of stay outliers: Implications for hospital payment systems. Health Policy. 2004;68(2):159–68.

Evans M, McGinty T. Hospital Prices Are Arbitrary. Just Look at the Kingsburys’ $100,000 Bill. Wall Street J. 2021.  https://www.wsj.com/articles/hospital-prices-arbitrary-healthcare-medical-bills-insurance-11635428943 . Accessed 28 June 2024.

Evans M. Hospitals Often Charge Uninsured People the Highest Prices, New Data Show. Wall Street J. 2021. https://www.wsj.com/articles/hospitals-often-charge-uninsured-people-the-highest-prices-new-data-show-11625584448 . Accessed 28 June 2024.

Kullgren JT, et al. A survey of Americans with high-deductible health plans identifies opportunities to enhance consumer behaviors. Health Aff. 2019;38(3):416–24.

Wetsman N. Hospitals are selling treasure troves of medical data — what could go wrong? The Verge. 2021. Available: https://www.theverge.com/2021/6/23/22547397/medical-records-health-data-hospitals-research . Accessed 28 June 2024.

Hripcsak G, et al. Observational Health Data Sciences and Informatics (OHDSI): Opportunities for Observational Researchers. Stud Health Technol Inform. 2015;216:574–8.

PubMed   PubMed Central   Google Scholar  

Gabarron E, Dorronzoro E, Rivera-Romero O, Wynn R. Diabetes on Twitter: a sentiment analysis. J Diabetes Sci Technol. 2019;13(3):439–44.

Statt N. Twitter is opening up its full tweet archive to academic researchers for free. The Verge. 2021. Available: https://www.theverge.com/2021/1/26/22250203/twitter-academic-research-public-tweet-archive-free-access . Accessed 28 June 2024. 

Evans M, Mathews AW, McGinty T. Hospitals Still Not Fully Complying With Federal Price-Disclosure Rules. Wall Street J. 2021.  https://www.wsj.com/articles/hospital-price-public-biden-11640882507 .

Johnson AE, et al. MIMIC-III, a freely accessible critical care database. Scientific data. 2016;3(1):1–9.

Download references

Acknowledgements

We are grateful to the New York State SPARCS program for making the data available freely to the public. We greatly appreciate the feedback provided by the anonymous reviewers which helped in improving the quality of this manuscript.

No external funding was available for this research.

Author information

Authors and affiliations.

Indian Institute of Technology, Delhi, India

Raunak Jain, Mrityunjai Singh & Rahul Garg

Fairleigh Dickinson University, Teaneck, NJ, USA

A. Ravishankar Rao

You can also search for this author in PubMed   Google Scholar

Contributions

Raunak Jain, Mrityunjai Singh, A. Ravishankar Rao, and Rahul Garg contributed equally to all stages of preparation of the manuscript.

Corresponding author

Correspondence to A. Ravishankar Rao .

Ethics declarations

Ethics approval and consent to participate.

Not applicable as no human subjects were used in our study.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary material 1., rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Jain, R., Singh, M., Rao, A.R. et al. Predicting hospital length of stay using machine learning on a large open health dataset. BMC Health Serv Res 24 , 860 (2024). https://doi.org/10.1186/s12913-024-11238-y

Download citation

Received : 19 June 2023

Accepted : 24 June 2024

Published : 29 July 2024

DOI : https://doi.org/10.1186/s12913-024-11238-y

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Machine learning
  • Artificial intelligence
  • Health informatics
  • Open-source software
  • Healthcare analytics

BMC Health Services Research

ISSN: 1472-6963

research paper intro quote

Log in using your username and password

  • Search More Search for this keyword Advanced search
  • Latest content
  • Supplements
  • BMJ Journals

You are here

  • Volume 8, Issue 7
  • War-related sexual and gender-based violence in Tigray, Northern Ethiopia: a community-based study
  • Article Text
  • Article info
  • Citation Tools
  • Rapid Responses
  • Article metrics

Download PDF

  • Girmatsion Fisseha 1 ,
  • Tesfay Gebregzabher Gebrehiwot 2 ,
  • Mengistu Welday Gebremichael 3 ,
  • Shishay Wahdey 1 ,
  • http://orcid.org/0000-0001-6563-0001 Gebrekiros Gebremichael Meles 1 ,
  • http://orcid.org/0000-0002-5874-3304 Kebede Embaye Gezae 1 ,
  • Awol Yemane Legesse 4 ,
  • Akeza Awealom Asgedom 1 ,
  • Mache Tsadik 1 ,
  • Abraha Woldemichael 1 ,
  • Aregawi Gebreyesus 1 ,
  • Haftom Temesgen Abebe 1 ,
  • Yibrah Alemayehu Haile 5 ,
  • Selome Gezahegn 6 , 7 ,
  • Maru Aregawi 8 ,
  • http://orcid.org/0000-0003-2303-8493 Kiros T Berhane 9 ,
  • Hagos Godefay 5 ,
  • Afework Mulugeta 1
  • 1 School of Public Health , Mekelle University College of Health Sciences , Mekelle , Tigray , Ethiopia
  • 2 Epidemiology , Mekelle University College of Health Sciences , Mekelle , Tigray , Ethiopia
  • 3 Department of Midwifery , Mekelle University College of Health Sciences , Mekelle , Tigray , Ethiopia
  • 4 School of Medicine , Mekelle University College of Health Sciences , Mekelle , Tigray , Ethiopia
  • 5 Tigray Health Bureau , Mekelle , Tigray , Ethiopia
  • 6 Hennepin Healthcare , Minneapolis , Minnesota , USA
  • 7 University of Minnesota Medical School , Minneapolis , Minnesota , USA
  • 8 Global Malaria Program , World Health Organization , Geneve , Switzerland
  • 9 Biostatistics , Columbia University , New York , New York , USA
  • Correspondence to Professor Kiros T Berhane; kiros.berhane{at}columbia.edu

Introduction Sexual and gender-based violence (SGBV) during armed conflicts has serious ramifications with women and girls disproportionally affected. The impact of the conflict that erupted in November 2020 in Tigray on SGBV is not well documented. This study is aimed at assessing war-related SGBV in war-affected Tigray, Ethiopia.

Methods A community-based survey was conducted in 52 (out of 84) districts of Tigray, excluding its western zone and some districts bordering Eritrea due to security reasons. Using a two-stage multistage cluster sampling technique, a total of 5171 women of reproductive age (15–49 years) were randomly selected and included in the study. Analysis used weighted descriptive statistics, regression modelling and tests of associations.

Results Overall, 43.3% (2241/5171) of women experienced at least one type of gender-based violence. The incidents of sexual, physical and psychological violence, and rape among women of reproductive age were found to be 9.7% (500/5171), 28.6% (1480/5171), 40.4% (2090/5171) and 7.9% (411/5171), respectively. Of the sexual violence survivors, rape accounted for 82.2% (411/500) cases, of which 68.4% (247) reported being gang raped. Young women (aged 15–24 years) were the most affected by sexual violence, 29.2% (146/500). Commonly reported SGBV-related issues were physical trauma, 23.8% (533/2241), sexually transmitted infections, 16.5% (68/411), HIV infection, 2.7% (11/411), unwanted pregnancy, 9.5% (39/411) and depression 19.2% (431/2241). Most survivors (89.7%) did not receive any postviolence medical or psychological support.

Conclusions Systemic war-related SGBV was prevalent in Tigray, with gang-rape as the most common form of sexual violence. Immediate medical and psychological care, and long-term rehabilitation and community support for survivors are urgently needed and recommended.

  • health policy
  • public health
  • community-based survey

Data availability statement

Data are available on reasonable request.

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:  http://creativecommons.org/licenses/by-nc/4.0/ .

https://doi.org/10.1136/bmjgh-2022-010270

Statistics from Altmetric.com

Request permissions.

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

WHAT IS ALREADY KNOWN ON THIS TOPIC

Conflict-based sexual and gender-based violence (SGBV) is known to have serious immediate and long-term adverse societal impact with women and girls affected the most.

WHAT THIS STUDY ADDS

This study provides first-of-its-kind objectively and carefully collected primary data on the scale and level of SGBV in the Tigray region, Northern Ethiopia, as a result of the conflict that erupted in November 2020.

HOW THIS STUDY MIGHT AFFECT RESEARCH, PRACTICE OR POLICY

By providing carefully collected evidence on the level and impact of SGBV in the Tigray region, the study findings will help policy makers develop models of working with women who have experienced SGBV in the context of war, on establishing accountability for the atrocities committed and in planning for the unprecedented medical, psychological and rehabilitation needs of SGBV survivors.

Introduction

Sexual and gender-based violence (SGBV) is a worldwide phenomenon without any geographical, cultural, social, economic, ethnic or other boundaries. It is a form of violence that is inflicted on the basis of gender differences. 1 War-related SGBV has significant and severe adverse social impact, both during and the postconflict period. SGBV against women is often committed on a massive scale during wars and conflicts. That is, women and girls are disproportionately targeted in conflicts, systematically raped, intimidated, sexually and physically abused, forced into unwanted pregnancies and/or killed. 2–6

War-related sexual and human right abuses are still prevalent at the global scale. They mostly occur during conflicts in low-income (especially in Africa) and some high-income countries in general. 7–15 Independent studies in many countries including in Africa have reported prevalence of sexual violence ranging from 2.6% in the current war crisis in Ukraine (2.6%) to 21.3% in South Sudan during the civil war between 2005 and 2011. 8–15 However, sexual violence in most conflicts has not been well assessed for several reasons including prolonged periods of conflict, as well as sociocultural and other complex issues, especially in low-income countries. 7

Most sexually abused women suffer emotional breakdowns, especially those from the rural communities where the moral codes are strict. Raped women do not routinely report the incidents for fear of family alienation and stigmatisation by their communities. In low-income countries, raped daughters are often disclaimed by their parents, and raped wives are rejected by their husbands. 16

Many impregnated women, after rape, undergo ‘back-street’ abortions that put their lives at risk. Some cannot even look at their babies. Still others give them away. 5 6 During a conflict, men and women often lose their lives from various causes and are likely to be tortured and abused in various ways for biological, psychological or socio-economic reasons. While relatively more men are killed during wars, women often experience violence, forced pregnancy, abduction, sexual abuse and slavery. The harm, silence and shame women experience because of war is pervasive; with their redress, almost non-existent. The situation of women in armed conflicts has been systematically neglected 5 17 in taking concrete actions, although the United Nation has designated SGBV as war crimes in Article 8 of the Rome Statute of the International Criminal Court. 18

On 4 November 2020, war erupted in the Tigray region of Ethiopia following years of growing tensions between the federal government of Ethiopia and the regional government of Tigray in Ethiopia. The causes and development of the war in Tigray were highly complex and multidimensional including the involvement of both internal and external parties’ interests. 19 A complete account and analysis of the cause of the conflict is beyond the scope of this manuscript. During the war, several parties were involved with the Tigray regional special forces on one side and allied forces such as the Ethiopian National Defence Forces Amhara regional special armed forces and Amhara militias, and the Eritrean Defence Forces on the other side. 20 21

The war in Tigray that erupted in the beginning of November 2020 has resulted in a massive humanitarian crisis. Preliminary reports have shown that Tigrayan women and girls have experienced deliberate and organised widespread war-related SGBV, in which some were subjected to severe violence including gang-raping, and the insertion of foreign objects to their reproductive organs. 20 According to the report of the Human Rights Watch (HRW), 2204 survivors sought services for sexual violence at health facilities across Tigray from November 2020 through June 2021. 21 This figure is more likely to be under-reported owing to the fact that many of the victims had poor healthcare access and some of them are less likely to seek healthcare for fear of stigmatisation. Besides, many of the health facilities were non-functional because the war has eroded the more than two decades previous investment and progress in the health systems and resulted in 70% of the health institutions either destroyed or their status could not be ascertained. 20 22

Furthermore, a recent survey report revealed that only 17.5% of the health centres were functional after 6 months of the war. 22 Most of the available evidence on SGBV in Tigray during the conflict period was based on reports from the limited functioning health facilities and hence it is likely to be unrepresentative. The scale and burden of the war-related violence at community level in Tigray is also not comprehensively known. 21 Thus, the purpose of this study was to determine the extent, and distribution of the SGBV and its impact on survivors using community-level survey during the first round of the active war period of the war in Tigray. The findings of this study are anticipated to be used as baseline data on the burden, severity and factors of the SGBV during the war period in Tigray, Ethiopia. Besides, the findings contribute valuable data to humanitarian agencies, as well as national and local authorities in providing a comprehensive medical and psychological support to survivors, and in reducing the burden of SGBV against women and girls during wars and conflicts in Tigray and elsewhere. The findings will also provide guidance on the needs for, and/or availability of, health services for survivors of SGBV, including further intervention for establishing medical and psychological services, continuous follow-up and support for survivors.

A community-based survey was conducted in six zones of Tigray, after the Eritrean, Ethiopian and Amhara forces left Mekelle, the capital of Tigray, Ethiopia. By 28 June 2021, the Regional Government of Tigray restored its administrative control of most parts of Tigray. After the withdrawal of the allied forces, there was a relatively reduced active conflict in the parts of Tigray under the control of the Regional forces of the Government of Tigray. Thus, the survey was conducted during 4–20 August 2021 immediately after the withdrawal of the allied forces from most parts of Tigray. The western zone of Tigray and the districts bordering Eritrea were not included due to security reasons.

Women of reproductive age (ie, 15–49 years) recruited from the study communities were included as primary respondents in this survey. Information on girls under 15 years and women above 50 years of age were also collected from the primary respondents and is separately presented in this study. The period of the SGBV incidents covered from 4 November 2020 to 28 June 2021.

Multistage cluster sampling was used to select women of reproductive age from selected households (HHs). Tabiya/Kebelle (smallest administration unit) was considered as a cluster. A total of 52 districts out of 84 districts in the 6 zones of Tigray were randomly included in the study. The 52 districts included in this study accounted for 64.4% of the Tigray population. From each of these 52 districts, 4 Tabiyas/clusters were randomly selected and from each cluster, 20 HHs were randomly selected making a total sample size of 4160 HHs. If a selected HH had multiple women of reproductive age, only one woman was randomly selected for the interview.

Taking the prevailing situation on the ground into consideration, two sampling approaches (random and purposive) were designed with their own sample sizes. In the random approach, the list of all the Tabiyas/clusters in the district was used as a sampling frame and four Tabiyas/clusters were selected randomly. However, in the purposive approach, the Tabiyas/clusters were grouped into moderately and severely war-affected ones. The list of the severely affected Tabiyas/clusters was used as a sampling frame and four severely affected Tabiyas/clusters were randomly selected ( figure 1 ). ‘Random group’ and ‘purposive group’ was defined based on the information obtained from the local administrative authorities at the field level related to war situation of the context. The ‘purposive group’ was composed of those clusters which sustained repeated fighting in their communities, longer duration of stay of the combatants and harassment among the community members in the context. However, the ‘random group’ involved randomly selected clusters irrespective of the active war, and other characteristics of the prevailing situation. At the beginning of sampling, we planned to select four distinct Tabiyas/clusters per district from each group. However, due to the availability of only few Tabiyas/clusters in some districts, some Tabiyas/clusters were selected in both groups. Due to this situation, HHs data from a subset of Tabiyas/cluster were shared by both groups. A total of 3693 HHs were included from the purposive group and of these, 1489 HHs were unique to the purposive group. From the random group, 3682 HHs were randomly selected of which 1478 HHs were unique for random group. Thus, 2202 HHs were included in both groups. During the preliminary stage of the analysis, it was observed that there were no significant differences in sexual violence/rape between the two groups ( online supplemental table 1 ). Thus, the findings reported in this study are the results generated from the merged data of 5171 HHs. Additionally, the study included SGBV incidents on other members of the HH (including underage girls, men and old age women) based on the report of the index woman in each HH interviewed. Then, detailed interviews were conducted with each reported HH member about the types of violence and consequences. Generally, the dataset for the final analysis included a total of 5171 women of reproductive age group (15–49 years), 1196 men, 53 girls aged <15 years and 227 women aged 50 years and older.

Supplemental material

  • Download figure
  • Open in new tab
  • Download powerpoint

The sampling framework for the study. HHs, Households.

The outcome variables of this study were types of SGBV, and consequence of SGBV. For modelling the SGBV outcome variables like age, residence, religion, occupation, education, reproductive health characteristics, family member violence and health-facility utilisation were used as explanatory variables. For the consequence of SGBV, types of SGBV, age and education were used as explanatory variables.

Data collection, management and analysis

The data collection and field coordination process was challenging due to the ongoing war. Furthermore, all services including telephone network, electricity and transportation were not available in Tigray because of the war and siege. 21 For these reasons and other security-related issues, the application of electronic data collection tools including the use of mobile applications was impossible. Thus, we used a standard and validated interviewer-administered paper-based questionnaire to collect data following adoption of the tool from those used in WHO multicountry study 23 ( online supplemental file 1 ). The questionnaire was translated from English to Tigrigna (the local language) and then back translated to English by another translator to ensure consistency of the tool in data collection. The questionnaire consisted of various sections that enable the collection of data related to sociodemographic, reproductive health characteristics, SGBV, consequences, coping mechanism, self and family member violence and health-facility utilisation by victims.

We recruited two supervisors per district with educational level of MSc and above from the College of Health Sciences of Mekelle University, Tigray Health Research Institute and Tigray Regional Health Bureau. They were trained on the objectives of the study and the administration of the tool for 5 days in Mekelle, capital of Tigray. Transportation was arranged for field work 1 week ahead of the data collection period to facilitate the recruitment of competent female health extension workers (HEWs) for the data collection, to develop a map of each selected cluster and to prepare list of the HHs in the selected clusters. Two female HEWs were used as interviewers (data collectors) at each selected Tabiya/cluster; and a total of 416 HEWs participated in the 52 districts. HEWs were assigned one per group for both the ‘purposive group’ and the ‘random group’. The team of supervisors provided training to the data collectors for 3 days at each district. A representative of the health office at each selected district participated in the orientation of the HEWs to support the process of data collection. Then, the HEWs were allowed a 1 day exercise and pretest of the tool in the field in the same community but in HHs that were not sampled for the study. The tool was further adapted and adjusted to maximise the validity and reliability of the collected data. The supervisors used the closest health facility for accommodation and the district health office as a meeting place. Besides, they checked the completed questionnaires daily for any unclear or incomplete information or wrong coding. Errors were checked and addressed in the field, if any. The entire data collection process was coordinated and supervised by six teams of investigators (one team per zone), and any challenges during the data collection were addressed in a timely manner.

The collected data were entered into EpiData V.3.1. Quality of the data was further ascertained during data-entry and cleaning through visualisation and thorough correction of any errors and outliers. Besides, the rejected responses were re-evaluated. All these processes were done together with senior biostatisticians. Descriptive statistical analyses, with sampling weighting as necessary, were used for tabulation, cross-tabulation and computation of frequencies and percentages on selected variables. Sensitivity analysis was conducted by considering weighting for each selected district (Woreda) in the descriptive analysis. The χ² test was used to test for any associations of SGBV types with the consequences of the SGBV, with attention to potential corrections for multiple comparisons. 24 A logistic regression model for binary data was also used to assess the factors associated with war-related rape in the study area. All analyses were performed using the statistical package STATA V.15.1 (StataCorp, College Station, Texas, USA) and we considered the association to be statistically significant at a value of p<0.05.

Measurements

SGBV was measured according to WHO standard guidelines. 23

Sexual violence

Includes, at least, rape/attempted rape, sexual abuse and sexual exploitation. Sexual violence is any sexual act, attempt to obtain a sexual act, unwanted sexual comments or advances or acts to traffic a person’s sexuality, using coercion, threats of harm or physical force, by any person regardless of relationship to the victim, in any setting, including but not limited to home and work. Sexual violence can take many forms, including rape, sexual slavery and/or trafficking, forced pregnancy, sexual harassment, sexual exploitation and/or abuse and forced abortion.

Rape was defined to occur if a woman experienced any act of non-consensual sexual intercourse. This can include the invasion of any part of the body with a sexual organ and/or the invasion of the genital or anal opening with any object or body part. Rape and attempted rape involve the use of force, threat of force and/or coercion. Any penetration without consent is considered rape. Efforts to rape someone which do not result in penetration are considered attempted rape.

Physical violence

An act of physical violence that is not sexual in nature. It was measured as physical violence, if a woman had experienced at least one of the following: hitting, slapping, choking, cutting, shoving, burning, shooting or use of any weapons, acid attacks or any other act that results in pain, discomfort or injury. This incident type does not include female genital mutilation or cutting.

Psychological violence

Infliction of mental or emotional pain or injury. It was measured as psychological violence, if a woman had experienced at least one of the following: threats of physical or sexual violence, intimidation, humiliation, forced isolation, stalking, harassment, unwanted attention, remarks, gestures or written words of a sexual and/or menacing nature, destruction of cherished things, etc.

‘Purposive group’

Tabiyas/clusters selected randomly from the severely war-affected Tabiyas/clusters. The Tabiyas/clusters were considered severely affected if they are exposed to sustained active war and/or intermittent war, long duration of stay of the combatants in the area and harassment among the community members in context.

‘Random group’

Tabiyas/clusters selected randomly from a given district using the list of all Tabiyas/clusters (moderately and severely affected) as a sampling frame. Regardless of active war and other characters of the situation, random selections of Tabiyas/clusters were conducted.

Calculation of weights

Lists of districts/Tabiyas/clusters included in ‘random group’ and ‘purposive group’ based on the population of women aged 15–49 years were grouped and taken from the Tigray regional Health Bureau during 2020/21. Three columns were then created (list of districts, total sample taken, total population of women in the selected clusters per district) for each group within Excel. To calculate proper weights, we summed up the total samples and total population in each group separately. Then, weights were calculated as follows:

where w1 (district in purposive group) =total population in each district in purposive group/total population in purposive group; w1 (district in random group) =total population in each district in random group/total population in random group; w2 (district in purposive group) =total sample in each district in purposive group/total sample in purposive group and w2 (district in random group) =total sample in random group/total sample in random group.

Then, weighting data were generated in STATA. Based on this mechanism, the weight assigned to each selected district is found in online supplemental table 4 .

Patient and public involvement

The development of the research topic was directly motivated by the unprecedented experience of the people of Tigray due to the devastating conflict and the need to document the experienced conflict related to SGBV. The participation of study subjects was via their willingness to respond to the survey instruments. Findings from the study are disseminated to the local communities and appropriate authorities in order to inform policy decisions in restoring health services and providing much-needed medical and psychological treatment for SGBV victims.

Ethical considerations

As the interview was conducted by trained female HEWs, the probability of harm or discomfort in this assessment is anticipated to be minimal. All filled-questionnaires were anonymised via subject-level numerical identifiers and kept confidential.

Sociodemographic characteristics of women of reproductive age

In this study, information about 5171 women of reproductive age was included from six zones. Women included in this study were mainly from central (32.3%), eastern (28.2%) and north-west (15.2%) zones of Tigray. The median age of women was 32 years (IQR 26–38). Roughly, half of the study participants were from rural residences (2602, 50.3%). More than a third of the women interviewed were unable to read or write (1879, 36.3%); and 70% of the women were married while 34.5% of them reported their occupation as housewives and 29.9% as farmers ( online supplemental table 2 ). To enable contextual comparisons, the current population profile of Tigray, Northern Ethiopia is included in online supplemental table 3 .

Reproductive and obstetric characteristics of women aged 15–49 years

Of the 5171 surveyed women, 3476 (67.2 %) had at least one child under 5 years of age; and 422 (8%) were pregnant at the time of the survey. Of these pregnant women, 75 (17.8%) of the pregnancies were unwanted. A total of 137 (2.7%) women had an abortion during the study period (first 8 months of the war) ( online supplemental table 5 ).

Magnitude of sexual and gender-based violence

The incidents of sexual violence, physical violence and psychological violence were found to be 9.7%, 28.6% and 40.4%, respectively. Detailed disaggregated data on types of violence is provided in online supplemental table 6 . Overall, 43.3% (2241/5171) women experienced at least one type of gender-based violence (psychological, physical or sexual violence). Most women experienced various, and at times, multiple forms of violence. Nearly 7.4% of the women who experienced sexual violence had at the same time experienced physical and psychological violence ( figure 2 ).

War time prevalence of sexual and gender-based violence (SGBV) among women of reproductive age in 52 districts of Tigray (n=5171), 4 November 2020 to 28 June 2021.

Sexual and gender-based violence in relation to some sociodemographic characteristics

The level of SGBV varied among zones as well as by age, residence, education and marital status. Sexual, physical and psychological violence were highest in central zone followed by eastern zone of Tigray. Of the 500 women that reported experiencing sexual violence, 202 (40.4%) women were from the central zone followed by 144 (28.8%) women in the eastern zone. Similarly, 146 (29.2%) young women aged 15–24 years reported as being highly sexually abused, whereas women aged 35–39 years were more physically (307/1480; 20.7%) or psychologically (459/2091; 21.9%) abused. Women in reproductive age living in urban residence were more sexually abused (243, 48.6%), whereas rural residents were more physically (660; 44.6%) or psychologically (1011; 48.3%) abused. The women who did not join formal education (both those ‘unable to read and write’ and those ‘able to read and write’) were the most sexually abused groups (249; 43.8%). Of the 692 pregnant women who responded to the violence related to questions during the war time, 62 (12.6%) pregnant women reported of being sexually abused ( online supplemental table 7 ).

About 9.7% (500/5171) of the women of reproductive age had suffered from sexual violence; and 7.9% (411/5,171) women were raped during the study period. Of the 411 women that experienced sexual violence, 411/500 (82.2%) reported of being raped ( online supplemental table 7 ). Repeated cases of sexual violence were reported by women. Of those 411 women that reported being raped, 271 (65.9%) were abused once, 62 (15.1%) twice and 78 (19.0%) three times or more ( figure 3 ). Women of reproductive age were sexually abused on average by three soldiers to a maximum of nine (median=3, IQR 2–6). Most raped women reported being gang-raped, 68.4% (247/361). The remaining 50 raped women did not respond to this question. Most (45.9%) of the sexual violence on women of reproductive age had taken place during the first 150 days of the war, while 26.3% of the sexual violence occurred during the subsequent 86 days—a period that corresponded to the second-round of a large-scale military operation by the allied forces against Tigray forces.

Frequency and type of rape during Tigray war, 4 November 2020 to 28 June 2021.

Consequences of SGBV

The consequences of sexual, psychological and physical violence during the conflict period ranged from behavioural problems and injury to potentially lifelong health and physical complications. In this study, physical trauma was the most common consequence of SGBV (28.3%; 533/2241) including incidents of dislocations (cases=84), fracture or broken bone (cases=37), perforated eardrum or eye injuries (cases=20) and broken teeth (cases=8). Moreover, severe consequences of sexual violence among those who visited a health facility included HIV infection (11 cases), sexually transmitted infections excluding HIV infection (68 cases), unwanted pregnancy (39 cases) and others. The emotional and behavioural consequences of physical and sexual violence were also common, including depression (19.2%; 431/2241), social isolation (3.8%, 85/2241), suicidal ideation and attempt (2.6%, 58/2241) and others. As a result of the SGBV, majority of women reported an emotional change such as stress (84.1%; 1438/1709), anxiety (11.5%; 197/1709) as well as instances of flashbacks of the incidence, nightmare or sleeping disturbance. Although many women of reproductive age experienced various forms of violence, nearly 90% of the women (1629/1817) did not receive any medical care at a health facility (424 women did not provide a response about healthcare use after violence). The main reported reasons for not receiving healthcare were destruction of the health facilities (52.9%; 618/1169), victims’ disappointments (leading to hopelessness) (27.0%; 316/1169) and other reasons including physical disability and lack of transport ( table 1 ).

  • View inline

Consequences of SGBV in women of reproductive age from Tigray, 4 November 2020 to 28 June 2021

The findings showed that women who were physically abused had experienced statistically significant physical trauma, depression, suicidal ideation or attempt and social isolation (p<0.001). Similarly, sexually abused women experienced physical trauma, excessive worry, depression, suicidal ideation and attempt (p<0.001) ( table 2 ). These findings were still statistically significant after Bonferroni-type corrections were made for multiple comparisons within a given type of violence (ie, physical or sexual).

Consequences of SGBV in relation to the types of violence and sociodemography of women, Tigray, 2021

SGBV to any household members other than women of reproductive age

During the first 8 months of armed conflict, gender-based violence was not confined to women of reproductive age only. Other HH members (children, men/boys or elderly women) were also victims of SGBV as reported by the women of reproductive age interviewed in the study. Based on the responses obtained from the primarily interviewed respondents, more than a quarter (27.8%; 1417/5171) of other members (children, men/boys or elderly women) of the HH were reported to be victims of some form of psychological, physical or sexual violence. A total of 57.4% (686/1196) men/boys were reported to have experienced physical violence. Moreover, about 17.0% (9/53) of girls under 15 years of age and 8.4% (19/227) of older women above the age of 49 years were experienced sexual violence. Deaths of 242 HH members perpetrated by the allied forces were also reported in this study ( table 3 ).

SGBV among children, elderly women household members and women of reproductive age in Tigray war, 4 November 2020 to 28 June 2021

Factor associated with war-related rape

After entering significant variables with p value <0.05 in the bivariate and multivariate logistic regression models, women in the age group 20 and 24 years, with no formal education, being urban residence, student and unemployed, living in the temporary shelter within the community and rental house and having knowledge about raped incidence in the community were associated with war-related rape ( table 4 ).

Factors associated with war-related rape in Tigray, Ethiopia (n=411)

SGBV was one of the most serious and life-threatening occurrences that have affected women and children during the first 8 months of the war in Tigray. The findings from women of reproductive age showed that 1 in 10 women and girls experienced sexual violence, mostly rape, physical violence and psychological violence. Many women experienced severe forms of sexual violence such as gang-rape, sexual salivary, insertion of foreign objects into the vagina and harassment. Gang-rape was the most common form of sexual violence reported among those who were raped. Young girls under 15 years and elderly women above 49 years were also victims of sexual violence, mostly rape. Living in a temporary shelter within the community, age 20–24 years, being urban residence, no formal education, being student and unemployed by occupation were the risk factors associated with war-related rape in this study.

The findings from this study indicate higher incidence (nearly 10%) of rape than those reported in other studies during conflicts such as in Northern Uganda (4.2%), 13 Seirra Leone (8%) 15 and Ukriane (2.6%). 8 Similarly, the physical violence (28.6%) observed in this study was higher than the findings for East Timor, Indonesia where 22.7% of the women were physically assaulted. 25 However, the finding for the current study is lower than other studies for the Kurdistan region of Iraq (16.6%) 10 and South Sudan (21.3%). 11 The difference in prevalence with the Iraq study might be due to the fact that the study was at camp and dealing with a vulnerable group, and the South Sudan study covering a long duration of civil war from 2005 to 2011. In contrast, the current study was conducted at community level for only 8-month duration of war. Overall, the sexual violence observed in the current study showed consistently and systematically higher levels of incidence in the central and eastern zones of Tigray, where the invading allied forces of Ethiopia and Eritrea had more control. The higher incidence of rape and physical violence in this study compared with other studies may be due to (i) ill-intentions of the perpetrators to use systemic raping as weapon of war to dehumanise the population and (ii) the relatively long duration of the war (8 months) inside Tigray at the hands of the invaders, thereby exposing victims to more violence.

Most of the sexual violence reported in the current study was due to gang-rape, mainly perpetrated by soldiers and other armed groups affiliated with the governments of Ethiopia and Eritrea. This is a severe form of sexual violence and is likely to cause severe forms of psychological, mental, social and sexual disorders among the survivors. Similar findings during a conflict period were reported in eastern Democratic Republic of the Congo (DRC), Rwanda, Bosnia and Herzegovina. 14 26 27 Moreover, in the current study, many women and girls experienced sexual violence by three to nine soldieries. This is generally in-line with the report of the joint United Nations Office of the Human Rights Commission and Ethiopian Human Rights Commission, where women and girls in Tigray were sexually abused by maximum of six armed men during the war period. 20 Most sexual and physical violence was committed by Eritrean defence forces, consistent with the report of the HRW, where most brutal sexual violence was reported as committed by the Eritrean defence forces in the Tigray war. 21

Severe forms of sexual violence such as insertion of foreign objects to vagina and anus, and high rate of HIV transmission were reported in the current study, again consistent with the report from HRW report in February 2021, where insertion of foreign objects into the vagina and transmission of HIV by different armed forces were reported in the Tigray war. 20 21 28 The finding on the severity of sexual violence in the Tigray war in this study is reminiscent and in conformity with similar observations elsewhere. 3 16 17 The risk of HIV infection during conflicts is reported to be relatively higher in Africa due to multiple perpetrators of sexual violence. 29

Sexual violence of underage girls and elderly women reported in this study, ascertained indirectly via the interviewed index participants, are also consistent with the reports of different humanitarian agencies on the Tigray war during the same period. 20 This indicates that the extent and impact of sexual violence, mostly via rapes, were highly prevalent among all age groups of girls and women. The mental and psychological problems experienced by underage girls could have severe, lifelong and generational impact on sexual, behavioural, productivity and well-being of the victims and their families.

The incidence of sexual violence, mostly rape, was typically high in urban centres during the first 8 months of the war period (12% (urban) vs 8.3% (rural); online supplemental table 5 ). This is consistent with the other reports on the Tigray war, where most survivors who sought health services for postsexual violence (rape) were mainly from big towns. 20 Moreover, women from lower socio-economic status are less likely to disclose sexual violence as also reported in a study from eastern DRC. 16 The SGBV figures reported in this study are likely to be the tip of the iceberg of the problem as most of the women, particularly in the rural areas of Tigray are highly traditional and religious; have less sexual literacy and less access to healthcare during the early days of the war as most of the communities, trapped by the war, were hiding in hard-to-reach areas away from their houses and villages. Moreover, potential gross underestimation of cases could result from the strict moral codes and fear of stigma by traditional community and family hostility. 16 All survivors from rural or urban areas appear to share the same level of mental, psychological and health consequences as a result of sexual violence.

In this study, 90% of the sexually abused women were unable to receive medical and psychological support after experiencing sexual violence. This implies the likelihood of the majority of women and girls suffering from incidence of SGBV violence to suffer from its complications such as medical, behavioural and emotional disorders. This is consistent with the study on DRC conflict where only <5% women of SGBV victims sought medical care and the remaining women had to wait a year or longer prior to accessing SGBV services. 30 According to a recent survey, >70% of the health facilities in Tigray were non-operational in the first 6 months of the war (4 November 2020 to April 2021) 22 and only 17.5% of the health centres were functional, most of which were within the vicinities of Mekelle and other big cities such as Shire, Axum, Adwa, Adigrat and others on the main roads. 20 Therefore, the reason for the poor health seeking by survivors could be because most health facilities were destroyed, closed; lack of medications; absence of trained health providers; absences of transportation to health facility; traditional and religious barriers of discussing on sexual matters and fear for physical safety. The mental, psychological, economical and behavioural impact of SGBV on survivors and their families are thus severe and lifelong. Therefore, provisions of immediate medical and psychological support for all survivors are long overdue.

During the prewar period, the accessibility of health service in Tigray was >90%. Health centres could typically provide service for survivors of SGBV, including administered Pruritic papular eruption (PPE) in case of HIV exposure, test for pregnancy and other health services. However, during the period of war, 70% of the health facilities were non-functional 22 either due to complete destruction or absence of medications and/or trained health providers. Most functional health facilities were located at big towns where survivors were usually using the service during the war period. During the war period, the interim office of Tigray health bureau was reporting the numbers of raped survivors who sought health facility every month. 20 There were also mobile clinics given to increase accessibility of healthcare during the war period by the interim Tigray health bureau and different non-governmental organisations (NGOs). However, the availability of health services to respond to cases of SGBV during the war was very low. Few NGOs such as Médecins Sans Frontières-Spain, International Committee of the Red Cross (ICRC), local NGO, Mothers and Children Multisectoral Development Organization (MCMDO) and mobile clinics were participating to support survivors. Overall, it was very difficult to assure whether survivors were using health services appropriately given that 70% of the health facilities were non-functional during the war period. 22 In this study, we did not assess the situation of the catchment of health facilities on whether functional or not. We simply asked the survivors about their use of health services after the violence and, where appropriate, the reasons for not getting health service.

Survivors of sexual violence are likely to be at high risk of severe and long-lasting health problems, including death from injuries or suicide. 31 The deaths from sexual violence were not captured in this study, as the study was solely based on responses from women of reproductive age that were alive at the time of the study. Health consequences such as unwanted pregnancy, unsafe or self-induced abortion and sexually transmitted infections (STIs), including HIV infections were reported by survivors. Moreover, physical, mental and psychological traumas and emotional breakdown, stigma and discrimination were reported as consequences of the sexual and physical violence. These consequences have lifelong impact. 20 21 Thus, treatment of STIs, HIV, abortion care for late pregnancy and care for the child born from rapist, emotional and psychological support from local and international healthcare and humanitarian stakeholders is urgently recommended. Involving family members in counselling with survivors is also an important strategy to reduce conflict, stigma, disagreement and divorce between family and couples. Comprehensive approach in caring for survivors including provision of medical, psychological, social and economic support is critically needed. Therefore, establishing rehabilitation centres to care for survivors at the affected sites and devising local integration strategies are important.

The association of rape with young age and living in the temporary shelter within the community during the war period was evident in this study. The higher the incidence of rape in those groups may be explained due to family breakdown and absence of legal and social protection during crisis. Living in shelter without protection for women is difficult. The higher incidence of rape in urban area can be explained by the higher number of combatants who stayed in urban towns and distribution sites where most wars were taking place. In this study, most of the women in lower status were abused higher (girls/women with no formal education, student and/or unemployed women). This can be explained due to poor awareness of girls/women about combatants and may be disproportionally affected due to low status and poor protection given by family.

The findings of this study should be interpreted in the context of its strengths and limitations. The study has several notable strengths. It is the first population-based study of its kind to be conducted in Tigray following the eruption of the war on SGBV related, while still under severe siege. The study was carefully designed using sampling techniques that ensured adequate sample size randomly selected to represent all parts of Tigray region (except western zone and some northeastern districts occupied by the invaders) to systematically assess critical cross-cutting issues at the community level. Since the survey was conducted immediately after the invading forces left most parts of Tigray and not far from the time of the violence, the likelihood of recall biases is minimal. In the analysis, we considered weighted analysis in the summary statistics to minimise error in estimation of prevalence as result of the sampling procedure used. However, there are some limitations to be considered in using weighting in this study where the population figures are only likely to provide a partial picture during the conflict period as there is so much potential for forced mobility. Thus, calculating weighting by considering the fact that a district/Tabiya/cluster-level population may not show the actual population during the conflict period due to high mobility, even though it should still be valid in relative terms. The fact that the main findings do not change after the use of weighting is a strength and further evidence about the robustness of the findings reported in this study.

The interpretation of the findings should also consider some limitations of the study. Although the use of female local data collectors (ie, HEWs) may have had a positive outcome in creating an environment of safety and comfort for the respondents to disclose information, the use of the HEWs residing in the same place with the victims might also have negatively led to hiding or distortion of information for fear of shame or leak of information. Moreover, the HEWs are government employees in the specific area they may overestimate/underestimate finding. However, adequate orientation, and training was given with close supervision during data collection period. The majority of the western zone of Tigray and some pockets of areas bordering Eritrea were not included since they are currently under control of the Ethiopian and Eritrean forces and still active war areas. Note that the level of SGBV is expected to be more extensively prevalent in these areas where the conflict has been more severe and still ongoing. 20 21 Thus, the findings of this study are likely to be an underestimation of the true extent of SGBV in Tigray as a result of the war. Some data were missing due to reluctance to respond owing to the sensitive nature of the variables. Moreover, internally displaced women living at camps were not included, even though displaced women within the community or living in their own new HHs were included. In general, the number or proportions of recorded sexual violence might be underestimated due to under-reporting of sexual violence from surviving victims and/or lack of information from victims who were murdered or perished due to injuries during or after the assault.

Survivors/Victims generally do not speak of the incident for many reasons, including self-blame, fear of reprisals, mistrust of data collectors, risk/fear of re-victimisation, fear of shaming and blaming, social stigma and often rejection by the survivor/victim’s family and community. However, the findings in this study still indicate levels of SGBV that are substantially and alarmingly high—similar or even higher than reported levels from other conflict areas around the world. 3 16

Conclusions

SGBV was highly prevalent during the first 8 months of the Tigray war. Almost 10% of the girls and women of reproductive age interviewed were sexually abused, mostly by rape. Gang-rape was the most common and frequent form of sexual violence. Physical and psychological forms of violence were common too. Underage girls, elder women and men were also victims of sexual and physical violence. Physical traumas, depression, suicidal attempts, emotional change, unwanted pregnancy, STIs and HIV infection were the most common consequences of the SGBV reported. Ninety per cent of the survivors of sexual violence have not received medical care or psychological care because most health facilities were destroyed and looted. Urgent survivor centre approach with medical and psychological service; and sustained community support are recommended to reduce lifelong impact on behavioural, emotional, sexual, social and economic fortunes of SGBV victims. Of significant importance, there will be a need for further investigation of SGBV among girls and women in rural and urban communities of Tigray including those excluded from this study due to the occupation. Where and when possible, a complete inventory and investigation of every girl and woman of reproductive age in each Tabias/Kebeles is recommended in order to provide a full picture and extent of the SGBV as a result of the war throughout Tigray in order to reach out comprehensively to all victims to ensure they receive short-term and long-term rehabilitation, access to post-trauma services and socio-economic support.

Region-wide tracing of survivors is needed for further medical and psychological support. Because of the complete collapse of the health system in Tigray, the challenges in dealing with the health and psychosocial ramifications from this unprecedented crisis from SGBV in Tigray will require urgent and multisectoral effort by local and international partners.

Ethics statements

Patient consent for publication.

Not applicable.

Ethics approval

Ethical clearance was obtained from the Institutional Review Board (IRB) of the College of Health Sciences at Mekelle University (reference no: MU-IRB1905/2011) and support letters were secured from the Tigray Health Bureau and District Health Offices before the start of the actual data collection. Written consent (for adult women) or assents (for underage girls) were obtained prior to data collection.

  • UN general assembly
  • Muluneh MD ,
  • Francis L , et al
  • García-Moreno C ,
  • Rashida M ,
  • ↵ Guidelines on reporting on sexual violence in conflict . n.d. Available : https://www.coveringcrsv.org/wpcontent/uploads/2021/05/CRSV_Downloadable_UK_FULL.pdf
  • Capasso A ,
  • Skipalska H ,
  • Guttmacher S , et al
  • Goessmann K ,
  • Ibrahim H ,
  • Ellsberg M ,
  • Murphy M , et al
  • Gingerich T ,
  • Kinyanda E ,
  • Biryabarema C , et al
  • Bartels SA ,
  • Mukwege D , et al
  • Amowitz LL ,
  • Lyons KH , et al
  • Betancourt TS ,
  • Verhoeven H ,
  • Woldemariam M
  • EHRC and OHCHR
  • Human Rights Watch
  • Gesesew H ,
  • Berhane K ,
  • Siraj ES , et al
  • Schraiber LB ,
  • Latorre M do RDO ,
  • França I , et al
  • Kleinbaum D ,
  • Nizam A , et al
  • Robertson K ,
  • Ward J , et al
  • Jovanović N , et al
  • Elisabeth R ,
  • Hossain M , et al
  • Steiner B ,
  • Benner MT ,
  • Sondorp E , et al
  • Bastick M ,

Supplementary materials

Supplementary data.

This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

  • Data supplement 1
  • Data supplement 2

Handling editor Seye Abimbola

Contributors All authors have contributed significantly to the conduct of the study, analysis of the data, writing and interpretation of the manuscript and revision of this submission. GF is the guarator with full responsibility for the work and/or the conduct of the study, had access to the data, and controlled the decision to publish.

Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

Competing interests None declared.

Patient and public involvement Patients and/or the public were involved in the design, or conduct, or reporting, or dissemination plans of this research. Refer to the 'Methods' section for further details.

Provenance and peer review Not commissioned; externally peer reviewed.

Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.

Read the full text or download the PDF:

IMAGES

  1. How To Introduce A Quote In An Essay

    research paper intro quote

  2. 💣 Can you start a research paper with a quote. How to Start an Essay

    research paper intro quote

  3. How to use Quotes in an Essay in 7 Simple Steps (2024)

    research paper intro quote

  4. How to Write an Introduction for a Research Paper Step-by-Step?

    research paper intro quote

  5. Quote In Essay Example

    research paper intro quote

  6. Example Of An Introduction For A Research Paper

    research paper intro quote

VIDEO

  1. Imperial Paper Intro

  2. My SpongeBob paper intro

  3. Magic paper intro.... ☺️ #art #craft #creativeswara

  4. Paper intro

  5. Weekly Outline Assignment and Final Paper Intro

  6. Research Article writing Workshop Part 2: Writing Introduction and Abstract

COMMENTS

  1. Writing a Research Paper Introduction

    Table of contents. Step 1: Introduce your topic. Step 2: Describe the background. Step 3: Establish your research problem. Step 4: Specify your objective (s) Step 5: Map out your paper. Research paper introduction examples. Frequently asked questions about the research paper introduction.

  2. How to Start an Essay With a Quote: 14 Steps (with Pictures)

    5. Hook your reader. Think of a quotation as a "hook" that will get your reader's attention and make her want to read more of your paper. The well-executed quotation is one way to draw your reader in to your essay. [2] 6. Ensure that the quotation contributes to your essay.

  3. How to Write a Research Paper Introduction (with Examples)

    Define your specific research problem and problem statement. Highlight the novelty and contributions of the study. Give an overview of the paper's structure. The research paper introduction can vary in size and structure depending on whether your paper presents the results of original empirical research or is a review paper.

  4. 5 Ways to Quote in a Research Paper

    4. Quote important evidence. Quotations can be particularly helpful for an argumentative or study-based research paper, as you can use them to provide direct evidence for an important point you are making. Add oomph to your position by quoting someone who also backs it, with good reason.

  5. Starting Your Research Paper: Writing an Introductory Paragraph

    The middle sentences cover the different points in your paper. If you've already planned which order to write the points in the paper, you already know which order to place them in your introductory paragraph. (Hint: it's the same order). You don't have to include every single point, but make sure the important ones get in there. Ending Sentence

  6. How to Write an Introduction for a Research Paper

    After you've done some extra polishing, I suggest a simple test for the introductory section. As an experiment, chop off the first few paragraphs. Let the paper begin on, say, paragraph 2 or even page 2. If you don't lose much, or actually gain in clarity and pace, then you've got a problem. There are two solutions.

  7. Research Paper Introduction Examples

    Quotes, anecdotes, questions, examples, and broad statements—all of them can be used successfully to write an introduction for a research paper. It's instructive to see them in action, in the hands of skilled academic writers. Let's begin with David M. Kennedy's superb history, Freedom from Fear: The American People in Depression and ...

  8. How to Write a Research Paper Introduction in 4 Steps

    1. Get your readers' attention. To speak to your readers effectively, you need to know who they are. Consider who is likely to read the paper and the extent of their knowledge on the topic. Then begin your introduction with a sentence or two that will capture their interest.

  9. How to Write a Research Paper Introduction (with Examples)

    1- In this paper, I will discuss climate change. Problem: This statement is too broad and vague. It does not provide a clear direction or specific argument. 2- This paper argues that climate change, measured by global average temperature change, is primarily driven by human activities, such as.

  10. How to Write an Introduction for a Research Paper

    When writing your research paper introduction, there are several key elements you should include to ensure it is comprehensive and informative. A hook or attention-grabbing statement to capture the reader's interest. It can be a thought-provoking question, a surprising statistic, or a compelling anecdote that relates to your research topic.

  11. Quoting and integrating sources into your paper

    Important guidelines. When integrating a source into your paper, remember to use these three important components: Introductory phrase to the source material: mention the author, date, or any other relevant information when introducing a quote or paraphrase. Source material: a direct quote, paraphrase, or summary with proper citation.

  12. Is it bad to start an introduction with a direct quote?

    If the quote is relevant to the importance you can use it by a presentence of yourself. Overally speaking the direct quote in academics is not generally acceptable with an exception for pioneers of that field. You can use an implication of this qoute and write in your language with a citation to the original document. Share. Improve this answer.

  13. Who Said What? Introducing and Contextualizing Quotations

    Download this page as a PDF: Introducing and Contextualizing Quotations. Return to Writing Studio Handouts. Quotations (as well as paraphrases and summaries) play an essential role in academic writing, from literary analyses to scientific research papers; they are part of a writer's ever-important evidence, or support, for his or her argument.

  14. How to Quote

    Citing a quote in APA Style. To cite a direct quote in APA, you must include the author's last name, the year, and a page number, all separated by commas. If the quote appears on a single page, use "p."; if it spans a page range, use "pp.". An APA in-text citation can be parenthetical or narrative.

  15. Words that introduce Quotes or Paraphrases

    For more information on MLA Style, APA style, Chicago Style, ASA Style, CSE Style, and I-Search Format, refer to our Gallaudet TIP Citations and References link. Words that introduce Quotes or Paraphrases are basically three keys verbs: Neutral Verbs: When used to introduce a quote, the following verbs basically mean "says".

  16. How to Write a Research Proposal: (with Examples & Templates)

    The Introduction or Background section in a research proposal sets the context of the study by describing the current scenario of the subject and identifying the gaps and need for the research. A Literature Review, on the other hand, provides references to all prior relevant literature to help corroborate the gaps identified and the research need.

  17. Suggested Ways to Introduce Quotations

    Use An Introductory Phrase Naming The Source, Followed By A Comma to Quote A Critic or Researcher. Note that the first letter after the quotation marks should be upper case. According to MLA guidelines, if you change the case of a letter from the original, you must indicate this with brackets. APA format doesn't require brackets.

  18. Accurate Quote Explanation Generator + Helpful Guide & Tips

    Quotes can make your writing more subtle and creative. This way, the text becomes more interesting and less monotonous. Larger context. Adding quotes lets you better evaluate and discuss the topic. They expand on the context and give essential details to your piece. More credibility. Having quotes in your paper makes it more credible.

  19. Ted Bundy Research Paper

    Ted Bundy Research Paper; ... Introduction: "I didn't know what made people want to be friends. I didn't know what made people attractive to one another. I didn't know what underlay social interactions." ... psychopaths. This quote shows the most basic characteristics used to describe a psychopath - someone having no emotion and not ...

  20. White Paper: Types, Purpose, and How to Write One

    White Paper: A white paper is an informational document, issued by a company or not-for-profit organization, to promote or highlight the features of a solution, product, or service. White papers ...

  21. Leading role of Saharan dust on tropical cyclone rainfall in the ...

    The predicted/observed mean Tropical Cyclone Rainrate (TCR) within 600 km of the TC center (R < 600): for (A) the non-DOD model and (B) the DOD model using the scatter density plot (out-of-sample predictions are made for five testing sets and then combined, then 100 bins with equal intervals are generated for the TCR ranges.The count of scatters is summarized within each box); (C) difference ...

  22. Predicting hospital length of stay using machine learning on a large

    Stone et al. [] present a survey of techniques used to predict the LoS, which include statistical and arithmetic methods, intelligent data mining approaches and operations-research based methods.Lequertier et al. [] surveyed methods for LoS prediction.The main gap in the literature is that most methods focus on analyzing trends in the LoS or predicting the LoS only for specific conditions or ...

  23. Full article: Intercultural family-school cooperation and the dilemma

    Introduction. Europe has historically been characterised by its cultural and linguistic diversity. In the current era, marked by globalisation and increased mobility, the diversity is perpetuated by both internal population movements within Europe and external immigration to the continent (Eurostat, Citation 2023).Consequently, this rich cultural and linguistic diversity is also mirrored in ...

  24. War-related sexual and gender-based violence in Tigray, Northern

    Introduction Sexual and gender-based violence (SGBV) during armed conflicts has serious ramifications with women and girls disproportionally affected. The impact of the conflict that erupted in November 2020 in Tigray on SGBV is not well documented. This study is aimed at assessing war-related SGBV in war-affected Tigray, Ethiopia. Methods A community-based survey was conducted in 52 (out of ...