Backing Up the Evidence: Systematic Review and Meta-Analysis

A meta-analysis or systematic review (MA/SR) can be a powerful tool to aggregate data on a specific question and arrive at a universal answer based on current literature. Meta-analyses are often used to guide best practice in clinical practice. However, there is varying quality between different meta-analyses – not every meta-analysis is done well. Likewise, high quality systematic reviews do not necessarily have to be published in established review databases, like the Cochrane Database of Systematic Reviews.1 It is important to be able to distinguish MA/SRs with high quality evidence from others to help determine best evidence-based practice.

Several guidelines have been published on how to write a quality MA/SR, including Cochrane Handbook,1 MOOSE,2 QUORUM,3 and PRISMA4 guidelines. These are tailored for the author's interpretation. This article will help to clarify what you need to know to evaluate a MA/SR.

Is There a Clear Question?

It is important to consider if the systematic review or meta-analysis had a question that met the SMART criteria used for goal setting: Specific, Measurable, Attainable, Realistic, and Time-Bound (Table 1).

Clear questions will help lead to more clinically and statistically relevant answers. Studies that address different questions from different perspectives may have variability between them, leading to statistical heterogeneity. Too much heterogeneity between studies may make the meta-analysis results irrelevant to clinical practice. It is also important to consider if there is enough literature on the topic to make an informed decision and whether it is realistic that the study found every article relevant to the study question.

Time frame is another factor. Several important issues related to time include new technology (ultrasound), changes in hospital systems and healthcare policy, and availability of alternative treatment or diagnostic tools. Consider, for example, that the diagnostic accuracy of a test depends on what the “gold standard” (reference test) is, but 20 years ago the “gold standard” may not have been as accurate. The same holds true for the “standard of care” against which you may be comparing a medication or procedure.

What is the Protocol?

It is important to consider what the study defined from the outset as its inclusion and exclusion strategy for identified studies in a PRISMA format.5 You must take into consideration if they searched an adequate amount of databases and included broad enough search terms. For example, “ultrasound” could also be listed under “sonogram,” “ultrasonogram,” “sonograph,” “POCUS,” or “ultrasonography.” While it may be adequate for background literature searches, a simple search of basic keywords is not enough to meet the rigor needed for an MA/SR.

How Did They Analyze the Data?

It is important to understand the basic statistics in a meta-analysis. One important issue is whether the results were synthesized using a “fixed” or “random” effects model. Simply, the random effect model will assume the studies are exactly the same and any differences are actually because of true differences in the question. A fixed effect model will assume the studies may have variability between them in areas such as design or risk of bias. The Forest Plot is a visual tool commonly used to summarize study results. It displays the result and effect size of each study about a solid, vertical “line of no effect.” The summarized results are generally shown at the bottom of the graph. An important note about the Forest Plot is that if the confidence bar crosses the line of no effect, for an individual study or the summary statistic, the results are not statistically significant.

A systematic review generally does not contain the formal statistical summation of a meta-analysis. This is due to the design of systematic reviews: too much variability in the question, there was not enough data or studies on the subject, or synthesizing the data was not possible. It is important to look at how the review grouped the included studies and analyzed the results. There could be confounders or other methods of grouping that could yield a different conclusion. The process of combining multiple outcomes is one example. For example, it is important to know the rate of adverse drug events of a drug in addition to its effectiveness to determine if it will be beneficial.

Also, beware of surrogate outcomes.  Flecainide and encainide are historical examples.  These anti-arrythmics were FDA approved based on their ability to decrease arrhythmias, but further analysis showed they actually increased mortality.6

Of What Quality are the Studies?

It is important that the MA/SR assesses the quality of the studies analyzed, preferably with a previously validated tool. Some examples for various types of studies include the Medical Education Research Quality Instrument (MERSQI),7 Quality Assessment of Studies of Diagnostic Accuracy Included in Systematic Reviews 2 (QUADAS-2),8 and the Cochrane tool for assessing risk of bias in randomized controlled studies.9 It can be helpful to use an appropriate validated tool to assess the quality of the included studies compared against other similar studies. It is also best if more than one person reviewed each article to score its quality.

It is necessary to consider the level of evidence of the included studies. It is possible there were no, or few, high-quality studies on a topic. However, if the authors limited the MA/SR to only randomized controlled trials, you need to evaluate the synthesis appropriately. Also, consider if the authors acknowledge that the quality of evidence is limited.

Table 1.  Examples of Good and Bad Research Questions Based on the SMART Criteria

Criteria Good example Bad example
Specific Does tPA administered within 3 hours of presentation of acute ischemic stroke improve mortality? Does tPA improve mortality for stroke?
Measurable Does IV magnesium administered to status asthmaticus patients in the pre-hospital environment decrease admissions? Should IV magnesium be administered in pre-hospital status asthmaticus?
Attainable Does early aspirin administration improve mortality in patients with suspected MI? Does [new drug X (which was just FDA approved last month)] improve mortality in patients with MI
Realistic Do heart rate measurements at triage predict ED LOS? Does any vital sign measured any time in the ED predict ED LOS?
Time-bound What are the test characteristics for point of care ultrasound in diagnosing pleural effusion? What are the test characteristics of ultrasound in diagnosing pleural effusion?


Do the Results Make Sense?

Possibly the most important aspect of the process is to determine if the results of the MA/SR are consistent with what you see in your current practice. If not, think about why there are differences between them. Is the MA/SR high quality, and should it be taken at face value? Do the study question and included studies address all of the complex issues you see in practice? Make sure the results make sense to you, and, if not, discover why.

In conclusion, MA/SRs are among our best tools to determine the evidence to support our clinical practice. It is important, though, to realize that not all MA/SRs should carry the same weight in your decision-making. You should now be able to quickly critically appraise an MA/SR and identify methodologic weaknesses so you can incorporate that information and give the highest quality level of care based on current research.


  2. Stroup DF, Berlin JA, Morton SC, et al. Meta-analysis of observational studies in epidemiology: a proposal for reporting. JAMA. 2000;283:2008–2012.
  3. Clarke M. The QUORUM statement. Lancet. 2000;355:756–757.
  4. Moher D, Liberati A, Tetzlaff J, Altman DG, The PRISMA Group.Preferred reporting items for systematic reviews and meta-analyses: The PRISMA Statement. PLoS Med. 2009;6(6):e1000097.
  6. Echt D, Liebson PR, Mitchell LB, Peters RW, Obias-Manno D, Barker AH, et al. Mortality and morbidity in patients receiving encainide, flecainide, or placebo. The Cardiac Arrhythmia Suppression Trial. NEJM. 1991;324(12):781-788.
  7. Reed DA, Beckman TJ, Wright SM, Levine RB, Kern DE, Cook DA. Predictive validity evidence for medical education research study quality instrument scores: quality of submissions toJGIM's medical education special issue. J Gen Intern Med. 2008;23(7):903-907.
  8. Whiting PF, Rutjes AW, Westwood ME, et al. QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med. 2011;155(8):529–36.
  9. Higgins J, Altman D, Gotzsche P, et al. The Cochrane Collaboration's tool for assessing risk of bias in randomised trials. Br Med J.2011;343:d5928.