Determination of the correct timing for extubation in patients receiving mechanical ventilation (MV) is crucial, and predictors of the success are a topic of debate among specialists since extubation failure contributes to mortality and a variety of life-threatening complications [1–4]. The physician’s subjective ability to predict successful weaning has low accuracy [5, 6]; therefore, objective clinical manifestation and conditions have been evaluated as playing a role in prediction of weaning failure [7]. Although traditional objective indices and the rapid shallow breathing index (RSBI) may present a summary of the patient’s overall conditions, they may not clarify the underlying reason for the weaning trial failure [6–10]. However, of these, the RSBI has been shown to determine the extubation outcomes more accurately with specific cut-off values [11]. Recent findings suggest that diaphragm dysfunction (DD) is frequently involved during weaning failure and that it is associated with poor prognosis at the time of liberation from MV. Recently, ultrasonographic evaluation of the diaphragm muscle has shown promising improvement in the prediction of successful weaning, since DD accounts for a large number of extubation failures [12, 13]. Studies not only have confirmed correlation of the ultrasonographically measured diaphragm muscle thickness with lung volumes during inspiration, but also accurately diagnosed diaphragm atrophy and paralysis [14–16]. Two principal diaphragm evaluations via ultrasonography (US) are the measurement of the diaphragmatic excursion (DE) and calculation of the diaphragm muscle thickness during inspiration and expiration [17, 18]. According to the literature, these imaging techniques are non-invasive and seem to provide high and acceptable diagnostic accuracy in evaluation of the diaphragm function when compared to the reference method of diaphragm assessment which is phrenic nerve stimulation, especially in critically ill patients admitted to intensive care units (ICUs) [17–19]. Although recent studies have contributed to promoting our knowledge of US examination advantages in diaphragm muscle function evaluation, it has not been considered as a conventional approach to monitor diaphragm function and predict the optimal extubation time. Thus, there remains a need for a reliable, accurate and applicable method for prediction of the weaning outcomes. The present study aims to evaluate the accuracy and applicability of the bedside US examination of the diaphragm muscle in the prediction of ventilator weaning success.
METHODS
A systematic review was carried out on the published articles reporting the accuracy of diaphragm US in prediction of weaning success in critically ill patients undergoing MV. The study was conducted in a PICOS format (i.e. popu-lation, intervention, comparisons, outcome and study type), as follows:
Population: Critically ill patients receiving MV admitted to the ICU and candidates for ventilator weaning.
Intervention: Bed-side diaphragm muscle ultra-sonographic examination in order to evaluate diaphragm muscle thickness or excursion.
Comparisons: 1) diaphragmatic excursion; 2) diaphragmatic dysfunction (analysis subdivided based on the pressure support during the weaning trial); 3) RSBI.
Outcome: Data relevant to diaphragm muscle characteristics including end-inspiratory and end-expiratory thickness and muscle excursion, as well as data regarding ventilator weaning success rate. Additionally, prediction of successful weaning [sensitivity (SE), specificity (SP), positive predictive value (PPV), negative predictive value (NPV), true negative (TN), true positive (TP), false negative (FN), and false positive (FP)] was taken into consideration.
Study type: Reviews, protocols, experimental studies, letters, comments, editorials and case reports were excluded. The reference lists of the retrieved articles were screened for additional relevant studies. The study is in accordance with the Quality Assessment of Diagnostic Test Accuracy Studies (QUASDAS II) statement guidelines (Table 1, Figure 1) [20].
TABLE 1
Search strategy and selection criteria
A systematic literature search was conducted by combination of the following search term groups according to the PubMed algorithm: “Ultrasono-graphy” AND “Diaphragm” AND “Ventilator weaning” OR “Discontinuation of the mechanical ventilation” in the Medline (PubMed), the Web of Science (ISI), Embase and Google Scholar databases. The search covered all of the relevant articles published until August 2020.
Inclusion and exclusion
The inclusion criteria were as follows: (1) type of study: prospective or retrospective study involving human participants published in a peer-reviewed journal; (2) population: subjected to invasive MV for at least 24 hours; (3) intervention: thickness and excursion of the diaphragm measured by ultrasound during the weaning process or at a spontaneous breathing trial (SBT) and (4) predefined outcomes: the primary outcome was the accuracy of diaphragm ultrasound for predicting weaning outcomes in critically ill adults. Weaning failure was defined broadly as SBT failure or the need for re-intubation, or non-invasive MV or death within 48 hours. Weaning success was defined as the absence of criteria for failure. The secondary outcome was the influence of DD on the weaning outcome. The exclusion criteria were as follows: (1) abstracts, letters, editorials, expert opinions, reviews and case reports; (2) articles without sufficient data for the calculation of ORs or relative risk with 95% CIs; (3) studies performed in settings other than critical care (i.e., patients ventilated for elective surgery) and (4) maximal not mean DE as the ultrasound measurement.
Study selection and data extraction
The titles and abstracts of the retrieved articles were independently screened by two authors (A.R. and S.F.). The full articles of interest were then reviewed by the same authors to select the articles and extract the data (Figure 2). In case of disagreement, a third author (A.M.) decided about the inclusion or exclusion of the studies. The following variables were extracted from included studies: first author, total sample size, country, study design, baseline characteristics of patients, data regarding reason for ventilation, severity scales for ICU admitted patients, mean ventilation time, US assessment technique, diaphragm muscle thickness or excursion, weaning success rate, mean admission length, and complications such as reintubation and mortality. The authors achieved an agreement rate of 100% on inclusion of studies. The assessment of quality of enrolled studies was performed using the QUADAS-II tool [20]. On this basis, each item is scored “low risk” if reported, “high risk” if not reported, or “(unclear)”.
Statistical analysis
There was extensive heterogeneity between cut-off points of the studies during DTF and DE evaluations, which prevented us from performing a uniform meta-analysis. Thus, the analysis was obliged to consider different DTF cut-off values in three distinguished groups. On this basis, the pooled diagnostic evaluation characteristics analysis was carried on in different subdivisions, as follows: a) subanalysis of the DTF measurement accuracy during pressure support and self-breathing weaning trials, b) subanalysis of the US diagnostic accuracy in prediction of the successful weaning trial based on DTF thresholds, c) examination of the accuracy of DE for prediction of a successful weaning trial, d) assessment of the accuracy of RSBI for prediction of the weaning trial tolerance. All statistics were reported as point values with the 95% confidence interval (CI). Data extraction was performed to construct 2 × 2 tables. Subsequently, in comparison with reference standard results, the index test results were categorised as TP, FP, FN, or TN. There were no indeterminate results through the data extraction among the studies. The diagnostic odds ratio (DOR) was calculated as (TP × TN)/(FP × FN) and considered as the overall indicator of diagnostic performance and demonstrated the extent to which the odds of weaning trial failure is greater for patients with decreased DTF or increased RSBI compared to patients with increased DTF or a lower RSBI value. Summary receiver operator characteristics (SROC) curves were constructed to examine the interaction between sensitivity and specificity. Sensitivity analysis was carried out using STATA statistical software version 14, (StataCorp, College Station, TX, USA) by excluding each article separately. In addition, Meta-Disc 1.4 was used for further analysis, including heterogeneity calculation and judgement.
RESULTS
Characteristics of the studies
After the initial search in PubMed, ISI Web of Sciences, Embase and Google Scholar, 2738 articles were yielded. Studies that enrolled patients without MV, patients suffering neuromuscular disease, or evaluated diaphragm muscle atrophy, dysfunction or thickness changes regardless of the weaning outcomes, were excluded. After the screening of titles and abstracts, and the removal of duplicates, 43 articles with full texts were evaluated for inclusion in the study. Of these, 24 articles were excluded for the following reasons: ten studies were not primary studies, five studies evaluated diaphragm muscle atrophy, two studies used phrenic nerve stimulation rather than ultrasonography, two studies only reported ultrasonography examination reproducibility, one study evaluated patients under MV, one evaluated only patients with high risk of reintubation and three studies were conducted on children or non-mechanically ventilated patients. Therefore, 19 studies were evaluated in the final analysis [15, 21–38]. Included studies’ quality assessed by the QUADAS II tool are shown in Table 1. QUADAS evaluated studies’ bias via 11 questions considering risk of bias (patient selection, index test, reference standard, flow, and timing) and applicability concerns (patient selection, index test, reference standard). Ventilation type during US examinations and its technique are reported in Table 2, as well as RSBI calculations and cut-off values. DTF was obtained at tidal inspiration using the following formula: (diaphragm thickness at end inspiration – diaphragm thickness at end expiration/diaphragm thickness at end expiration).
TABLE 2
Diaphragm thickness fraction
Of the twelve studies, the pooled sensitivity and specificity were 89% (I2 = 72.9%) and 81% (I2 = 66.5%), respectively, with DOR of 36.2 (I2 = 46.7%), while the area under the ROC curve (AUC) was 0.93 (Figures 3 and 4).
In patients who underwent the PS weaning trial, the pooled sensitivity, specificity, and DOR of DTF measurement in prediction of weaning success were 84%, 77%, and 22.4, respectively. However, DTF measurement performed during SBT using US showed the pooled sensitivity of 92%, specificity of 78%, and DOR of 48.1. Furthermore, studies were categorised with due attention to the DTF threshold value reported to determine DD, as follows: DTF < 25%, DTF: 25–30%, and DTF > 30% (Table 3).
TABLE 3
TABLE 4
For respective DTF thresholds, the pooled sensitivity, specificity, and DOR were:
Diaphragmatic excursion (DE)
DE was assessed in 701 patients enrolled in eleven studies [15, 25, 26, 30–34, 36–38] during spontaneous breathing. The pooled sensitivity and specificity were 79.9% (I2 = 65.3%) and 69% (I2 = 75.1%), respectively, with DOR of 9.1 (I2 = 59.1%).
Rapid shallow breathing index
Nine studies calculated RSBI during SBT to overview its value as a guide for a successful weaning trial, using the following formula: breathing frequency/tidal volume [15, 21, 26, 28, 29, 31–33, 37, 38]. The analysis demonstrated the sensitivity of 74% (I2 = 91.3%) and specificity of 73% (I2 = 83.4%), as well as DOR of 9.94 (I2 = 46.7%) for RSBI in weaning trial outcome prognosis.
Discussion
Our study demonstrated that US imaging of the diaphragm muscle has a potential role in predicting ventilator weaning outcome. The non-invasiveness and accessibility of US-derived measures provide an advantage over the transdiaphragmatic pressure calculation (Pdi), which is considered to be the gold standard in diagnosis of diaphragmatic function. Pdi is a subjective measure that requires coaching of the patient and, even though different techniques have been introduced to enhance its applicability and decrease the costs [18, 39], it does not offer an easy bedside technique for evaluation of the diaphragm strength and ventilator weaning tolerance.
In several studies, DTF measured via US turned out to be a practical tool in the assessment of the muscle function and breathing workload [40, 41]. Despite the report by Cartwright et al. [42] that reported no statistically significant change in diaphragm muscle thickness during ICU admission, some studies showed that initiation of the MV leads to acute thinning and atrophy of the diaphragm muscle which, in turn, increases the duration of MV and lowers the probability of MV liberation [43–46]. Although they stated that the association of diaphragm thinning with DD is unclear, further studies evaluating diaphragm thickness and excursion proved that DD is followed by a reduction in muscle thickness which predisposes individuals to weaning trial failure. Similarly, in another study, Goligher et al. [45] used a different US measurement index defined as the thickness of the diaphragm (TDI); although it indicated diaphragm thinning in patients with MV, no significant correlation was detected between TDI and weaning outcomes. Afterwards, Mistri et al. [47] showed that diaphragm atrophy during MV is associated with a decrease in DTF value in patients admitted to a paediatric ICU. Additionally, increased DTF was suggested as a potential predictor of successful extubation. Concerning the studies mentioned above, Vallette et al. [48] reported their experience of DD diagnosis using DUS upon admission to the ICU in patients with acute respiratory failure and suggested that ultrasonography of diaphragm may be useful in identifying patients at high risk of difficult weaning.
Our results showed that US examination of the DTF can be better to administer during SBT rather than PS, which was consistent with the literature [49]. However, although the current study tried to obtain more accurate outcomes by comparing the diagnostic accuracy among different thresholds, the low number of the studies in each group prohibited the study from obtaining an exact comparison and providing better outcomes. However, a lower DTF threshold seems to increase the diagnostic accuracy of the US measurement in differentiation of successful weaning, which was not in accordance with previous reports [50].
Regarding the studies evaluating the DE via the US, we were able to run a meta-analysis and calculate the pooled sensitivity and specificity for prediction of the weaning prognosis. Nevertheless, many of the performed studies considered different cut-off values to discriminate DD and this heterogeneity prevented us reaching a better conclusion. Thus, the meta-analysis suffered from a high rate of heterogeneity that resulted in low quality of the obtained results. However, DE provided lower diagnostic accuracy in comparison to the DTF measurement. The majority of the studies in this review, which considered US examination of the diaphragm muscle, evaluated repeatability and reproducibility of the US between different measurement sessions and different operators, suggesting high reproducibility and feasibility in mechanically ventilated patients. Additionally, further studies revealed that the US examination of the diaphragm provides acceptable interclass correlation both in children and adult patients. The overall sensitivity and specificity of the RSBI regardless of the threshold value were 74% and 73%, which were comparable to outcomes of the DE measurement analysis. However, for DE and RSBI diagnostic accuracies were noticeably lower compared to that of the DTF obtained by diaphragm US. Besides, a single study evaluated combined RSBI and DTF in prediction of successful weaning, which led to decreased sensitivity and specificity compared to single RSBI and DTF. Thus, we suppose US derived indices for diagnosis of DD are able to provide higher sensitivity and specificity for diagnosis of diaphragmatic dysfunction compared to conventional parameters, but lack of a unique and exact cut-off value for differentiation of diaphragmatic dysfunction prevents us obtaining more reliable results. However, this review underlines some methodological strengths and weaknesses encountered in the reviewed studies. None of the enrolled studies used cross-sectional or case-control designs that notably decreased risk of bias. Although the reference standard (ventilator weaning tolerance for 48 hours) was identical in all of the studies, representing high quality, some studies calculated diagnostic cut-off points based on the acquired data, which made the pooled performance data less meaningful. A recently performed meta-analysis showed that lung and diaphragm US can help predict weaning outcome, but its accuracy may vary depending on the patient subpopulation. However, sensitivity was low because weaning is also affected by non-diaphragm-related factors. Further research in subgroups of critically ill patients applying a homogeneous definition of weaning and uniformly conducted measure is needed to assess the accuracy of diaphragm US [51–53].
Our review has some potential limitations. Firstly, the most critical limitation of this review is the significant heterogeneity of the included studies (i.e., different cut-off values in US measurements). Secondly, a limited number of randomised controlled trials was available for inclusion. Furthermore, only two studies evaluated the diagnostic accuracy of the DE in mechanically ventilated patients, leading to limited generalizability of the review outcomes.