Meta-analysis methods

ImaGEO has implemented two of the main methods of gene expression meta-analysis: meta-analysis based on effects size combination and meta-analysis based on p-value combination.

Effect size combination

Effect size can be defined as a quantitative measure that explains the strength of a phenomenon across different studies. In our case, we calculate the standardized mean difference between two groups (i.e. case and control, case1 and case2, etc). “In our case, we use the aproximation of the Hedges’ g as an estimator of the standardized mean difference:

\[g = J \times d = J \times \frac{\overline{x}_{1} - \overline{x}_{2}}{S}\]

where:

  • \(J= 1 -\frac{3}{4df -1}\), is the correction factor. In this case, when considering independent groups, the degrees of freedom (\(df\)) is \(n_{1} + n_{2} - 2\).
  • \(d\) is Cohen’s d estimator.
  • \(\overline{x}_{1}\) and \(\overline{x}_{2}\) are the mean of the first and second groups respectively.
  • \(S = \sqrt{ \frac{ (n_{1}-1)S_{1}^{2} + (n_{2}-1)S_{2}^{2} } {n_{1}+n_{2}-2} }\), where \(n_{1}\) and \(n_{2}\) are the sample sizes of the first and second groups respectively and \(S_{1}^{2}\) and \(S_{2}^{2}\) are the variances of the first and second group respectively.

The variance of this estimator is:

\[V_{g} = J^{2} \times V_{d} = J^{2} \times \frac{n_{1} + n_{2}}{n_{1}n_{2}} + \frac{d^{2}}{2({n_{1} + n_{2}})}\]

Where \(V_{d}\) is the variance of the Cohen’s \(d\) estimator.

To combine the effect sizes from different studies, we consider two methods:

Fixed Effect Method (FEM)

FEM is a linear model that assumes the different studies share a common true effect size. The combined effect size is calculated as a weighted mean of the different effect sizes:

\[\overline{M} = \frac{\sum_{i=1}^{k} \omega_{i} Y_{i}}{\sum_{i=1}^{k} \omega_{i}}\]

where:

  • \(Y_{i}\) is the effect of each study. In this case, they are the different estimations of Hedges’ g calculated.
  • \(\omega_{i} = \frac{1}{V_{Y_{i}}}\) are the different weights assigned to each study. \(V_{Y_{i}}\) is the inverse within-study variance, that is to say, the different \(V_{g}\) calculated for each effect.

The variance of this combined effect is calculated as:

\[V(\overline{M}) = \frac{1}{\sum_{i=1}^{k} \omega_{i}}\]

The combined effect value for a standard normal, \(N(0,1)\):

\[Z = \frac{\overline{M}}{\sqrt{V(\overline{M})}}\]

Therefore, we obtain a two-tailed p-value:

\[P-value = 2[ 1- (\Phi|Z|)]\]

where \(\Phi\) is the standard normal cumulative distribution function.

Random Effect Method (REM)

Unlike FEM, the random-effects model (REM) assumes that the true effect can vary from one study to another. In this case, the combined effect size represents the average of the true effects. In practice, this implies assuming that in the calculation of the weights for the weighted mean, there are two sources of error: the within-study variance (similar to FEM) and the between-study variance (\(\tau^{2}\)). To calculate \(\tau^{2}\), we use the method of moments (DerSimonian and Laird):

\[\tau^{2} = max(0, \frac{Q-df}{C})\]

where:

  • \(Q = \sum_{i=1}^{k} \omega_{i} (Y_{i} - \overline{M})^{2}\), is the total variance. In this case \(\omega_{i}\), \(Y_{i}\) and \(\overline{M}\) are the effects size, weights use in the FEM model and M the combined effect size obtained in the FEM model.
  • \(df = k-1\), is the degrees of freedom, where \(k\) is the number of studies
  • \(C = \sum_{i=1}^{k} \omega_{i} - \frac{\sum_{i=1}^{k} \omega_{i}^{2} }{\sum_{i=1}^{k} \omega_{i} }\)

By this way, the weight of a study is:

\[\omega_{i}^{*} = \frac{1}{V(Y_{i}) + \tau^{2}}\]

Therefore, similarly to the FEM, the combined effect size for the REM is calculated as:

\[\overline{M^{*}} = \frac{\sum_{i=1}^{k} \omega_{i}^{*} Y_{i}}{\sum_{i=1}^{k} \omega_{i}^{*}}\]

And similarity:

\[V(\overline{M^{*}}) = \frac{1}{\sum_{i=1}^{k} \omega_{i}^{*}}\]

\[Z^{*} = \frac{\overline{M^{*}}}{\sqrt{V(\overline{M^{*}})}}\]

\[P-value = 2[ 1- (\Phi|Z^{*}|)]\]

FEM should only be used when the studies included in the analysis are functionally identical (not independently conducted) and the results do not need to be generalized to other studies. In the case of meta-analysis of differential expression, it’s challenging to fulfill these conditions. Therefore, we recommend always applying a random-effects model unless the researcher is entirely certain that the conditions for applying a fixed-effects model are met.

P-value combination methods:

These techniques are aimed to integrate the P-values of individual analyses into one single combined P-value.

Fisher’s method:

This technique uses as statistic the sum of the logarithms of the p-values:

\[- 2 \times \sum_{i=1}^{k} ln(p) \sim \chi^{2}_{2 \times k} \; under \; H_{0}\]

being \(k\) the number of studies.

Stouffer method

This method assumes that:

\[Z_{i} = \Phi^{-1}(1-P)\]

Where \(\Phi\) is the standard normal cumulative distribution.

The statistic used in this method is the combination of Z-values:

\[\frac{\sum_{i=1}^{k} Z_{i}}{\sqrt{k}} \sim N(0,1) \; under \; H_{0}\]

Tippet’s method (minP):

The statistic of this method is the minimum of P values of all studies:

\[min(p_{1},..., p_{i},...,p_{k}) \sim Beta(1,k) \; under \; H_{0}\]

Wilkinson’s method (maxP):

The statistic of this method is the maximum of P values of all studies:

\[max(p_{1},..., p_{i},...,p_{k}) \sim Beta(k,1) \; under \; H_{0}\]

One notable aspect of these methodologies is their uniform treatment of all studies, irrespective of their scale. This is due to the direct combination of individually obtained P-values. Furthermore, these methodologies exhibit greater compatibility in combining studies from diverse platforms or conditions compared to approaches focused on effect sizes combination. An additional benefit lies in their capacity to directly combine outcomes from disparate analyses. Nonetheless, P-value combination methods suffer from a significant drawback: the loss of directional information regarding the expression pattern.