Evaluation and improvement of the regulatory inference for large co-expression networks with limited sample size

Wenbin Guo, Cristiane P. G. Calixto, Nikoleta Tzioutziou, Ping Lin, Robbie Waugh, John W. S. Brown, Runxuan Zhang (Lead / Corresponding author)

Research output: Contribution to journalArticle

3 Citations (Scopus)
30 Downloads (Pure)

Abstract

Background: Co-expression has been widely used to identify novel regulatory relationships using high throughput measurements, such as microarray and RNA-seq data. Evaluation studies on co-expression network analysis methods mostly focus on networks of small or medium size of up to a few hundred nodes. For large networks, simulated expression data usually consist of hundreds or thousands of profiles with different perturbations or knock-outs, which is uncommon in real experiments due to their cost and the amount of work required. Thus, the performances of co-expression network analysis methods on large co-expression networks consisting of a few thousand nodes, with only a small number of profiles with a single perturbation, which more accurately reflect normal experimental conditions, are generally uncharacterized and unknown.

Methods: We proposed a novel network inference methods based on Relevance Low order Partial Correlation (RLowPC). RLowPC method uses a two-step approach to select on the high-confidence edges first by reducing the search space by only picking the top ranked genes from an intial partial correlation analysis and, then computes the partial correlations in the confined search space by only removing the linear dependencies from the shared neighbours, largely ignoring the genes showing lower association.

Results: We selected six co-expression-based methods with good performance in evaluation studies from the literature: Partial correlation, PCIT, ARACNE, MRNET, MRNETB and CLR. The evaluation of these methods was carried out on simulated time-series data with various network sizes ranging from 100 to 3000 nodes. Simulation results show low precision and recall for all of the above methods for large networks with a small number of expression profiles. We improved the inference significantly by refinement of the top weighted edges in the pre-inferred partial correlation networks using RLowPC. We found improved performance by partitioning large networks into smaller co-expressed modules when assessing the method performance within these modules.

Conclusions: The evaluation results show that current methods suffer from low precision and recall for large co-expression networks where only a small number of profiles are available. The proposed RLowPC method effectively reduces the indirect edges predicted as regulatory relationships and increases the precision of top ranked predictions. Partitioning large networks into smaller highly co-expressed modules also helps to improve the performance of network inference methods. The RLowPC R package for network construction, refinement and evaluation is available at GitHub: https://github.com/wyguo/RLowPC .

Original languageEnglish
Article number62
Pages (from-to)1-12
Number of pages12
JournalBMC Systems Biology
Volume11
DOIs
Publication statusPublished - 19 Jun 2017

Fingerprint

Correlation methods
Partial Correlation
Electric network analysis
Sample Size
Genes
Evaluation
Microarrays
RNA
Time series
Throughput
Association reactions
Costs
Experiments
Network Analysis
Module
Search Space
Partitioning
Refinement
Vertex of a graph
Confined Spaces

Keywords

  • Gene co-expression networks
  • Gene regulatory networks
  • Network method evaluation
  • Partial correlation
  • Synthetic data

Cite this

@article{7c5c02981ac54b10a66015a52a84ef95,
title = "Evaluation and improvement of the regulatory inference for large co-expression networks with limited sample size",
abstract = "Background: Co-expression has been widely used to identify novel regulatory relationships using high throughput measurements, such as microarray and RNA-seq data. Evaluation studies on co-expression network analysis methods mostly focus on networks of small or medium size of up to a few hundred nodes. For large networks, simulated expression data usually consist of hundreds or thousands of profiles with different perturbations or knock-outs, which is uncommon in real experiments due to their cost and the amount of work required. Thus, the performances of co-expression network analysis methods on large co-expression networks consisting of a few thousand nodes, with only a small number of profiles with a single perturbation, which more accurately reflect normal experimental conditions, are generally uncharacterized and unknown.Methods: We proposed a novel network inference methods based on Relevance Low order Partial Correlation (RLowPC). RLowPC method uses a two-step approach to select on the high-confidence edges first by reducing the search space by only picking the top ranked genes from an intial partial correlation analysis and, then computes the partial correlations in the confined search space by only removing the linear dependencies from the shared neighbours, largely ignoring the genes showing lower association.Results: We selected six co-expression-based methods with good performance in evaluation studies from the literature: Partial correlation, PCIT, ARACNE, MRNET, MRNETB and CLR. The evaluation of these methods was carried out on simulated time-series data with various network sizes ranging from 100 to 3000 nodes. Simulation results show low precision and recall for all of the above methods for large networks with a small number of expression profiles. We improved the inference significantly by refinement of the top weighted edges in the pre-inferred partial correlation networks using RLowPC. We found improved performance by partitioning large networks into smaller co-expressed modules when assessing the method performance within these modules.Conclusions: The evaluation results show that current methods suffer from low precision and recall for large co-expression networks where only a small number of profiles are available. The proposed RLowPC method effectively reduces the indirect edges predicted as regulatory relationships and increases the precision of top ranked predictions. Partitioning large networks into smaller highly co-expressed modules also helps to improve the performance of network inference methods. The RLowPC R package for network construction, refinement and evaluation is available at GitHub: https://github.com/wyguo/RLowPC .",
keywords = "Gene co-expression networks, Gene regulatory networks, Network method evaluation, Partial correlation, Synthetic data",
author = "Wenbin Guo and Calixto, {Cristiane P. G.} and Nikoleta Tzioutziou and Ping Lin and Robbie Waugh and Brown, {John W. S.} and Runxuan Zhang",
note = "This project was supported by joint PhD studentship Program from the James Hutton Institute and the University of Dundee [to W.G] and the Scottish Government Rural and Environment Science and Analytical Services division (RESAS) [to J.B., R.W. and R.Z.].",
year = "2017",
month = "6",
day = "19",
doi = "10.1186/s12918-017-0440-2",
language = "English",
volume = "11",
pages = "1--12",
journal = "BMC Systems Biology",
issn = "1752-0509",
publisher = "Springer Verlag",

}

Evaluation and improvement of the regulatory inference for large co-expression networks with limited sample size. / Guo, Wenbin; Calixto, Cristiane P. G.; Tzioutziou, Nikoleta; Lin, Ping; Waugh, Robbie; Brown, John W. S.; Zhang, Runxuan (Lead / Corresponding author).

In: BMC Systems Biology, Vol. 11, 62, 19.06.2017, p. 1-12.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Evaluation and improvement of the regulatory inference for large co-expression networks with limited sample size

AU - Guo, Wenbin

AU - Calixto, Cristiane P. G.

AU - Tzioutziou, Nikoleta

AU - Lin, Ping

AU - Waugh, Robbie

AU - Brown, John W. S.

AU - Zhang, Runxuan

N1 - This project was supported by joint PhD studentship Program from the James Hutton Institute and the University of Dundee [to W.G] and the Scottish Government Rural and Environment Science and Analytical Services division (RESAS) [to J.B., R.W. and R.Z.].

PY - 2017/6/19

Y1 - 2017/6/19

N2 - Background: Co-expression has been widely used to identify novel regulatory relationships using high throughput measurements, such as microarray and RNA-seq data. Evaluation studies on co-expression network analysis methods mostly focus on networks of small or medium size of up to a few hundred nodes. For large networks, simulated expression data usually consist of hundreds or thousands of profiles with different perturbations or knock-outs, which is uncommon in real experiments due to their cost and the amount of work required. Thus, the performances of co-expression network analysis methods on large co-expression networks consisting of a few thousand nodes, with only a small number of profiles with a single perturbation, which more accurately reflect normal experimental conditions, are generally uncharacterized and unknown.Methods: We proposed a novel network inference methods based on Relevance Low order Partial Correlation (RLowPC). RLowPC method uses a two-step approach to select on the high-confidence edges first by reducing the search space by only picking the top ranked genes from an intial partial correlation analysis and, then computes the partial correlations in the confined search space by only removing the linear dependencies from the shared neighbours, largely ignoring the genes showing lower association.Results: We selected six co-expression-based methods with good performance in evaluation studies from the literature: Partial correlation, PCIT, ARACNE, MRNET, MRNETB and CLR. The evaluation of these methods was carried out on simulated time-series data with various network sizes ranging from 100 to 3000 nodes. Simulation results show low precision and recall for all of the above methods for large networks with a small number of expression profiles. We improved the inference significantly by refinement of the top weighted edges in the pre-inferred partial correlation networks using RLowPC. We found improved performance by partitioning large networks into smaller co-expressed modules when assessing the method performance within these modules.Conclusions: The evaluation results show that current methods suffer from low precision and recall for large co-expression networks where only a small number of profiles are available. The proposed RLowPC method effectively reduces the indirect edges predicted as regulatory relationships and increases the precision of top ranked predictions. Partitioning large networks into smaller highly co-expressed modules also helps to improve the performance of network inference methods. The RLowPC R package for network construction, refinement and evaluation is available at GitHub: https://github.com/wyguo/RLowPC .

AB - Background: Co-expression has been widely used to identify novel regulatory relationships using high throughput measurements, such as microarray and RNA-seq data. Evaluation studies on co-expression network analysis methods mostly focus on networks of small or medium size of up to a few hundred nodes. For large networks, simulated expression data usually consist of hundreds or thousands of profiles with different perturbations or knock-outs, which is uncommon in real experiments due to their cost and the amount of work required. Thus, the performances of co-expression network analysis methods on large co-expression networks consisting of a few thousand nodes, with only a small number of profiles with a single perturbation, which more accurately reflect normal experimental conditions, are generally uncharacterized and unknown.Methods: We proposed a novel network inference methods based on Relevance Low order Partial Correlation (RLowPC). RLowPC method uses a two-step approach to select on the high-confidence edges first by reducing the search space by only picking the top ranked genes from an intial partial correlation analysis and, then computes the partial correlations in the confined search space by only removing the linear dependencies from the shared neighbours, largely ignoring the genes showing lower association.Results: We selected six co-expression-based methods with good performance in evaluation studies from the literature: Partial correlation, PCIT, ARACNE, MRNET, MRNETB and CLR. The evaluation of these methods was carried out on simulated time-series data with various network sizes ranging from 100 to 3000 nodes. Simulation results show low precision and recall for all of the above methods for large networks with a small number of expression profiles. We improved the inference significantly by refinement of the top weighted edges in the pre-inferred partial correlation networks using RLowPC. We found improved performance by partitioning large networks into smaller co-expressed modules when assessing the method performance within these modules.Conclusions: The evaluation results show that current methods suffer from low precision and recall for large co-expression networks where only a small number of profiles are available. The proposed RLowPC method effectively reduces the indirect edges predicted as regulatory relationships and increases the precision of top ranked predictions. Partitioning large networks into smaller highly co-expressed modules also helps to improve the performance of network inference methods. The RLowPC R package for network construction, refinement and evaluation is available at GitHub: https://github.com/wyguo/RLowPC .

KW - Gene co-expression networks

KW - Gene regulatory networks

KW - Network method evaluation

KW - Partial correlation

KW - Synthetic data

U2 - 10.1186/s12918-017-0440-2

DO - 10.1186/s12918-017-0440-2

M3 - Article

VL - 11

SP - 1

EP - 12

JO - BMC Systems Biology

JF - BMC Systems Biology

SN - 1752-0509

M1 - 62

ER -