我们将使用系统内安装的R:
# bash命令
export PATH="/sibcb1/bioinformatics/biocourse/biocourse01/miniforge3/envs/scRNA/bin:$PATH"
加载需要使用的软件包:
## devtools::install_github("gpli/DNB")
library(DNB) ##加载R包DNB
# Load data
gene_expr <- read.csv(system.file("extdata", "gene_expr.csv", package = "DNB"), row.names = 1)
sample_info <- read.csv(system.file("extdata", "sample_info.csv", package = "DNB"), row.names = 1)
head(gene_expr[, 1:5])
s_1 s_2 s_3 s_4 s_5
g_1 86.090028 0.01700653 21.8332070 109.768187 118.198947
g_2 63.159675 0.78383604 641.5469855 1.183683 1047.381835
g_3 27.923068 0.11859829 431.1271239 7.960984 92.287214
g_4 249.504697 28.04153114 0.7283007 126.189366 0.088603
g_5 0.261581 0.23176937 1.9417726 68.252337 16.135678
g_6 0.000000 0.00000000 0.0000000 21.311293 8.737244
head(sample_info)
time group
s_1 6 0
s_2 6 1
s_3 6 1
s_4 6 0
s_5 3 1
s_6 3 0
# New DNB object
# all-zero genes at any time point will be removed
dnb <- new_DNB(data = gene_expr, time = as.factor(sample_info$time))
dnb
$data
1818 x 74 matrix of doubles:
s_1 s_2 s_3 ... s_74
g_1 86.0900277 0.0170065 21.8332070 ... 112.5990364
g_2 63.1596748 0.7838360 641.5469855 ... 2.2705360
g_3 27.9230676 0.1185983 431.1271239 ... 3.4109098
... ... ... ... ... ...
g_1873 487.4775445 143.9325181 17.5177474 ... 390.7652305
$time
[1] 6 6 6 6 3 3 2 2 3 3 4 4 4 4 2 2 2 2 2 2 4 4 6 6 6 6 6 6 6 6 4 4 5
[34] 5 2 2 6 6 3 3 3 3 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 4 4 3 3 5 5 4 4
[67] 4 4 6 6 6 6 6 6
Levels: 2 3 4 5 6
$group
NULL
dnb <- cal_cor(dnb) ##计算不同分组样本内基因的相关性系数
2
3
4
5
6
dnb <- cal_cv(dnb) ##计算不同分组样本内基因的变异系数
2
3
4
5
6
##变异系数(Coefficient of Variation):标准差与平均数的比
##当需要比较两组数据离散程度大小的时候,如果两组数据的测量尺度相差太大,
##或者数据量纲的不同,直接使用标准差来进行比较不合适,此时就应当消除测量尺度和量纲的影响
dnb <- search_candidates(dnb, min_size = 20, max_size = 100)
time point 2
Hierarchical clustering genes ...
Calculating module attributes...
time point 3
Hierarchical clustering genes ...
Calculating module attributes...
time point 4
Hierarchical clustering genes ...
Calculating module attributes...
time point 5
Hierarchical clustering genes ...
Calculating module attributes...
time point 6
Hierarchical clustering genes ...
Calculating module attributes...
## 对基因表达进行聚类分成不同的基因module, 并计算每个gene module的DNB score
## 找到每个时间点DNB Score最高的基因module
## score <- cv_in * cor_in / cor_out
dnb <- cal_final(dnb)
##Compare and get the time point with highest score, and use that set of
##DNB genes to recalculate DNB attibutes for all time points.
candidates <- get_candidates(dnb)
plot_DNB(candidates)
最终结果:
dnb_genes <- get_DNB_genes(dnb)
final <- get_final(dnb)
plot_DNB(final)
dnb <- new_DNB(data = gene_expr,
time = as.factor(sample_info$time),
group = as.factor(sample_info$group))
dnb <- cal_cor(dnb)
2 (case)
2 (control)
3 (case)
3 (control)
4 (case)
4 (control)
5 (case)
5 (control)
6 (case)
6 (control)
dnb <- cal_cv(dnb)
2 (case)
2 (control)
3 (case)
3 (control)
4 (case)
4 (control)
5 (case)
5 (control)
6 (case)
6 (control)
dnb
$data
1685 x 74 matrix of doubles:
s_1 s_2 s_3 ... s_74
g_1 86.0900277 0.0170065 21.8332070 ... 112.5990364
g_2 63.1596748 0.7838360 641.5469855 ... 2.2705360
g_3 27.9230676 0.1185983 431.1271239 ... 3.4109098
... ... ... ... ... ...
g_1873 487.4775445 143.9325181 17.5177474 ... 390.7652305
$time
[1] 6 6 6 6 3 3 2 2 3 3 4 4 4 4 2 2 2 2 2 2 4 4 6 6 6 6 6 6 6 6 4 4 5
[34] 5 2 2 6 6 3 3 3 3 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 4 4 3 3 5 5 4 4
[67] 4 4 6 6 6 6 6 6
Levels: 2 3 4 5 6
$group
[1] 0 1 1 0 1 0 1 0 0 1 0 1 0 1 0 1 0 1 1 0 0 1 1 0 0 1 1 0 0 1 0 1 0
[34] 1 0 1 0 1 1 0 1 0 1 0 0 1 0 1 1 0 1 0 0 1 1 0 1 0 1 0 1 0 0 1 0 1
[67] 1 0 1 0 0 1 1 0
Levels: 0 1
$correlation
list of length 5
[[1]]
time point "2"
1685 x 1685 matrix of doubles:
g_1 g_2 g_3 ... g_1873
g_1 1.0000000 -0.3939425 -0.1021136 ... -0.4051167
g_2 -0.3939425 1.0000000 -0.5705402 ... 0.9700166
g_3 -0.1021136 -0.5705402 1.0000000 ... -0.4061931
... ... ... ... ... ...
g_1873 -0.4051167 0.9700166 -0.4061931 ... 1.0000000
...
$correlation_ctrl
list of length 5
[[1]]
time point "2"
1685 x 1685 matrix of doubles:
g_1 g_2 g_3 ... g_1873
g_1 1.0000000 -0.3781295 -0.5869382 ... -0.2721603
g_2 -0.3781295 1.0000000 0.6430009 ... -0.5344707
g_3 -0.5869382 0.6430009 1.0000000 ... 0.4171144
... ... ... ... ... ...
g_1873 -0.2721603 -0.5344707 0.4171144 ... 1.0000000
...
$CV
list of length 5
[[1]]
time point "2"
g_1 g_2 g_3 g_1873
1.003978 1.100766 2.021448 ... 1.614518
...
$CV_ctrl
list of length 5
[[1]]
time point "2"
g_1 g_2 g_3 g_1873
2.293950 1.146814 1.076170 ... 1.990480
...
这里的Score会根据ctrl的score进行校准:
##找到每个时间点DNB score最高的基因module
## score <- cv_in * cor_in / cor_out
## score <- score / cv_in_ctrl * cor_out_ctrl / cor_in_ctrl
dnb <- search_candidates(dnb, min_size = 50, max_size = 500)
time point 2
Hierarchical clustering genes ...
Calculating module attributes...
time point 3
Hierarchical clustering genes ...
Calculating module attributes...
time point 4
Hierarchical clustering genes ...
Calculating module attributes...
time point 5
Hierarchical clustering genes ...
Calculating module attributes...
time point 6
Hierarchical clustering genes ...
Calculating module attributes...
dnb <- cal_final(dnb)
##Compare and get the time point with highest score, and use that set of
##DNB genes to recalculate DNB attibutes for all time points.
candidates <- get_candidates(dnb)
plot_DNB(candidates)
最终结果的可视化:
dnb_genes <- get_DNB_genes(dnb)
final <- get_final(dnb)
plot_DNB(final)
本文档由陈洛南组唐诗婕博士构建,如需帮助,请联系bioinformatics[at]sibcb.ac.cn
获得唐博士的联系方式。
R version 4.3.3 (2024-02-29)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server 7.6 (Maipo)
Matrix products: default
BLAS/LAPACK: /sibcb1/bioinformatics/biocourse/biocourse01/miniforge3/envs/scRNA/lib/libopenblasp-r0.3.27.so; LAPACK version 3.12.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: Asia/Shanghai
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base
other attached packages:
[1] DNB_0.1.1 knitr_1.46 rmarkdown_2.26
loaded via a namespace (and not attached):
[1] dendextend_1.17.1 gtable_0.3.4 jsonlite_1.8.8
[4] highr_0.10 dplyr_1.1.4 compiler_4.3.3
[7] tidyselect_1.2.1 gridExtra_2.3 jquerylib_0.1.4
[10] scales_1.3.0 yaml_2.3.8 fastmap_1.1.1
[13] ggplot2_3.5.0 R6_2.5.1 labeling_0.4.3
[16] generics_0.1.3 viridis_0.6.5 tibble_3.2.1
[19] bookdown_0.40 distill_1.6 munsell_0.5.1
[22] bslib_0.7.0 pillar_1.9.0 rlang_1.1.3
[25] utf8_1.2.4 cachem_1.0.8 ramify_0.3.3
[28] xfun_0.43 sass_0.4.9 viridisLite_0.4.2
[31] memoise_2.0.1 cli_3.6.2 withr_3.0.0
[34] magrittr_2.0.3 digest_0.6.35 grid_4.3.3
[37] rstudioapi_0.16.0 fontawesome_0.5.2 cowplot_1.1.3
[40] lifecycle_1.0.4 vctrs_0.6.5 downlit_0.4.4
[43] evaluate_0.23 glue_1.7.0 farver_2.1.1
[46] fansi_1.0.6 colorspace_2.1-0 tools_4.3.3
[49] pkgconfig_2.0.3 htmltools_0.5.8.1