我们这里的转录调控网络是指转录因子和靶基因所组成的有向、有权重的网络。有很多方法可以构建此类网络,在这次课程,我们主要学习ARACNE。该方法使用互信息(Mutual information)来衡量转录因子和靶基因之间的的相关性,不仅能捕捉线性相关,也能发现非线性的相关性。
在上图中,不同类型的相关性都能被MI
识别。
DPI(Data Processing Inequality)基于信息论中的一个定理,该定理指出:如果变量A、B、C形成一个马尔可夫链A->B->C,那么A和C之间的互信息不会超过A和B之间或B和C之间的互信息。DPI用于识别和消除网络中的间接相互作用。例如,如果基因A调控基因B,基因B调控基因C,DPI可以帮助识别出A和C之间的关系可能是间接的。
在上图中,A和C之间会被判定为间接作用。
我们将使用系统内安装的R:
/sibcb/program/install/r-4.0/bin/R
设定R的library path:
准备文件:
ln -s /sibcb1/bioinformatics/biocourse/biocourse01/network/raw .
ln -s /sibcb1/bioinformatics/biocourse/biocourse01/network_lesson2/out/aracne .
运行ARACNE
需要两个文件,表达谱矩阵和转录因子列表。
ex = read.table(file = "raw/GSE6477.txt", row.names = 1, header = TRUE, sep = "\t")
ex[1:10, 1:4]
GSM148911_ReMM GSM148912_MGUS GSM148914_NewMM GSM148915_NewMM
RFC2 5.102640 4.813578 5.445759 4.977311
HSPA6 6.466330 6.772655 6.467316 7.720870
PAX8 6.249809 6.425345 6.111605 6.553714
GUCA1A 5.871829 5.771262 5.657366 5.667482
THRA 5.435354 5.555236 5.528174 5.142718
PTPN21 5.857098 6.145216 5.890093 5.564663
CCL5 5.286222 5.288690 4.992404 5.128804
CYP2E1 5.427245 5.872459 5.490823 5.463017
EPHB3 7.704572 7.967523 7.560548 7.352723
ESRRA 8.096653 8.393159 7.903102 7.762751
head raw/Regulator.txt
AATF
ADNP
ADNP2
AEBP1
AFF3
AHCTF1
AHR
APP
ARID4A
ARNT
java -Xmx16g -jar /sibcb1/bioinformatics/training_shared/software/Aracne.jar \
\
-e raw/GSE6477.txt \
-o out \
--tfs raw/Regulator.txt \
--pvalue 1E-8 \
--seed 1 --calculateThreshold
cat aracne/miThreshold_p1E-8_samples162.txt
0.1586100591876638
for i in {1..100}; do
java -Xmx16g -jar /sibcb1/bioinformatics/training_shared/software/Aracne.jar \
-e raw/GSE6477.txt \
-o aracne \
--tfs raw/Regulator.txt \
--pvalue 1E-8 \
--seed ${i}
done
java -Xmx16g -jar /sibcb1/bioinformatics/training_shared/software/Aracne.jar \
\
-o aracne --consolidate
运行得到的结果:
/sibcb1/bioinformatics/biocourse/biocourse01/miniforge3/bin/tree aracne/
aracne/
├── bootstrapNetwork_1o16trgvqmt17tec7b1ma8njof.txt
├── bootstrapNetwork_1paet1fa307s0v7c9clkigh9eh.txt
├── bootstrapNetwork_1rjmsnbk3aiqq0scre9qro8uki.txt
├── bootstrapNetwork_1ssusd9u3ktlj2ld5ftp402k2k.txt
├── bootstrapNetwork_31hatvjvp5vku1ck9grn8sebs6.txt
├── bootstrapNetwork_32qit5ga1gajn33kjhflhk81a7.txt
├── bootstrapNetwork_333qtbck1qleg4qktj3rqs1mg9.txt
├── bootstrapNetwork_34d2shaua4099mjl7knq33rc6b.txt
├── bootstrapNetwork_4aamtpja7vd3dn1slm9mgo0pdu.txt
├── bootstrapNetwork_4cjutffk09o26oosvotopfqek0.txt
├── bootstrapNetwork_4dt6sldu8j2svqht1phr27i421.txt
├── bootstrapNetwork_4j4qtpg1fqhu2ogb0itbji9jng.txt
├── bootstrapNetwork_5ldat9euf35cmee53vbs1bcs5o.txt
├── bootstrapNetwork_5nmisvd8fdgbfg55m0vu9j4hjp.txt
├── bootstrapNetwork_5su6t3hbmjvcie7jcobertq0p8.txt
├── bootstrapNetwork_5u7et9flmtabbfsjmqvh45lmfa.txt
├── bootstrapNetwork_6v6mt3e8ltiv5k3dg5pv8mt9ng.txt
├── bootstrapNetwork_70fuspaim7tpv5qdq7e1humuti.txt
├── bootstrapNetwork_75eatnibl3208i5rmt5fqhip4v.txt
├── bootstrapNetwork_76nitdelldcv1jqrovpi39ceb0.txt
├── bootstrapNetwork_770qt3cvtnnpqljs30dgc163p2.txt
├── bootstrapNetwork_8n1uthgd3nkdd99idv745ne0g9.txt
├── bootstrapNetwork_8ob6t7en41v86b0ig1r6ef7lub.txt
├── bootstrapNetwork_8qkestd14ba6vcniq2f8n71bcc.txt
├── bootstrapNetwork_8rtms39bcll1oeej443b0ep0ie.txt
├── bootstrapNetwork_a1ratbhn2g1vsusqi6lbd2ue21.txt
├── bootstrapNetwork_a24it1c1arcqlglr4799mao383.txt
├── bootstrapNetwork_a3dqsnabb5nleicre9tfuihou4.txt
├── bootstrapNetwork_a8c6tli4a1rvouk8r0kq7lfith.txt
├── bootstrapNetwork_b9betvin104jj2r2sbf8c6p65o.txt
├── bootstrapNetwork_bakmt5f19afec4i36d3alegrbp.txt
├── bootstrapNetwork_bctusrdb9kq956b38enctmagpr.txt
├── bootstrapNetwork_bi5itvhe8q9e84bhf62tfh00fa.txt
├── bootstrapNetwork_bjeqt5foh4k9162hh8mvo8pllb.txt
├── bootstrapNetwork_cke2tfeb84ssrq9bajhdsq38ti.txt
├── bootstrapNetwork_clnaslclge7rkrubkl5c5hsubj.txt
├── bootstrapNetwork_cruut9gonkmonq0pjdgsncidh2.txt
├── bootstrapNetwork_ct86tff2nu1ngrnpte5304a374.txt
├── bootstrapNetwork_cuheslbco8cm9teq7gp18s5od5.txt
├── bootstrapNetwork_dvgmsvbvn7l641lk0rjjdddblb.txt
├── bootstrapNetwork_e4f2ttjom3pcedv1lib1m0b5so.txt
├── bootstrapNetwork_e5oat3g2ue4b7fk1vkv3uo2r2q.txt
├── bootstrapNetwork_e61it9ccuof601d29lj67vsg8s.txt
├── bootstrapNetwork_f7qthhlrsfio7p3r4jj2d56en.txt
├── bootstrapNetwork_fd8etnh2st6utjia1pp0trtj6h.txt
├── bootstrapNetwork_ffhmtdfct7hpmlbabqd76jl8ki.txt
├── bootstrapNetwork_fgqusjdn5hskfn2als15fretqk.txt
├── bootstrapNetwork_gh2t7fvs6qdh9g4567hb4ursp.txt
├── bootstrapNetwork_h0s2t1h4bhp82ao0orqp8hmqpr.txt
├── bootstrapNetwork_h15at7debr46rcf12serhpeg7t.txt
├── bootstrapNetwork_h2eistboc5f1ke61cu2tqh85lu.txt
├── bootstrapNetwork_h7cutrjhb1jbtqff1kqc347vlb.txt
├── bootstrapNetwork_h8m6t1frjbu6nc6fbmeecbvkrd.txt
├── bootstrapNetwork_hqastc9sh58ab94f7rnjsmh2q.txt
├── bootstrapNetwork_i9letbeeab6qhgd9d18sgt983j.txt
├── bootstrapNetwork_ibumshcoilhlai49n3sup50thl.txt
├── bootstrapNetwork_ih6atlgrhr0qdg4nlr8fbfmcv3.txt
├── bootstrapNetwork_iifitbf5i5bl6htnvsshjng2d5.txt
├── bootstrapNetwork_ijoqshdfqfmjvjio1ugjsv9nj7.txt
├── bootstrapNetwork_jko2src2hfv7q7phr9b210jb3d.txt
├── bootstrapNetwork_jqvmtfg5gle8t5rvq1miir8q8s.txt
├── bootstrapNetwork_js8ut5efovp7m7h043agrj2fut.txt
├── bootstrapNetwork_jti6srapp942f980e5un4aq54v.txt
├── bootstrapNetwork_kvqmsb9moinh2v7q9hd7hjtdr7.txt
├── bootstrapNetwork_l3fqtjj5v4gojpo7s7gjhv1ici.txt
├── bootstrapNetwork_l4p2t9hfverncrf8684hqmr7qk.txt
├── bootstrapNetwork_l52atfdpvo6i5t688aok3ekt0l.txt
├── bootstrapNetwork_l7bisla403hguuv8ibcmc6cimn.txt
├── bootstrapNetwork_mc96ttgftuub3fdg0dumpajvua.txt
├── bootstrapNetwork_meiet3eq6895s14gifil22dlcc.txt
├── bootstrapNetwork_mfrmspd46ik0l2tgsg6rba5aie.txt
├── bootstrapNetwork_nm2itnhq4nbpil2okkcm164d83.txt
├── bootstrapNetwork_nnbqtde452mkbmroul0oadu2m4.txt
├── bootstrapNetwork_npl2sjcedc1j4ogp0nkqilno46.txt
├── bootstrapNetwork_nquas98odmcdtq9pao8srthda8.txt
├── bootstrapNetwork_o062tdercsrj0o879hkdd86svm.txt
├── bootstrapNetwork_o1fasjb5l66dpq17ji8fmfui5o.txt
├── bootstrapNetwork_o6dmthiuk2ak36al89vtuiscd5.txt
├── bootstrapNetwork_o7mut7f8kclis81liajs7qm1j7.txt
├── bootstrapNetwork_o906tddikm0dlpolsc82gifn18.txt
├── bootstrapNetwork_pavesnc5jm91gdvfln2cl3pa9e.txt
├── bootstrapNetwork_pg72trg8iso6irvtkfe16uepmt.txt
├── bootstrapNetwork_phgat1eir631btmtuh23fm6f4v.txt
├── bootstrapNetwork_pipisncsrgds5ffu0im5oe04b0.txt
├── bootstrapNetwork_qp0etlhipl5l21l60ms0ea170l.txt
├── bootstrapNetwork_qr9mtbfspvgjrja6aog6n1osmn.txt
├── bootstrapNetwork_qsiusha72arekl36kp4509ihsp.txt
├── bootstrapNetwork_s2gitpiio588olje2rm1ddpv4c.txt
├── bootstrapNetwork_s3pqtfgsofj7hn8ecta7m5hkie.txt
├── bootstrapNetwork_s432t5d70pu2b91emuu5utb9of.txt
├── bootstrapNetwork_s6casrbh138t4aoep0i87l4veh.txt
├── bootstrapNetwork_tb9utjht6uln8b6mf244l9acm4.txt
├── bootstrapNetwork_tdj6t9e7790i1svmh3oath2246.txt
├── bootstrapNetwork_tesesvch7jbcqummr5c96otna7.txt
├── bootstrapNetwork_ul3atth75o35ngtur8ibskqq7s.txt
├── bootstrapNetwork_umcit3fhe2e0h2kv5a6a5skfdu.txt
├── bootstrapNetwork_uolqspdrecova4bvfbqce4e4s0.txt
├── bootstrapNetwork_upv2sf85en3q362vhdeenc7q21.txt
├── bootstrapNetwork_uutetdhudi80cicde45t0f5k1e.txt
├── bootstrapNetwork_vvsmt7ghcigk7mj77f0b4gd7hl.txt
├── miThreshold_p1E-8_samples162.txt
├── network_1.txt
└── network.txt
1 directory, 103 files
head aracne/network.txt
Regulator Target MI pvalue
FUBP3 TOPORS 0.4333787339157187 0.0
ZNF593 MRPL4 0.3711633496258149 0.0
TAF12 MRC1 0.274655604191936 0.02477282013150539
CBX2 PRB3 0.3652975573975554 0.02477282013150539
CBX2 PRB4 0.34497953379788715 0.0
CBX2 PRG3 0.32253416742820235 0.02477282013150539
CTBP1 PRKAR1A 0.3515585095031235 6.558268594858419E-8
HMGA2 ADORA2B 0.25260373635361727 0.02477282013150539
ZNF415 DRAP1 0.29908699522267373 0.02477282013150539
ARACNE
构建的networknetwork <- read.table(file = "aracne/network.txt", row.names = NULL, header = TRUE, sep = "\t")
head(network)
Regulator Target MI pvalue
1 FUBP3 TOPORS 0.4333787 0.00000000
2 ZNF593 MRPL4 0.3711633 0.00000000
3 TAF12 MRC1 0.2746556 0.02477282
4 CBX2 PRB3 0.3652976 0.02477282
5 CBX2 PRB4 0.3449795 0.00000000
6 CBX2 PRG3 0.3225342 0.02477282
构建了网络之后,我们可以用VIPER来进行网络分析,检测两种状态之间转录因子活性的变化。
ARACNE
构建的网络ExS = as.matrix(read.table(file = "raw/GSE6477.txt", row.names = 1, header = TRUE, sep = "\t"))
regulon = suppressWarnings(aracne2regulon("aracne/network.txt", ExS))
number of iterations= 335
[1] "AATF" "ADNP" "ADNP2" "AEBP1" "AFF3" "AHCTF1"
查看每个转录因子的靶基因:
head(regulon[["CBX2"]]$tfmode)
PRB3 PRB4 PRG3 PPIE PPCS POGZ
0.9862316 0.9970921 0.8368823 -0.9810494 -0.9728839 -0.9175349
head(regulon[["CBX2"]]$likelihood)
[1] 0.3652976 0.3449795 0.3225342 0.3522193 0.3375608 0.3320467
激活的靶基因:
plot(ExS["CBX2", ], ExS["PRB3", ], xlab = "CBX2", ylab = "PRB3", main = "CBX2 vs PRB3", type = "p", col = "red")
抑制的靶基因:
plot(ExS["CBX2", ], ExS["PPIE", ], xlab = "CBX2", ylab = "PPIE", main = "CBX2 vs PPIE", type = "p", col = "blue")
我们分析多发性骨髓瘤中新发的病例:
[1] 12494 73
基因水平的均一化:
ExC <- Ex - apply(Ex, 1, mean)
ExC[1:10, 1:3]
GSM148914_NewMM GSM148915_NewMM GSM148917_NewMM
RFC2 0.29945900 -0.16898876 -0.268771656
HSPA6 -0.36654649 0.88700750 0.377417082
PAX8 -0.23744655 0.20466264 -0.003593696
GUCA1A -0.27842014 -0.26830384 0.306076143
THRA 0.16143512 -0.22402003 -0.200295535
PTPN21 -0.07940156 -0.40483195 0.088706134
CCL5 -0.24402822 -0.10762812 -0.058526014
CYP2E1 0.03985734 0.01205131 0.156893418
EPHB3 0.11090254 -0.09692283 0.194864154
ESRRA -0.05772633 -0.19807728 -0.209893059
导入用来分类的标志物:
$UP
[1] "TXNIP" "RELN" "AC245060.6" "IL2RB" "IL32"
[6] "TRPC4"
$DN
[1] "CCL4" "NR4A2" "FOS" "CCL4L2" "JUND" "DUSP1"
$PG
[1] "DPM1" "SCYL3" "C1orf112" "FUCA2" "GCLC" "NFYA"
我们可以使用rank-sum test
检验这两个标志物在样本中的富集程度
SimpleRankTest <- function(Pvalue, gsList, size = 4) {
# The Pvalue is pvalue vector with Entrez ID as names
# gsList is the gene set list
GeneID <- names(Pvalue)
M <- length(gsList)
zscore <- rep(NaN,M)
N <- length(Pvalue)
RankP <- rank(Pvalue,na.last = TRUE)
for(i in 1:M) {
Index <- GeneID %in% gsList[[i]]
n1 <- sum( Index )
if(n1 < size)
next
n2 <- N - n1
R1 <- sum(RankP[Index])
K1 <- R1 - n1*(n1+1)/2
Umean <- n1*n2/2
Uvar <- sqrt(n1*n2*(n1+n2+1)/12)
zscore[i] <- (K1-Umean)/Uvar
}
# return values
names(zscore) <- names(gsList)
return(zscore)
}
计算所有的样本:
检查富集程度:
cbind(zDIF)
zDIF
GSM149010_NewMM -10.7867863
GSM148961_NewMM -7.6253545
GSM149059_NewMM -6.7517171
GSM149021_NewMM -6.3821158
GSM148979_NewMM -6.0103738
GSM149018_NewMM -5.7146125
GSM148925_NewMM -4.0913544
GSM148960_NewMM -4.0857549
GSM149017_NewMM -4.0225267
GSM148989_NewMM -3.4952527
GSM149038_NewMM -3.4242261
GSM148940_NewMM -3.3006190
GSM149002_NewMM -3.0070938
GSM148944_NewMM -2.5112228
GSM148973_NewMM -2.4443350
GSM148978_NewMM -2.3576782
GSM148964_NewMM -2.2148657
GSM148999_NewMM -2.2019378
GSM148981_NewMM -2.0402653
GSM148955_NewMM -2.0009582
GSM149005_NewMM -1.8851118
GSM148968_NewMM -1.8690704
GSM148918_NewMM -1.5680894
GSM149048_NewMM -1.5245688
GSM149004_NewMM -1.5161443
GSM148971_NewMM -1.5103623
GSM149049_NewMM -1.1769248
GSM148987_NewMM -1.1277184
GSM148921_NewMM -1.0566599
GSM148939_NewMM -0.8764960
GSM149052_NewMM -0.5200675
GSM149016_NewMM -0.2851927
GSM149037_NewMM -0.2440378
GSM148986_NewMM 0.1161754
GSM148914_NewMM 0.1659326
GSM148997_NewMM 0.2203500
GSM149050_NewMM 0.3441627
GSM148949_NewMM 0.4429702
GSM148956_NewMM 0.4638226
GSM148917_NewMM 0.6611341
GSM148991_NewMM 0.9454948
GSM148915_NewMM 1.0190211
GSM149025_NewMM 1.0496826
GSM148951_NewMM 1.1105393
GSM148946_NewMM 1.1835128
GSM148974_NewMM 1.5657098
GSM149040_NewMM 1.5941324
GSM149009_NewMM 2.0572438
GSM149055_NewMM 2.2552075
GSM148998_NewMM 2.3816011
GSM149054_NewMM 2.4192577
GSM148988_NewMM 2.5358233
GSM149019_NewMM 2.6342345
GSM149031_NewMM 2.6785674
GSM149007_NewMM 2.7504196
GSM148958_NewMM 3.0686307
GSM148947_NewMM 3.1064238
GSM148938_NewMM 3.1728299
GSM149044_NewMM 3.7870587
GSM148922_NewMM 3.9556022
GSM148942_NewMM 4.1914853
GSM148950_NewMM 4.4281523
GSM149006_NewMM 4.6457489
GSM148990_NewMM 5.1434197
GSM148923_NewMM 5.3091977
GSM149061_NewMM 5.6542414
GSM149033_NewMM 6.2964675
GSM148995_NewMM 6.9684832
GSM148993_NewMM 7.0429813
GSM149056_NewMM 7.1122615
GSM148967_NewMM 8.2635866
GSM149008_NewMM 9.2495623
GSM148983_NewMM 11.2821098
根据已知 signature z-score 高、低将 NewMM 样本分为 high/low 两组,并各取其表达谱:
先比较两类之间表达谱的差异:
signature <- rowTtest(hM, lM)
head(data.frame(signature$statistic, signature$p.value))
signature.statistic signature.p.value
RFC2 0.65903165 0.512037596
HSPA6 -1.65906375 0.101577316
PAX8 -0.52824854 0.598997606
GUCA1A 0.03782808 0.969932434
THRA -2.87466306 0.005353084
PTPN21 -0.55973305 0.577448303
将p-value转成Z-score:
signzed_zscore <- (qnorm(signature$p.value/2, lower.tail = FALSE) *sign(signature$statistic))[, 1]
head(signzed_zscore)
RFC2 HSPA6 PAX8 GUCA1A THRA
0.65566826 -1.63725444 -0.52584255 0.03769303 -2.78498110
PTPN21
-0.55711582
构建背景:
nullmodel <- ttestNull(hM, lM, per = 1000, repos = TRUE, seed = 1, verbose = FALSE)
nullmodel[1:5,1:5]
1 2 3 4 5
RFC2 0.70341900 -2.0957027 0.14256862 -0.3056710 -0.6340376
HSPA6 0.96298940 -0.2053427 -0.08795436 -0.4622612 0.6203836
PAX8 -1.07571418 1.2828265 -0.07530774 1.9562127 0.9869166
GUCA1A -0.04172332 1.2565216 -0.91892153 2.0362804 -0.2365597
THRA -0.80796274 -0.6313287 0.09873444 0.4074121 1.0040930
检验转录因子活性的变化:
mrs <- msviper(signzed_zscore, regulon, nullmodel, verbose = FALSE)
res <- summary(mrs, 100)
head(res)
Regulon Size NES p.value FDR
SATB2 SATB2 68 3.63 0.000282 0.0724
MYC MYC 206 3.25 0.001140 0.0724
HMGA1 HMGA1 278 3.23 0.001250 0.0724
BRD8 BRD8 178 3.21 0.001330 0.0724
TAF1B TAF1B 97 3.19 0.001400 0.0724
CDKN2A CDKN2A 167 3.14 0.001670 0.0724
画出最显著的前30个转录因子:
plot(mrs, 30, cex = .9)
使用fgsea
检查MYC
的富集性:
mycTargets = regulon[["MYC"]]$tfmode
gs = list(activated = names(mycTargets)[mycTargets > 0], repressed = names(mycTargets)[mycTargets < 0])
res = suppressWarnings(fgsea(gs, signzed_zscore, minSize = 15, maxSize = 500))
res
pathway pval padj log2err ES NES
<char> <num> <num> <num> <num> <num>
1: activated 1.000000e-50 2.000000e-50 NA 0.7554677 3.675944
2: repressed 2.124813e-05 2.124813e-05 0.5756103 -0.5559996 -2.105326
size leadingEdge
<int> <list>
1: 166 PUS7, NO....
2: 40 APOC1, T....
展示富集结果:
plotEnrichment(gs[["activated"]], signzed_zscore) + labs(title="MYC activated targets")