我们将使用系统内安装的R:
# bash命令
export PATH="/sibcb1/bioinformatics/biocourse/biocourse01/miniforge3/envs/scRNA/bin:$PATH"
加载需要使用的软件包:
超几何分布检验是做通路富集分析最常用的方法:
DE | non-DE | total | |
---|---|---|---|
Insiside Gene Set | k | m-k | m |
Outside Gene Set | n-k | N+k-n-m | N-m |
total | n | N-n | N |
其中:
在R
中可以使用如下函数检验显著性:
phyper(k, m, N-m, n, lower.tail = FALSE)
# 读入所有的基因
res = readRDS(file = "example_DESeq2.RDS")
res = na.exclude(res)
Pg = rownames(res)
head(res)
log2 fold change (MLE): group stroke vs ctrl
Wald test p-value: group stroke vs ctrl
DataFrame with 6 rows and 8 columns
baseMean log2FoldChange lfcSE stat pvalue
<numeric> <numeric> <numeric> <numeric> <numeric>
A1BG 138.9053 0.5696487 0.366932 1.5524628 0.12055154
A1BG-AS1 94.0295 -0.0194117 0.329088 -0.0589862 0.95296312
A1CF 12.2568 -0.0143727 0.875907 -0.0164089 0.98690817
A2M 175.9281 -1.5962724 0.612288 -2.6070619 0.00913229
A2M-AS1 70.6930 -0.9688552 0.494190 -1.9604894 0.04993861
A2ML1 20.0849 -0.3049334 0.624204 -0.4885156 0.62518465
padj DEG LogFDR
<numeric> <character> <numeric>
A1BG 0.2522373 NC 0.59819076
A1BG-AS1 0.9747968 NC 0.01108592
A1CF 0.9929060 NC 0.00309186
A2M 0.0325865 DN 1.48696254
A2M-AS1 0.1296658 NC 0.88717461
A2ML1 0.7703000 NC 0.11334008
# 以调变的基因作为例子,log2FoldChange是可选的筛选的标准
Up = rownames(filter(data.frame(res), DEG == "UP", log2FoldChange > 1))
Dn = rownames(filter(data.frame(res), DEG == "DN", log2FoldChange < -1))
# 我们模拟一些pathway,并检测富集性
for(i in 1:10){
pw = c(sample(Up, 10*i), sample(Dn, 100 - 10*i))
m = length(pw)
N = length(Pg)
n = length(Up)
k = length(intersect(pw, Up))
p = phyper(k, m, N-m, n, lower.tail = FALSE)
cat('Backgrond:', N, 'Up:', n, 'Pathway:', m, 'Overlap:', k, 'p-value:', p, '\n')
}
Backgrond: 29556 Up: 3930 Pathway: 100 Overlap: 10 p-value: 0.7921274
Backgrond: 29556 Up: 3930 Pathway: 100 Overlap: 20 p-value: 0.0213996
Backgrond: 29556 Up: 3930 Pathway: 100 Overlap: 30 p-value: 3.409843e-06
Backgrond: 29556 Up: 3930 Pathway: 100 Overlap: 40 p-value: 5.987918e-12
Backgrond: 29556 Up: 3930 Pathway: 100 Overlap: 50 p-value: 1.769887e-19
Backgrond: 29556 Up: 3930 Pathway: 100 Overlap: 60 p-value: 9.705833e-29
Backgrond: 29556 Up: 3930 Pathway: 100 Overlap: 70 p-value: 7.936605e-40
Backgrond: 29556 Up: 3930 Pathway: 100 Overlap: 80 p-value: 4.927482e-53
Backgrond: 29556 Up: 3930 Pathway: 100 Overlap: 90 p-value: 4.028224e-69
Backgrond: 29556 Up: 3930 Pathway: 100 Overlap: 100 p-value: 0
ClusterProfiler
是R中做功能富集分析最常用的软件包,它提供了多种功能,包括GO富集分析,KEGG通路富集分析,MSigDB数据库的富集分析等。详细文档,可参考ClusterProfiler。
ego <- enrichGO(gene = gene.df$ENTREZID,
OrgDb = org.Hs.eg.db,
keyType = "ENTREZID",
ont = "BP",
pAdjustMethod = "BH",
pvalueCutoff = 0.01,
qvalueCutoff = 0.05,
readable = TRUE)
head(ego[, 1:7])
ID
GO:0061644 GO:0061644
GO:0006334 GO:0006334
GO:0034728 GO:0034728
GO:0030099 GO:0030099
GO:0061515 GO:0061515
GO:0010506 GO:0010506
Description
GO:0061644 protein localization to CENP-A containing chromatin
GO:0006334 nucleosome assembly
GO:0034728 nucleosome organization
GO:0030099 myeloid cell differentiation
GO:0061515 myeloid cell development
GO:0010506 regulation of autophagy
GeneRatio BgRatio pvalue p.adjust qvalue
GO:0061644 15/2235 18/18870 6.981139e-12 4.162155e-08 3.859468e-08
GO:0006334 42/2235 118/18870 1.601777e-11 4.774896e-08 4.427648e-08
GO:0034728 44/2235 138/18870 3.343898e-10 6.645439e-07 6.162158e-07
GO:0030099 95/2235 430/18870 9.466825e-10 1.411030e-06 1.308415e-06
GO:0061515 29/2235 85/18870 6.333996e-08 6.752295e-05 6.261243e-05
GO:0010506 77/2235 355/18870 8.253368e-08 6.752295e-05 6.261243e-05
可视化:
dotplot(ego, showCategory=20)+
theme(axis.text.y = element_text(size=15), axis.text.x = element_text(size = 15), legend.text = element_text(size=15),
panel.grid.major = element_blank(),panel.grid.minor = element_blank())
也可以使用barplot来展示:
barplot(ego,showCategory=20)+
theme(axis.text.y = element_text(size=15), axis.text.x = element_text(size = 15),
legend.text = element_text(size=15), panel.grid = element_blank())
还可以自定义展示FDR:
ggplot(ego[1:20,], aes(x=-log(p.adjust, base=10), y=reorder(Description,p.adjust,decreasing = T))) +
geom_bar(stat = "identity", fill="red") +
theme_bw() +
xlab(label='-log10 FDR') +
ylab(label='') +
theme(axis.text.y = element_text(size=12,colour="black"), axis.text.x = element_text(size = 15,colour="black"),
panel.grid= element_blank())
需要注意的是,在分析GO和KEGG富集的时候,需要根据情况设定背景基因集,在enrichGO
和enrichKEGG
中,默认的是数据库中所有的基因。可以自定义背景:
Dn.df <- bitr(Dn, fromType = "SYMBOL", toType = "ENTREZID", OrgDb = org.Hs.eg.db)
ego1 <- enrichGO(gene = gene.df$ENTREZID,
OrgDb = org.Hs.eg.db,
keyType = "ENTREZID",
ont = "BP",
pAdjustMethod = "BH",
universe = unique(c(gene.df$ENTREZID, Dn.df$ENTREZID)),
pvalueCutoff = 0.01,
qvalueCutoff = 0.05,
readable = TRUE)
head(ego1)
ID
GO:0007267 GO:0007267
GO:0007167 GO:0007167
GO:0055085 GO:0055085
GO:0009888 GO:0009888
GO:0007169 GO:0007169
GO:0048646 GO:0048646
Description
GO:0007267 cell-cell signaling
GO:0007167 enzyme-linked receptor protein signaling pathway
GO:0055085 transmembrane transport
GO:0009888 tissue development
GO:0007169 transmembrane receptor protein tyrosine kinase signaling pathway
GO:0048646 anatomical structure formation involved in morphogenesis
GeneRatio BgRatio pvalue p.adjust qvalue
GO:0007267 240/2235 404/5013 3.204136e-10 5.715669e-07 4.675181e-07
GO:0007167 162/2235 255/5013 3.353282e-10 5.715669e-07 4.675181e-07
GO:0055085 243/2235 413/5013 9.088696e-10 1.032779e-06 8.447703e-07
GO:0009888 270/2235 470/5013 2.847550e-09 2.426825e-06 1.985042e-06
GO:0007169 114/2235 175/5013 2.032991e-08 1.386093e-05 1.133767e-05
GO:0048646 167/2235 278/5013 6.893108e-08 3.916434e-05 3.203481e-05
geneID
GO:0007267 ABCC4/ABR/ACE/ACHE/ACP4/ADAM8/ADGRA2/ADGRB1/ADORA2A/AKT1/ALOX5/AMH/ANAPC2/APBA3/APC/APC2/AQP1/ARRB2/ASPM/ATP6V0C/AXIN1/BAIAP2/BAIAP3/BCR/BGLAP/BRSK2/BTRC/C1QTNF12/C2CD2L/CA2/CACNA1E/CACNB1/CAPN10/CCN6/CCND1/CDH1/CDK16/CELSR1/CHRFAM7A/CHRNA10/COL1A1/CRY2/CSNK1A1L/CSNK1G2/CYP19A1/DAB2IP/DBH/DEPDC1B/DGKZ/DISC1/DNM1/DRP2/DVL1/DVL2/DVL3/DYSF/EFNA4/EPHA5/EPHB1/ETV2/EZH2/FAT1/FBXW4/FFAR1/FGF14/FGF17/FGF4/FKBP1B/FRAT1/FRAT2/FRMD8/FZD5/GABRR2/GAD1/GAL/GATA1/GATA6/GDF15/GFAP/GIT1/GJB7/GJC2/GLI1/GNA15/GPLD1/GPR176/GPR27/GRIN2A/GRIN2D/GRIN3B/GRK2/GRK6/GSK3A/HAP1/HCN2/HDAC6/HNF4A/HRAS/IGFBP2/INHBA/IRS2/JADE1/JAK2/JRK/KCNJ5/KCNQ1/KLF15/KREMEN1/KREMEN2/LBX2/LHB/LILRB1/LILRB2/LMBR1L/LPAR3/LRP1/LRRC4/LRRTM1/LRRTM2/LTBP3/LTBP4/LZIC/MAFA/MIDN/MITF/MPZ/NADK/NAPA/NEUROD2/NIBAN2/NOTCH1/NPHP4/NRARP/NSMF/NTNG2/NTRK1/OMP/OSM/P2RX1/PANX2/PCDH1/PCLO/PCSK5/PDGFB/PGF/PIAS4/PIM3/PINK1/PIP5K1C/PKD1/PLA2G6/PPM1A/PRKAR1B/PRKCG/PRKCZ/PRR7/PTK2B/PTPN23/PTPRA/PTPRN/PVALB/RAB11B/RAB3A/RAC3/RAPGEF2/RARA/RBX1/RELN/RETN/RFX3/RGS14/RIMS4/RNF10/RNF14/RPS6KA1/SCRIB/SEMA3B/SEPTIN5/SFRP2/SHANK2/SHISA6/SHISA7/SIAH2/SKI/SLC12A4/SLC12A7/SLC16A10/SLC25A22/SLC44A4/SLC6A9/SLC8B1/SMO/SNAI2/SNCA/SNCAIP/SNX3/SOX9/SQSTM1/SRC/SREBF1/STAB1/STK11/STX1B/STXBP2/SYT8/TBC1D24/TCIRG1/TERT/TFR2/TGFB1I1/TGM2/TLE2/TLE3/TMED1/TMEM132A/TMEM88/TNFRSF11A/TNXB/TPBGL/TPGS1/TRABD2B/TRPV1/TSC2/TSHR/TSPAN32/TSPO/TUBB2B/UBAC2/UBE3A/UCP2/UNC119/VGF/VPS18/WNK1/WNT1/ZACN/ZDHHC12/ZDHHC2/ZDHHC3/ZYX
GO:0007167 ACP4/ADAMTS3/ADGRA2/ADIPOR1/AKT1/AKT1S1/AKT2/AMH/ARID4B/ARRB2/AXIN1/BAIAP2/BMPER/C1QTNF12/CCN1/CHMP6/CHRD/CHRDL2/CHST11/COL1A1/COL4A2/CSPG4/CSRNP1/CTSD/DAB2IP/DOK1/DOK3/ECSIT/EFNA4/EMILIN1/EPHA2/EPHA3/EPHA5/EPHB1/EPHB3/EPHB4/ERBB2/ERBB4/ERCC2/ETV2/FAM20C/FAM83A/FAM89B/FBXL15/FGF17/FGF19/FGF4/FGFR4/FGFRL1/FKBP8/FLCN/FLRT1/FLT4/FOXO4/FRK/FRS3/FSTL3/FURIN/FUT7/GATA6/GDF15/GIT1/GPC1/GPLD1/GSK3A/HAP1/HGS/HIP1R/HNF4A/HRAS/HSPB1/IGF1/IGF1R/IGF2/IGFBP2/INHBA/INPP5K/INPPL1/INSR/IRS2/ITGA5/ITGA8/JAK2/JAK3/KCNQ1OT1/KDR/KIAA0319/LEFTY1/LONP1/LRP1/LTBP2/LTBP3/LTBP4/MAP2K2/MEGF8/MIR26B/MMP9/NFIA/NIBAN2/NKX2-1/NOTCH1/NOTCH2/NRTN/NTRK1/NTRK3/OSBPL8/PDGFB/PDK2/PGF/PIGR/PIK3CD/PIK3R2/PIP4K2A/PPM1A/PPP2R5B/PRKCZ/PTK2B/PTP4A3/PTPRA/PTPRF/PTPRT/PXN/RAPGEF2/RASGRP4/RET/RGS14/RHBDF1/RHBDF2/RNF126/SCX/SESN3/SFRP2/SH2B1/SH2B2/SHKBP1/SKI/SLC2A4/SNCA/SOCS3/SORBS1/SOX9/SRC/SREBF1/STK11/TGFB1I1/TMPRSS6/TNS2/TNXB/TSC2/TYK2/UBE2O/VASN/VEGFA/WFIKKN1/WNT1/ZBTB7B/ZC3H3/ZDHHC17/ZEB1/ZGPAT/ZNF451/ZYX
GO:0055085 ABCA13/ABCA2/ABCA7/ABCB10/ABCB5/ABCB6/ABCC2/ABCC3/ABCC4/ABCC5/ABCD1/ABCG2/ACE/AKT1/AKT2/ANKH/ANO3/ANO8/ANO9/ANTKMT/AQP1/AQP10/ARL6IP1/ASIC3/ASPSCR1/ATP13A1/ATP13A2/ATP1B2/ATP2A3/ATP6V0A1/ATP6V0B/ATP6V0C/ATP7B/BCR/BEST1/BEST4/BLOC1S3/C1QTNF12/CA2/CACNA1E/CACNA1I/CACNA2D4/CACNB1/CAPN1/CAPN10/CATSPER1/CBARP/CHRFAM7A/CHRNA10/CLCN2/CLCN7/CLIC2/CNIH2/CNIH3/CYB561D2/CYBRD1/FGF14/FGF19/FKBP1B/G6PD/GABRR2/GAL/GFAP/GJB7/GJC2/GNB2/GP1BB/GP9/GPR89A/GPR89B/GRIN2A/GRIN2D/GRIN3B/GSK3A/HAMP/HAP1/HCN2/IGF1/INPP5K/INSR/IRS2/ITLN1/JSRP1/KCNAB3/KCNC3/KCND1/KCNG2/KCNH2/KCNH3/KCNJ15/KCNJ5/KCNK7/KCNQ1/KCNQ4/KCNT1/KEL/KLF15/MCOLN1/MFSD10/MFSD12/MFSD2B/MFSD3/MMP9/NCS1/NEDD4L/NIPAL1/OAZ1/ORAI2/ORAI3/OSBPL8/P2RX1/PAM16/PANX2/PDE4D/PEX10/PEX16/PIEZO1/PIM1/PKD1/PKD1L3/PLCB2/PLCB3/PLCD1/PLCH2/PPIF/PTK2B/PTPN6/RELN/RHAG/RHCE/RHD/RNF5/SCN4A/SCNN1A/SFXN5/SHISA6/SHISA7/SLC10A3/SLC10A7/SLC11A1/SLC11A2/SLC12A4/SLC12A7/SLC12A9/SLC14A1/SLC16A10/SLC16A3/SLC16A5/SLC17A9/SLC18B1/SLC19A1/SLC1A5/SLC22A16/SLC22A17/SLC22A18/SLC22A20P/SLC22A23/SLC22A4/SLC23A3/SLC24A4/SLC25A21/SLC25A22/SLC25A37/SLC25A38/SLC25A39/SLC25A42/SLC25A48/SLC26A1/SLC26A11/SLC26A6/SLC26A8/SLC29A2/SLC2A1/SLC2A14/SLC2A4/SLC2A6/SLC30A6/SLC34A3/SLC35C2/SLC35E2A/SLC35E2B/SLC36A1/SLC37A3/SLC38A10/SLC38A5/SLC39A13/SLC40A1/SLC43A1/SLC43A2/SLC44A2/SLC44A3/SLC44A4/SLC49A3/SLC4A1/SLC4A2/SLC4A3/SLC5A1/SLC5A2/SLC5A8/SLC66A1/SLC6A19/SLC6A8/SLC6A9/SLC7A1/SLC7A5/SLC7A5P1/SLC8B1/SLC9A1/SLC9A3/SLCO1A2/SLCO1B3/SLMAP/SNCA/SORBS1/SPHK2/SPNS2/TCIRG1/TERT/TESC/TIMM17B/TIMM23B/TLR9/TMC5/TMC6/TMC8/TMEM120A/TMEM175/TMEM38A/TMEM63B/TOMM20L/TPCN1/TPCN2/TRPM2/TRPM5/TRPV1/TRPV5/TSC1/TSPO2/TST/TTYH3/UCP2/WNK1/ZACN
GO:0009888 ACADVL/ADAM8/ADAMTSL4/ADIPOR1/ADM/AGPAT2/AKT1/AKT2/ALOX5/ALPK3/ALPL/AMBRA1/AMH/ANKH/APC/AQP1/ARID4B/ARL13B/ARRB2/AXIN1/BCR/BGLAP/BMPER/BRSK2/BSG/BTRC/CA2/CAMSAP3/CARMIL2/CBS/CCDC78/CCN1/CCND1/CD151/CDH23/CDHR2/CEBPA/CEBPB/CELSR1/CEP63/CHRD/CHRDL2/CHST11/CLCN2/COL11A1/COL12A1/COL18A1/COL1A1/COL4A2/COL7A1/CRLF1/CSMD1/CST6/CTDP1/CTSB/CUL7/CYP1B1/CYP26C1/DAB2IP/DCHS1/DGAT2/DIPK2A/DSP/DUSP2/DVL1/DVL2/DVL3/DYRK1B/E2F4/EFEMP2/EPCAM/EPHA2/EPHA3/EPHB1/EPOR/EPPK1/ERBB4/ERCC2/ETV2/EVPL/EZH2/FAM20A/FAM20C/FAM83H/FASN/FAT1/FBXL15/FES/FGF19/FGF4/FHL2/FKBP8/FLCN/FLOT2/FRAS1/FZD5/FZR1/G6PD/GAA/GAL/GATA1/GATA6/GCNT3/GLI1/GPC1/GPLD1/GPX1/GSK3A/GTF3C5/HRAS/HSF4/HYAL2/HYAL3/IFITM5/IFT122/IGF1/IGF2/INHBA/INSR/IQGAP3/ITGA5/ITGA7/ITGA8/ITGAX/ITGB4/JAG2/JAK2/JUNB/KANK2/KCNQ1/KDM6B/KDR/KEL/KLF15/KLF2/KRT1/LAMA5/LAMB2/LAMB3/LBX2/LFNG/LGALS3/LTBP3/MAP2K2/MAPK11/MATN1/MBOAT2/MEF2D/MEGF8/MICAL2/MMP9/MTHFR/MYBPC3/MYH6/MYL6/NCOA1/NFIA/NFKBIZ/NIBAN2/NKX2-1/NOTCH1/NOTCH2/NOTCH4/NPRL3/NRARP/NRTN/NTRK1/OVOL3/PBX1/PDCD10/PDE2A/PDE4D/PDGFB/PDZD7/PERCC1/PHOSPHO1/PIAS4/PIK3CD/PIM1/PKD1/PKDCC/PLEC/PLXNA1/PLXNB2/PLXND1/POFUT2/PPARGC1A/PPL/PTCH2/PTK2B/RADIL/RAPGEF2/RARA/RCN3/RET/RFX3/RGCC/RPS6KA6/RXRA/SBNO2/SCRIB/SCX/SEMA3B/SEMA4A/SEMA4B/SEMA4C/SEMA6B/SEMA6C/SFRP2/SHARPIN/SIPA1/SKI/SLC24A4/SLC39A13/SLC40A1/SLC44A4/SLC4A2/SLC9A1/SMO/SNAI1/SNAI2/SOCS3/SOX15/SOX6/SOX9/SPINT1/SRC/SRPK3/STIL/TAL1/TBC1D32/TBX6/TCAP/TCIRG1/TECTA/TFDP1/TGFB1I1/TGM1/TGM2/THBS3/TLR9/TMEM79/TMOD1/TNXB/TOLLIP/TPRN/TRIOBP/TSC1/TSC2/TST/UBB/UMODL1/UPK2/USP19/VASN/VEGFA/WNT1/WT1/XK/YBX1/YBX3/YIPF6/ZBTB17/ZBTB7B/ZEB1/ZFP36/ZFP36L1/ZFPM1/ZNF358/ZNF516
GO:0007169 ACP4/ADAMTS3/ADGRA2/ADIPOR1/AKT1/AKT1S1/AKT2/BAIAP2/C1QTNF12/CHMP6/COL1A1/COL4A2/CSPG4/CSRNP1/CTSD/DAB2IP/DOK1/DOK3/EFNA4/EMILIN1/EPHA2/EPHA3/EPHA5/EPHB1/EPHB3/EPHB4/ERBB2/ERBB4/ERCC2/FAM20C/FAM83A/FGF17/FGF19/FGF4/FGFR4/FGFRL1/FLRT1/FLT4/FOXO4/FRK/FRS3/FUT7/GDF15/GIT1/GPC1/GPLD1/GSK3A/HAP1/HGS/HIP1R/HRAS/HSPB1/IGF1/IGF1R/IGF2/IGFBP2/INPP5K/INPPL1/INSR/IRS2/ITGA5/JAK2/JAK3/KDR/LONP1/LRP1/MAP2K2/MMP9/NIBAN2/NRTN/NTRK1/NTRK3/OSBPL8/PDGFB/PDK2/PGF/PIGR/PIK3CD/PIK3R2/PIP4K2A/PPP2R5B/PRKCZ/PTK2B/PTP4A3/PTPRA/PTPRT/PXN/RAPGEF2/RASGRP4/RET/RGS14/RHBDF1/RHBDF2/RNF126/SESN3/SH2B1/SH2B2/SHKBP1/SLC2A4/SNCA/SOCS3/SORBS1/SOX9/SRC/SREBF1/TNS2/TNXB/TSC2/TYK2/VEGFA/WNT1/ZBTB7B/ZDHHC17/ZGPAT
GO:0048646 ABCA2/ACE/ADAM8/ADGRA2/ADGRB1/ADM/ADM2/AGO2/AKT1/ALOX5/AQP1/AXIN1/BCL3/BCL6/BMPER/BSG/CCN1/CCN6/CELSR1/CEND1/COL11A1/COL12A1/COL18A1/COL1A1/COL4A2/COL7A1/CSF3R/CSPG4/CYP19A1/CYP1B1/DAB2IP/DCHS1/DUSP2/DVL1/DVL2/DVL3/DYRK1B/E2F2/EGFL7/EMILIN1/EPHA2/EPHB1/EPHB3/EPHB4/EPN1/ERBB2/ERCC2/ETV2/FAM20A/FAM20C/FHL2/FLT4/FOXO4/FURIN/FZD5/GADD45A/GATA1/GATA6/GDF15/GLUL/GPC1/GPLD1/GPX1/HDAC5/HDAC7/HERC1/HGS/HIF3A/HSPB1/HSPG2/IFT122/IL4R/INHBA/ITGA5/ITGA7/ITGA8/ITGAX/ITGB4/JMJD8/JUNB/KDM6B/KDR/KLF2/KRT1/LAMB3/LFNG/MAP2K2/MAPK7/MBOAT7/MFNG/MIR126/MMP9/MTHFR/MYH6/NBEAL2/NFKB2/NIBAN2/NINJ1/NKX2-1/NOTCH1/NOTCH2/NOTCH4/NRARP/OBSCN/PDCD10/PGF/PIK3CD/PIM1/PLCD3/PLEC/PLXNA1/PLXNA3/PLXNB2/PLXND1/POFUT2/PTK2B/PTPN6/RARA/RELN/RELT/RET/RFX2/RGCC/S100A1/SBNO2/SCRIB/SCX/SDK1/SEMA4A/SEMA4C/SFRP2/SKI/SLC24A4/SLC40A1/SLC44A4/SLC4A2/SMARCD3/SMO/SNAI1/SNAI2/SOX9/SPHK1/SPINT1/STAB1/STIL/TAL1/TBX6/TBXA2R/TCAP/TCIRG1/TERT/TGM2/THBS3/TMOD1/TNFAIP2/TNFRSF12A/TSC1/TSC2/TYMP/UNC13D/VEGFA/WASF2/WNK1/WNT1/WT1/ZC3H12A/ZFPM1
Count
GO:0007267 240
GO:0007167 162
GO:0055085 243
GO:0009888 270
GO:0007169 114
GO:0048646 167
kk <- enrichKEGG(gene = gene.df$ENTREZID,
organism = 'hsa',
pAdjustMethod = "BH",
pvalueCutoff = 0.01,
qvalueCutoff = 0.05,)
head(kk[,1:7])
category subcategory
hsa04613 Organismal Systems Immune system
hsa05034 Human Diseases Substance dependence
hsa05322 Human Diseases Immune disease
hsa04010 Environmental Information Processing Signal transduction
hsa05231 Human Diseases Cancer: overview
hsa04072 Environmental Information Processing Signal transduction
ID Description GeneRatio
hsa04613 hsa04613 Neutrophil extracellular trap formation 75/1135
hsa05034 hsa05034 Alcoholism 72/1135
hsa05322 hsa05322 Systemic lupus erythematosus 59/1135
hsa04010 hsa04010 MAPK signaling pathway 69/1135
hsa05231 hsa05231 Choline metabolism in cancer 29/1135
hsa04072 hsa04072 Phospholipase D signaling pathway 38/1135
BgRatio pvalue
hsa04613 193/8866 2.183791e-20
hsa05034 188/8866 3.426042e-19
hsa05322 141/8866 4.308680e-18
hsa04010 300/8866 5.311868e-07
hsa05231 99/8866 1.025024e-05
hsa04072 149/8866 1.727945e-05
MSigDB是一个Molucular Signatures Database
, 也包含了多种数据库通路的基因集,如KEGG
,‘GO’,‘BIOCARTA’, REACTOME
等。
x <- msigdbr_collections()
print(x, n = 1000)
# A tibble: 23 × 3
gs_cat gs_subcat num_genesets
<chr> <chr> <int>
1 C1 "" 299
2 C2 "CGP" 3384
3 C2 "CP" 29
4 C2 "CP:BIOCARTA" 292
5 C2 "CP:KEGG" 186
6 C2 "CP:PID" 196
7 C2 "CP:REACTOME" 1615
8 C2 "CP:WIKIPATHWAYS" 664
9 C3 "MIR:MIRDB" 2377
10 C3 "MIR:MIR_Legacy" 221
11 C3 "TFT:GTRD" 518
12 C3 "TFT:TFT_Legacy" 610
13 C4 "CGN" 427
14 C4 "CM" 431
15 C5 "GO:BP" 7658
16 C5 "GO:CC" 1006
17 C5 "GO:MF" 1738
18 C5 "HPO" 5071
19 C6 "" 189
20 C7 "IMMUNESIGDB" 4872
21 C7 "VAX" 347
22 C8 "" 700
23 H "" 50
还是选择其中的KEGG
进行分析:
# A tibble: 6 × 15
gs_cat gs_subcat gs_name gene_symbol entrez_gene ensembl_gene
<chr> <chr> <chr> <chr> <int> <chr>
1 C2 CP:KEGG KEGG_ABC_TRAN… ABCA1 19 ENSG0000016…
2 C2 CP:KEGG KEGG_ABC_TRAN… ABCA10 10349 ENSG0000015…
3 C2 CP:KEGG KEGG_ABC_TRAN… ABCA12 26154 ENSG0000014…
4 C2 CP:KEGG KEGG_ABC_TRAN… ABCA13 154664 ENSG0000017…
5 C2 CP:KEGG KEGG_ABC_TRAN… ABCA2 20 ENSG0000010…
6 C2 CP:KEGG KEGG_ABC_TRAN… ABCA3 21 ENSG0000016…
# ℹ 9 more variables: human_gene_symbol <chr>,
# human_entrez_gene <int>, human_ensembl_gene <chr>, gs_id <chr>,
# gs_pmid <chr>, gs_geoid <chr>, gs_exact_source <chr>,
# gs_url <chr>, gs_description <chr>
制作Term to Gene
表:
KEGG_DB_t2g <- KEGG_DB %>% dplyr::distinct(gs_name, entrez_gene) %>% as.data.frame()
head(KEGG_DB_t2g)
gs_name entrez_gene
1 KEGG_ABC_TRANSPORTERS 19
2 KEGG_ABC_TRANSPORTERS 10349
3 KEGG_ABC_TRANSPORTERS 26154
4 KEGG_ABC_TRANSPORTERS 154664
5 KEGG_ABC_TRANSPORTERS 20
6 KEGG_ABC_TRANSPORTERS 21
进行富集分析:
upKEGG2 <- enricher(gene = gene.df$ENTREZID, universe = backgroud, TERM2GENE = KEGG_DB_t2g, pvalueCutoff = 0.05)
upKEGG2 <- setReadable(upKEGG2, OrgDb = 'org.Hs.eg.db', keyType = 'ENTREZID')
head(upKEGG2[,1:7])
ID
KEGG_SYSTEMIC_LUPUS_ERYTHEMATOSUS KEGG_SYSTEMIC_LUPUS_ERYTHEMATOSUS
KEGG_GLYCEROPHOSPHOLIPID_METABOLISM KEGG_GLYCEROPHOSPHOLIPID_METABOLISM
Description
KEGG_SYSTEMIC_LUPUS_ERYTHEMATOSUS KEGG_SYSTEMIC_LUPUS_ERYTHEMATOSUS
KEGG_GLYCEROPHOSPHOLIPID_METABOLISM KEGG_GLYCEROPHOSPHOLIPID_METABOLISM
GeneRatio BgRatio pvalue
KEGG_SYSTEMIC_LUPUS_ERYTHEMATOSUS 59/722 117/4402 5.707778e-18
KEGG_GLYCEROPHOSPHOLIPID_METABOLISM 24/722 70/4402 1.798461e-04
p.adjust qvalue
KEGG_SYSTEMIC_LUPUS_ERYTHEMATOSUS 1.010277e-15 9.673182e-16
KEGG_GLYCEROPHOSPHOLIPID_METABOLISM 1.591638e-02 1.523959e-02
可视化:
dotplot(upKEGG2,showCategory=20)+
theme(axis.text.y = element_text(size=15), axis.text.x = element_text(size = 15), legend.text = element_text(size=15),
panel.grid.major = element_blank(),panel.grid.minor = element_blank())
GSEA是广泛使用的通路富集分析方法,当样本的生物学重复数较少时,可以按照差异表达水平对基因进行排序,然后检验给定的基因集是否随机分布在这个排序上。如果不随机,说明显著性富集。
fc = data.frame(gene_symbol = rownames(res), log2FoldChange = res$log2FoldChange)
fc = filter(fc, !is.na(log2FoldChange))
head(fc)
gene_symbol log2FoldChange
1 A1BG 0.56964866
2 A1BG-AS1 -0.01941167
3 A1CF -0.01437268
4 A2M -1.59627245
5 A2M-AS1 -0.96885522
6 A2ML1 -0.30493344
write.table(fc, file='training_stroke_vs_control_DESeq2_log2FC.rnk', sep='\t', col.names=F, row.names=F, quote=F)
在命令行运行GSEA:
/sibcb1/bioinformatics/biocourse/biocourse01/software/GSEA_Linux_4.3.3/gsea-cli.sh GSEAPreranked \
-rnk training_stroke_vs_control_DESeq2_log2FC.rnk \
-gmx c2.cp.kegg_medicus.v2023.2.Hs.symbols.gmt \
-collapse false \
-scoring_scheme weighted \
-nperm 1000 \
-rpt_label kegg \
-make_sets true \
-plot_top_x 20 \
-set_max 500 -set_min 15 \
-zip_report false \
-create_svgs true \
-out gsea
分析结果在gsea
文件夹中。