加载R包:
在这里,我们通过两个示例数据集,展示scRNA-seq差异基因表达分析的基本方法,并通过热图展示结果。
col_palatte <- c('#6965CE', '#EE5755', '#75E15A', '#D5CE85', '#9F33D9', '#9BE2CF', '#69A5C8',
'#FC9500', '#D846AF', '#CE95C4', '#0018F9')
我们推荐使用秩和检验对单细胞数据进行差异表达分析:test.use = “wilcox”
m1 = FindMarkers(pbmc, ident.1 = "FCR3A+ Monocytes", only.pos = TRUE, test.use = "wilcox")
m1 %>% head() %>% arrange(-pct.1)
p_val avg_log2FC pct.1 pct.2 p_val_adj
FCGR3A 3.511192e-184 4.064902 0.975 0.134 4.815249e-180
RP11-290F20.3 8.429376e-200 4.063794 0.840 0.068 1.156005e-195
MS4A7 1.710321e-186 4.387850 0.809 0.073 2.345534e-182
HES4 1.516412e-207 4.993200 0.586 0.020 2.079608e-203
CDKN1C 1.689064e-212 5.428812 0.506 0.010 2.316382e-208
CKB 8.230606e-168 5.881212 0.370 0.005 1.128745e-163
这里我们想比较FCR3A+ Monocytes
和其他细胞之间表达差异,也就是One versus all
,可以看到FCGR3A是差异表达最显著的基因之一。默认情况下,ident.1
提取的是Idents
的结果,我们也可以认为指定其他的变量。
Naive T cells CD14+ Monocytes Memory CD4+ T cells B CD8+ T
0 684 0 0 0 0
1 0 481 0 0 0
2 0 0 476 0 0
3 0 0 0 344 0
4 0 0 0 0 291
5 0 0 0 0 0
6 0 0 0 0 0
7 0 0 0 0 0
8 0 0 0 0 0
FCR3A+ Monocytes NK DC Platelet
0 0 0 0 0
1 0 0 0 0
2 0 0 0 0
3 0 0 0 0
4 0 0 0 0
5 162 0 0 0
6 0 155 0 0
7 0 0 32 0
8 0 0 0 13
m2 = FindMarkers(pbmc, ident.1 = 5, group.by = 'seurat_clusters', only.pos = TRUE, test.use = "wilcox")
m2 %>% head() %>% arrange(-pct.1)
p_val avg_log2FC pct.1 pct.2 p_val_adj
FCGR3A 3.511192e-184 4.064902 0.975 0.134 4.815249e-180
RP11-290F20.3 8.429376e-200 4.063794 0.840 0.068 1.156005e-195
MS4A7 1.710321e-186 4.387850 0.809 0.073 2.345534e-182
HES4 1.516412e-207 4.993200 0.586 0.020 2.079608e-203
CDKN1C 1.689064e-212 5.428812 0.506 0.010 2.316382e-208
CKB 8.230606e-168 5.881212 0.370 0.005 1.128745e-163
这里我们比较了meta.data的’seurat_clusters’这一列所定义的cluster 5(FCR3A+ Monocytes)和其他细胞之间的差异基因表达。可以看出,我们得到了完全相同的结果。
m3 = FindMarkers(pbmc, ident.1 = "Naive T cells", ident.2 = 'Memory CD4+ T cells', only.pos = FALSE, test.use = "wilcox")
m3 %>% filter(rownames(m3) %in% c('CCR7', 'S100A4'))
p_val avg_log2FC pct.1 pct.2 p_val_adj
S100A4 9.932175e-69 -1.440462 0.680 0.952 1.362099e-64
CCR7 6.200586e-16 1.256239 0.447 0.252 8.503484e-12
可以看出,CCR7是Naive T cells的marker,相对于Memory CD4+ T cells是高表达的。反过来,S100A4是Memory CD4+ T cells的marker,在其中高表达。
这里我们分析了小鼠Nash模型的肝组织单细胞测序数据GSE129516,包含6个样本,3个处理组(nash),三个对照组(chow)。
mouse_harmony = readRDS("output/mouse_harmony.RDS")
mouse_harmony = JoinLayers(mouse_harmony)
DimPlot(mouse_harmony, label = T, pt.size = 0.0001, cols = col_palatte)
在文章的Figure 3D中,作者比较了’Endo’细胞在两种状态下基因表达谱的变化,我们看看能否观测到类似的结果。
endoDEG = c('Fabp4', 'Cd36', 'Cxcl9', 'Bmp2', 'Nrp1', 'Lifr')
m4 = FindMarkers(mouse_harmony, ident.1 = "nash", ident.2 = "chow", group.by = 'sample', subset.ident = "Endo")
m4[endoDEG, ]
p_val avg_log2FC pct.1 pct.2 p_val_adj
Fabp4 0.000000e+00 2.6730162 0.986 0.579 0.000000e+00
Cd36 0.000000e+00 1.1905355 0.927 0.588 0.000000e+00
Cxcl9 3.065900e-96 1.3880939 0.471 0.279 5.932210e-92
Bmp2 1.306220e-175 -0.8483164 0.772 0.893 2.527404e-171
Nrp1 0.000000e+00 -1.0563196 0.844 0.964 0.000000e+00
Lifr 6.993346e-301 -1.1174045 0.671 0.905 1.353143e-296
查看这些基因:
FeaturePlot(mouse_harmony, features = 'Fabp4', split.by = 'sample', cols=c('grey95', 'red3'), label=F, pt.size=0.001) &
theme(legend.position = c(0.05,0.8))
我们可以使用DoHeatmap
方便的展示热图。需要注意的是,该函数默认展示的是scale.data
这个slot,更清晰地展示相对表达变化,也能改成data
。
mx = FindMarkers(pbmc, ident.1 = "Memory CD4+ T cells", only.pos = TRUE, test.use = "wilcox")
top20 = rownames(mx[1:20, ])
DoHeatmap(pbmc, features = top20, label = FALSE) + theme(legend.position="bottom")
mx = FindMarkers(mouse_harmony, ident.1 = "nash", ident.2 = "chow", group.by = 'sample', subset.ident = "Endo")
gx = mx %>% filter(avg_log2FC > 0, p_val_adj < 0.05) %>% head(20)
cx = mouse_harmony[[]] %>% filter(cellType == 'Endo', orig.ident %in% c('chow1', 'nash1'))
DoHeatmap(mouse_harmony, features = rownames(gx), cells = rownames(cx), group.by = "sample") + theme(legend.position="bottom")
首先我们通过提取一部分数据画图,只展示Endo细胞:
我们还可以展示所有的细胞类型:
R version 4.3.3 (2024-02-29)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server 7.7 (Maipo)
Matrix products: default
BLAS/LAPACK: /sibcb1/bioinformatics/biocourse/biocourse01/miniforge3/envs/scRNA/lib/libopenblasp-r0.3.27.so; LAPACK version 3.12.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: Asia/Shanghai
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base
other attached packages:
[1] patchwork_1.2.0 reticulate_1.35.0 harmony_1.2.0
[4] Rcpp_1.0.12 Seurat_5.0.3 SeuratObject_5.0.2
[7] sp_2.1-3 dplyr_1.1.4 data.table_1.15.2
[10] ggplot2_3.5.0 knitr_1.46 rmarkdown_2.26
loaded via a namespace (and not attached):
[1] RColorBrewer_1.1-3 rstudioapi_0.16.0
[3] jsonlite_1.8.8 magrittr_2.0.3
[5] spatstat.utils_3.0-5 farver_2.1.1
[7] fs_1.6.3 vctrs_0.6.5
[9] ROCR_1.0-11 memoise_2.0.1
[11] spatstat.explore_3.2-6 usethis_2.2.3
[13] htmltools_0.5.8.1 curl_5.1.0
[15] sass_0.4.9 sctransform_0.4.1
[17] parallelly_1.37.1 KernSmooth_2.23-22
[19] bslib_0.7.0 desc_1.4.3
[21] htmlwidgets_1.6.4 ica_1.0-3
[23] fontawesome_0.5.2 plyr_1.8.9
[25] plotly_4.10.4 zoo_1.8-12
[27] cachem_1.0.8 igraph_2.0.2
[29] mime_0.12 lifecycle_1.0.4
[31] pkgconfig_2.0.3 Matrix_1.6-5
[33] R6_2.5.1 fastmap_1.1.1
[35] fitdistrplus_1.1-11 future_1.33.2
[37] shiny_1.8.1.1 digest_0.6.35
[39] colorspace_2.1-0 ps_1.7.7
[41] tensor_1.5 RSpectra_0.16-1
[43] irlba_2.3.5.1 pkgload_1.3.4
[45] labeling_0.4.3 progressr_0.14.0
[47] fansi_1.0.6 spatstat.sparse_3.0-3
[49] httr_1.4.7 polyclip_1.10-6
[51] abind_1.4-5 compiler_4.3.3
[53] remotes_2.5.0 withr_3.0.0
[55] fastDummies_1.7.3 pkgbuild_1.4.4
[57] highr_0.10 MASS_7.3-60
[59] sessioninfo_1.2.2 tools_4.3.3
[61] lmtest_0.9-40 httpuv_1.6.15
[63] future.apply_1.11.2 goftest_1.2-3
[65] glue_1.7.0 callr_3.7.6
[67] nlme_3.1-164 promises_1.3.0
[69] grid_4.3.3 Rtsne_0.17
[71] cluster_2.1.6 reshape2_1.4.4
[73] generics_0.1.3 gtable_0.3.4
[75] spatstat.data_3.0-4 tidyr_1.3.1
[77] utf8_1.2.4 spatstat.geom_3.2-9
[79] RcppAnnoy_0.0.22 ggrepel_0.9.5
[81] RANN_2.6.1 pillar_1.9.0
[83] stringr_1.5.1 spam_2.10-0
[85] RcppHNSW_0.6.0 limma_3.58.1
[87] later_1.3.2 splines_4.3.3
[89] lattice_0.22-6 survival_3.5-8
[91] deldir_2.0-4 tidyselect_1.2.1
[93] miniUI_0.1.1.1 pbapply_1.7-2
[95] downlit_0.4.4 gridExtra_2.3
[97] bookdown_0.40 distill_1.6
[99] scattermore_1.2 xfun_0.43
[101] statmod_1.5.0 devtools_2.4.5
[103] matrixStats_1.1.0 stringi_1.8.3
[105] lazyeval_0.2.2 yaml_2.3.8
[107] evaluate_0.23 codetools_0.2-20
[109] tibble_3.2.1 cli_3.6.2
[111] uwot_0.1.16 xtable_1.8-4
[113] processx_3.8.4 munsell_0.5.1
[115] jquerylib_0.1.4 globals_0.16.3
[117] spatstat.random_3.2-3 png_0.1-8
[119] parallel_4.3.3 ellipsis_0.3.2
[121] presto_1.0.0 dotCall64_1.1-1
[123] profvis_0.3.8 urlchecker_1.0.1
[125] listenv_0.9.1 viridisLite_0.4.2
[127] scales_1.3.0 ggridges_0.5.6
[129] leiden_0.4.3.1 purrr_1.0.2
[131] rlang_1.1.3 cowplot_1.1.3