Pseudotime analysis using Monocle2

loading packages

In [1]:
library(Seurat)
library(monocle)
library(data.table)
Loading required package: SeuratObject

Loading required package: sp


Attaching package: ‘SeuratObject’


The following objects are masked from ‘package:base’:

    intersect, saveRDS


Loading Seurat v5 beta version 
To maintain compatibility with previous workflows, new Seurat objects will use the previous object structure by default
To use new Seurat v5 assays: Please run: options(Seurat.object.assay.version = 'v5')

Loading required package: Matrix

Loading required package: Biobase

Loading required package: BiocGenerics


Attaching package: ‘BiocGenerics’


The following object is masked from ‘package:SeuratObject’:

    intersect


The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs


The following objects are masked from ‘package:base’:

    anyDuplicated, aperm, append, as.data.frame, basename, cbind,
    colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
    get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
    Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort,
    table, tapply, union, unique, unsplit, which.max, which.min


Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.


Loading required package: ggplot2

Loading required package: VGAM

Loading required package: stats4

Loading required package: splines

Loading required package: DDRTree

Loading required package: irlba

load data

In [2]:
project <- readRDS("data/PbmcRDS/pbmc.rds")
In [3]:
project
An object of class Seurat 
13714 features across 2638 samples within 1 assay 
Active assay: RNA (13714 features, 2000 variable features)
 3 layers present: counts, data, scale.data
 3 dimensional reductions calculated: pca, tsne, umap
In [54]:
DimPlot(project)
In [58]:
FeaturePlot(object = project, features = 'CD8B', label = TRUE, cols=c('grey95', 'red3'), pt.size=0.001)
In [61]:
features = c("CD8A", "LYZ", "CCL5", "IL32", "PTPRCAP", "FCGR3A", "PF4")
DotPlot(project, features = features) + RotatedAxis()

building new CDS object

In [6]:
data <- as(as.matrix(project@assays$RNA@counts),'sparseMatrix')
pd <- project@meta.data
fd <- data.frame(gene_short_name = row.names(project),row.names=row.names(project))
In [7]:
head(pd)
A data.frame: 6 × 7
orig.identnCount_RNAnFeature_RNApercent.mtRNA_snn_res.0.5seurat_clustersCellType
<fct><dbl><int><dbl><fct><fct><fct>
AAACATACAACCAC-1pbmc3k2419 7793.017775922CD14+ Mono
AAACATTGAGCTAC-1pbmc3k490313523.793595833B
AAACATTGATCAGC-1pbmc3k314711290.889736322CD14+ Mono
AAACCGTGCTTCCG-1pbmc3k2639 9601.743084511Memory CD4 T
AAACCGTGTATGCG-1pbmc3k 980 5211.224489866NK
AAACGCACTGGTAC-1pbmc3k2163 7811.664355122CD14+ Mono
In [8]:
head(fd)
A data.frame: 6 × 1
gene_short_name
<chr>
AL627309.1AL627309.1
AP006222.2AP006222.2
RP11-206L10.2RP11-206L10.2
RP11-206L10.9RP11-206L10.9
LINC00115LINC00115
NOC2LNOC2L
In [9]:
pd <- new("AnnotatedDataFrame",data=pd)
fd <- new("AnnotatedDataFrame",data=fd)
cds <- newCellDataSet(data, #expression data matrix for an experiment
                      phenoData = pd, #data frame containing attributes of individual cells
                      featureData = fd, #data frame containing attributes of features (e.g. genes)
                      lowerDetectionLimit = 0.5, #the minimum expression level that consistitutes true expression
                      expressionFamily = negbinomial.size()) #the VGAM family function to be used for expression response variables
In [10]:
cds
CellDataSet (storageMode: environment)
assayData: 13714 features, 2638 samples 
  element names: exprs 
protocolData: none
phenoData
  sampleNames: AAACATACAACCAC-1 AAACATTGAGCTAC-1 ... TTTGCATGCCTCAC-1
    (2638 total)
  varLabels: orig.ident nCount_RNA ... Size_Factor (8 total)
  varMetadata: labelDescription
featureData
  featureNames: AL627309.1 AP006222.2 ... SRSF10.1 (13714 total)
  fvarLabels: gene_short_name
  fvarMetadata: labelDescription
experimentData: use 'experimentData(object)'
Annotation:  

Estimating size factors and dispersions

In [11]:
cds <- estimateSizeFactors(cds)
cds <- estimateDispersions(cds)
Warning message:
“`group_by_()` was deprecated in dplyr 0.7.0.
 Please use `group_by()` instead.
 See vignette('programming') for more help
 The deprecated feature was likely used in the monocle package.
  Please report the issue to the authors.”
Warning message:
“`select_()` was deprecated in dplyr 0.7.0.
 Please use `select()` instead.
 The deprecated feature was likely used in the monocle package.
  Please report the issue to the authors.”
Warning message:
“glm.fit: algorithm did not converge”
Removing 276 outliers

In [12]:
cds <- detectGenes(cds, min_expr = 0.1)
In [13]:
head(fData(cds))
A data.frame: 6 × 2
gene_short_namenum_cells_expressed
<chr><int>
AL627309.1AL627309.1 9
AP006222.2AP006222.2 3
RP11-206L10.2RP11-206L10.2 5
RP11-206L10.9RP11-206L10.9 3
LINC00115LINC00115 18
NOC2LNOC2L 254
In [14]:
head(pData(cds))
A data.frame: 6 × 9
orig.identnCount_RNAnFeature_RNApercent.mtRNA_snn_res.0.5seurat_clustersCellTypeSize_Factornum_genes_expressed
<fct><dbl><int><dbl><fct><fct><fct><dbl><int>
AAACATACAACCAC-1pbmc3k2419 7793.017775922CD14+ Mono 1.1076350 779
AAACATTGAGCTAC-1pbmc3k490313523.793595833B 2.24503291352
AAACATTGATCAGC-1pbmc3k314711290.889736322CD14+ Mono 1.44097871129
AAACCGTGCTTCCG-1pbmc3k2639 9601.743084511Memory CD4 T1.2083708 960
AAACCGTGTATGCG-1pbmc3k 980 5211.224489866NK 0.4487318 521
AAACGCACTGGTAC-1pbmc3k2163 7811.664355122CD14+ Mono 0.9904153 781

Constructing Single Cell Trajectories

Step 1: choosing genes that define progress

In [15]:
des <- FindAllMarkers(project)
Calculating cluster Naive CD4 T

Calculating cluster Memory CD4 T

Calculating cluster CD14+ Mono

Calculating cluster B

Calculating cluster CD8 T

Calculating cluster FCGR3A+ Mono

Calculating cluster NK

Calculating cluster DC

Calculating cluster Platelet

In [16]:
head(des)
A data.frame: 6 × 7
p_valavg_log2FCpct.1pct.2p_val_adjclustergene
<dbl><dbl><dbl><dbl><dbl><fct><chr>
RPS121.273332e-143 0.73780801.0000.9911.746248e-139Naive CD4 TRPS12
RPS66.817653e-143 0.69280271.0000.9959.349729e-139Naive CD4 TRPS6
RPS274.661810e-141 0.73633210.9990.9926.393206e-137Naive CD4 TRPS27
RPL328.158412e-138 0.62589420.9990.9951.118845e-133Naive CD4 TRPL32
RPS145.177478e-130 0.63283491.0000.9947.100394e-126Naive CD4 TRPS14
CYBA8.340652e-128-1.76596020.6590.9131.143837e-123Naive CD4 TCYBA
In [17]:
diff <- subset(des[grep("^RP[L|S]",des$gene, ignore.case = FALSE,invert=TRUE),],subset=avg_log2FC>0.25 & pct.1 > 0.25 & (pct.1 > pct.2) & p_val < 0.05)
In [18]:
dim(diff)
  1. 3110
  2. 7
In [19]:
head(diff)
A data.frame: 6 × 7
p_valavg_log2FCpct.1pct.2p_val_adjclustergene
<dbl><dbl><dbl><dbl><dbl><fct><chr>
LDHB3.746131e-1121.19262890.9120.5925.137444e-108Naive CD4 TLDHB
EEF1A1 7.956185e-980.53512210.9940.991 1.091111e-93Naive CD4 TEEF1A1
MALAT1 1.266916e-910.64957511.0000.999 1.737448e-87Naive CD4 TMALAT1
CCR7 9.571984e-882.21446460.4470.108 1.312702e-83Naive CD4 TCCR7
TPT1 1.801906e-850.60990040.9970.982 2.471133e-81Naive CD4 TTPT1
CD3D 1.154695e-761.04954500.8450.406 1.583548e-72Naive CD4 TCD3D
In [20]:
write.table(diff,file="data/PbmcRDS/monocle_DEG.txt",col.names = T,row.names = F,sep="\t",quote = F)
In [21]:
ordergene <- as.vector(unique(diff$gene))
cds <- setOrderingFilter(cds,ordergene)
In [22]:
cds
CellDataSet (storageMode: environment)
assayData: 13714 features, 2638 samples 
  element names: exprs 
protocolData: none
phenoData
  sampleNames: AAACATACAACCAC-1 AAACATTGAGCTAC-1 ... TTTGCATGCCTCAC-1
    (2638 total)
  varLabels: orig.ident nCount_RNA ... num_genes_expressed (9 total)
  varMetadata: labelDescription
featureData
  featureNames: AL627309.1 AP006222.2 ... SRSF10.1 (13714 total)
  fvarLabels: gene_short_name num_cells_expressed use_for_ordering
  fvarMetadata: labelDescription
experimentData: use 'experimentData(object)'
Annotation:  
In [23]:
plot_ordering_genes(cds)
Warning message:
“Transformation introduced infinite values in continuous y-axis”
Warning message:
“Transformation introduced infinite values in continuous y-axis”

Step 2: reducing the dimensionality of the data

In [24]:
cds <- reduceDimension(cds, max_components = 2, method = 'DDRTree')
Found more than one class "dist" in cache; using the first, from namespace 'spam'

Also defined by ‘BiocGenerics’

Found more than one class "dist" in cache; using the first, from namespace 'spam'

Also defined by ‘BiocGenerics’

step 3: order cells along the trajectory

In [25]:
cds = orderCells(cds)
Warning message in graph.dfs(dp_mst, root = root_cell, neimode = "all", unreachable = FALSE, :
“Argument `neimode' is deprecated; use `mode' instead”
Warning message in graph.dfs(dp_mst, root = root_cell, neimode = "all", unreachable = FALSE, :
“Argument `neimode' is deprecated; use `mode' instead”
In [62]:
#choose the root
#cds <- orderCells(cds,root_state=?)

Visualization

By Pseudotime

In [26]:
plot_cell_trajectory(cds,color_by="Pseudotime", size=1,show_backbone=TRUE)

By cell type

In [27]:
plot_cell_trajectory(cds,color_by="CellType", size=1,show_backbone=TRUE)

By cell State

In [28]:
plot_cell_trajectory(cds, color_by = "State",size=1,show_backbone=TRUE)
In [29]:
plot_cell_trajectory(cds, color_by = "CellType") + facet_wrap("~State", nrow = 1)

Displaying specific genes

In [30]:
keygenes <- head(ordergene,5)
plot_genes_in_pseudotime(cds[keygenes],color_by = "State")
In [31]:
plot_genes_in_pseudotime(cds[keygenes],color_by = "CellType")

Finding trajectory-associated genes

In [32]:
#fullModelFormulaStr = "~sm.ns(Pseudotime)"  "Pseudotime","CellType","Cluster"
In [33]:
head(pData(cds))
A data.frame: 6 × 11
orig.identnCount_RNAnFeature_RNApercent.mtRNA_snn_res.0.5seurat_clustersCellTypeSize_Factornum_genes_expressedPseudotimeState
<fct><dbl><int><dbl><fct><fct><fct><dbl><int><dbl><fct>
AAACATACAACCAC-1pbmc3k2419 7793.017775922CD14+ Mono 1.1076350 77917.75859113
AAACATTGAGCTAC-1pbmc3k490313523.793595833B 2.2450329135211.61878691
AAACATTGATCAGC-1pbmc3k314711290.889736322CD14+ Mono 1.4409787112913.76633385
AAACCGTGCTTCCG-1pbmc3k2639 9601.743084511Memory CD4 T1.2083708 960 0.69597161
AAACCGTGTATGCG-1pbmc3k 980 5211.224489866NK 0.4487318 52117.20850375
AAACGCACTGGTAC-1pbmc3k2163 7811.664355122CD14+ Mono 0.9904153 78115.12647942
In [34]:
Deg_Pseudotime <- differentialGeneTest(cds[ordergene,],cores = 1,fullModelFormulaStr = "~ Pseudotime")
In [35]:
Deg_Pseudotime <- Deg_Pseudotime[order(Deg_Pseudotime$qval),]
In [36]:
head(Deg_Pseudotime)
A data.frame: 6 × 7
statusfamilypvalqvalgene_short_namenum_cells_expresseduse_for_ordering
<chr><chr><dbl><dbl><chr><int><lgl>
S100A9OKnegbinomial.size4.717768e-2888.751459e-285S100A9 943TRUE
S100A8OKnegbinomial.size4.938210e-2714.580190e-268S100A8 730TRUE
LYZOKnegbinomial.size4.503770e-2112.784831e-208LYZ 1595TRUE
CST3OKnegbinomial.size1.144560e-2045.307895e-202CST3 1050TRUE
LGALS2OKnegbinomial.size6.539955e-1932.426323e-190LGALS2 564TRUE
FCN1OKnegbinomial.size1.544015e-1744.773578e-172FCN1 782TRUE
In [37]:
write.table(Deg_Pseudotime, file = "data/PbmcRDS/Diff_pseudotime_heatmap.txt", sep = "\t",row.names = F)
In [38]:
# show single gene
pData(cds)$S100A9 =log2(exprs(cds)['S100A9',]+1)
plot_cell_trajectory(cds,color_by = "S100A9")
In [39]:
# heatmap
top50 = (Deg_Pseudotime$gene_short_name)[1:50]
plot_pseudotime_heatmap(cds[top50,],num_clusters = 4,cores = 1,show_rownames = T)
In [40]:
plot_genes_in_pseudotime(cds[top50[1:4],],nrow= 2,ncol = 2)

Branched expression analysis modeling (BEAM)

In [41]:
BEAM_res <- BEAM(cds, branch_point = 1, cores = 1)
In [42]:
BEAM_res <- BEAM_res[order(BEAM_res$qval),]
In [43]:
head(BEAM_res)
A data.frame: 6 × 7
statusfamilypvalqvalgene_short_namenum_cells_expresseduse_for_ordering
<chr><chr><dbl><dbl><chr><int><lgl>
GZMBOKnegbinomial.size3.593340e-1644.857117e-160GZMB 318TRUE
PPBPOKnegbinomial.size3.188004e-1572.154613e-153PPBP 76TRUE
GNLYOKnegbinomial.size9.031612e-1294.069343e-125GNLY 475TRUE
NKG7OKnegbinomial.size3.622942e-1091.224283e-105NKG7 788TRUE
PF4OKnegbinomial.size1.198009e-101 3.238697e-98PF4 41TRUE
FGFBP2OKnegbinomial.size 4.806126e-94 1.082740e-90FGFBP2290TRUE
In [46]:
plot_genes_branched_heatmap(cds[row.names(BEAM_res[1:50,]),],branch_point = 1,
                            num_clusters =4,cores = 1,use_gene_short_name = T,
                            show_rownames = T, return_heatmap = F)
In [63]:
write.table(BEAM_res,file="data/PbmcRDS/branch1_genes_analysis.xls",
            sep="\t",quote=FALSE,col.names=TRUE,row.names=FALSE)
In [64]:
save(list=c("cds","BEAM_res"),file = "data/PbmcRDS/monocle_PbmcRDS.RData")
In [ ]: