SPOTlight vignette

The goal of SPOTlight is to provide a tool that enables the deconvolution of cell types and cell type proportions present within each capture locations comprising mixtures of cells, originally developed for 10X’s Visium - spatial trancsiptomics- technology, it can be used for all technologies returning mixtures of cells. SPOTlight is based on finding topic profile signatures, by means of an NMFreg model, for each cell type and finding which combination fits best the spot we want to deconvolute.

Graphical abstract

0.1 Installation

You can install the latest stable version from the GitHub repository SPOTlight with:

# install.packages("devtools")
devtools::install_github("https://github.com/MarcElosua/SPOTlight")
devtools::install_git("https://github.com/MarcElosua/SPOTlight")

Or the latest version in development by downloading the devel branch

devtools::install_github("https://github.com/MarcElosua/SPOTlight", ref = "devel")
devtools::install_git("https://github.com/MarcElosua/SPOTlight", ref = "devel")

0.2 Libraries

library(Matrix)
library(data.table)
library(Seurat)
library(SeuratData)
library(dplyr)
library(gt)
library(SPOTlight)
library(igraph)
library(RColorBrewer)

0.3 Load data

For the purpose of this tutorial we are going to use adult mouse brain data. The scRNAseq data can be downloaded here while the spatial data is the one put out publicly by 10X and the processed object can be downloaded using SeuratData as shown below.

Load single-cell reference dataset.

path_to_data <- system.file(package = "SPOTlight")
cortex_sc <- readRDS(glue::glue("{path_to_data}/allen_cortex_dwn.rds"))

Load Spatial data

if (! "stxBrain" %in% SeuratData::AvailableData()[, "Dataset"]) {
  # If dataset not downloaded proceed to download it
  SeuratData::InstallData("stxBrain")
}

# Load data
anterior <- SeuratData::LoadData("stxBrain", type = "anterior1")

0.4 Pre-processing

set.seed(123)
cortex_sc <- Seurat::SCTransform(cortex_sc, verbose = FALSE) %>%
  Seurat::RunPCA(., verbose = FALSE) %>%
  Seurat::RunUMAP(., dims = 1:30, verbose = FALSE)

Visualize the clustering

Seurat::DimPlot(cortex_sc,
                group.by = "subclass",
                label = TRUE) + Seurat::NoLegend()

0.5 Descriptive

cortex_sc@meta.data %>%
  dplyr::count(subclass) %>%
  gt::gt(.[-1, ]) %>%
  gt::tab_header(
    title = "Cell types present in the reference dataset",
  ) %>%
  gt::cols_label(
    subclass = gt::html("Cell Type")
  )

Cell types present in the reference dataset

Cell Type	n
Astro	70
CR	7
Endo	70
L2/3 IT	70
L4	70
L5 IT	70
L5 PT	70
L6 CT	70
L6 IT	70
L6b	70
Lamp5	70
Macrophage	51
Meis2	45
NP	70
Oligo	70
Peri	32
Pvalb	70
Serpinf1	27
SMC	55
Sncg	70
Sst	70
Vip	70
VLMC	67

0.6 Compute marker genes

To determine the most important marker genes we can use the function Seurat::FindAllMarkers which will return the markers for each cluster.

Seurat::Idents(object = cortex_sc) <- cortex_sc@meta.data$subclass
cluster_markers_all <- Seurat::FindAllMarkers(object = cortex_sc, 
                                              assay = "SCT",
                                              slot = "data",
                                              verbose = TRUE, 
                                              only.pos = TRUE)

saveRDS(object = cluster_markers_all,
        file = here::here("inst/markers_sc.RDS"))

0.6.1 SPOTlight Decomposition

set.seed(123)

spotlight_ls <- spotlight_deconvolution(
  se_sc = cortex_sc,
  counts_spatial = anterior@assays$Spatial@counts,
  clust_vr = "subclass", # Variable in sc_seu containing the cell-type annotation
  cluster_markers = cluster_markers_all, # Dataframe with the marker genes
  cl_n = 100, # number of cells per cell type to use
  hvg = 3000, # Number of HVG to use
  ntop = NULL, # How many of the marker genes to use (by default all)
  transf = "uv", # Perform unit-variance scaling per cell and spot prior to factorzation and NLS
  method = "nsNMF", # Factorization method
  min_cont = 0 # Remove those cells contributing to a spot below a certain threshold 
  )

saveRDS(object = spotlight_ls, file = here::here("inst/spotlight_ls.rds"))

Read RDS object

spotlight_ls <- readRDS(file = here::here("inst/spotlight_ls.rds"))

nmf_mod <- spotlight_ls[[1]]
decon_mtrx <- spotlight_ls[[2]]

0.6.2 Assess deconvolution

Before even looking at the decomposed spots we can gain insight on how well the model performed by looking at the topic profiles for the cell types.

The first thing we can do is look at how specific the topic profiles are for each cell type.

h <- NMF::coef(nmf_mod[[1]])
rownames(h) <- paste("Topic", 1:nrow(h), sep = "_")
topic_profile_plts <- SPOTlight::dot_plot_profiles_fun(
  h = h,
  train_cell_clust = nmf_mod[[2]])

topic_profile_plts[[2]] + ggplot2::theme(
  axis.text.x = ggplot2::element_text(angle = 90), 
  axis.text = ggplot2::element_text(size = 12))

Next we can take a look at the how the individual topic profiles of each cell within each cell-type behave.
Here we expect that all the cells from the same cell type show a similar topic profile distribution, if not there might be a bit more substructure in that cluster and we may only be capturing one or the other.

topic_profile_plts[[1]] + theme(axis.text.x = element_text(angle = 90), 
                                axis.text = element_text(size = 12))

Lastly we can take a look at which genes are the most important for each topic and therefore get an insight into which genes are driving them.

basis_spotlight <- data.frame(NMF::basis(nmf_mod[[1]]))

colnames(basis_spotlight) <- unique(stringr::str_wrap(nmf_mod[[2]], width = 30))

basis_spotlight %>%
  dplyr::arrange(desc(Astro)) %>%
  round(., 5) %>% 
  DT::datatable(., filter = "top")

0.7 Visualization

Join decomposition with metadata

# This is the equivalent to setting min_cont to 0.04
decon_mtrx_sub <- decon_mtrx[, colnames(decon_mtrx) != "res_ss"]
decon_mtrx_sub[decon_mtrx_sub < 0.08] <- 0
decon_mtrx <- cbind(decon_mtrx_sub, "res_ss" = decon_mtrx[, "res_ss"])
rownames(decon_mtrx) <- colnames(anterior)

decon_df <- decon_mtrx %>%
  data.frame() %>%
  tibble::rownames_to_column("barcodes")

anterior@meta.data <- anterior@meta.data %>%
  tibble::rownames_to_column("barcodes") %>%
  dplyr::left_join(decon_df, by = "barcodes") %>%
  tibble::column_to_rownames("barcodes")

0.7.1 Specific cell-types

we can use the standard Seurat::SpatialFeaturePlot to view predicted celltype proportions one at a time.

Seurat::SpatialFeaturePlot(
  object = anterior,
  features = c("L2.3.IT", "L6b", "Meis2", "Oligo"),
  alpha = c(0.1, 1))

0.7.2 Spatial scatterpies

Plot spot composition of all the spots.

cell_types_all <- colnames(decon_mtrx)[which(colnames(decon_mtrx) != "res_ss")]

SPOTlight::spatial_scatterpie(se_obj = anterior,
                              cell_types_all = cell_types_all,
                              img_path = "sample_data/spatial/tissue_lowres_image.png",
                              pie_scale = 0.4)

Plot spot composition of spots containing cell-types of interest

SPOTlight::spatial_scatterpie(se_obj = anterior,
                              cell_types_all = cell_types_all,
                              img_path = "sample_data/spatial/tissue_lowres_image.png",
                              cell_types_interest = "L6b",
                              pie_scale = 0.8)

0.7.3 Spatial interaction graph

Now that we know which cell types are found within each spot we can make a graph representing spatial interactions where cell types will have stronger edges between them the more often we find them within the same spot. To do this we will only need to run the function get_spatial_interaction_graph, this function prints the plot and returns the elements needed to plot it.

graph_ntw <- SPOTlight::get_spatial_interaction_graph(decon_mtrx = decon_mtrx[, cell_types_all])

If you want to tune how the graph looks you can do the following or you can check out more options here:

deg <- degree(graph_ntw, mode="all")

# Get color palette for difusion
edge_importance <- E(graph_ntw)$importance

# Select a continuous palette
qual_col_pals <- brewer.pal.info[brewer.pal.info$category == 'seq',]

# Create a color palette
getPalette <- colorRampPalette(brewer.pal(9, "YlOrRd"))

# Get how many values we need
grad_edge <- seq(0, max(edge_importance), 0.1)

# Generate extended gradient palette dataframe
graph_col_df <- data.frame(value = as.character(grad_edge),
                           color = getPalette(length(grad_edge)),
                           stringsAsFactors = FALSE)
# Assign color to each edge
color_edge <- data.frame(value = as.character(round(edge_importance, 1)), stringsAsFactors = FALSE) %>%
  dplyr::left_join(graph_col_df, by = "value") %>%
  dplyr::pull(color)

# Open a pdf file
plot(graph_ntw,
     # Size of the edge
     edge.width = edge_importance,
     edge.color = color_edge,
     # Size of the buble
     vertex.size = deg,
     vertex.color = "#cde394",
     vertex.frame.color = "white",
     vertex.label.color = "black",
     vertex.label.family = "Ubuntu", # Font family of the label (e.g.“Times”, “Helvetica”)
     layout = layout.circle)

Lastly one can compute cell-cell correlations to see groups of cells that correlate positively or negatively.

# Remove cell types not predicted to be on the tissue
decon_mtrx_sub <- decon_mtrx[, cell_types_all]
decon_mtrx_sub <- decon_mtrx_sub[, colSums(decon_mtrx_sub) > 0]

# Compute correlation
decon_cor <- cor(decon_mtrx_sub)

# Compute correlation P-value
p.mat <- corrplot::cor.mtest(mat = decon_mtrx_sub, conf.level = 0.95)

# Visualize
ggcorrplot::ggcorrplot(
  corr = decon_cor,
  p.mat = p.mat[[1]],
  hc.order = TRUE,
  type = "full",
  insig = "blank",
  lab = TRUE,
  outline.col = "lightgrey",
  method = "square",
  # colors = c("#4477AA", "white", "#BB4444"))
  colors = c("#6D9EC1", "white", "#E46726"),
  title = "Predicted cell-cell proportion correlation",
  legend.title = "Correlation\n(Pearson)") +
  ggplot2::theme(
    plot.title = ggplot2::element_text(size = 22, hjust = 0.5, face = "bold"),
    legend.text = ggplot2::element_text(size = 12),
    legend.title = ggplot2::element_text(size = 15),
    axis.text.x = ggplot2::element_text(angle = 90),
    axis.text = ggplot2::element_text(size = 18, vjust = 0.5))

0.8 Step-by-Step insight

Here we are going to show step by step what is going on and all the different steps involved in the process.

SPOTlight scheme

0.8.1 Downsample data

If the dataset is very large we want to downsample it, both in terms of number of cells and number of genes, to train the model. To do this downsampling we want to keep a representative amount of cells per cluster and the most important genes. We show that this downsampling doesn’t affect the performance of the model and greatly speeds up the model training.

# Downsample scRNAseq to select gene set and number of cells to train the model
se_sc_down <- downsample_se_obj(se_obj = cortex_sc,
                                clust_vr = "subclass",
                                cluster_markers = cluster_markers_all,
                                cl_n = 100,
                                hvg = 3000)

0.8.2 Train NMF model

Once we have the data ready to pass to the model we can train it as shown below.

start_time <- Sys.time()
nmf_mod_ls <- train_nmf(cluster_markers = cluster_markers_all, 
                        se_sc = se_sc_down, 
                        mtrx_spatial = anterior@assays$Spatial@counts,
                        clust_vr = "subclass",
                        ntop = NULL,
                        hvg = 3000,
                        transf = "uv",
                        method = "nsNMF")

nmf_mod <- nmf_mod_ls[[1]]

Extract matrices form the model:

# get basis matrix W
w <- basis(nmf_mod)
dim(w)

# get coefficient matrix H
h <- coef(nmf_mod)
dim(h)

Look at cell-type specific topic profile

rownames(h) <- paste("Topic", 1:nrow(h), sep = "_")
topic_profile_plts <- dot_plot_profiles_fun(
  h = h,
  train_cell_clust = nmf_mod_ls[[2]]
  )

topic_profile_plts[[2]] + theme(axis.text.x = element_text(angle = 90))

0.8.3 Spot Deconvolution

# Extract count matrix
spot_counts <- anterior@assays$Spatial@counts

# Subset to genes used to train the model
spot_counts <- spot_counts[rownames(spot_counts) %in% rownames(w), ]

Run spots through the basis to get the pertinent coefficients. To do this for every spot we are going to set up a system of linear equations where we need to find the coefficient, we will use non-negative least squares to determine the best coefficient fit.

ct_topic_profiles <- topic_profile_per_cluster_nmf(h = h,
                              train_cell_clust = nmf_mod_ls[[2]])

decon_mtrx <- mixture_deconvolution_nmf(nmf_mod = nmf_mod,
                          mixture_transcriptome = spot_counts,
                          transf = "uv",
                          reference_profiles = ct_topic_profiles, 
                          min_cont = 0.09)

0.9 Session Info

sessionInfo()

## R version 4.0.3 (2020-10-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.5 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=es_ES.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=es_ES.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=es_ES.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] grid      parallel  stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] jpeg_0.1-8.1              png_0.1-7                 imager_0.42.3             magrittr_2.0.1            cowplot_1.1.1             ggplot2_3.3.3             tidyr_1.1.2               tibble_3.0.6              stringr_1.4.0             Biobase_2.50.0            BiocGenerics_0.36.0       RColorBrewer_1.1-2        igraph_1.2.6              SPOTlight_0.1.4           gt_0.2.2                  dplyr_1.0.4               stxBrain.SeuratData_0.1.1 SeuratData_0.2.1          SeuratObject_4.0.0        Seurat_4.0.0              data.table_1.13.6         Matrix_1.3-2             
## 
## loaded via a namespace (and not attached):
##   [1] backports_1.2.1      corrplot_0.84        NMF_0.23.0           plyr_1.8.6           lazyeval_0.2.2       splines_4.0.3        gmp_0.6-2            crosstalk_1.1.1      listenv_0.8.0        scattermore_0.7      gridBase_0.4-7       digest_0.6.27        foreach_1.5.1        htmltools_0.5.1.1    tiff_0.1-6           checkmate_2.0.0      tensor_1.5           cluster_2.1.0        doParallel_1.0.16    ROCR_1.0-11          globals_0.14.0       matrixStats_0.58.0   colorspace_2.0-0     rappdirs_0.3.3       ggrepel_0.9.1        xfun_0.20            crayon_1.4.0         jsonlite_1.7.2       scatterpie_0.1.5     spatstat_1.64-1      spatstat.data_1.7-0  survival_3.2-7       zoo_1.8-8            iterators_1.0.13     glue_1.4.2           polyclip_1.10-0      registry_0.5-1       gtable_0.3.0         leiden_0.3.7         future.apply_1.7.0   abind_1.4-5          scales_1.1.1         DBI_1.1.1            rngtools_1.5         miniUI_0.1.1.1       Rcpp_1.0.6           viridisLite_0.3.0    xtable_1.8-4         reticulate_1.18      DT_0.17              htmlwidgets_1.5.3    httr_1.4.2           ellipsis_0.3.1       ica_1.0-2            pkgconfig_2.0.3      farver_2.0.3        
##  [57] sass_0.3.1           uwot_0.1.10          deldir_0.2-9         here_1.0.1           ggcorrplot_0.1.3     tidyselect_1.1.0     labeling_0.4.2       rlang_0.4.10         reshape2_1.4.4       later_1.1.0.1        munsell_0.5.0        tools_4.0.3          cli_2.3.0            generics_0.1.0       ggridges_0.5.3       evaluate_0.14        fastmap_1.1.0        yaml_2.2.1           goftest_1.2-2        knitr_1.31           fitdistrplus_1.1-3   arrangements_1.1.9   purrr_0.3.4          RANN_2.6.1           readbitmap_0.1.5     pbapply_1.4-3        future_1.21.0        nlme_3.1-151         mime_0.9             compiler_4.0.3       rstudioapi_0.13      plotly_4.9.3         spatstat.utils_2.0-0 tweenr_1.0.1         stringi_1.5.3        highr_0.8            RSpectra_0.16-0      lattice_0.20-41      vctrs_0.3.6          pillar_1.4.7         lifecycle_0.2.0      BiocManager_1.30.10  lmtest_0.9-38        RcppAnnoy_0.0.18     irlba_2.3.3          httpuv_1.5.5         patchwork_1.1.1      R6_2.5.0             promises_1.1.1       KernSmooth_2.23-18   gridExtra_2.3        bmp_0.3              parallelly_1.23.0    codetools_0.2-18     MASS_7.3-53          assertthat_0.2.1    
## [113] pkgmaker_0.32.2.900  rprojroot_2.0.2      withr_2.4.1          sctransform_0.3.2    mgcv_1.8-33          rpart_4.1-15         rvcheck_0.1.8        rmarkdown_2.6        Rtsne_0.15           ggforce_0.3.2        shiny_1.6.0