Set Up Your Own Project

Prerequisites

We provide a custom R script to export Seurat objects into a format compatible with CellAnalyst. This R script depends on several R packages. Please install the following packages before running the script.

# CRAN packages
install.packages("Seurat")
install.packages("ggplot2")
install.packages("gtools")
install.packages("dyplr")
install.packages("Matrix")
install.packages("data.table")
# Github packages
install.packages("devtools")
devtools::install_github("satijalab/seurat-wrappers")
devtools::install_github("immunogenomics/presto")
devtools::install_github("genecell/COSGR")
# Bioconductor packages
install.packages("BiocManager")
BiocManager::install("clusterProfiler")
BiocManager::install("org.Hs.eg.db") # for human
BiocManager::install("org.Mm.eg.db") # for mouse
BiocManager::install("org.Dr.eg.db") # for zebrafish

Prepare your project

With Seurat V3/4/5

The R script is compatibale with Seurat V3/4/5, you can load the seurat data and export required files directly.

# Source the ExportToCellAnalyst.R function (see manual in section 3)
source("https://bis.zju.edu.cn/cellanalyst/scripts/ExportToCellAnalyst.R")

# Load your seurat object
seuratObj <- readRDS("example_seurat.rds")

# Run the pipeline (all exports will be saved to the "./cellanalyst" folder)
results <- ExportToCellAnalyst(seuratObj, species = "human", outputDir = "./cellanalyst", 
                                downsampleSize = 2000, embedding = "umap", group.by = "celltype")

With Scanpy/AnnData

If you have a Scanpy h5ad file, please convert to Seurat object first. Package SeuratDisk and sceasy can be used to convert the data.

Option 1: use SeuratDisk

# Install the SeuratDisk package from GitHub using devtools.
devtools::install_github("mojaveazure/seurat-disk")

# Load the SeuratDisk package
library(SeuratDisk)
# Convert the AnnData file ("example_anndata.h5ad") to the Seurat file format ("h5seurat").
Convert("example_anndata.h5ad", dest = "h5seurat", overwrite = TRUE)

# Load the converted .h5seurat file into a Seurat object.
seurat_object <- LoadH5Seurat("example_anndata.h5seurat")

# save as the rds format
saveRDS(seurat_object, file = "example_seurat.rds")

Option 2: use sceasy

# Install required Bioconductor packages
BiocManager::install(c("LoomExperiment", "SingleCellExperiment"))
devtools::install_github("cellgeni/sceasy")

# Load sceasy and reticulate packages
library(sceasy)
library(reticulate)

# Convert the file "example.h5ad" (AnnData format) to the Seurat format.
sceasy::convertFormat(obj = "example.h5ad", from = "anndata", 
                      to = "seurat", outFile = "example_seurat.rds")

After convert to Seurat format RDS file, you can directory load and export files for CellAnalyst.

# Source the ExportToCellAnalyst.R function (see manual in section 3)
source("https://bis.zju.edu.cn/cellanalyst/scripts/ExportToCellAnalyst.R")

# Load your seurat object
seuratObj <- readRDS("example_seurat.rds")

# Run the pipeline (all exports will be saved to the "./cellanalyst" folder)
results <- ExportToCellAnalyst(seuratObj, species = "human", outputDir = "./cellanalyst", 
                                downsampleSize = 2000, embedding = "umap", group.by = "celltype")

ExportToCellAnalyst.R Manual

Description

This script orchestrates a complete workflow for processing a Seurat object by performing multiple steps commonly required for single-cell RNA-seq data analysis. The pipeline includes:

  • Quality Control: Calculation of standard QC metrics (e.g., number of features, counts, percent mitochondrial genes, log10GenesPerUMI).
  • Expression Calculations: Computation of mean and percent expression per cell type.
  • Marker Identification: Identification of marker genes using two independent approaches:
  • COSG-based marker detection.
  • Presto-based marker selection with significance filtering.
  • Marker Combination: Merging marker data with average and percent expression matrices.
  • Enrichment Analyses: Running Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses on significant markers.
  • Downsampling: Optionally downsampling the Seurat object to a target number of cells.
  • Exporting Results: Writing out metadata, cell embedding (e.g., UMAP coordinates), feature names, various marker outputs, and the expression matrix in a binary triplet format.

Usage

ExportToCellAnalyst(
  seuratObj, 
  species = "human", 
  outputDir = "./cellanalyst", 
  downsampleSize = 2000, 
  pAdjThreshold = 0.05, 
  avgLog2FCThreshold = 0.5, 
  embedding = "umap",
  group.by = "celltype"
)

Source code can be downloaded from https://bis.zju.edu.cn/cellanalyst/scripts/ExportToCellAnalyst.R or github.

Arguments

  • **seuratObj**
    A Seurat object containing the single-cell RNA-seq data. This object should include:
  • A metadata column corresponding to cell groupings (e.g., cell type).
  • At least one dimensionality reduction embedding (e.g., UMAP, tSNE).
  • **species**
    A character string indicating the species for the analysis. In current version, the accepted values are "human", "mouse", or "zebrafish".
    Default:"human"
  • **outputDir**
    A character string specifying the directory where all output files will be saved. The function creates the directory if it does not exist.
    Default:"./cellanalyst"
  • **downsampleSize**
    An integer indicating the target number of cells for the downsampled Seurat object. Downsampling may be useful for visualization or when dealing with very large datasets.
    Default:2000
  • **pAdjThreshold**
    A numeric value representing the adjusted p-value threshold used for marker significance filtering when using Presto.
    Default:0.05
  • **avgLog2FCThreshold**
    A numeric value representing the minimum average log2 fold-change threshold for selecting significant markers (via Presto).
    Default:0.5
  • **embedding**
    A character string specifying the reduction embedding to export (e.g., "umap").
    Note: The Seurat object must contain this embedding in its reductions slot.
    Default:"umap"
  • **group.by**
    A character string indicating the metadata column to use for grouping the cells (e.g., cell types) during the analysis. This column is used in multiple steps, including marker and expression calculations.
    Default:"celltype"

Value

The function returns a list containing the following elements:

  • **seuratObj**
    The processed (and possibly downsampled) Seurat object with updated QC metrics.
  • **cosgResults**
    A list of outputs from the COSG marker identification step. This includes:
    • The COSG marker table.
    • The top 10 genes per cell type.
    • Subset average and percent expression matrices for the top markers.
  • **prestoResults**
    A list of outputs from the Presto marker identification step. It contains:
    • The complete list of markers.
    • The subset of markers meeting the significance criteria.
  • **combinedMarkers**
    A data frame that combines significant marker data with the average and percent expression values.
  • **goResults**
    A list of results from the GO enrichment analysis. This includes:
    • Individual GO enrichment objects per cell type.
    • A merged GO results table.
    • A filtered table of significant GO terms.
  • **keggResults**
    A list of results from the KEGG enrichment analysis. This includes:
    • Individual KEGG enrichment objects per cell type.
    • A merged KEGG results table.
    • A table of significant KEGG terms.
  • **outputFiles**
    A list of full file paths to the output files generated in the specified outputDir. Expected files include:
    • meta.tsv
    • embedding.coords.tsv
    • features.tsv
    • data_sparseMatrix.mtx.bin
    • data_sparseMatrix.index.bin
    • celltype_markers_cosg.tsv
    • celltype_average_expression_cosg.tsv
    • celltype_percent_expression_cosg.tsv
    • celltype_markers_enrichment_significant.tsv
    • celltype_markers_seurat_significant_datatable.tsv

results matching ""

    No results matching ""