1. What is the meaning of each column title in the data table?
p_val :   the probability value is the probability for a given statistical model. The smaller the p-value, the higher the significance
avg_diff :   log fold-chage of the average expression between the two groups. Positive values indicate that the gene is more highly expressed in the first group.
pct.1 :   The percentage of cells where the gene is detected in current group.
pct.2 :   The percentage of cells where the gene is detected in other groups.
Annotation :   Defined cell type.
gene:   gene symbol
2. How did the ECL do the cell type annotation?
We read as many papers as possible to correspond cluster specific genes to known cell types. If a cell type is not described before we call it XXX_high cells based on the most specific marker XXX. However the annotation might not be always accurate. We really appreciate your help on correcting annotations on the ECL.
3. Why the BAM files on the GEO do not contain cell barcode and molecular index?
Because of the huge amount of data, the raw data we uploaded to GEO is the QC filtered and trimmed bam file, the cellcode is tagged with XC and the UMI is tagged with XM. The GEO change the storage format to sra file. The barcodes should be included as a spot-group instead of as a custom tag. We will provide a local server for puting the original BAM files soon.
4. How accurate is the cell number ratio in different tissue?
The ratio might be affected by cell digestion method as well as gene expression profiling method. The cell number ratio identified by Microwell-seq might be different from the ratio identified by other methods such as FACS.
5. How to find the corresponding dataset and code in this work?