7: Working with Squidpy

Demonstrate the joint usage of spatialcells and squidpy to analyze 4i dataset by Gut et al, 2018

@author: Guihong Wan and Boshen Yan
@date: Feb 15 2024
@last updated: Feb 15 2024

Installing the environment

[1]:
%%bash

# conda env create --name spatialcells_env --file=conda.yaml
# conda activate spatialcells_env
# pip install scimap
# conda install -c conda-forge squidpy
[2]:
import numpy as np
import matplotlib.pyplot as plt
import squidpy as sq

import spatialcells as spc

Squidpy analysis

Original tutorial analysis for 4i from squidpy: link

[3]:
adata = sq.datasets.four_i()
100%|██████████| 173M/173M [00:08<00:00, 22.4MB/s]
/Users/ghwan/opt/anaconda3/envs/spatialcells_env/lib/python3.10/site-packages/anndata/__init__.py:51: FutureWarning: `anndata.read` is deprecated, use `anndata.read_h5ad` instead. `ad.read` will be removed in mid 2024.
  warnings.warn(
/Users/ghwan/opt/anaconda3/envs/spatialcells_env/lib/python3.10/site-packages/anndata/_core/anndata.py:1908: UserWarning: Variable names are not unique. To make them unique, call `.var_names_make_unique`.
  utils.warn_names_duplicates("var")
[4]:
sq.pl.spatial_scatter(
    adata, color="cluster", shape=None, figsize=(10, 10)
)
WARNING: Please specify a valid `library_id` or set it permanently in `adata.uns['spatial']`
/Users/ghwan/opt/anaconda3/envs/spatialcells_env/lib/python3.10/site-packages/squidpy/pl/_spatial_utils.py:955: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap', 'norm' will be ignored
  _cax = scatter(
../_images/tutorial_7_tutorial_squidpy_joint_analysis_6_2.png
[5]:
# Build spatial neighbors graph
sq.gr.spatial_neighbors(adata, coord_type="generic")
sq.gr.nhood_enrichment(adata, cluster_key="cluster")
sq.pl.nhood_enrichment(adata, cluster_key="cluster", method="ward", vmin=-100, vmax=100)
/Users/ghwan/opt/anaconda3/envs/spatialcells_env/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
100%|██████████| 1000/1000 [00:22<00:00, 45.12/s]
/Users/ghwan/opt/anaconda3/envs/spatialcells_env/lib/python3.10/site-packages/anndata/_core/anndata.py:522: FutureWarning: The dtype argument is deprecated and will be removed in late 2024.
  warnings.warn(
/Users/ghwan/opt/anaconda3/envs/spatialcells_env/lib/python3.10/site-packages/squidpy/pl/_utils.py:555: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  row_labels = adata.obs[key][row_order]
../_images/tutorial_7_tutorial_squidpy_joint_analysis_7_1.png

The squidpy tutorial allows calculating the neighborhood enrichment score for the entire sample. However, we may only be interested in a region of the image, for example, only around the three cells on the left. Spatialcells simplifies the region annotation, while providing convenient methods for region-based quantification.

[6]:
adata.obs["X_centroid"] = adata.obsm["spatial"][:, 0]
adata.obs["Y_centroid"] = adata.obsm["spatial"][:, 1]
adata.obs
[6]:
x y is_nucleous is_cytplasm cell_id leiden cluster X_centroid Y_centroid
908860 1777 1201 False True 140 1 Cell_periphery_1 1777 1201
682735 1447 982 True False 118 5 Nucleus 1447 982
806699 2089 1104 False True 136 7 Endosomes_golgi_1 2089 1104
790375 1306 1089 True False 127 0 Nucleolus 1306 1089
835641 1902 1133 False True 142 4 ER_mitochondria_2 1902 1133
... ... ... ... ... ... ... ... ... ...
961485 1364 1246 False True 143 1 Cell_periphery_1 1364 1246
806705 2095 1104 False True 136 7 Endosomes_golgi_1 2095 1104
603424 1386 909 False True 118 2 Cell_periphery_2 1386 909
839951 2033 1137 True False 136 0 Nucleolus 2033 1137
768582 2054 1067 True False 136 5 Nucleus 2054 1067

270876 rows × 9 columns

[7]:
marker = ["is_nucleous"]
communitycolumn = "Nucleolus"
ret = spc.spatial.getCommunities(adata, marker, eps=10, newcolumn=communitycolumn)

fig, ax = plt.subplots(figsize=(10, 4))
spc.plt.plotCommunities(
    adata, ret, communitycolumn, plot_first_n_clusters=20, s=2, fontsize=10, ax=ax
)
ax.invert_yaxis()
plt.show()
../_images/tutorial_7_tutorial_squidpy_joint_analysis_10_0.png
[8]:
# selecting cells of interest: the three cells on the left
communityIndexList = [1, 10, 11]
boundary = spc.spatial.getBoundary(
    adata, communitycolumn, communityIndexList, alpha=10
)
roi_boundary = spc.spa.getExtendedBoundary(boundary, offset=40)

markersize = 1
fig, ax = plt.subplots(figsize=(10, 4))
## all points
ax.scatter(
    *zip(*adata.obs[["X_centroid", "Y_centroid"]].to_numpy()),
    s=markersize,
    color="grey",
    alpha=0.2
)
# Points in selected commnities
xy = adata.obs[adata.obs[communitycolumn].isin(communityIndexList)][
    ["X_centroid", "Y_centroid"]
].to_numpy()
ax.scatter(xy[:, 0], xy[:, 1], s=markersize, color="r")
# Bounds of points in selected commnities
spc.plt.plotBoundary(boundary, ax=ax, label="Boundary", color="b")
spc.plt.plotBoundary(roi_boundary, ax=ax, label="ROI boundary", color="g")
ax.invert_yaxis()
plt.show()
../_images/tutorial_7_tutorial_squidpy_joint_analysis_11_0.png
[9]:
spc.spatial.assignPointsToRegions(
    adata,
    [boundary, roi_boundary],
    ["Nucleolus", "Nucleolus_ROI"],
    assigncolumn="region",
    default="Others",
)

point_size = 1
fig, ax = plt.subplots(figsize=(10, 4))
for region in sorted(set(adata.obs["region"])):
    tmp = adata.obs[adata.obs.region == region]
    ax.scatter(
        *zip(*tmp[["X_centroid", "Y_centroid"]].to_numpy()),
        s=point_size,
        alpha=0.7,
        label=region
    )
# Bounds of points in selected commnities
spc.plt.plotBoundary(boundary, ax=ax, label="Boundary", color="purple")
spc.plt.plotBoundary(roi_boundary, ax=ax, label="ROI boundary", color="r")
plt.legend(loc="upper right")
ax.invert_yaxis()
plt.show()
45636it [00:02, 19639.93it/s]
Assigned points to region: Nucleolus
40441it [00:01, 24158.65it/s]
Assigned points to region: Nucleolus_ROI
../_images/tutorial_7_tutorial_squidpy_joint_analysis_12_4.png
[10]:
roi_adata = adata[adata.obs.region != "Others"].copy()
roi_adata.obsm["spatial"] = np.array(roi_adata.obs[["X_centroid", "Y_centroid"]])

sq.gr.spatial_neighbors(roi_adata, coord_type="generic")
sq.gr.nhood_enrichment(roi_adata, cluster_key="cluster")
sq.pl.nhood_enrichment(roi_adata, cluster_key="cluster", method="ward", vmin=-100, vmax=100)
/Users/ghwan/opt/anaconda3/envs/spatialcells_env/lib/python3.10/site-packages/anndata/_core/anndata.py:1908: UserWarning: Variable names are not unique. To make them unique, call `.var_names_make_unique`.
  utils.warn_names_duplicates("var")
100%|██████████| 1000/1000 [00:08<00:00, 117.28/s]
/Users/ghwan/opt/anaconda3/envs/spatialcells_env/lib/python3.10/site-packages/anndata/_core/anndata.py:522: FutureWarning: The dtype argument is deprecated and will be removed in late 2024.
  warnings.warn(
/Users/ghwan/opt/anaconda3/envs/spatialcells_env/lib/python3.10/site-packages/squidpy/pl/_utils.py:555: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  row_labels = adata.obs[key][row_order]
../_images/tutorial_7_tutorial_squidpy_joint_analysis_13_1.png

Quantifying the protein composition within each region

[11]:
nucleolus_comp = spc.msmt.getRegionComposition(adata, "cluster", regions=["Nucleolus"])
nucleolus_comp.rename(columns={"cell_count": "protein_count"}, inplace=True)
nucleolus_comp
[11]:
cluster protein_count composition
0 Nucleolus 10883 0.540528
1 Nucleus 8035 0.399076
2 Endosomes_golgi_1 534 0.026522
3 Nuclear_envelope 316 0.015695
4 ER_mitochondria_1 306 0.015198
5 Cell_periphery_3 57 0.002831
6 Endosomes_golgi_2 2 0.000099
7 ER_mitochondria_2 1 0.000050
8 Cell_periphery_1 0 0.000000
9 Cell_periphery_2 0 0.000000
[12]:
roi_comp = spc.msmt.getRegionComposition(adata, "cluster", regions=["Nucleolus_ROI"])
roi_comp.rename(columns={"cell_count": "protein_count"}, inplace=True)
roi_comp
[12]:
cluster protein_count composition
0 ER_mitochondria_1 7690 0.256376
1 Cell_periphery_3 5687 0.189598
2 ER_mitochondria_2 4836 0.161227
3 Endosomes_golgi_1 4643 0.154792
4 Cell_periphery_1 4495 0.149858
5 Cell_periphery_2 2117 0.070578
6 Nuclear_envelope 278 0.009268
7 Nucleus 232 0.007735
8 Nucleolus 12 0.000400
9 Endosomes_golgi_2 5 0.000167
[13]:
combined_comp = spc.msmt.getRegionComposition(adata, "cluster", regions=["Nucleolus", "Nucleolus_ROI"])
combined_comp.rename(columns={"cell_count": "protein_count"}, inplace=True)
combined_comp
[13]:
cluster protein_count composition
0 Nucleolus 10895 0.217339
1 Nucleus 8267 0.164915
2 ER_mitochondria_1 7996 0.159508
3 Cell_periphery_3 5744 0.114584
4 Endosomes_golgi_1 5177 0.103274
5 ER_mitochondria_2 4837 0.096491
6 Cell_periphery_1 4495 0.089669
7 Cell_periphery_2 2117 0.042231
8 Nuclear_envelope 594 0.011849
9 Endosomes_golgi_2 7 0.000140

Comparison with the protein composition of the entire sample.

[14]:
overall_comp = spc.msmt.getRegionComposition(adata, "cluster")
overall_comp.rename(columns={"cell_count": "protein_count"}, inplace=True)
overall_comp
[14]:
cluster protein_count composition
0 Nucleolus 51802 0.191239
1 Cell_periphery_1 37885 0.139861
2 Cell_periphery_2 36226 0.133736
3 ER_mitochondria_1 33660 0.124264
4 ER_mitochondria_2 32378 0.119531
5 Nucleus 28458 0.105059
6 Cell_periphery_3 22788 0.084127
7 Endosomes_golgi_1 22402 0.082702
8 Nuclear_envelope 3674 0.013563
9 Endosomes_golgi_2 1603 0.005918
[ ]: