Overview of PyTDC Model Server

Figure 2. AI inferencing and model evaluation components. The PyTDC model server (sections 3.2 and C) streamlines retrieval, inferencing, and training setup for an array of context-aware biological foundation models and models spanning multiple modalities. A model store retrieval API provides unified access to model weights stored in the Hugging Face Model Hub, Chan-Zuckerberg CELLxGENE Census fine-tuned models, and TDC (Huang et al., 2021; 2022; Velez-Arce et al., 2024) storage. The model server also provides access to model classes, tokenizer functions, and inference endpoints supporting PyTorch (Paszke et al., 2019) and Hugging Face Transformers (Wolf et al., 2020). Extracted embeddings, from either model server inference or pre-computed embedding storage, are ready for downstream use by task-specific benchmarking modules.

We present PyTDC, a machine-learning platform providing streamlined training, evaluation, and inference software for single-cell biological foundation models to accelerate research in transfer learning method development in therapeutics. PyTDC introduces an API-first architecture that unifies heterogeneous, continuously updated data sources. The platform introduces a model server, which provides unified access to model weights across distributed repositories and standardized inference endpoints. The model server accelerates research workflows by exposing state-of-the-art, research-ready models and training setups for biomedical representation learning models across modalities. Building upon Therapeutic Data Commons, we present single-cell therapeutics tasks, datasets, and benchmarks for model development and evaluation.

Here you can find an overview of all the models available in the system.

scFMs	Single-cell Foundation Models	Single-cell foundation models (scFMs) are a new class of models that leverage the power of foundation models to analyze and interpret single-cell data. They are designed to capture the complex relationships and interactions within single-cell datasets, enabling researchers to gain deeper insights into cellular heterogeneity, cell type identification, and functional characterization.
scGPT	scGPT is a generative pre-trained transformer model specifically designed for single-cell RNA sequencing (scRNA-seq) data. It leverages the power of transformer architectures to capture the intricate relationships between genes and cells, enabling accurate cell type identification, differential expression analysis, and trajectory inference.	View Details Huggingface Model Hub
Geneformer	Geneformer is a foundational transformer model pretrained on a large-scale corpus of single cell transcriptomes to enable context-aware predictions in settings with limited data in network biology.	View Details Huggingface Model Hub
scVI	Single-cell variational inference (scVI) is a powerful tool for the probabilistic analysis of single-cell transcriptomics data. It uses deep generative models to address technical noise and batch effects, providing a robust framework for various downstream analysis tasks. To load the pre-trained model, use the Files and Versions tab files.	View Details Huggingface Model Hub