Skip to main content
The **Embeddings** extension enables extraction of latent feature representations (embeddings) from TabPFN models. These embeddings provide a way to analyze, visualize, or reuse the learned representations produced by TabPFN’s transformer architecture - useful for downstream tasks such as clustering, feature analysis, search, or meta-learning. The extension offers two embedding extraction modes:
  • Vanilla embeddings - trained on the full dataset
  • Cross-validated embeddings - extracted via K-fold cross-validation
The embeddings extension is not compatible with the tabpfn-client package since the client does not expose internal model representations yet.

Getting Started

Install the embeddings extension:
pip install "tabpfn-extensions[embedding]"
You can use a classifier or regressor depending on your task. The example below demonstrates the workflow with a TabPFNClassifier.

1. Initialize the Extractor

from tabpfn_extensions import TabPFNClassifier
from tabpfn_extensions.embedding import TabPFNEmbedding
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split

# Load example dataset
X, y = fetch_openml(name="kc1", version=1, as_frame=False, return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

# Initialize TabPFN model
clf = TabPFNClassifier(n_estimators=1)

# Create embedding extractor (no cross-validation)
embedding_extractor = TabPFNEmbedding(tabpfn_clf=clf, n_fold=0)

2. Extract Vanilla Embeddings

When n_fold=0, the model extracts embeddings after fitting once on the entire dataset.
# Train the model
embedding_extractor.fit(X_train, y_train)

# Extract embeddings for both training and test sets
train_embeddings = embedding_extractor.get_embeddings(X_train, y_train, X_test, data_source="train")
test_embeddings = embedding_extractor.get_embeddings(X_train, y_train, X_test, data_source="test")
This produces NumPy arrays containing dense vector representations of each sample.

3. Extract Cross-Validated Embeddings

When n_fold > 0, K-fold cross-validation is applied following the method described in “A Closer Look at TabPFN v2: Strength, Limitation, and Extension” (arXiv:2502.17361). Larger values of n_fold yield more robust embeddings.
# Enable 5-fold cross-validation
embedding_extractor = TabPFNEmbedding(tabpfn_clf=clf, n_fold=5)

# Generate embeddings via K-fold training
embeddings = embedding_extractor.get_embeddings(X_train, y_train, X_test, data_source="test")
Under the hood, this:
  1. Splits the training data into n_fold partitions
  2. Trains a new TabPFN model on each fold
  3. Extracts embeddings from the validation partition of each fold
  4. Concatenates all fold embeddings into a unified matrix

Google Colab Example

Check out our Google Colab for a demo.