Skip to main content
The Embeddings extension extracts latent feature representations (embeddings) from TabPFN models. These dense vectors capture the representations learned by TabPFN’s transformer and can be reused for downstream tasks such as clustering, search, visualization, meta-learning, or as features for a simpler model. TabPFNEmbedding is a scikit-learn style transformer with the familiar fit / fit_transform / transform API. It supports two extraction modes:
  • Out-of-fold embeddings (n_fold >= 2, recommended) — robust, leakage-free training-set embeddings extracted via K-fold cross-validation. These generalize better and give stronger downstream performance.
  • Vanilla embeddings (n_fold=0) — a single model is trained on the full dataset and used for everything; cheaper, but the training embeddings leak label information.
Embeddings require the full local tabpfn package — they expose internal model representations that the tabpfn-client cloud backend does not provide. Passing a client model raises a TypeError.

Getting Started

The embedding module ships in the base tabpfn-extensions package (no extra needed). Install it alongside the local tabpfn engine:
pip install tabpfn tabpfn-extensions

The Interface

TabPFNEmbedding follows the scikit-learn transformer contract, and the method you call depends on whose embeddings you want:
MethodUse forReturns
fit_transform(X_train, y_train)the training dataOut-of-fold embeddings when n_fold >= 2 (no label leakage); full-model embeddings when n_fold == 0
transform(X)unseen / held-out dataEmbeddings from the model trained on the full training set
transform always runs through the final full-data model. It never returns cached training embeddings, even if X happens to equal the training set — so for leakage-free training embeddings, always use fit_transform (or read the train_embeddings_ attribute after fit).
Pass a configured TabPFN model via the model= parameter. Use a classifier or regressor depending on your task — the examples below show both. With n_fold >= 2, the training data is split into K folds; a fresh model is trained on each fold and used to embed its held-out partition. The out-of-fold (OOF) embeddings are reassembled into the original sample order, and a final model is refit on the full training set to embed unseen data.
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from tabpfn import TabPFNClassifier
from tabpfn_extensions.embedding import TabPFNEmbedding

X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=42)

embedding = TabPFNEmbedding(
    n_fold=10,
    model=TabPFNClassifier(n_estimators=1, random_state=42),
)

train_embeddings = embedding.fit_transform(X_train, y_train)  # out-of-fold
test_embeddings = embedding.transform(X_test)                 # final model
Why prefer out-of-fold embeddings? Vanilla embeddings use a single model to embed the training and test rows. The problem with this approach is that the training rows contain target information, the test rows do not. This introduces the risk of information leakage. OOF embeddings break this leakage: every training point is embedded by a model that never saw it, so the training embeddings match the statistics of the held-out embeddings produced by transform. This is the robust variant introduced in “A Closer Look at TabPFN v2: Strength, Limitation, and Extension” (arXiv:2502.17361), and larger n_fold values yield more robust embeddings. In practice this lifts downstream performance: the get_embeddings.py example compares a baseline linear model, vanilla TabPFN embeddings, and K-fold embeddings on the same data — the K-fold embeddings come out ahead for both classification accuracy and regression R². Classifiers use StratifiedKFold and regressors use KFold. Set shuffle=True (with an optional random_state) to shuffle the split. n_fold=1 is invalid — use 0 for vanilla or >= 2 for cross-validation.

2. Vanilla embeddings

With n_fold=0, a single model is trained on the entire training set and reused for both training and unseen data. This is cheaper (one fit instead of K + 1) and fine when you only need embeddings for unseen data via transform, but avoid it for training-set embeddings you plan to feed into a downstream model — see the leakage caveat above.
embedding = TabPFNEmbedding(
    n_fold=0,
    model=TabPFNClassifier(n_estimators=1, random_state=42),
)

train_embeddings = embedding.fit_transform(X_train, y_train)  # training data
test_embeddings = embedding.transform(X_test)                 # unseen data
Output shape. Both fit_transform and transform return a 3D array of shape (n_estimators, n_samples, embed_dim) — one embedding matrix per ensemble member. This is not a drop-in 2D input for an sklearn Pipeline. Select a single member (embeddings[0]) or aggregate across axis=0 before passing the result to a downstream estimator.

Using Embeddings as Features

A common pattern is to use TabPFN embeddings as features for a lightweight downstream model. Because the embeddings are 3D, select an ensemble member (embeddings[0]) to get a 2D feature matrix.
from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from tabpfn import TabPFNClassifier
from tabpfn_extensions.embedding import TabPFNEmbedding

X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=42)

embedding = TabPFNEmbedding(
    n_fold=10,
    model=TabPFNClassifier(n_estimators=1, random_state=42),
)
train_embeddings = embedding.fit_transform(X_train, y_train)
test_embeddings = embedding.transform(X_test)

clf = LogisticRegression(max_iter=5000)
clf.fit(train_embeddings[0], y_train)          # pick ensemble member 0 -> 2D
y_pred = clf.predict(test_embeddings[0])
print(f"Accuracy: {accuracy_score(y_test, y_pred):.4f}")

Parameters

ParameterTypeDefaultDescription
n_foldint00 disables CV (vanilla). >= 2 enables K-fold out-of-fold embeddings. 1 is invalid.
modelTabPFNClassifier | TabPFNRegressor | NoneNonePre-configured TabPFN estimator. When None, the task is inferred from y at fit time (with a warning). Passing it explicitly is recommended.
shuffleboolFalseWhether to shuffle the K-fold split.
random_stateint | NoneNoneSeed used by the K-fold split when shuffle=True.
After fitting, two attributes are available: model_ (the fitted full-data model) and train_embeddings_ (the training-set embeddings, OOF when n_fold >= 2).
Migration. The old get_embeddings(X_train, y_train, X, data_source=...) method and the tabpfn_clf / tabpfn_reg constructor arguments are deprecated. Use model= together with fit_transform (training, OOF) and transform (unseen data) instead.

Example Script

Full runnable example for classification and regression.

Google Colab Example

Check out our Google Colab for a demo.