TabPFN’s internal preprocessing pipeline is one of the most powerful tuning levers. Each estimator in the ensemble cycles through a list of preprocessing configurations, creating diversity.
Control how features are transformed before being fed to the transformer.
Configuration Options
| Field | Default | Options |
|---|
name | (required) | "quantile_uni", "squashing_scaler_default", "safepower", "quantile_uni_coarse", "kdi", "robust", "none" |
categorical_name | "none" | "none", "numeric", "onehot", "ordinal", "ordinal_shuffled", "ordinal_very_common_categories_shuffled" |
append_original | False | True, False, "auto" |
max_features_per_estimator | 500 | int — subsamples features if above this limit |
global_transformer_name | None | None, "svd", "svd_quarter_components" |
For optimal diversity, use as many different preprocessing transforms as you have estimators (default 8). Each estimator cycles through the list.
For regression tasks, you can control how the target variable y is transformed. This is especially useful for skewed targets:
from tabpfn import TabPFNRegressor
model = TabPFNRegressor(
inference_config={
"REGRESSION_Y_PREPROCESS_TRANSFORMS": (
"none",
"safepower",
"quantile_norm",
"quantile_uni",
"1_plus_log"
),
},
)
| Transform | When to Use |
|---|
"none" | Symmetric, well-behaved targets |
"safepower" | Skewed targets (handles negatives) |
"quantile_norm" | Heavily skewed or multi-modal targets |
"quantile_uni" | Alternative to quantile_norm |
"1_plus_log" | Non-negative targets with large range |
Adding more transforms to the tuple increases ensemble diversity, which helps when the target distribution is non-trivial.
Other Inference Settings
model = TabPFNClassifier(
inference_config={
"POLYNOMIAL_FEATURES": "no", # "no", int, or "all" for O(n^2) interactions
"FINGERPRINT_FEATURE": True, # hash-based row identifier
"OUTLIER_REMOVAL_STD": "auto", # "auto" (12.0), None, or float
"SUBSAMPLE_SAMPLES": None, # None, int, float, or list
},
)
POLYNOMIAL_FEATURES: Generates interaction features. Can help when interactions matter but increases feature count quadratically.
FINGERPRINT_FEATURE: Adds a hash-based row identifier. Useful by default; try disabling if you have very few features.
OUTLIER_REMOVAL_STD: Removes extreme outliers before fitting. Lower values are more aggressive.
SUBSAMPLE_SAMPLES: Subsample training rows for faster iteration during experimentation.