Preprocessing Transforms

TabPFN’s internal preprocessing pipeline is one of the most powerful tuning levers. Each estimator in the ensemble cycles through a list of preprocessing configurations, creating diversity.

PREPROCESS_TRANSFORMS

Control how features are transformed before being fed to the transformer.

Configuration Options

Field	Default	Options
`name`	(required)	`"quantile_uni"`, `"quantile_uni_extrapolate"`, `"squashing_scaler_default"`, `"safepower"`, `"quantile_uni_coarse"`, `"kdi"`, `"robust"`, `"none"`
`categorical_name`	`"none"`	`"none"`, `"numeric"`, `"onehot"`, `"ordinal"`, `"ordinal_shuffled"`, `"ordinal_very_common_categories_shuffled"`
`append_original`	`False`	`True`, `False`, `"auto"`
`max_features_per_estimator`	`500`	int — subsamples features if above this limit
`global_transformer_name`	`None`	`None`, `"svd"`, `"svd_quarter_components"`

For optimal diversity, use as many different preprocessing transforms as you have estimators (default 8). Each estimator cycles through the list.

Target Transforms (Regression)

For regression tasks, you can control how the target variable y is transformed. This is especially useful for skewed targets:

from tabpfn import TabPFNRegressor

model = TabPFNRegressor(
    inference_config={
        "REGRESSION_Y_PREPROCESS_TRANSFORMS": (
            "none",
            "safepower",
            "quantile_norm",
            "quantile_uni",
            "1_plus_log"
        ),
    },
)

Transform	When to Use
`"none"`	Symmetric, well-behaved targets
`"safepower"`	Skewed targets (handles negatives)
`"quantile_norm"`	Heavily skewed or multi-modal targets
`"quantile_uni"`	Alternative to `quantile_norm`
`"1_plus_log"`	Non-negative targets with large range

Adding more transforms to the tuple increases ensemble diversity, which helps when the target distribution is non-trivial.

Other Inference Settings

model = TabPFNClassifier(
    inference_config={
        "POLYNOMIAL_FEATURES": "no",       # "no", int, or "all" for O(n^2) interactions
        "FINGERPRINT_FEATURE": True,        # hash-based row identifier
        "OUTLIER_REMOVAL_STD": "auto",      # "auto" (12.0), None, or float
        "SUBSAMPLE_SAMPLES": None,          # None, int, float, or list
    },
)

POLYNOMIAL_FEATURES: Generates interaction features. Can help when interactions matter but increases feature count quadratically.
FINGERPRINT_FEATURE: Adds a hash-based row identifier. Useful by default; try disabling if you have very few features.
OUTLIER_REMOVAL_STD: Removes extreme outliers before fitting. Lower values are more aggressive.
SUBSAMPLE_SAMPLES: Subsample training rows for faster iteration during experimentation.

​PREPROCESS_TRANSFORMS

​Configuration Options

​Target Transforms (Regression)

​Other Inference Settings

PREPROCESS_TRANSFORMS

Configuration Options

Target Transforms (Regression)

Other Inference Settings