> ## Documentation Index
> Fetch the complete documentation index at: https://docs.priorlabs.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Preprocessing Transforms

> Configure TabPFN's internal preprocessing pipeline for maximum ensemble diversity.

TabPFN's internal preprocessing pipeline is one of the most powerful tuning levers. Each estimator in the ensemble cycles through a list of preprocessing configurations, creating diversity.

## PREPROCESS\_TRANSFORMS

Control how features are transformed before being fed to the transformer.

### Configuration Options

| Field                        | Default    | Options                                                                                                                                             |
| ---------------------------- | ---------- | --------------------------------------------------------------------------------------------------------------------------------------------------- |
| `name`                       | (required) | `"quantile_uni"`, `"quantile_uni_extrapolate"`, `"squashing_scaler_default"`, `"safepower"`, `"quantile_uni_coarse"`, `"kdi"`, `"robust"`, `"none"` |
| `categorical_name`           | `"none"`   | `"none"`, `"numeric"`, `"onehot"`, `"ordinal"`, `"ordinal_shuffled"`, `"ordinal_very_common_categories_shuffled"`                                   |
| `append_original`            | `False`    | `True`, `False`, `"auto"`                                                                                                                           |
| `max_features_per_estimator` | `500`      | int — subsamples features if above this limit                                                                                                       |
| `global_transformer_name`    | `None`     | `None`, `"svd"`, `"svd_quarter_components"`                                                                                                         |

<Tip>
  For optimal diversity, use as many different preprocessing transforms as you have estimators (default 8). Each estimator cycles through the list.
</Tip>

## Target Transforms (Regression)

For regression tasks, you can control how the target variable `y` is transformed. This is especially useful for skewed targets:

```python theme={null}
from tabpfn import TabPFNRegressor

model = TabPFNRegressor(
    inference_config={
        "REGRESSION_Y_PREPROCESS_TRANSFORMS": (
            "none",
            "safepower",
            "quantile_norm",
            "quantile_uni",
            "1_plus_log"
        ),
    },
)
```

| Transform         | When to Use                           |
| ----------------- | ------------------------------------- |
| `"none"`          | Symmetric, well-behaved targets       |
| `"safepower"`     | Skewed targets (handles negatives)    |
| `"quantile_norm"` | Heavily skewed or multi-modal targets |
| `"quantile_uni"`  | Alternative to `quantile_norm`        |
| `"1_plus_log"`    | Non-negative targets with large range |

Adding more transforms to the tuple increases ensemble diversity, which helps when the target distribution is non-trivial.

## Other Inference Settings

```python theme={null}
model = TabPFNClassifier(
    inference_config={
        "POLYNOMIAL_FEATURES": "no",       # "no", int, or "all" for O(n^2) interactions
        "FINGERPRINT_FEATURE": True,        # hash-based row identifier
        "OUTLIER_REMOVAL_STD": "auto",      # "auto" (12.0), None, or float
        "SUBSAMPLE_SAMPLES": None,          # None, int, float, or list
    },
)
```

* **`POLYNOMIAL_FEATURES`**: Generates interaction features. Can help when interactions matter but increases feature count quadratically.
* **`FINGERPRINT_FEATURE`**: Adds a hash-based row identifier. Useful by default; try disabling if you have very few features.
* **`OUTLIER_REMOVAL_STD`**: Removes extreme outliers before fitting. Lower values are more aggressive.
* **`SUBSAMPLE_SAMPLES`**: Subsample training rows for faster iteration during experimentation.
