> ## Documentation Index
> Fetch the complete documentation index at: https://docs.priorlabs.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Fine-Tuning

> Adapt TabPFN's pretrained foundation model to your data with gradient-based fine-tuning.

Fine-tuning updates TabPFN's pretrained transformer parameters using gradient descent on your dataset. This retains TabPFN's learned priors while aligning the model more closely with your target data distribution.

You can fine-tune both:

* [`FinetunedTabPFNClassifier`](https://github.com/PriorLabs/tabpfn/blob/main/src/tabpfn/finetuning/finetuned_classifier.py) — for classification tasks
* [`FinetunedTabPFNRegressor`](https://github.com/PriorLabs/tabpfn/blob/main/src/tabpfn/finetuning/finetuned_regressor.py) — for regression tasks

## How It Works

TabPFN performs in-context learning: during inference, it processes both training data and test samples in a single forward pass, using attention to identify relevant patterns. Fine-tuning adapts the transformer's weights so that the attention mechanism more accurately reflects the similarity structure of your specific data.

Concretely, after fine-tuning:

* The query representations of test samples and key representations of training samples produce dot products that better reflect their target similarity.
* This allows the fine-tuned model to more appropriately weight relevant in-context samples when making predictions.

The fine-tuning process decouples the preprocessing pipeline to generate transformed tensors that mirror the preprocessing configurations used during inference, ensuring the model optimizes on the exact same data variations it encounters when making predictions.

## When to Fine-Tune

Fine-tuning is not always necessary. TabPFN's in-context learning already adapts to your data at inference time. Fine-tuning adds value in specific scenarios:

### Good Candidates for Fine-Tuning

<CardGroup cols={2}>
  <Card title="Amortized prediction cost" icon="coins">
    You predict on the same schema repeatedly. Fine-tuning has an upfront cost but pays off across many future predictions.
  </Card>

  <Card title="Niche or specialized domains" icon="microscope">
    Your data represents a distribution not well-covered by TabPFN's pretraining priors — e.g., molecular properties, specialized sensor data, or domain-specific financial instruments.
  </Card>

  <Card title="Multiple related tables" icon="layer-group">
    You have a family of related datasets (e.g., multiple experiments, regional variants) and want to fine-tune a single model across them.
  </Card>
</CardGroup>

### When Fine-Tuning is Less Likely to Help

* On very small datasets (\< 1000 rows), the risk of overfitting often outweighs adaptation benefits.
* If baseline TabPFN is already within a few percent of your target metric, the simpler approaches in [Tips & Tricks](/improving-performance) often close the gap with less effort.
* On datasets with gradual temporal distribution shifts and many features, fine-tuning can be less stable. Make sure your train/validation split respects the time ordering.

### Decision Flowchart

<Steps>
  <Step title="Run baseline TabPFN">
    Evaluate default `TabPFNClassifier` or `TabPFNRegressor` on your task.
  </Step>

  <Step title="Try quick wins first">
    Apply [feature engineering](/improving-performance/feature-engineering), [metric tuning](/improving-performance/model-parameters), and [preprocessing tuning](/improving-performance/preprocessing) — these are faster to iterate on.
  </Step>

  <Step title="Fine-tune when plateau'd">
    If performance has plateau'd and you have sufficient data (1000+ rows), fine-tuning can push past the ceiling by adapting the model's internal representations.
  </Step>
</Steps>

## Getting Started

Fine-tuning shares the same interface as `TabPFNClassifier` and `TabPFNRegressor`.

### 1. Prepare Your Dataset

Load and split your data into train and test sets. Use a proper validation strategy: for time-dependent data, use temporal splits rather than random splits.

```python theme={null}
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)
```

### 2. Configure and Train

<CodeGroup>
  ```python Classifier theme={null}
  from tabpfn.finetuning import FinetunedTabPFNClassifier

  finetuned_clf = FinetunedTabPFNClassifier(
      device="cuda",
      epochs=30,
      learning_rate=1e-5,
  )

  finetuned_clf.fit(X_train, y_train)
  ```

  ```python Regressor theme={null}
  from tabpfn.finetuning import FinetunedTabPFNRegressor

  finetuned_reg = FinetunedTabPFNRegressor(
      device="cuda",
      epochs=30,
      learning_rate=1e-5,
  )

  finetuned_reg.fit(X_train, y_train)
  ```
</CodeGroup>

By default, fine-tuning splits off 10% of the training data for validation and uses early stopping (patience of 8 epochs). You can also provide your own validation set, which is useful for temporal data or other cases where a random split isn't appropriate:

```python theme={null}
finetuned_clf.fit(X_train, y_train, X_val=X_val, y_val=y_val)
```

### 3. Predict

<CodeGroup>
  ```python Classifier theme={null}
  y_pred = finetuned_clf.predict(X_test)
  y_pred_proba = finetuned_clf.predict_proba(X_test)
  ```

  ```python Regressor theme={null}
  y_pred = finetuned_reg.predict(X_test)
  ```
</CodeGroup>

## Hyperparameters

### Core Parameters

| Parameter       | Default  | Description                                                                             |
| --------------- | -------- | --------------------------------------------------------------------------------------- |
| `epochs`        | `30`     | Number of fine-tuning epochs. More epochs allow deeper adaptation but risk overfitting. |
| `learning_rate` | `1e-5`   | Step size for gradient updates. Lower values are safer but slower to converge.          |
| `device`        | `"cuda"` | GPU is strongly recommended. Fine-tuning on CPU is very slow.                           |

### Tuning Guidelines

**Learning rate:**

* Start with `1e-5` (the default). This is conservative and preserves pretrained knowledge.
* For larger datasets (10k+ rows), you can try `3e-5` to `1e-4` for faster convergence.
* If you see training loss spike or diverge, reduce the learning rate.

**Epochs:**

* `10–30` epochs is a good starting range for most datasets.
* For high-accuracy tasks where you're fine-tuning carefully, use more epochs (50–100) with a lower learning rate to allow gradual adaptation without destroying pretrained representations.
* Monitor validation loss to detect overfitting — stop if validation performance degrades.

<Warning>
  Fine-tuning requires GPU acceleration. While it will run on CPU, training times will be impractical for most use cases.
</Warning>

## Multi-GPU Fine-Tuning

Fine-tuning supports multi-GPU training via PyTorch DDP (Distributed Data Parallel). This is auto-detected when launched with `torchrun`:

```bash theme={null}
torchrun --nproc-per-node=4 your_finetuning_script.py
```

No code changes are needed. The DDP setup is handled internally based on the `LOCAL_RANK` environment variable that `torchrun` sets. Note that `.fit()` should only be called once per `torchrun` session.

## Best Practices

<AccordionGroup>
  <Accordion title="Always compare against baseline">
    Before fine-tuning, establish a baseline with the default `TabPFNClassifier` or `TabPFNRegressor`. Fine-tuning should measurably improve on this baseline — if it doesn't, the simpler model is preferable.
  </Accordion>

  <Accordion title="Use proper validation">
    Split a held-out validation set and monitor performance across epochs. For time-series or temporal data, use a temporal split rather than random cross-validation.
  </Accordion>

  <Accordion title="Start conservative, then adjust">
    Begin with the defaults (`epochs=30`, `learning_rate=1e-5`). Only increase aggressiveness if you see clear room for improvement without signs of overfitting.
  </Accordion>

  <Accordion title="Combine with feature engineering">
    Fine-tuning and [feature engineering](/improving-performance/feature-engineering) are complementary. Good features make fine-tuning more effective by giving the model better signal to adapt to.
  </Accordion>

  <Accordion title="Watch for overfitting on small data">
    With fewer than \~1000 rows, fine-tuning can overfit quickly. Use fewer epochs and a lower learning rate, or evaluate whether the default (non-fine-tuned) TabPFN already meets your target.
  </Accordion>
</AccordionGroup>

## Enterprise Fine-Tuning

For organizations with proprietary datasets, Prior Labs offers an enterprise fine-tuning program that includes:

* Fine-tuning on your organization's data corpus for a customized, high-performance model
* Support for fine-tuning across collections of related datasets
* Optimized training infrastructure

<Card title="Enterprise Fine-Tuning" icon="building" horizontal href="mailto:sales@priorlabs.ai">
  Learn more about fine-tuning TabPFN for your organization.
</Card>

## Related

<Card title="Improving Performance" icon="lightbulb" href="/improving-performance">
  Quick wins to try before fine-tuning.
</Card>

<Card title="GitHub Examples" icon="book" horizontal href="https://github.com/PriorLabs/TabPFN/blob/main/examples/finetune_classifier.py">
  See more examples and fine-tuning utilities in our TabPFN GitHub repository.
</Card>
