Fine-tuning updates TabPFN’s pretrained transformer parameters using gradient descent on your dataset. This retains TabPFN’s learned priors while aligning the model more closely with your target data distribution. You can fine-tune both:Documentation Index
Fetch the complete documentation index at: https://docs.priorlabs.ai/llms.txt
Use this file to discover all available pages before exploring further.
FinetunedTabPFNClassifier— for classification tasksFinetunedTabPFNRegressor— for regression tasks
How It Works
TabPFN performs in-context learning: during inference, it processes both training data and test samples in a single forward pass, using attention to identify relevant patterns. Fine-tuning adapts the transformer’s weights so that the attention mechanism more accurately reflects the similarity structure of your specific data. Concretely, after fine-tuning:- The query representations of test samples and key representations of training samples produce dot products that better reflect their target similarity.
- This allows the fine-tuned model to more appropriately weight relevant in-context samples when making predictions.
When to Fine-Tune
Fine-tuning is not always necessary. TabPFN’s in-context learning already adapts to your data at inference time. Fine-tuning adds value in specific scenarios:Good Candidates for Fine-Tuning
Amortized prediction cost
You predict on the same schema repeatedly. Fine-tuning has an upfront cost but pays off across many future predictions.
Niche or specialized domains
Your data represents a distribution not well-covered by TabPFN’s pretraining priors — e.g., molecular properties, specialized sensor data, or domain-specific financial instruments.
Multiple related tables
You have a family of related datasets (e.g., multiple experiments, regional variants) and want to fine-tune a single model across them.
When Fine-Tuning is Less Likely to Help
- On very small datasets (< 1000 rows), the risk of overfitting often outweighs adaptation benefits.
- If baseline TabPFN is already within a few percent of your target metric, the simpler approaches in Tips & Tricks often close the gap with less effort.
- On datasets with gradual temporal distribution shifts and many features, fine-tuning can be less stable. Make sure your train/validation split respects the time ordering.
Decision Flowchart
Try quick wins first
Apply feature engineering, metric tuning, and preprocessing tuning — these are faster to iterate on.
Try AutoTabPFN or HPO
If you’re still below your target metric after quick wins, try AutoTabPFN ensembles or hyperparameter optimization.
Getting Started
Fine-tuning shares the same interface asTabPFNClassifier and TabPFNRegressor.
1. Prepare Your Dataset
Load and split your data into train and test sets. Use a proper validation strategy: for time-dependent data, use temporal splits rather than random splits.2. Configure and Train
3. Predict
Hyperparameters
Core Parameters
| Parameter | Default | Description |
|---|---|---|
epochs | 30 | Number of fine-tuning epochs. More epochs allow deeper adaptation but risk overfitting. |
learning_rate | 1e-5 | Step size for gradient updates. Lower values are safer but slower to converge. |
device | "cuda" | GPU is strongly recommended. Fine-tuning on CPU is very slow. |
Tuning Guidelines
Learning rate:- Start with
1e-5(the default). This is conservative and preserves pretrained knowledge. - For larger datasets (10k+ rows), you can try
3e-5to1e-4for faster convergence. - If you see training loss spike or diverge, reduce the learning rate.
10–30epochs is a good starting range for most datasets.- For high-accuracy tasks where you’re fine-tuning carefully, use more epochs (50–100) with a lower learning rate to allow gradual adaptation without destroying pretrained representations.
- Monitor validation loss to detect overfitting — stop if validation performance degrades.
Multi-GPU Fine-Tuning
Fine-tuning supports multi-GPU training via PyTorch DDP (Distributed Data Parallel). This is auto-detected when launched withtorchrun:
LOCAL_RANK environment variable that torchrun sets. Note that .fit() should only be called once per torchrun session.
Best Practices
Always compare against baseline
Always compare against baseline
Before fine-tuning, establish a baseline with the default
TabPFNClassifier or TabPFNRegressor. Fine-tuning should measurably improve on this baseline — if it doesn’t, the simpler model is preferable.Use proper validation
Use proper validation
Split a held-out validation set and monitor performance across epochs. For time-series or temporal data, use a temporal split rather than random cross-validation.
Start conservative, then adjust
Start conservative, then adjust
Begin with the defaults (
epochs=30, learning_rate=1e-5). Only increase aggressiveness if you see clear room for improvement without signs of overfitting.Combine with feature engineering
Combine with feature engineering
Fine-tuning and feature engineering are complementary. Good features make fine-tuning more effective by giving the model better signal to adapt to.
Watch for overfitting on small data
Watch for overfitting on small data
With fewer than ~1000 rows, fine-tuning can overfit quickly. Use fewer epochs, a lower learning rate, or consider whether AutoTabPFN ensembles might be more appropriate.
Enterprise Fine-Tuning
For organizations with proprietary datasets, Prior Labs offers an enterprise fine-tuning program that includes:- Fine-tuning on your organization’s data corpus for a customized, high-performance model
- Support for fine-tuning across collections of related datasets
- Optimized training infrastructure
Enterprise Fine-Tuning
Learn more about fine-tuning TabPFN for your organization.
Related
Improving Performance
Quick wins to try before fine-tuning.
AutoTabPFN Ensembles
Automated ensembling as an alternative to fine-tuning.
Hyperparameter Optimization
Automated search over TabPFN’s hyperparameter space.
GitHub Examples
See more examples and fine-tuning utilities in our TabPFN GitHub repository.