Key Capabilities
- Zero-shot regression - Predicts continuous targets instantly in a single forward pass.
- Calibrated uncertainty - Produces reliable mean, median, or quantile-based predictions for confidence estimation.
- Robust to real-world noise - Handles outliers, missing values, and mixed data types.
- Text-aware inputs - Detects textual columns, extracts embeddings, and integrates them into predictions.
- Minimal preprocessing - Works directly with raw numerical and categorical data.
- Fast inference - Zero-shot predictions complete in seconds, even on mid-range GPUs.
Getting Started
First, load your training and test datasets. The training dataset is used for in-context learning - it’s passed directly into the model’s forward pass, allowing TabPFN to condition its predictions on the data distribution without performing gradient-based optimization. The test dataset is then used for making predictions, leveraging what the model inferred from the context provided by your training data.Advanced Features
Auto fine-tuning
TheAutoTabPFNRegressor automatically builds ensembles of strong models to maximize accuracy:
- Runs an automated hyperparameter search for optimal settings.
- Builds a Post-Hoc Ensemble (PHE) for improved calibration and performance.
- Balances accuracy vs. latency with the max_timeparameter.
Quantile Regression
How can I apply a custom loss (e.g. MAPE, asymmetric loss) to regression?
How can I apply a custom loss (e.g. MAPE, asymmetric loss) to regression?
TabPFN provides a full predictive distribution, enabling loss-aware predictions without retraining. You can compute the Bayes-optimal point prediction that minimizes the expected custom loss. This method gives flexible custom-loss predictions without modifying the model.
GPU memory keeps increasing after multiple regression predictions - what’s happening??
GPU memory keeps increasing after multiple regression predictions - what’s happening??
This is due to cached tensors not being cleared between predictions. It occurs primarily in regression because distributional predictions retain more GPU buffers.This prevents gradual VRAM growth until automatic cleanup is added in future releases.