The AutoTabPFN Ensembles Extension leverages AutoGluon to perform hyperparameter search and ensembling. The AutoTabPFN ensemble extension automatically explores the TabPFN hyperparameter space, trains multiple candidate models, and constructs an optimized weighted ensemble from the best performers. How it works:Documentation Index
Fetch the complete documentation index at: https://docs.priorlabs.ai/llms.txt
Use this file to discover all available pages before exploring further.
- Randomly sample configurations from the TabPFN hyperparameter space.
- Train a TabPFN model for each sampled configuration (
n_ensemble_modelstotal). - Use AutoGluon to evaluate, select, and weight the top-performing models.
- Combine them into a final, optimized meta-ensemble of TabPFNs.
Getting Started
To install the extension, include thepost_hoc_ensembles extension:
This installs all dependencies, including AutoGluon and the TabPFN core library.
Core Parameters
The interface issklearn compatible and built around two parameter sets: AutoGluon control parameters and TabPFN model parameters. See the github for more details.
AutoGluon Parameters
| Parameter | Description |
|---|---|
presets | Controls the trade-off between training time and predictive accuracy. Common options: 'medium_quality', 'best_quality'. |
phe_init_args | Dictionary of arguments passed directly to AutoGluon.TabularPredictor() for advanced customization. |
phe_fit_args | Arguments passed to AutoGluon.TabularPredictor.fit() to control training specifics such as early stopping, validation splits, and resource usage. |
TabPFN Model Parameters
| Parameter | Description |
|---|---|
n_estimators | Number of internal transformers to ensemble within each individual TabPFN model. Increasing this can boost performance at the cost of compute time. (int, default=8) |
balance_probabilities | Balances class probabilities for imbalanced datasets. Recommended for skewed classification tasks. (bool, default=False) |
ignore_pretraining_limits | Bypasses TabPFN’s dataset size and feature limits (50k samples / 2k features). Use with caution - performance beyond these limits may degrade. (bool, default=False) |
Best Practices
- Start small: Try
max_time=300to quickly explore configurations. - Use for accuracy-critical tasks: The ensemble adds compute cost but yields higher precision and calibration.