Best Practices

To achieve good performance, we recommend the following:

Use a dedicated GPU or GPUs: We recommend NVIDIA H100 or A100 GPUs. Any dedicated GPU supported by PyTorch is compatible, but some models may not have enough memory for larger datasets or perform slowly. Integrated GPUs, MPS (Apple Silicon), and CPUs are also supported, but are only suitable for small datasets.
Use multiple GPUs: For larger datasets, fit + predict time can be dramatically reduced by paral- lelizing inference over several GPUs. To enable this, set the device parameter of TabPFNClassifier and TabPFNRegressor.
Use batch inference: Unless the fitted-model cache is enabled (see below), the model is retrained each time .predict() is called. This means that it is much faster to make a prediction for all your test points in a single .predict() call. If you run out of memory, split the test points into batches of 1000 to 10000 and call .predict() for each batch.
Use PyTorch 2.8 or above: TabPFN-2.5 also supports earlier versions of PyTorch, but these may have lower performance.
For small datasets, enable the fitted-model cache: This is an experimental feature that trains and stores the model during .fit(), making subsequent .predict() calls fast by using a KV-Cache. It is enabled by setting the fit_mode parameter of TabPFNClassifier and TabPFNRegressor to fit_with_cache. However, with this setting classification models will consume approximately 6.1 KB of GPU memory and 48.8 KB of CPU memory per cell in the training dataset (regression models about 25% less), thus it is currently only suitable for small training datasets. For larger datasets and CPU-based inference, we recommend the TabPFN-as-MLP/Tree output engine.
If speed is important for your application, you may consider optimizing the memory_saving_mode and n_preprocessing_jobs parameters of TabPFNClassifier and TabPFNRegressor. See the code documentation for further information.

Getting Started

Capabilities

Extensions

Use Cases