TabPFNRegressor returns a full probability distribution over the target in a single forward pass, with no extra inference cost. The default .predict(X) call returns the distribution mean for scikit-learn compatibility, but the full distribution is always available through .predict(X, output_type="full").
Most models return only a single number per prediction. This hides everything about how confident the model is, whether the uncertainty is symmetric, and whether the target might have two plausible values instead of one. The full predictive distribution exposes all of it.
The full distribution lets you:
- Build calibrated prediction intervals from the model’s own quantiles.
- Detect skew, heavy tails, and multimodality that a point estimate hides.
- Pick the point estimate (mean, median, mode) that matches your loss function.
- Plot the per-sample density to inspect or communicate uncertainty.

Getting the full distribution
Setoutput_type="full" to get the raw distribution alongside all point estimates:
output_type values are convenience shortcuts that all derive from the same distribution:
output_type | Returns | When to use |
|---|---|---|
"mean" (default) | np.ndarray, shape (n,) | MSE-style metrics; optimal under squared loss for unimodal posteriors |
"median" | np.ndarray, shape (n,) | Heavy-tailed or skewed targets; minimizes MAE |
"mode" | np.ndarray, shape (n,) | ”Most likely” answer for clean unimodal posteriors only |
"quantiles" | list[np.ndarray], one per entry in quantiles=[...] | Calibrated intervals and risk-aware decisions |
"main" | dict with keys "mean", "median", "mode", and "quantiles" | All point estimates in one call |
"full" | "main" plus "criterion" and "logits" | Inspecting or plotting the distribution directly |
output_type="full" requires the local tabpfn package. The cloud client (tabpfn-client) does not return raw logits. Use output_type="main" or output_type="quantiles" there instead.Point estimates
All three point estimates are derived from the same(logits, criterion) pair:
- Mean
- Median
- Mode
The probability-weighted average over bucket midpoints. Under squared loss this is the optimal point estimate given the model’s learned distribution, but if the distribution is bimodal, the mean falls between modes and may be a low-probability outcome. Use the full distribution or median when multimodality is likely.You can also compute it manually from the full output:

Use Cases
Quantiles and credible intervals
Passoutput_type="quantiles" with a quantiles list, or compute them directly from the bar distribution’s inverse CDF:
icdf method inverts the piecewise-uniform CDF exactly within the model’s bucket grid, so intervals are calibrated under the model’s distribution. Empirical calibration on held-out data is still recommended before relying on these intervals in production.
Single-sample plots
The coretabpfn package can draw the full predictive density for single test points. The helper renders the density as a curve, marks the mean, median, and mode, and shades a central credible interval.

output_type="full" dict directly and selects which row to plot with sample_idx; see plot_regression_distribution below for all arguments. Because it accepts an ax, you can lay several panels side by side to compare test points in one figure:
Plotting requires the optional matplotlib extra:
pip install "tabpfn[viz]".examples/plot_regression_distribution.py.
Visualizing many points at once
For comparing many test points at once, thebar_distribution_plot.py example helper in tabpfn-extensions renders the per-sample predictive density as a vertical heatmap. It is a standalone example file, so copy bar_distribution_plot.py into your project and import it locally:
X_test[:, 0] is the first feature column, used here as the x-axis for plotting. Replace it with whichever 1D value makes sense as a horizontal axis for your data. See plot_bar_distribution below for the merging, cropping, and palette options.

merge_bars=4, merging adjacent buckets for a coarser but faster render. Use the coarser view during exploration and the full-resolution view for final figures.
The full runnable example is at examples/predictive_distribution/predictive_distribution_example.py.
What is the bar distribution?
TabPFN treats regression as classification over a grid of buckets on the target axis. The model outputs one logit per bucket; a softmax converts these to bucket probabilities, and within each bucket the density is uniform. The result is a piecewise-uniform probability density overy:
Buckets are non-uniform: they are packed densely near the bulk of the training targets and spread into long tails. The model can therefore represent skewed, multimodal, or heteroscedastic targets with no parametric assumptions.
You do not need to fully understand the formula above to use TabPFN’s predicted regression distribution. All you need to remember is that TabPFN can output a distribution instead of a single point-estimate.
Library Reference
tabpfn.visualisation.plot_regression_distribution
Plot the predicted target distribution for a single sample as a density curve, marking point estimates and shading a central credible interval. Requires the matplotlib extra (pip install "tabpfn[viz]"). All arguments after prediction are keyword-only.
| Parameter | Type | Default | Description |
|---|---|---|---|
prediction | dict | required | Output of reg.predict(X, output_type="full"). May hold several samples; pick the one to plot with sample_idx. |
sample_idx | int | 0 | Index of the sample to plot within prediction. |
statistics | Sequence[str] | ("mean", "median", "mode") | Point statistics to mark with a vertical line. Any of "mean", "median", "mode". |
quantile_interval | tuple[float, float] | None | (0.1, 0.9) | Central interval to shade, e.g. (0.1, 0.9) for the 80% interval. Pass None to disable. |
zoom_quantile | float | None | 0.99 | Fraction of probability mass to keep in view, centred on the median. Pass None to show the full support. |
smooth | float | 0.005 | Width of the display-only moving average over the density, as a fraction of the number of bars. Pass 0 to show the raw bar density. |
ax | matplotlib.axes.Axes | None | None | Existing axes to draw on. A new figure is created if omitted. |
color | str | "#1f77b4" | Base colour of the density curve. |
matplotlib.axes.Axes — the axes containing the plot.
bar_distribution_plot.plot_bar_distribution
Plot TabPFN’s per-sample bar distribution as a vertical heatmap, one column per test point. This is an example helper from tabpfn-extensions (copy it into your project); it depends on seaborn for its default palette.
| Parameter | Type | Default | Description |
|---|---|---|---|
ax | matplotlib.axes.Axes | required | Axis to draw on (fig, ax = plt.subplots()). |
x | torch.Tensor | required | 1D positions of shape (num_examples,) to place along the x-axis. |
bar_borders | torch.Tensor | required | Borders of the bar distribution, from preds["criterion"].borders. |
logits | torch.Tensor | required | Raw logits of shape (num_examples, len(bar_borders) - 1), from preds["logits"]. |
merge_bars | int | None | None | If set, merge this many adjacent bars into one for a faster, coarser plot. |
restrict_to_range | tuple[float, float] | None | None | (min_y, max_y) to crop the y-axis to a range of target values. |
plot_log_probs | bool | False | If True, plot log-densities (useful when a few bars dominate). |
**kwargs | Forwarded to heatmap_with_box_sizes (e.g. palette, threshold_i). |
None — draws the heatmap onto the supplied ax.
Summary
The bar distribution gives you everything a point estimate discards: shape, skew, multimodality, and calibrated uncertainty. Useoutput_type="mean" when you need scikit-learn compatibility, output_type="quantiles" when you need intervals, and output_type="full" when you want to inspect or plot the distribution directly. For inputs well outside the training range, treat the distribution with caution: bucket boundaries are fixed at training time and the model does not flag OOD inputs automatically.
Regression
Point estimates, quantiles, and full distribution overview.
Interpretability
Explain predictions with Shapley values.
Metric Tuning
Optimize predictions for custom loss functions.
FAQ
Common questions and troubleshooting.