TabPFN uses transformer attention over all features. Irrelevant or noisy features dilute the model’s attention budget and can reduce predictive power, especially as feature count grows.
Greedy feature selection - remove features individually and check performance. This works particularly well on smaller data with low computational costs.Mutual information filtering — rank features by mutual information with the target and keep the top-k: