Skip to main content
When your dataset has many features (especially beyond 500), feature selection can improve both performance and speed.

Why It Helps

TabPFN uses transformer attention over all features. Irrelevant or noisy features dilute the model’s attention budget and can reduce predictive power, especially as feature count grows.

Approaches

Greedy feature selection - remove features individually and check performance. This works particularly well on smaller data with low computational costs. Mutual information filtering — rank features by mutual information with the target and keep the top-k:
from sklearn.feature_selection import mutual_info_classif, SelectKBest

selector = SelectKBest(mutual_info_classif, k=50)
X_train_selected = selector.fit_transform(X_train, y_train)
X_test_selected = selector.transform(X_test)
PCA / TruncatedSVD — reduce dimensionality while retaining variance:
from sklearn.decomposition import PCA

pca = PCA(n_components=50)
X_train_reduced = pca.fit_transform(X_train)
X_test_reduced = pca.transform(X_test)