Skip to main content
Feature engineering is one of the most impactful ways to improve TabPFN’s performance. The goal is to encode domain knowledge that TabPFN cannot learn from raw columns alone.

Domain-Specific Features

Create features that capture known relationships in your data:
  • Ratios: price / area, revenue / headcount
  • Interactions: weight / (height ** 2) (BMI), voltage * current (power)
  • Group aggregations: mean, count, or standard deviation of a numeric column grouped by a categorical (e.g., average spend per customer segment)

Datetime Features

TabPFN cannot interpret raw datetime objects. Extract structured features instead:
df["year"] = df["date"].dt.year
df["month"] = df["date"].dt.month
df["dayofweek"] = df["date"].dt.dayofweek
df["hour"] = df["date"].dt.hour

# Cyclical encoding for periodic features
import numpy as np
df["month_sin"] = np.sin(2 * np.pi * df["month"] / 12)
df["month_cos"] = np.cos(2 * np.pi * df["month"] / 12)
For datasets with a time dimension, also consider adding a running index feature (sequential 0, 1, 2, …) to help TabPFN detect trends.
The TabPFN API automatically detects and embeds date features. This manual extraction is primarily needed when using the local package.

Text and String Features

The best approach depends on cardinality and semantic content:
  • Low cardinality: Feed directly to TabPFN, which auto-encodes strings as categoricals
  • Medium/High cardinality: Use CountVectorizer or TfidfVectorizer with dimensionality reduction (PCA or TruncatedSVD)
  • Semantic content: Use TabPFN API that automatically handles semantic text encoding.