Domain-Specific Features
Create features that capture known relationships in your data:- Ratios:
price / area,revenue / headcount - Interactions:
weight / (height ** 2)(BMI),voltage * current(power) - Group aggregations: mean, count, or standard deviation of a numeric column grouped by a categorical (e.g., average spend per customer segment)
Datetime Features
TabPFN cannot interpret raw datetime objects. Extract structured features instead:The TabPFN API automatically detects and embeds date features. This manual extraction is primarily needed when using the local package.
Text and String Features
The best approach depends on cardinality and semantic content:- Low cardinality: Feed directly to TabPFN, which auto-encodes strings as categoricals
- Medium/High cardinality: Use
CountVectorizerorTfidfVectorizerwith dimensionality reduction (PCA or TruncatedSVD) - Semantic content: Use TabPFN API that automatically handles semantic text encoding.