Feature Engineering - Prior Labs

Domain-Specific Features

Create features that capture known relationships in your data:

Ratios: price / area, revenue / headcount

Interactions: weight / (height ** 2) (BMI), voltage * current (power)

Group aggregations: mean, count, or standard deviation of a numeric column grouped by a categorical (e.g., average spend per customer segment)

Datetime Features

TabPFN cannot interpret raw datetime objects. Extract structured features instead:

df["year"] = df["date"].dt.year
df["month"] = df["date"].dt.month
df["dayofweek"] = df["date"].dt.dayofweek
df["hour"] = df["date"].dt.hour

# Cyclical encoding for periodic features
import numpy as np
df["month_sin"] = np.sin(2 * np.pi * df["month"] / 12)
df["month_cos"] = np.cos(2 * np.pi * df["month"] / 12)

For datasets with a time dimension, also consider adding a running index feature (sequential 0, 1, 2, …) to help TabPFN detect trends.

The TabPFN API automatically detects and embeds date features. This manual extraction is primarily needed when using the local package.

Text and String Features

The best approach depends on cardinality and semantic content:

Low cardinality: Feed directly to TabPFN, which auto-encodes strings as categoricals

Medium/High cardinality: Use CountVectorizer or TfidfVectorizer with dimensionality reduction (PCA or TruncatedSVD)

Semantic content: Use TabPFN API that automatically handles semantic text encoding.

​Domain-Specific Features

​Datetime Features

​Text and String Features

Domain-Specific Features

Datetime Features

Text and String Features