Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.priorlabs.ai/llms.txt

Use this file to discover all available pages before exploring further.

This Many Class Classifier Extension allows TabPFN to handle classification problems with more classes than TabPFN’s default limit (currently 10 classes). It works through an error-correcting output code (ECOC) approach that:
  • Encodes the multi-class task into multiple binary or small-class subtasks.
  • Trains the base TabPFNClassifier on these subtasks.
  • Decodes the results back into the original class space.
This approach enables TabPFN to scale to hundreds of classes efficiently while maintaining accuracy and calibration.

Getting Started

Install the many_class extension:
pip install "tabpfn-extensions[many_class]"
Then, simply wrap your existing TabPFNClassifier with ManyClassClassifier to enable support for datasets with large number of classes.
from tabpfn_extensions.many_class import ManyClassClassifier
from tabpfn import TabPFNClassifier

estimator = TabPFNClassifier(device="cuda")
classifier = ManyClassClassifier(
    estimator=estimator,
    # The parameters below are optional — shown here for clarity.
    n_estimators_redundancy=4, # Multiplier on the auto-chosen number of sub-estimators; higher = more stable, slower.
    random_state=42,
)
classifier.fit(X_train, y_train)
predictions = classifier.predict(X_test)

Key parameters

All parameters below are optional — sensible defaults are used if they are not provided.
  • alphabet_size — number of classes each sub-estimator is trained on. Leave unset (the default) so it is inferred from the base estimator’s max_num_classes_ attribute — this way future TabPFN models that support more classes per sub-task will automatically be used to their full capacity. Should not exceed the base estimator’s class limit (10 for TabPFNClassifier with model version ≤ 2.6).
  • n_estimators_redundancy — redundancy multiplier on the minimum number of sub-estimators needed to cover all classes. Higher values improve accuracy and calibration at the cost of runtime. Default is 4.
  • n_estimators — set this to override the auto-derived number of sub-estimators entirely. Leave as None to let it be chosen from alphabet_size, the number of classes in the training data, and n_estimators_redundancy.
  • random_state — controls the randomness of the sub-task encoding, ensuring reproducible results.