🌳

Making Predictions with TensorFlow Decision Forests

Apr 17, 2025

Lecture on Making Predictions with TensorFlow Decision Forests (TF-DF)

Overview

This lecture focuses on making predictions using TensorFlow Decision Forests (TF-DF) with the Python API. It covers various methods to generate predictions and benchmarks for model inference speed.

Key Points

  • TF-DF Python API is user-friendly and ideal for experimentation.
  • Other APIs like TensorFlow Serving and C++ API are recommended for production due to better performance and stability.
  • Consistency in Datasets: Prediction datasets must have the same feature names and types as the training datasets to avoid errors.

Methods for Making Predictions

1. Using model.predict() with pd_dataframe_to_tf_dataset

  • Converts Pandas DataFrame to TensorFlow Dataset using tfdf.keras.pd_dataframe_to_tf_dataset().
  • Example: pd_dataset = pd.DataFrame({ "feature_1": [1,2,3], "feature_2": ["a", "b", "c"], "label": [0, 1, 0], }) tf_dataset = tfdf.keras.pd_dataframe_to_tf_dataset(pd_dataset, label="label")
  • Predictions are generated using the TensorFlow dataset.

2. Using model.predict() with Manual TF Datasets

  • Suitable for large datasets.
  • Create datasets using tf.data.Dataset.from_tensor_slices() and batch them.
  • TensorFlow Decision Forests require datasets structured as: features, label or features, label, weights.
  • Example Dataset: tf_dataset = tf.data.Dataset.from_tensor_slices(( ({"feature_1": [1,2,3], "feature_2": [4,5,6]}, [0,1,0]) )).batch(2)

3. Using model.predict(...) and model.predict_on_batch() on Dictionaries

  • Allows prediction using dictionaries of NumPy arrays.
  • Automatic batching is performed, or use predict_on_batch to control batch processing.
  • Example: model.predict({"feature_1": np.random.rand(100), "feature_2": np.random.rand(100)}, verbose=0)[:10]

4. Inference with the YDF Format

  • Involves using TF-DF model trained with CLI API.
  • Use benchmark tool to measure model inference speed.
  • Save model and dataset: model.save("my_model") pd_serving_dataset.to_csv("dataset.csv")
  • Perform predictions using YDF tools: ./predict --model=my_model/assets --dataset=csv:dataset.csv --output=csv:predictions.csv

Important Remarks

  • Prediction Consistency: Ensure feature types are consistent across training and prediction datasets.
  • API Differences: Models instantiated in Python convert types automatically; models loaded from disk do not.

Installation and Setup

  • Install TensorFlow Decision Forests: pip install tensorflow_decision_forests
  • Import necessary libraries: import tensorflow_decision_forests as tfdf import numpy as np import pandas as pd import tensorflow as tf

Benchmarking

  • Measure inference speed with benchmark_inference tool.
  • Example benchmark command: !./benchmark_inference \ --model=my_model/assets \ --dataset=csv:dataset.csv \ --batch_size=100 \ --warmup_runs=10 \ --num_runs=50

Conclusion

This lecture provides a comprehensive guide to using TensorFlow Decision Forests for making predictions. Understanding different data input methods and prediction functions can optimize model performance and speed, especially critical for large datasets or production environments.