Lecture on Making Predictions with TensorFlow Decision Forests (TF-DF)

Overview

This lecture focuses on making predictions using TensorFlow Decision Forests (TF-DF) with the Python API. It covers various methods to generate predictions and benchmarks for model inference speed.

Key Points

TF-DF Python API is user-friendly and ideal for experimentation.
Other APIs like TensorFlow Serving and C++ API are recommended for production due to better performance and stability.
Consistency in Datasets: Prediction datasets must have the same feature names and types as the training datasets to avoid errors.

Methods for Making Predictions

1. Using `model.predict()` with pd_dataframe_to_tf_dataset

Converts Pandas DataFrame to TensorFlow Dataset using tfdf.keras.pd_dataframe_to_tf_dataset().
Example: pd_dataset = pd.DataFrame({ "feature_1": [1,2,3], "feature_2": ["a", "b", "c"], "label": [0, 1, 0], }) tf_dataset = tfdf.keras.pd_dataframe_to_tf_dataset(pd_dataset, label="label")
Predictions are generated using the TensorFlow dataset.

2. Using `model.predict()` with Manual TF Datasets

Suitable for large datasets.
Create datasets using tf.data.Dataset.from_tensor_slices() and batch them.
TensorFlow Decision Forests require datasets structured as: features, label or features, label, weights.
Example Dataset: tf_dataset = tf.data.Dataset.from_tensor_slices(( ({"feature_1": [1,2,3], "feature_2": [4,5,6]}, [0,1,0]) )).batch(2)

3. Using `model.predict(...)` and `model.predict_on_batch()` on Dictionaries

Allows prediction using dictionaries of NumPy arrays.
Automatic batching is performed, or use predict_on_batch to control batch processing.
Example: model.predict({"feature_1": np.random.rand(100), "feature_2": np.random.rand(100)}, verbose=0)[:10]

4. Inference with the YDF Format

Involves using TF-DF model trained with CLI API.
Use benchmark tool to measure model inference speed.
Save model and dataset: model.save("my_model") pd_serving_dataset.to_csv("dataset.csv")
Perform predictions using YDF tools: ./predict --model=my_model/assets --dataset=csv:dataset.csv --output=csv:predictions.csv

Important Remarks

Prediction Consistency: Ensure feature types are consistent across training and prediction datasets.
API Differences: Models instantiated in Python convert types automatically; models loaded from disk do not.

Installation and Setup

Install TensorFlow Decision Forests: pip install tensorflow_decision_forests
Import necessary libraries: import tensorflow_decision_forests as tfdf import numpy as np import pandas as pd import tensorflow as tf

Benchmarking

Measure inference speed with benchmark_inference tool.
Example benchmark command: !./benchmark_inference \ --model=my_model/assets \ --dataset=csv:dataset.csv \ --batch_size=100 \ --warmup_runs=10 \ --num_runs=50

Conclusion

This lecture provides a comprehensive guide to using TensorFlow Decision Forests for making predictions. Understanding different data input methods and prediction functions can optimize model performance and speed, especially critical for large datasets or production environments.

Making Predictions with TensorFlow Decision Forests