Stochastic Inventory Management Optimization

Jul 27, 2024

Stochastic Inventory Management Optimization

Overview

Continuation of inventory management topics
Previously discussed: EOQ model, Newsvendor Problem, Deterministic Dynamic Inventory Optimization
Focus: Stochastic Dynamic Inventory Management
Introduction to applications of reinforcement learning

Key Concepts

Stochastic Inventory Problem

Demand is uncertain and probabilistically distributed (demand distribution known)
Differentiates from deterministic dynamic inventory problems
Decision-making involves ordering quantities based on current inventory and demand forecast
Costs: Ordering Cost (Co) and Holding Cost (Cc/Ch)

Assumptions

Demand is discrete and an IID random variable with a stationary pmf
Instantaneous delivery (no lead time)
Finite warehouse capacity (state variable ≤ M)
Non-perishable inventory
Lost sales assumption (unmet demand is lost)

Sequence of Events

Current inventory, ordering, and next time period inventory update
Immediate costs incurred: ordering cost and holding cost
Focus on optimizing trade-off between ordering and holding costs
State updates based on realized demand

References

Martin Putterman's Book on MDPs (Markov Decision Processes)
Abhijit Gosafi's simulation-based optimization
Use of Bellman equation for policy/value iteration
Standard notations derived from Putterman's book

Model Assumptions and Definitions

Model Assumptions

Demand is discrete
No lead time
IID demand
Inventory is non-perishable
Lost sales scenario

Sequence of Events in Each Time Period

Start of period: Observe beginning inventory (St)
Place order (At)
Demand realization (Dt)
End of period: Calculate next beginning inventory (St + At - Dt)

Bellman Equation for MDPs

Used for solving dynamic programming problems
Takes into account current state, action, and expected future rewards

Immediate Reward Function

Considers revenue, ordering cost, and holding cost
R(St, At) = revenue (price) – ordering cost (Co) – holding cost (Cc)

State Transition

State update: St+1 = max(St + At - Dt, 0) (lost sales scenario)
Includes state transition probabilities

Formulating the Problem as an MDP

Components of the MDP

Stage Variable: Time (days, weeks, etc.)
State Variable: Inventory level (St)
Action Variable: Order quantity (At)

Transition Probabilities

Governed by demand distribution (pmf: PJ)
State transition probabilities impact next state conditioning on current state and action
Based on warehouse capacity (M)

Objective Function

Maximization of total reward over a time horizon (includes immediate and expected future rewards)

Example Policies

SS Policy: Order up to level S if beginning inventory < S, else don’t order
Stationary policy example

Numerical Example

Fixed numerical values for costs and revenues
Deterministic time horizon: 3 periods
Demand probabilities provided
Solving using Bellman equation

Advanced Topics

Value Iteration

Iterative algorithm to calculate the value of being in each state
Uses Bellman equation to iteratively find state values

Policy Iteration

Combines policy evaluation and improvement steps
Iteratively refines policy to maximize rewards

Q-Learning

Model-free learning method for MDPs
Used when transition probabilities are unknown
Q-value updates to find optimal policy

Conclusion

Python demonstration of solving the example numeric problem
Introduction to more advanced methods (TD, etc.) in subsequent sessions

Full transcript