Coconote
AI notes
AI voice & video notes
Export note
Try for free
Stochastic Inventory Management Optimization
Jul 27, 2024
Stochastic Inventory Management Optimization
Overview
Continuation of inventory management topics
Previously discussed: EOQ model, Newsvendor Problem, Deterministic Dynamic Inventory Optimization
Focus: Stochastic Dynamic Inventory Management
Introduction to applications of reinforcement learning
Key Concepts
Stochastic Inventory Problem
Demand is uncertain and probabilistically distributed (demand distribution known)
Differentiates from deterministic dynamic inventory problems
Decision-making involves ordering quantities based on current inventory and demand forecast
Costs: Ordering Cost (Co) and Holding Cost (Cc/Ch)
Assumptions
Demand is discrete and an IID random variable with a stationary pmf
Instantaneous delivery (no lead time)
Finite warehouse capacity (state variable β€ M)
Non-perishable inventory
Lost sales assumption (unmet demand is lost)
Sequence of Events
Current inventory, ordering, and next time period inventory update
Immediate costs incurred: ordering cost and holding cost
Focus on optimizing trade-off between ordering and holding costs
State updates based on realized demand
References
Martin Putterman's Book on MDPs (Markov Decision Processes)
Abhijit Gosafi's simulation-based optimization
Use of Bellman equation for policy/value iteration
Standard notations derived from Putterman's book
Model Assumptions and Definitions
Model Assumptions
Demand is discrete
No lead time
IID demand
Inventory is non-perishable
Lost sales scenario
Sequence of Events in Each Time Period
Start of period: Observe beginning inventory (St)
Place order (At)
Demand realization (Dt)
End of period: Calculate next beginning inventory (St + At - Dt)
Bellman Equation for MDPs
Used for solving dynamic programming problems
Takes into account current state, action, and expected future rewards
Immediate Reward Function
Considers revenue, ordering cost, and holding cost
R(St, At) = revenue (price) β ordering cost (Co) β holding cost (Cc)
State Transition
State update: St+1 = max(St + At - Dt, 0) (lost sales scenario)
Includes state transition probabilities
Formulating the Problem as an MDP
Components of the MDP
Stage Variable:
Time (days, weeks, etc.)
State Variable:
Inventory level (St)
Action Variable:
Order quantity (At)
Transition Probabilities
Governed by demand distribution (pmf: PJ)
State transition probabilities impact next state conditioning on current state and action
Based on warehouse capacity (M)
Objective Function
Maximization of total reward over a time horizon (includes immediate and expected future rewards)
Example Policies
SS Policy:
Order up to level S if beginning inventory < S, else donβt order
Stationary policy example
Numerical Example
Fixed numerical values for costs and revenues
Deterministic time horizon: 3 periods
Demand probabilities provided
Solving using Bellman equation
Advanced Topics
Value Iteration
Iterative algorithm to calculate the value of being in each state
Uses Bellman equation to iteratively find state values
Policy Iteration
Combines policy evaluation and improvement steps
Iteratively refines policy to maximize rewards
Q-Learning
Model-free learning method for MDPs
Used when transition probabilities are unknown
Q-value updates to find optimal policy
Conclusion
Python demonstration of solving the example numeric problem
Introduction to more advanced methods (TD, etc.) in subsequent sessions
π
Full transcript