Stochastic Inventory Management Optimization

Jul 27, 2024

Stochastic Inventory Management Optimization

Overview

  • Continuation of inventory management topics
  • Previously discussed: EOQ model, Newsvendor Problem, Deterministic Dynamic Inventory Optimization
  • Focus: Stochastic Dynamic Inventory Management
  • Introduction to applications of reinforcement learning

Key Concepts

Stochastic Inventory Problem

  • Demand is uncertain and probabilistically distributed (demand distribution known)
  • Differentiates from deterministic dynamic inventory problems
  • Decision-making involves ordering quantities based on current inventory and demand forecast
  • Costs: Ordering Cost (Co) and Holding Cost (Cc/Ch)

Assumptions

  • Demand is discrete and an IID random variable with a stationary pmf
  • Instantaneous delivery (no lead time)
  • Finite warehouse capacity (state variable ≀ M)
  • Non-perishable inventory
  • Lost sales assumption (unmet demand is lost)

Sequence of Events

  • Current inventory, ordering, and next time period inventory update
  • Immediate costs incurred: ordering cost and holding cost
  • Focus on optimizing trade-off between ordering and holding costs
  • State updates based on realized demand

References

  • Martin Putterman's Book on MDPs (Markov Decision Processes)
  • Abhijit Gosafi's simulation-based optimization
  • Use of Bellman equation for policy/value iteration
  • Standard notations derived from Putterman's book

Model Assumptions and Definitions

Model Assumptions

  • Demand is discrete
  • No lead time
  • IID demand
  • Inventory is non-perishable
  • Lost sales scenario

Sequence of Events in Each Time Period

  1. Start of period: Observe beginning inventory (St)
  2. Place order (At)
  3. Demand realization (Dt)
  4. End of period: Calculate next beginning inventory (St + At - Dt)

Bellman Equation for MDPs

  • Used for solving dynamic programming problems
  • Takes into account current state, action, and expected future rewards

Immediate Reward Function

  • Considers revenue, ordering cost, and holding cost
  • R(St, At) = revenue (price) – ordering cost (Co) – holding cost (Cc)

State Transition

  • State update: St+1 = max(St + At - Dt, 0) (lost sales scenario)
  • Includes state transition probabilities

Formulating the Problem as an MDP

Components of the MDP

  • Stage Variable: Time (days, weeks, etc.)
  • State Variable: Inventory level (St)
  • Action Variable: Order quantity (At)

Transition Probabilities

  • Governed by demand distribution (pmf: PJ)
  • State transition probabilities impact next state conditioning on current state and action
  • Based on warehouse capacity (M)

Objective Function

  • Maximization of total reward over a time horizon (includes immediate and expected future rewards)

Example Policies

  • SS Policy: Order up to level S if beginning inventory < S, else don’t order
  • Stationary policy example

Numerical Example

  • Fixed numerical values for costs and revenues
  • Deterministic time horizon: 3 periods
  • Demand probabilities provided
  • Solving using Bellman equation

Advanced Topics

Value Iteration

  • Iterative algorithm to calculate the value of being in each state
  • Uses Bellman equation to iteratively find state values

Policy Iteration

  • Combines policy evaluation and improvement steps
  • Iteratively refines policy to maximize rewards

Q-Learning

  • Model-free learning method for MDPs
  • Used when transition probabilities are unknown
  • Q-value updates to find optimal policy

Conclusion

  • Python demonstration of solving the example numeric problem
  • Introduction to more advanced methods (TD, etc.) in subsequent sessions