Abstract
This paper presents a novel reinforcement learning (RL) framework for adaptive order execution in cryptocurrency markets. Our approach uses a Proximal Policy Optimization (PPO) agent trained on historical order book data to minimize execution slippage while maximizing fill rates across varying market conditions.
Introduction
Large order execution in cryptocurrency markets presents unique challenges compared to traditional finance. Thin order books, fragmented liquidity across exchanges, and 24/7 operation require execution algorithms that can adapt in real-time to changing market microstructure.
Methodology
We trained a PPO agent on 12 months of Level 2 order book data from Binance, Coinbase, and Kraken for BTC/USDT and ETH/USDT pairs. The state space includes:
- Current order book depth (10 levels)
- Recent trade flow imbalance
- Volatility regime indicator
- Remaining order quantity
- Time elapsed since order start
The action space consists of limit order placement at various price levels and market order triggers.
Results
| Metric | TWAP Baseline | VWAP Baseline | RL Agent |
|---|---|---|---|
| Avg Slippage (bps) | 12.4 | 8.7 | 4.2 |
| Fill Rate | 94.2% | 96.1% | 98.7% |
| Execution Time | Fixed | Fixed | Adaptive |
Our RL agent reduced average slippage by 52% compared to VWAP and achieved a 98.7% fill rate across all test scenarios.
Conclusion
Reinforcement learning provides a powerful framework for adaptive order execution in crypto markets. The agent's ability to learn market microstructure patterns and adjust execution strategy in real-time offers significant advantages over static algorithmic approaches. Future work will extend this framework to cross-exchange execution and DeFi AMM interactions.



