Discussion about this post

User's avatar
The AI Architect's avatar

Really smart application of pipeline parallelism here. The cache-resident vs memory-resident split makes sense giventhe performance gap between LLC and DRAM. I'm curious tho if the auto-tuner logic creates any thrashing when workloads shift between hot and cold key distributions unexpectadly.

Expand full comment

No posts

Ready for more?