EL-attention proposes a way to reduce the memory requirements during inference. This post will try to provide a simple summary of the approach.
Share this post
EL-Attention: Memory Efficient Lossless…
Share this post
EL-attention proposes a way to reduce the memory requirements during inference. This post will try to provide a simple summary of the approach.