Pimba: A Processing-in-Memory Acceleration for Post-Transformer Large Language Model Serving
Wonung
Kim, Yubin
Lee, Yoonsung
Kim, Jinwoo
Hwang, Seongryong
Oh, Jiyong
Jung, Aziz
Huseynov, Woong Gyu
Park, Chang Hyun
Park, Divya
Mahajan, and Jongse
Park