uiuc-cs 12 minutes ago

great work!!

shradhasehgal 5 hours ago

Super interesting work. Wild that AF3 launched 100x more kernels. 768 tokens length training results seem cool.