Fast AI engines
for your chip
We design high-performance AI engines and silicon IP
for
your AI chip, ASIC, RISC-V SoC, FPGA and chiplet.
Our team has designed AI chips for the last 10 years at Google (TPUs), Groq, Apple and Waymo. We are based in the Bay Area and VC-backed by Mozilla.
Slim Attention
Slim Attention shrinks your context memory by up to 8x and boosts inference speed without sacrificing accuracy for some LLMs and many vision-language models.
Flash Normalization
FlashNorm is an exact but faster implementation of RMSNorm, LayerNorm, and Dynamic Tanh (DyT). RMSNorm is used by virtually all modern LLMs including LLama, Gemma, GPT-OSS, Mistral, and OLMo 2.