Fast AI engines
for your chip

We design high-performance AI engines and silicon IP
for your AI chip, ASIC, RISC-V SoC, FPGA and chiplet.

Our team has designed AI chips for the last 10 years at Google (TPUs), Groq, Apple and Waymo. We are based in the Bay Area and VC-backed by Mozilla.

Slim Attention

Slim Attention shrinks your context memory by up to 8x and boosts inference speed without sacrificing accuracy for some LLMs and many vision-language models.

YouTube video preview image

Flash Normalization

FlashNorm is an exact but faster implementation of RMSNorm, LayerNorm, and Dynamic Tanh (DyT). RMSNorm is used by virtually all modern LLMs including LLama, Gemma, GPT-OSS, Mistral, and OLMo 2.

Join Our Newsletter

Subscribe

Sign Up For Updates On Our Products