Fast AI engines
for your chip

We design high-performance AI engines and silicon IP
for your AI chip, ASIC, RISC-V SoC, FPGA and chiplet.

Our team has designed AI chips for the last 10 years at Google (TPUs), Groq, Apple and Waymo. We are based in the Bay Area and VC-backed by Mozilla.

Contact Us

Slim Attention

Slim Attention shrinks your context memory by up to 8x and boosts inference speed without sacrificing accuracy for some LLMs and many vision-language models.

Flash Normalization

FlashNorm is an exact but faster implementation of RMSNorm, LayerNorm, and Dynamic Tanh (DyT). RMSNorm is used by virtually all modern LLMs including LLama, Gemma, GPT-OSS, Mistral, and OLMo 2.

Join Our Newsletter

Subscribe

Sign Up For Updates On Our Products

Fast AI enginesfor your chip

Slim Attention

Flash Normalization

Join Our Newsletter

Fast AI engines
for your chip