Open-source ESP32 + Rust toolkit for building self-hosted, real-time voice AI agents on embedded hardware.
EchoKit is an open-source toolkit by Second State for building real-time voice AI agents on ESP32-based hardware. It combines an ESP32 device (WiFi/BLE, microphone, speaker, optional LCD) with a Rust-powered backend that orchestrates a low-latency audio pipeline: VAD (Voice Activity Detection) → ASR (Automatic Speech Recognition) → LLM (Large Language Model) → streaming TTS (Text-to-Speech). Designed for developers, students, educators, and AI enthusiasts, EchoKit enables practical, self-hostable voice AI experiences — from smart home devices and voice assistants to always-on edge AI agents. Unlike cloud-only solutions, EchoKit runs entirely on your own hardware with your own models, ensuring complete privacy and zero monthly subscription fees. **Available Kits** - **EchoKit Box ($59)**: Pre-assembled device, plug and play - **EchoKit DIY ($49)**: Component kit to assemble yourself (Lego-style) - **Fun Voice (Free)**: Browser-based playground to try voice AI without hardware **Key Highlights** - Full-stack open-source: firmware (ESP32) to AI inference server (Rust) - Works with any LLM, ASR, and TTS model — open-source or commercial - MCP (Model Context Protocol) tool support for agentic actions - Private knowledge base support for personalised responses - Voice cloning with GPT-SoVITS integration - Used in universities including Texas State, Tsinghua, and more
ESP32 hardware platform (WiFi/BLE, mic, speaker, optional LCD)
Rust backend: VAD → ASR → LLM → streaming TTS pipeline
Works with any open-source or commercial AI models
MCP tool integrations and agentic search capabilities
Self-hostable with private knowledge base support
$0/month
$29/month
Voice cloning via GPT-SoVITS
Pre-assembled Box ($59) and DIY kit ($49) options
Free browser-based Fun Voice playground
Full documentation, tutorials, and course materials