Senior AI Engineer – Voice Agent Platform
At Gorilla Logic, we build smarter, faster, and stronger systems that push the boundaries of what’s possible. We’re looking for a Senior AI Engineer who thrives on solving complex problems, loves cutting-edge AI, and wants to create the future of agentic voice intelligence.
This is your chance to join a world-class engineering team designing real-time, human-like voice agents that think, speak, and act with purpose.
What You’ll Do
Agentic AI Systems
Design and implement LangGraph-based agent architectures with multi-turn memory, real-time decision-making, and complex state management.
Build autonomous voice agents that handle interruptions, context switching, and live customer interactions.
Develop specialized agent types (customer service, sales, routing) with intelligent tool and function calling capabilities.
Implement agent evaluation systems using LLM-as-Judge methodologies to assess accuracy, hallucination detection, and goal achievement.
Create configurable templates for rapid, multi-tenant deployment and scalability.
*
LLM Solution Engineering
Integrate and optimize LLM providers (OpenAI GPT-4o/GPT-5, Groq Llama 4, Anthropic Claude) with dynamic model routing and fallback strategies.
Apply advanced prompt engineering techniques for voice-first applications, including templating, few-shot learning, and context management.
Build streaming LLM pipelines that coordinate sentence-level text generation with real-time text-to-speech synthesis.
Develop function calling frameworks for tools like call transfer, conferencing, recording, and external integrations.
Implement cost optimization strategies balancing performance, latency, and API usage across thousands of sessions.
*
Voice and Audio Processing
Build real-time speech-to-text pipelines using Deepgram Nova-3 with voice activity detection and interruption handling.
Implement multi-provider text-to-speech orchestration (ElevenLabs, Deepgram, Cartesia) with voice cloning and tone control.
Develop low-latency audio streaming over WebSockets with buffering, codec handling, and error recovery.
Create dual-channel recording systems with speaker separation for QA and data collection.
Optimize end-to-end latency in the STT LLM TTS pipeline to achieve natural conversational flow.
*
Multimodal AI Capabilities
Extend agents to handle text, voice, and vision inputs using GPT-4o multimodal capabilities.
Build cross-modal reasoning systems that combine transcription, context, and visual data.
Implement document and image understanding features for real-time reference during conversations.
Design evaluation frameworks to assess multimodal performance and interaction quality.
*
Distributed Systems and Infrastructure
Architect event-driven microservices using NATS JetStream for reliable message delivery.
Build multi-tenant RPC frameworks with access controls, secrets management, and isolation.
Deploy to Kubernetes with autoscaling, health checks, and fault-tolerant design.
Implement observability solutions using OpenTelemetry for full pipeline visibility.
Create idempotency and reliability mechanisms to handle high concurrency at scale.
*
What You Bring
Core AI and LLM Expertise
Proven experience building production-grade agentic AI systems using LangChain, LangGraph, or AutoGPT.
Deep understanding of ReAct agent architectures, tool use, memory systems, and multi-agent orchestration.
Hands-on integration with LLM APIs such as OpenAI GPT-4o/GPT-5, Anthropic Claude, and Groq Llama 4.
Expertise in prompt engineering, few-shot learning, and system prompt optimization.
Experience managing function calling pipelines, latency, hallucination control, and streaming responses.
*
Voice and Multimodal Development
2+ years developing voice AI systems with Deepgram, OpenAI Whisper, ElevenLabs, or similar providers.
Knowledge of audio codecs (MULAW, PCM), VAD, noise cancellation, and real-time audio streaming.
Experience with WebRTC, LiveKit, Twilio, or Telnyx for real-time communications.
Familiarity with multimodal AI models like GPT-4o or Gemini for cross-modal reasoning.
*
Backend Engineering
Strong proficiency in Node.js (22+) and TypeScript, using modern async and event-driven patterns.
Experience with Express.js and MongoDB (Mongoose) for high-write and time-series workloads.
Knowledge of NATS JetStream, Kafka, or RabbitMQ for message streaming.
Skilled in designing RESTful APIs and WebSocket services.
*
Infrastructure and DevOps
Hands-on experience with Kubernetes and Docker for scalable deployments.
Familiarity with AWS services such as Secrets Manager and S3.
Proficiency in OpenTelemetry for distributed tracing and observability.
Experience using GitHub Actions, Kustomize, pnpm workspaces, and Changesets for CI/CD.
Understanding of distributed systems fundamentals—idempotency, retries, circuit breakers, and high availability.
*
Audio and Performance Optimization
Experience optimizing end-to-end latency in voice pipelines.
Knowledge of Silero VAD, dual-channel recording, and audio data collection strategies.
Familiarity with performance profiling, testing AI systems, and cost optimization for large-scale voice agents.
*
Security and SaaS Fundamentals
Understanding of multi-tenant SaaS security, RBAC, and secrets management.
Experience designing for fault tolerance and data isolation at scale.
*
Bonus Points
Contributions to open-source AI frameworks such as LangChain, LlamaIndex, or Haystack.
Published research or blogs on agentic AI, LLM orchestration, or voice AI.
Experience with telephony systems (SIP, Twilio, Telnyx, WebRTC).
Proven success optimizing LLM cost and performance in production.
Participation in AI safety, evaluation, or red-teaming initiatives.
Experience building or debugging agent observability systems.