👨🏻‍💻 postech.work

Data Engineer

ConnexAI • 🌐 In Person

In Person Posted 2 days, 10 hours ago

Job Description

Build the Future of Conversational AI with ConnexAI

As a Speech Data Engineer, your work will power the data behind real-time speech systems used by millions worldwide, ensuring our AI learns from clean, accurate, and reliable datasets. By curating, analysing, and engineering the voice data that fuels our models, you’ll help shape products that transform how people and businesses communicate.

You’ll be part of the team that manages and scales our massive speech corpora, builds automated pipelines for cleaning and validation, and works closely with annotation and Machine Learning teams to keep our models at the cutting edge.

This is a pivotal moment for ConnexAI as we expand our speech technology capabilities and push the boundaries of conversational intelligence. Join us and be part of the team setting the industry standard.

Core Responsibilities

Organise and maintain large-scale audio and text corpora, ensuring they are versioned correctly, catalogued, and easy to retrieve.

Build automated pipelines using

Python

,

AWS

, and

Docker

to clean, validate, and standardise speech data, detecting duplicates, transcription inconsistencies, or quality issues.

Develop and integrate

APIs

to streamline ingestion and processing of new datasets.

Analyse speech datasets to support ASR and TTS model development, performance evaluation, and linguistic research.

Implement and manage

data version control tools

to ensure dataset reproducibility and traceability.

Contribute to evaluation frameworks for ASR and TTS performance by analysing metrics such as Word Error Rate (

WER

), Speaker Similarity (

SSim

), and Mean Opinion Score (

MOS

) to generate data-driven insights.

Document data processes and tools, ensuring all datasets and analyses are well-documented, reproducible, and compliant with internal standards.

Collaborate closely with data scientists, ML engineers, and product teams to identify opportunities to improve data quality, balance, and diversity through targeted analysis and feedback loops.

Key Skills \& Experience

Strong programming skills in Python for data processing, analysis, and automation.

Proficiency with SQL for developing and managing large-scale datasets.

Experience with AWS cloud services.

Hands-on experience with Docker and containerised development environments.

Familiarity with data versioning tools (e.g., LakeFS, DVC) and dataset reproducibility principles.

Strong collaboration and communication skills.

Background in speech, audio, or NLP data processing is highly desirable.

Interview Process

30-minute video call with the team lead

Take-home technical exercise

90-minute face-to-face interview

About ConnexAI

ConnexAI is an award-winning Conversational AI platform. Designed by an elite engineering team, ConnexAI’s technology enables organisations to maximise profitability, increase revenue, and take productivity to new levels.

ConnexAI provides cutting-edge, enterprise-grade AI applications, including AI Agent, AI Guru, AI Analytics, ASR, AI Voice, and AI Quality.

We value growth both for our products and our people. As we scale, there will be clear opportunities to progress into senior data science, leadership, or principal research roles. Our high retention rate reflects our inclusive, supportive, and empowering environment.

Get job updates in your inbox

Subscribe to our newsletter and stay updated with the best job opportunities.