Develop and optimize OCR pipelines focused on reading and extracting structured and unstructured data from PDF documents.

Create and maintain Python scripts for data processing, validation, and transformation.

Implement tests, performance tuning, and continuous improvements in OCR models and extraction rules.

Collaborate with the client’s internal teams (engineering and business) to understand requirements and propose efficient technical solutions.

Support technical documentation and provide maintenance for the developed solution.

Intermediate to advanced English for technical collaboration with the client’s global team.

Experience developing and optimizing OCR solutions.

Proficiency in Python (automation, PDF manipulation, regular expressions, testing).

Experience with data pipelines, modeling, and result integration (e.g., JSON, CSV, APIs).

Strong analytical skills and attention to detail to ensure high extraction accuracy.

Experience with version control (Git) and agile methodologies.

Previous experience in document automation projects within the energy, utilities, or financial industries.

Knowledge of Machine Learning applied to OCR (e.g., layout analysis, entity recognition).

Familiarity with Google Vision, AWS Textract, or Azure Cognitive Services.

We strive to provide our team with a welcoming, dynamic, and collaborative environment. To make that happen, we offer several initiatives, such as:

100% remote opportunities ‍

Home office assistance

Regular feedback sessions

Employee referral program

Psychological support ‍* ️

Workplace stretching sessions ️

Knowledge Academy

Partnership with an English school

Monthly transparency meetings

Online happy hours

Welcome kit

Software Engineer Python & OCR - Fluent English

Job Description