Responsibilities:
Develop and optimize OCR pipelines focused on reading and extracting structured and unstructured data from PDF documents.
Create and maintain Python scripts for data processing, validation, and transformation.
Implement tests, performance tuning, and continuous improvements in OCR models and extraction rules.
Collaborate with the client’s internal teams (engineering and business) to understand requirements and propose efficient technical solutions.
Support technical documentation and provide maintenance for the developed solution.
Technical Requirements:
Intermediate to advanced English for technical collaboration with the client’s global team.
Experience developing and optimizing OCR solutions.
Proficiency in Python (automation, PDF manipulation, regular expressions, testing).
Experience with data pipelines, modeling, and result integration (e.g., JSON, CSV, APIs).
Strong analytical skills and attention to detail to ensure high extraction accuracy.
Experience with version control (Git) and agile methodologies.
Nice-to-Have:
Previous experience in document automation projects within the energy, utilities, or financial industries.
Knowledge of Machine Learning applied to OCR (e.g., layout analysis, entity recognition).
Familiarity with Google Vision, AWS Textract, or Azure Cognitive Services.
We strive to provide our team with a welcoming, dynamic, and collaborative environment. To make that happen, we offer several initiatives, such as:
100% remote opportunities
Home office assistance
Regular feedback sessions
Employee referral program
Psychological support * ️
Workplace stretching sessions ️
Knowledge Academy
Partnership with an English school
Monthly transparency meetings
Online happy hours
Welcome kit