đŸ‘šđŸ»â€đŸ’» postech.work

Web Archiving Data Analyst and Crawl Engineer

The National Archives ‱ 🌐 In Person

In Person Posted 1 day, 11 hours ago

Job Description

As the living, growing home of our national story, The National Archives is already a special place to work. We’re an institution nearly 200 years old with a collection spanning 1,000 years of history. But it’s where we go next that makes things really interesting .

In our strategic vision: Archives for Everyone, we set ourselves the challenge of becoming the 21st Century national archive - a different kind of cultural and heritage institution: Inclusive, Entrepreneurial, Disruptive. We won’t become this overnight. It will take time, focus, effort and daring.

That’s where you come in. Because we can’t do this without you.

Job Overview

Salary: ÂŁ40,000 per annum

Contract type: Permanent

Band: E / Higher Executive Officer

Closing date: Tuesday 21st October 2025

Archives are special. As a home of our collective memory, The National Archives (TNA) plays a unique role. We hold records of events of national and international importance as well as documents that speak to our everyday lives, over the last one thousand years. Our web archives are unparalleled in their quality and richness and provide a unique source of evidence of our contemporary government and state.

We’re looking for an enthusiastic and skilled Data Analyst with experience in web crawling, scraping, or analysis, to support our workflows, help us understand more about our collection, and grow our web archiving capability.

Web archives are fascinating. At The National Archives, we deliver three public web archive services. They are vast collections of government websites and social media. The scale of these collections, the variety of users’ needs, and the complexity of the data make them challenging and fertile ground for innovation. This role is fundamental to our mission to improve our collection processes, ensuring the highest quality and fidelity, understanding our collection, and conveying these insights to a range of people.

As The National Archives’ Web Archiving Data Analyst and Crawl Engineer, you will bring your expertise and in-depth knowledge to develop and shape key aspects of our web archiving services and therefore you will be a key member of the team as we evolve our services.

We work with suppliers who deliver us many technical services, but we are increasing our in-house capability and expertise. These workflows are now important parts of our service and you will own them and develop them, including through finding ways to improve efficiency and resilience. You will embrace challenge and look for opportunities to do things differently.

Working closely with the Senior Data Engineer and our Web Archivists, you will help deepen our understanding of our web archiving and social media collections and use these insights to help tell the story of government online. This includes engaging with experts within The National Archives as well as with external organisations across the digital preservation community and other government departments, by sharing your knowledge with others and raising the profile of our work.

You will be passionate about data and technology. You will thrive in an environment which values and supports continuous learning and self-development.

Web archiving is an exciting, specialist, varied and rapidly evolving field that is a lot of fun to be involved in. Building and maintaining excellent web archiving services calls on a range of skills: problem solving, creativity, developing new techniques for capturing and replaying content, as well as supporting research, and managing stakeholders and projects.

You will support others’ research by delivering development that will help users explore our services “as data”. You will also contribute to the team’s tools and processes, ensuring that we can go about our work as efficiently and effectively as possible.

This is a full-time post. However, TNA are open to considering requests for part-time working, flexible working and job sharing. A combination of onsite and home working is available and applicants should be able to regularly travel to our Kew site for a minimum of 60% of their work time.

Application Process:

Interview: Interviews will be held on-site. We will ask you to complete a technical task prior to the interview, so you can tell us about this during the interview.

Personal Statement: We ask all applicants to submit work history details and a personal statement, not exceeding 1200 words. In your statement we'd like you to explain how you meet the 5 criteria below, your passion for technology, and what you believe you will bring to the role.

Essential criteria for personal statement:

Substantial experience in using web data extraction technologies that include web scraping, crawling, data extraction from websites, and handling web-based data formats (HTML, XML, JSON, WARC, CDX)

Demonstrable proficiency in programming languages (e.g. Python, R, SQL, JavaScript) with experience using data analysis tools and statistical techniques to extract insights from large datasets

Proven ability to build and maintain data pipelines that clean, transform, and aggregate data from multiple sources, ensuring data quality throughout the process

Understanding of data management principles for large-scale digital collections, including quality assurance approaches and working with both structured and unstructured data

Evidence of problem-solving skills with ability to research complex issues, propose innovative solutions, and adapt to changing priorities

Artificial Intelligence can be a useful tool to support your application, however, all examples and statements provided must be truthful, factually accurate and taken directly from your own experience. Where plagiarism has been identified (presenting the ideas and experiences of others, or generated by artificial intelligence, as your own) applications may be withdrawn and internal candidates may be subject to disciplinary action. Please visit the Civil Service Careers website where you can find further information on the use of AI in the application guidance section.

Sponsorship:

We are unable to offer sponsorship for this role.

Job Description

Role and Responsibilities

Deliver comprehensive analysis of web archive and social media archive data: working with the Senior Data Engineer, design and develop data structures, visualisations, and reports that unlock insights from our vast digital collections, demonstrating the impact and value of our archiving efforts to stakeholders.

Develop robust analytical solutions: write modular, well-documented code following best practices and quality standards, including comprehensive testing to ensure solutions can be maintained and extended by team members.

Engineer automated solutions: collaborate with the team to build cross-system applications and automated workflows that tackle unique web and social media archiving challenges, improving our ability to archive at scale through enhanced accuracy and operational efficiency.

Manage our in-house web crawling and social media capture lifecycles: take responsibility for overseeing our crawling operations to ensure the processes are well managed from initiation to completion, implementing efficient quality assurance processes and optimising workflows to maintain high capture standards.

Innovate access methods: leverage your technical understanding to create new ways for researchers, government bodies, and the public to access our collections through APIs, published datasets, and interactive dashboards. Support our strategic mission by developing tools and supplying data that facilitate researchers' use of our collections "as data," enabling novel research on government's digital history.

Leverage web crawling and social media capture technologies: experiment with cutting-edge web archiving tools to continuously improve our capturing capabilities, ensuring we keep pace with the evolving digital landscape of government and push forward the state of the art.

Extract actionable insights: find innovative approaches to add and extract value from our data, providing quantitative insights about both our collections and daily operations to inform strategic decisions. This will involve working with a combination of structured and unstructured data.

Conduct ongoing research: ensure that the tools and approaches we use are the best fit for the challenges we face. Be committed to keeping informed about emerging technologies and tools to support this effort.

Represent and communicate our work within The National Archives: communicate developments and findings through show-and-tells, team meetings, and broader presentations, demonstrating technical capabilities while making complex analysis accessible to varied audiences.

Represent and communicate our work externally: present findings to the wider archiving community and collaborate with other web archiving institutions to address shared challenges and advance the field collectively.

Working Conditions

Normal office environment

Display Screen Equipment user

Person Specification

Essential criteria :

Substantial experience in using web data extraction technologies that include web scraping, crawling, data extraction from websites, and handling web-based data formats (HTML, XML, JSON, WARC, CDX)

Demonstrable proficiency in programming languages (e.g. Python, R, SQL, JavaScript) with experience using data analysis tools and statistical techniques to extract insights from large datasets

Proven ability to build and maintain data pipelines that clean, transform, and aggregate data from multiple sources, ensuring data quality throughout the process

Experience creating data visualisations, reports, and dashboards that effectively communicate complex findings to diverse audiences, including senior stakeholders.

Understanding of data management principles for large-scale digital collections, including quality assurance approaches and working with both structured and unstructured data

Strong ability to translate technical concepts for non-technical stakeholders and gather requirements from users at all organisational levels

Evidence of problem-solving skills with ability to research complex issues, propose innovative solutions, and adapt to changing priorities

Strong organisational skills with proven ability to manage workflows, meet deadlines, and work effectively within multidisciplinary project teams

Desirable criteria :

Experience with cloud technologies (e.g. AWS), APIs, databases, and modern data infrastructure approaches

Knowledge of digital preservation, web archiving software, or research data management principles with understanding of metadata standards and compliance requirements

Familiarity with software testing, version control, code documentation best practices, and interest in emerging technologies for digital data challenges

The Civil Service is committed to attract, retain and invest in talent wherever it is

found. To learn more please see the Civil Service People Plan and the Civil Service

D\&I Strategy .

Benefits

Generous benefits package, including pension, sports and social club facilities, onsite gym, discounted rates at our on-site cafe and opportunities for training and development. Annual leave entitlement of 22 days per calendar year (rising to 25 after the first year, and incrementally to 30 days after six years) and 10œ days public and privilege holidays per annum.

Any move to The National Archives from another employer will mean you can no longer access childcare vouchers. This includes moves between government departments. You may however be eligible for other government schemes, including Tax-Free Childcare. Determine your eligibility at https://www.childcarechoices.gov.uk/ ( opens in new window)

Reasonable adjustments

If a person with disabilities is put at a substantial disadvantage compared to a non-disabled person, we have a duty to make reasonable changes to our processes.

If you need a change to be made so that you can make your application, you should:

Contact The National Archives via careers@nationalarchives.gov.uk as soon as possible before the closing date to discuss your needs

Complete the ‘Reasonable Adjustments’ section of your application form to tell us what changes or help you might need further on in the recruitment process. For instance, you may need wheelchair access at interview, or if you’re deaf, a Language Service Professional

Feedback will only be provided if you attend an interview or assessment.

Security

Successful candidates must pass a disclosure and barring security check.

People working with government assets must complete basic personnel security standard checks (opens in new window)

Nationality requirements

This job is broadly open to the following groups:

UK nationals

nationals of the Republic of Ireland

nationals of Commonwealth countries who have the right to work in the UK

nationals of the EU, Switzerland, Norway, Iceland or Liechtenstein and family members of those nationalities with settled or pre-settled status under the European Union Settlement Scheme (EUSS) (opens in a new window)

nationals of the EU, Switzerland, Norway, Iceland or Liechtenstein and family members of those nationalities who have made a valid application for settled or pre-settled status under the European Union Settlement Scheme (EUSS)

individuals with limited leave to remain or indefinite leave to remain who were eligible to apply for EUSS on or before 31 December 2020

Turkish nationals, and certain family members of Turkish nationals, who have accrued the right to work in the Civil Service

Further information on nationality requirements (opens in new window)

Working for the Civil Service

The Civil Service Code (opens in new window) sets out the standards of behaviour expected of civil servants.

We recruit by merit on the basis of fair and open competition, as outlined in the Civil Service Commission's recruitment principles . (opens in new window)

The Civil Service embraces diversity and promotes equal opportunities. As such, we run a Disability Confident Scheme (DCS) for candidates with disabilities who meet the minimum selection criteria.

The Civil Service also offers a Redeployment Interview Scheme to civil servants who are at risk of redundancy, and who meet the minimum requirements for the advertised vacancy.

This vacancy is part of the Great Place to Work for Veterans initiative . (opens in new window)

Contact point for applicants:

Name: The National Archives Recruitment Team

Email: careers@nationalarchives.gov.uk

Further information

If you feel your application has not been treated in accordance with the Recruitment Principles and you wish to make a complaint, in the first instance, you should contact The National Archives via email: careers@nationalarchives.gov.uk If you are not satisfied with the response you receive from the Department, you can contact the Civil Service Commission at https://civilservicecommission.independent.gov.uk/recruitment/recruitment-complaints/ (opens in new window)

Get job updates in your inbox

Subscribe to our newsletter and stay updated with the best job opportunities.