Differential Privacy and Synthetic Data Consultant

Desired Candidate Profile

Title: Differential Privacy and Synthetic Data Consultant
Location: Home-based
Duration: 01/05/2024 to 31/10/2024 (Negotiable)
Time Type: Full-time/Part-time depending on the availability of the selected candidate
Contract Type: Individual Consultant (Local)

General Background

UNHCR, the UN Refugee Agency, is a global organization dedicated to saving lives, protecting rights, and building a better future for refugees, forcibly displaced communities, and stateless people. We work to ensure that everybody has the right to seek asylum and find safe refuge, having fled violence, persecution, war, or disaster at home. Since 1950, we have faced multiple crises on multiple continents, and provided vital assistance to refugees, asylum-seekers, internally displaced and stateless people, many of whom have nobody left to turn to. We help to save lives and build better futures for millions forced from home.

UNHCR Data Transformation Strategy 2020-2025 envisions that by 2025, UNHCR becomes a trusted leader on data and information related to refugees and other affected populations, thereby enabling actions that protect, include, and empower them. As part of its commitment to open data , UNHCR currently shares sample surveys data on the MDL, UNHCR’s MicroData Library (microdata.unhcr.org), after anonymizing the datasets using statistical disclosure control methods implemented through the R package SDCmicro (sdctools.github.io/sdcMicro). The MDL currently holds over 700 microdatasets, mostly sampled-based surveys on a variety of topics conducted by field operations, regional offices, or external partners, and is evolving to accommodate an increasing amount of dataset types, including registration data or other population censuses. However, because of the sensitivity of this data, SDC would allow only for a small subsample of the data to be shared, which would greatly limit the final utility compared to the full dataset.

The Statistics and Demographics Section, in collaboration with UNHCR’s Innovation Service, under the Data Innovation Fund, is currently exploring the use of new Privacy Enhancing Technologies on UNHCR’s survey data, with the purpose of identifying new ways to safely share data with an optimal privacy-utility balance to promote collaboration via the MDL platform.

The Project team is focusing on two technologies under PETs, differential privacy (with libraries like OpenDP, DiffPriv, etc.) and synthetic data (with REaLTabFormer or similar libraries) and their application to tabular data. The project will concentrate on UNHCR’s registration data, a relational database which is composed of various tables with parent/child relations.

The purpose of the project is to identify, test and compare the most suitable technologies/libraries, to process this kind of data, document best practices and provide practical guidance on how to implement them. The team is seeking for an expert in differential privacy and synthetic data that can support the team in exploring and testing these technologies with some real case scenario datasets.

Overall Purpose and Scope of Consultancy

Main Objective:
To test and propose new possible solutions to micro-data sharing for the humanitarian and development actors overall.

The objectives of the project are to:
• Research and test various differential privacy and synthetic data libraries on registration data.
• Measure privacy and utility results and compare them with the currently used technology (statistical disclosure methods).
• Document the process and the script code used.
• Identify the libraries that deliver the best privacy-utility balance.
• Write practical guidance about best practices and how to use these technologies.

Project deliverables:
• Scripts to test various libraries, measure privacy and utility levels and compare them with the original dataset. The code should be well documented and have meaningful code comments.
• Technical documentation on the testing process with details and results.
• Final practical guidance on the use of differential privacy and synthetic data for tabular data. It should include:
– General introduction to differential privacy and synthetic data.
– Description of libraries/packages and instruction to install and use them.
– Reusable snippets of code to use the libraries.
– Guidance on the choice of technologies parameters and eventual characteristics that a dataset should have to be used with such techs.
– Guidance on the way to identify the acceptable thresholds of risk and utility for data sharing.

Duty and Responsibilities

Main responsibilities and activities:
• Help in scoping differential privacy and synthetic data libraries and their functionalities to identify the most promising to be used on relational tabular data.
• Clean and prepare the provided datasets so that they can be used with the selected libraries.
• Write code to test differential privacy and synthetic data libraries and help setting up an evaluation framework to compare them.
• Identify the best indicators or statistics to measure the privacy and utility level obtained with each technology develop thresholds or criteria to discern what is the level needed for publication.
• Write technical paper on the process so that it can be documented and reproduced. Showcase the results and explain which technologies perform the best.
• Write practical guidance on the use of the libraries, choice of parameters and reusable code examples.
• Consult project team, and stakeholders on needs and expectations.
• Provide inputs and feedback on the project schedule and activities, as needed.

Qualification and Experience Required

• University degree in Statistics, Mathematics, Engineering, Computer Science, Information Technology, Cryptography or a related field.

Required experience and skills:
• 3 years relevant experience with Undergraduate degree; or 2 years relevant experience with Graduate degree; or 1 year relevant experience with Doctorate degree.
• Excellent understanding of privacy technologies, in particular differential privacy, synthetic data, and statistical disclosure control.
• Experience in Python and/or other statistics programming languages.
• Experience working with privacy technologies packages/libraries.
• Fluency in English.
• Experience working with privacy technologies for tabular data.
• Experience writing, commenting and documenting programming code and scripts. Please include in the application links to repositories like GitHub or GitLab to showcase your code/projects or attach them to the application.

Desirable skills and knowledge:
• Previous experience in working for UN organizations / International organizations / NGOs / Public Sector / Universities, etc.
• Previous participation in privacy technologies projects, hackathons, etc. Please include this info in the application if available.
• PhD in Statistics, Mathematics, Engineering, Computer Science, Information Technology, Cryptography or a related field.

Location and Conditions

The successful candidate will be home-based. The remuneration of the individual consultant will be calculated based on the equivalent gross National Officer salary scale of the duty station.

Shortlisted candidates might be required to sit for a written test. Only shortlisted candidates will be notified. No late applications will be accepted.

The time type and duration of the consultancy is negotiable, depending on the availability of the selected candidate.

Please note that UNHCR does not charge a fee at any stage of its recruitment process (application, interview, meeting, travelling, processing, training or any other fees).

All UNHCR workforce members must individually and collectively, contribute towards a working environment where each person feels safe, and empowered to perform their duties. This includes by demonstrating no tolerance for sexual exploitation and abuse, harassment including sexual harassment, sexism, gender inequality, discrimination, and abuse of power.

As individuals and as managers, all must be proactive in preventing and responding to inappropriate conduct, support ongoing dialogue on these matters and speaking up and seeking guidance and support from relevant UNHCR resources when these issues arise.

How to apply

Apply here.

Leave a Reply