Organization:Armed Conflict Location and Events Dataset
The Armed Conflict Location & Event Data Project (ACLED) is a research project which records, publishes and analyses disaggregated data on violent political conflict. These data are publicly available, along with analysis and information about the project, atwww.acleddata.com. These data are used for a range of purposes, from academic research on dynamics of conflict to informing diplomatic policy, and humanitarian and development work in conflict-affected contexts.
ACLED uses hundreds of individual sources to create a sourcing database from which events are extracted. ACLED therefore needs a web scraper that can go to each news website, each day, and extract articles based on an established series of words that are relevant to ACLED events. ACLED is seeking to contract someone to harvest and extract news report data from websites (not social media).
This needs to be done for each country, each day, and each source we have already identified. The output of the web crawler activity should be a country specific Excel sheet with the details including the date, title of the piece, source, body of the piece, and relevant terminology as separate columns. This should be repeated for all countries. All ACLED media sources are accessible online. All models must return all sources, stories and events highlighted with the terms.
Additional, ACLED seeks a contractor to support in the identification of an initial set of machine learning techniques to help classify and sort the media data for a higher degree of relevance to coding. This will be done based on test cases. Further details are provided on the specific tasks below.
ACLED is recruiting aData Science consultantto complete the following tasks:
1) Development of a web scraping tool
The contractor will develop a web scraper to pull news articles from a provided list of several hundred international news websites based on pre-determined search queries. Specifically, the tool in question would need to extract and isolate key information including the main body of the article, source, and date, and would need to be robust in dealing with minor changes in website format, as is likely to occur over time. Ideally this tool would work for newsites in multiple languages. The contractor will work closely with ACLED’s resident data scientist, who will continue to implement the tool after the termination of the contract. As such, coding language for this task is left to the discretion of the contractor, however Python is preferred.
2) Identification of news articles of interest (machine learning/NLP):
Once the web scraper has been completed, the contractor will develop a model to flag articles of interest within the scraped news articles. ACLED researchers will provide a set of ‘relevant’ and ‘irrelevant’ news articles for training of the model. Ideally, the contractor would develop versions of the model parametrized such as to function on non-English news sources as well, however this is not essential. The type of classification model to be used is left to the discretion of the contractor. Here, too, the contractor will work closely with ACLED’s resident data scientist, who will continue to implement the model following the termination of the contract. Python or R are preferred.
Skills and competencies
ACLED is seeking applicants who have the following skills and experience:
- Fluency in English (essential);
- Strong problem solving skills;
- Ability to work remotely with limited supervision;
- Experience using programming languages and software (R and Python, especially)
- In depth-knowledge of machine learning techniques and ability to appy them;
- Excellent written and verbal communication skills for coordinating across teams;
- Masters or PhD in Statistics, Mathematics, Computer Science or another quantitative field
- Research focus or professional experience in conflict research or peace-building (essential)
To apply, please visit ACLED’scareer portalto submit a CV and cover letter detailing qualifications and experience, along withtwo separate detailed technical and financial proposalsdetailing your proposed approach and budget/timeline for completion of tasks1 and 2.
This means your proposal should have 4 distinct components: a financial proposal for each task (separated) and a technical proposal for each task (separated).Please combine the CV and technical/financial proposals into one PDF to upload as the CV (make sure to label the sections clearly). Applications without the proposal will not be considered.
Further information on the project is available online athttp://www.acleddata.com/. Applications will be reviewed on a rolling basis. Interested candidates are advised to apply as soon as possible.