Research Challenge 5: Data Engineering and Governance
Team Lead: Mark Roantree
Strand 5.1: Data Discovery — John Breslin (University of Galway)
Strand 5.2: Data Integration — Mark Roantree (DCU)
Strand 5.3: Data Exploitability — Andrew McCarren (DCU)
Research Challenge 5 focuses on fundamental data engineering and governance aspects that underpin data science research. Data engineering is about acquiring, transforming and preparing data, while governance is focused on surfacing the value contained in data. Without innovation and a quality-based approach to both of these key activities, the accuracy and integrity of the data science process cannot be guaranteed. Therefore, this research challenge seeks to contribute leading edge research addressing the following challenges:
- Managing the Heterogeneity of data, especially data rich in semantics and connected data from many different sources.
- Addressing the Complexity of data where solutions require data engineering for complex software infrastructures.
- Ensuring the Quality of data where large, rich, connected data leads to incompleteness, biases, errors, etc. Developing protocols for the Governance of data to address issues such as data provenance, rights and workflows in a connected infrastructure.
- Therefore, through a shared graph-based data model and ETL infrastructure, Research Challenge 5 delivers widely applicable solutions for Data Discoverability, Integration and Exploitability.