It’s Insight Communities month and Prof Noel O’Connor considers the definition of community for a data scientist
Mimi Ọnụọha is a Nigerian-American artist and researcher whose work highlights the social relationships and power dynamics behind data collection. One of her works is a white filing cabinet containing labelled files, called ‘The Library of Missing Datasets’.
If you look in the file labelled ‘Mobility for older adults with physical disabilities’ you will find it empty. Empty too is the file labelled ‘Poverty and employment statistics that include people behind bars’. In fact, all the files are empty.
The Library of Missing Datasets is a physical repository of communities that have been excluded in a society where so much is collected. It represents groups of people who are not represented in US datasets.
We have plenty of empty datasets in Ireland too. Data scientists like us must challenge ourselves to imagine the communities whose data is not collected and to find ways to draw more complete data maps. Any other AI path cannot call itself intelligent.
What are we doing in Insight to enable underrepresented communities to generate, capture and analyse their own data? How are we ensuring that the AI systems we create reach and empower these communities? How are we harnessing AI connectivity to bring isolated communities in from the cold?
Datasets come in myriad forms – maps and GPS systems are two everyday examples. Maps are not neutral: mapmakers and GPS developers select which features to document. When you type a destination into a GPS system, it will commonly give you travel options for cars, bikes, pedestrians and public transport. However, if you are from the community of wheelchair users, or from the visually impaired community, your travel options are not explicitly featured. Here is a perfect example of missing data that excludes certain communities.
At Insight we have developed a project called Crowd4Access that empowers people with disabilities to collect data on accessible routes through their own cities and to input that data into a GPS platform for others to use. We hope that over time this project will spread beyond the cities and become a common resource for all to use. The process of data collection, which is citizen-led, is already mobilising communities – we have Crowd4Access communities popping up across the country.
Insight has for some years been collecting novel datasets of young people’s fundamental literacy skills (FLS), as part of the Moving Well Being Well project. These skills are important because low levels of FLS correlate with low levels of physical activity. We have established that many young people need FLS training to help them to participate in healthy physical activity. We are now turning our lens on children and adults with visual impairment. In partnerships with Vision Sports Ireland, we are building a unique dataset of FLS in children with visual impairment so that we can develop skills training programmes that address their specific needs. This dataset should prove valuable to communities of visually impaired people all over the world.
Traction is an EU collaborative project that aims to provide a bridge between opera professionals and specific communities at risk of exclusion. A toolset is being designed and developed to foster democratisation of opera, using technology as a means to reach new audiences and to connect artists with audiences. The team in Insight has been working on algorithms for the best overall playback experience for the Cocreation Stage, a tool that allows multiple stages to perform opera together. The technology is being trialled with disadvantaged communities, including socially deprived areas in Spain and young offenders in Portugal.
These are just three projects. One of the challenges of reaching underrepresented communities is that we don’t always know where to look. SFI’s Creating our Future initiative took to the road this year to hear directly from communities and what they want from the national research effort. We look forward to hearing the results of that initiative and remain committed to drawing as many communities as possible into data research to form new connections, to develop more democratic AI systems and to make AI truly intelligent.