Enhancing Research Centre profiling through Natural Language Processing: Gender representation in a large research institute
Sampritha Manjunath, Paul Buitelaar
Efficiently identifying and characterising the research focus of a large research centre is pivotal for promoting collaboration and synergy among its members as well as fostering connections beyond the Centre’s confines. Leveraging Natural Language Processing techniques for this purpose holds significant promise. In this study Inisght researchers Sampritha Manjunath (pictured) and Paul Buitelaar of the University of Galway employ topic extraction methods on the titles and abstracts of scientific publications authored by members of the research centre to delineate their respective areas of research and further analyse the gender representation among the areas of research.
The member data used in this study is captured using Insight’s own reporting tool, Performance Update Presentation System’ (PUPS). Paul Buitelaar and Sampritha Manjunath teamed up with Insight members Derek Greene and Eoghan Cunningham of UCD on Semantic Scholar and Scopus to extract members’ publication details, including title and abstract. The study covered 423 female centre members and 987 male centre members.
The ‘Saffron’ tool was used to extract the topic of interest. The researchers considered 20 topics for this study. The authors’ information, including gender information, extracted from PUPS, was mapped to the publication data.
The researchers analysed this data under the three headings:
Gender balance in overall top 20 topics
Gender balance in top 20 topics extracted for female cohort
Gender balance in top 20 topics for male cohort
Some of the topics that came closest to parity between male and female researchers were ‘Physical Activity’ and ‘Quality of Life’. The involvement of female researchers was lowest compared to male researchers in fields such as of ‘Natural Language Processing’ and ‘Recommender Systems’.
The next step for researchers Buitelaar and Manjunath is to obtain updated data and further enhance the research study by including collaboration among cohorts and first and last author analysis.