Insight at UCD’s Dr Brian Mac Namee and TU Dublin’s Prof Sarah Jane Delany are collaborating on the AL4AI project. Dr Mac Namee explains below:c
The availability of a large corpus of labelled training data is a key component in developing effective machine learning models and this remains a bottleneck in the model development life-cycle. This is particularly the case when labels are time-consuming or expensive to obtain, or when high-level domain expertise is required for labelling (for example in medical imaging projects). The difficulty of creating large, labelled dataset for machine learning can be greatly reduced using transfer learning and active learning, machine learning techniques that allow models to be built using a limited amount of training data.
The AL4AI project is developing an online, open-source, extensible, active labelling platform that can be used for image and text data. The platform will incorporate modern active learning and semi-supervised learning methods including a range of selection strategies and label propagation techniques, an intuitive interface for manual artefact labelling, and a range of data representation techniques that can be used through transfer learning within the active learning process. The platform will also use techniques for identifying and mitigating bias in training data sets to enhance the selection strategies used with the aim of delivering less biased labelled training data.