Dr Lili Zhang

AI for Good: Aligning AI with Human Values

Submitted on Thursday, 20/06/2024

Enhancing the Cognitive and Ethical Robustness of Large Language Model-based Agents through Transdisciplinary Approaches
Dr. Lili Zhang, Insight SFI Research Centre for Data Analytics at Dublin City University

As artificial intelligence (AI) systems, particularly Large Language Models (LLMs), rapidly advance towards surpassing human cognitive capabilities, ensuring their alignment with human values and safety standards emerges as a formidable challenge. This phenomenon, known as ‘super alignment’, is critical as these systems are deployed across increasingly diverse domains, including sensitive applications such as healthcare and finance.
Our research at Insight SFI Research Centre for Data Analytics at Dublin City University (DCU) focuses on integrating principles from cognitive science to gain a deeper understanding of the behavioural dynamics and functionality of LLM-based agents. By examining how these models process, generate, and comprehend language in decision-making under uncertain contexts, the project aims to identify parallels between human cognition and artificial intelligence, enhancing the interpretability and potentially the alignment of LLM-based agents.
Cognitive Science Meets Artificial Intelligence
Cognitive science provides a rich framework for understanding the intricate processes behind human thought, decision-making, and language comprehension. By leveraging methodologies from this discipline, we aim to dissect and analyse the behaviour of LLMs in ways that reveal their underlying decision-making mechanisms. This understanding is crucial for enhancing the interpretability of these models, making their operations more transparent and predictable.
A key component of our project involves investigating the adversarial vulnerabilities of LLMs’ decision-making processes. Through experimental paradigms such as a choice-based task where options have uncertain rewards and a multi-round interaction game requiring trust and strategy, we probe LLMs to uncover their cognitive biases, risk aversion tendencies, and strategic adherence patterns. These tasks, traditionally used to study human decision-making, provide a robust platform for assessing how LLMs perform under conditions of uncertainty and strategic interaction.
Findings and Implications
Our preliminary findings reveal that LLMs often exhibit cognitive biases similar to those found in humans. For instance, they show tendencies towards risk aversion and can adhere to specific strategies that adversarial agents may exploit. This similarity between human and artificial cognition underscores the necessity of developing more advanced training methodologies that incorporate cognitive and ethical considerations.
Understanding these biases is not just an academic exercise; it has practical implications for the deployment of AI systems in real-world scenarios. For example, in financial applications, an LLM that is risk-averse might make overly conservative investment decisions, while in healthcare, biases in decision-making could affect diagnostic accuracy. By identifying and mitigating these biases, we can enhance the reliability and safety of AI systems.
Towards Enhanced Training Methodologies
To address these challenges, our research advocates for the development of training methodologies that are informed by cognitive science and ethics. This approach involves not only refining the algorithms that power LLMs but also ensuring that they are trained in environments that reflect the complexity and variability of real-world decision-making contexts. Ethical considerations are paramount in this process, as they help ensure that AI systems operate within the bounds of societal norms and values.
The Path Forward
Ultimately, our research aspires to advance the reliability, explainability, and alignment of AI systems with human cognitive and psychological patterns. By bridging the gap between cognitive science and artificial intelligence, we aim to contribute significantly to the fields of data analytics and AI. Our work promises to make AI systems not only more robust and reliable but also more aligned with human values, enhancing their safety and efficacy in diverse applications.
In conclusion, the integration of cognitive science principles into the development and training of LLM-based agents offers a promising pathway towards creating AI systems that are not only powerful but also ethically and cognitively robust. As we continue to explore this transdisciplinary approach, we look forward to uncovering new insights that will drive the next generation of AI innovations.

Insight DCU collaborator Dr. Lili Zhang and Prof. Tomas Ward Site Director of the Insight SFI Research Centre for Data Analytics at DCU focus on the intersection of cognitive science and artificial intelligence to enhance the robustness and reliability of AI systems.