SURC 2025 Student Presentations
SUNY Undergraduate Research Conference Student Presentations

Explanation-based automated reference data

Authors: Lily Plotkin, Phung Lai

SUNY Campus: University at Albany

Presentation Type: Poster

Location: Old Union Hall

Presentation #: 38

Timeslot: Session C 1:45-2:45 PM

Abstract: Large language models (LLMs) have demonstrated remarkable capabilities in tasks such as text generation, machine translation, and knowledge understanding. To ensure that LLMs operate effectively and align with human values, ethical standards, and specific objectives, it is crucial to train them with constraints based on reference data. These datasets typically consist of a mixture of “normal” and “harmful” data, designed to guide the LLMs in distinguishing between appropriate and inappropriate behaviors or outputs. However, the creation of such reference data is often a labor-intensive process, typically involving manual human labeling, which can be time-consuming, error-prone, and limited in scope. To address these issues, we propose a new mechanism based on explainable AI (XAI) techniques that can automate the generation of reference data for model alignment. By leveraging explainability, this approach ensures that the selection and generation of reference data are transparent, interpretable, and aligned with ethical principles. The proposed mechanism can also dynamically update and refine the reference data, enabling continuous adaptation to new information or changes in societal values without the need for constant human intervention. This can improve both the efficiency and effectiveness of LLM alignment processes, making it possible to build more responsible, fair, and ethically sound AI systems. Experimental results show that our approach significantly improves the efficiency and accuracy of reference data generation over traditional manual methods. These findings demonstrate the potential of XAI-driven automation to create more responsible, fair, and scalable AI systems.