Rethinking AI Evaluation

The theme of this year’s SAIL Spring School was ‘Innovating AI Evaluation: Beyond Accuracy and Precision’. From 26 to 28 March, researchers from various disciplines discussed the evaluation of AI. While classical metrics such as accuracy and precision continue to play a role, this year’s Spring School also looked at more advanced approaches such as ethical and social implications, interpretable AI, mathematical guarantees and user evaluation.

International experts presented their latest research results in various tutorials. In addition to lectures and workshops, the event provided a platform for young researchers to present their work in the form of posters – including SAIL doctoral student Clarissa Sabrina Arlinghaus. Her research focuses on social exclusion in human-technology interaction, especially in hybrid teams where humans work together with AI systems or robots. In her poster presentation, she presents an innovative method for LLM-based coding in qualitative research. A large language model (LLM) helps to automatically sort and categorise qualitative data (e.g. interviews or open-ended responses). Compared to the statistical analysis of quantitative data, which is already available in numerical form, the analysis of qualitative data is often very time-consuming.

In a short interview at Bielefeld University before the SAIL Spring School, Clarissa gave an insight into her research and her upcoming participation.

Your research investigates social exclusion in human-technology interaction. What are the most important findings of your studies so far and what impact could they have on everyday work in hybrid teams?

What I find most important is the realisation that human-technology teams basically follow the same general social rules from interpersonal relationships, but the intensity differs. For example, in our studies of human-robot teams in the catering industry, we found that people generally want to be noticed, but the attention of human colleagues is valued more than that of robot colleagues. At the same time, people found it more threatening to be excluded by human colleagues than by robot colleagues. In other studies, we have also found that while participants generally disapprove of workplace bullying, they are significantly more likely to condemn the perpetrators and intervene in incidents with human bullies than with AI bullies. Our findings clearly show that working with technical systems is not exactly the same as working in human teams, despite what is often claimed. Work remains a central component of social participation. In the face of increasing technologisation, human contact in the workplace should be focused on and prioritised in hybrid teams, as it has the greatest influence on the satisfaction and threat of social needs. This is the only way to maintain the psychological well-being of employees in the long term and prevent serious individual or organisational consequences – such as prolonged absence due to depression.

You are presenting a new method for LLM-supported coding of qualitative data at the Spring School. What challenges are there in qualitative AI research and how can your method help to solve them?

The analysis of qualitative data has always been very time consuming. At the same time, qualitative data often provides valuable insights that can be extremely helpful in interpreting quantitative data. In my personal opinion, the number of research results that refer to both quantitative and qualitative data is increasing. At the same time, however, I have also had the impression that the analysis of qualitative data is often a major challenge, which in cases of doubt leads to qualitative data being neglected for capacity reasons and potential insights being lost. With our method for LLM-assisted coding of qualitative data, we offer a solution to this problem. With this method, a lot of qualitative data can be summarised into inductive categories in a short time and with high quality. It is possible to run replications, use different LLMs, or adapt LLMs to generate more and more robust solutions than with traditional coding methods. We provide templates and instructions to make our method accessible to people without programming knowledge. This makes the method easy to use even without prior experience.

As a doctoral student at Bielefeld University, you are researching in an environment with strong interdisciplinary networking. What significance does the SAIL Spring School have for your academic networking and further development?

I enjoy the interdisciplinary exchange and find it very enriching. You can see what other researchers are working on and get inspiration for your own new projects. Sometimes interesting opportunities for collaboration arise. In general, I think it’s important in science to build up a good network. Events like the SAIL Spring School can be very helpful. That’s why I’m looking forward to participating.