Research - 10.11.2021 - 00:00 

What instant messages can reveal about users’ demographics and what this means for data privacy

A study from the University of St.Gallen suggests that language use in instant messages allows for the prediction of users’ age and gender. Digital language footprints left from private instant messages, such as WhatsApp, could give technology companies a way to profile and target users beyond demographics.
Source: HSG Newsroom

10 November 2021. With the ubiquitous use of smartphones globally, text messaging has become one of the most prevalent types of communication and, therefore, one of the most common types of digital data. Research from Prof. Dr. Clemens Stachl from the Institute of Behavioral Science and Technology (IBT) at the University of St.Gallen (HSG), Timo Koch at Ludwig-Maximilians-Universität München and Peter Romero at Keio University shows that there are distinct language differences between gender and age groups that allow for the prediction of user demographics using machine learning algorithms.

The study analyzed more than 300,000 WhatsApp messages from 226 German volunteers. Some of the key findings regarding age and gender differences showed that:

  • Younger participants utilized 1st person singular more frequently (e.g., “I” or “me”), informal language (for example the German “geil” which translates to “hot/great”), and emoticons (e.g., “:)”) in their messages.
  • Female users tended to use emoji more frequently and employed a broader range of emoji than men. Further, women incorporated more function words, particularly personal pronouns in 1st person singular, in their messages.
  • Men, on the other hand, scored higher on the "analytic thinking" language summary variable. It is calculated from the use of, for example, personal pronouns and conjunctions. High values in "analytic thinking" could be indicative of a rather logical and hierarchical thinking style in contrast to a personal, here-and-now, and narrative thinking.

This study indicates that private instant messages could be more predictive of user characteristics than pubic social media posts because users engage in more self-disclosure in their messages. As a consequence, digital language footprints in instant messages would allow tech firms to profile users and could threaten individual privacy rights beyond user demographics. Given the overall trend away from public posting and towards private communication, these findings open up many questions about how instant messaging data should be protected.

The corresponding paper has been published in Computers in Human Behavior and is openly accessible via:

Institute of Behavioral Science and Technology (IBT-HSG)

The relationship between human beings and technology is constantly evolving. Thus, in 2021, HSG founded the Institute of Behavioral Science and Technology (IBT-HSG), a branched-out division from the marketing institute. The newly founded institute specializes in interdisciplinary research that delivers evidence-based insight. The goal is to better understand the relationship between humans and society and further to predict, and to understand their implications for individuals, organizations, and society at large.

Spearheading the institute are Christian Hildebrand, Clemens Stachl, and Emanuel De Bellis. Together, they explore research areas such as robotics, conversational AI, mobile sensing and digital ethics.

For more information please contact:

Prof. Dr. Clemens Stachl, Associate Professor of Behavioral Science, University of St.Gallen (HSG)

Timo Koch, Department of Psychology, Ludwig-Maximilians-Universität München

Image: Adobe Stock / Alex Ruhl