SoDA Lab

Studying social phenomena
with large-scale data and AI

Led by Haewoon Kwak and Jisun An, we focus on human behavior on online platforms: its measurement, understanding, design, and the assessment of its implications. Every day we use our devices to read the news, watch videos, search for places to eat, chat with friends, and post on social media. Those electronic footprints make it possible to study both individual and collective behavior, including what people like and dislike, how they feel about different topics, and how they engage with one another. Understanding human behavior on these platforms has become essential.

We develop new computational methods and tools for understanding, predicting, and shaping human behavior online. A central challenge is the sheer diversity and complexity of online data. We work across many kinds of large-scale data, examining existing tools, surfacing their limits, and developing new measurements, machine-learning models, and linguistic methods that let us understand online behavior and address real-world problems.

But our goal is not only to solve problems. We also want to improve the online spaces themselves. We are interested in identifying obstacles to a trustworthy public space online, developing methodologies that make those obstacles transparent, building frameworks for real-time large-scale monitoring, and ultimately helping the online public square become more credible.

We are based at the Luddy School of Informatics, Computing, and Engineering at Indiana University Bloomington, and a member of the Center for Complex Networks and Systems Research (CNetS).

Featured work

All publications →

Zoher Kachwala, Bao Tran Truong, Rasika Muralidharan, Haewoon Kwak, Jisun An, Filippo Menczer

ACL · 2026

Different online communities have different rules: what gets you banned from one subreddit may be the norm in another. PluRule is a multimodal, multilingual benchmark with 13,371 rule violations across 1,989 Reddit communities, 2,885 rules, and 9 languages. Even GPT-5.2 performs only slightly better than a trivial baseline, exposing pluralistic moderation as a fundamental challenge.

Fan Huang, Haewoon Kwak, Jisun An

ACL Findings · 2026

Are LLMs persuaded out of their beliefs? Using the Source–Message–Channel–Receiver (SMCR) communication framework across six mainstream LLMs and three domains (facts, medical QA, social bias), we measure how stable each model's stated beliefs are under persuasive pressure. The smallest model flips on 82.5% of attempts at the first turn. Counterintuitively, asking models to verbalize confidence makes them more vulnerable.

Byunghwee Lee, Rachith Aiyappa, Yong-Yeol Ahn, Haewoon Kwak, Jisun An

Nature Human Behaviour, 2025 · 2025 · 16 cites

How do beliefs interconnect, and what drives a person to adopt new ones? We fine-tune large language models on online debate data to map thousands of beliefs into a semantic space where proximity reflects coherence. Position in that space predicts which beliefs an individual is likely to adopt next and quantifies cognitive dissonance via distance between existing and new beliefs.

Kunihiro Miyazaki, Taichi Murayama, Takayuki Uchiba, Jisun An, Haewoon Kwak

EPJ Data Science, 2024 · 2024 · 80 cites

What does the public actually think of generative AI? We analyze 3M tweets from 2019 to 2023 and find broad interest across occupations, not just tech. Sentiment is generally positive and tracks exposure, with one exception: illustrators are notably negative, reflecting concerns over training-data ethics.