Publications

Rematch: Robust and Efficient Matching of Local Knowledge Graphs to Improve Structural and Semantic Similarity

AI/ML/NLP

How can we quickly and accurately compare the knowledge graphs that structure a sentence’s meaning? Existing metrics have limitations, often failing to capture true semantic similarity while being computationally expensive. We developed REMATCH, a new and efficient metric that addresses this by extracting core semantic elements called ‘motifs’ and comparing their collective sets. REMATCH measures semantic similarity 1–5 percentage points more accurately than previous state-of-the-art metrics, and it is five times faster. This model can play a key role in building more sophisticated natural language understanding systems and directly benefits downstream applications that rely on analyzing semantic relationships between sentences.

Zoher Kachwala, Jisun An, Haewoon Kwak, Filippo Menczer

NAACL Findings, 2024

ChatGPT Rates Natural Language Explanation Quality Like Humans: But on Which Scales?

AI/ML/NLP HCI

How can we evaluate the quality of Natural Language Explanations for decisions made by AI? Direct human evaluation is accurate, but it is a difficult, time-consuming, and expensive process. We experimented to see if ChatGPT can evaluate AI explanations for ‘informativeness’ and ‘clarity’ like expert annotators, and how its judgment aligns with humans across various scales. The results show that ChatGPT performs very similarly to humans when evaluating explanations into broad categories like “good/bad,” but struggles to assign fine-grained scores from 1 to 7. Notably, in ‘pairwise comparison’ tasks—judging which of two explanations is better—it demonstrated high accuracy comparable to human experts. This research shows that Large Language Models can be used as reliable and efficient tools to supplement human evaluators under specific conditions, which can accelerate the development of transparent and responsible AI systems.

Fan Huang, Haewoon Kwak, Kunwoo Park, Jisun An

LREC-COLING, 2024

The Impact of Toxic Trolling Comments on Anti-vaccine YouTube Videos

computational social science AI/ML/NLP social media

What is the true impact of toxic trolling in the comment sections of anti-vaccine YouTube videos? While it’s widely believed that such comments spread fear and fuel vaccine hesitancy, there has been little empirical evidence to measure this effect. Our latest study tackles this question by analyzing the complex interplay between toxicity and fear across 484 anti-vaccine videos and more than 414,000 of their comments. Using machine learning to score each comment, we found a significant link between the overall toxicity of a video’s comment section and the level of fear expressed within it. More importantly, we discovered a powerful contagion effect; the toxicity of highly liked early comments was significantly associated with a rise in fear in subsequent comments. This influence was also found to be bidirectional, as highly liked fearful comments were linked to an increase in later toxicity.

Kunihiro Miyazaki, Takayuki Uchiba, Haewoon Kwak, Jisun An, Kazutoshi Sasahara

Scientific Reports, 2024

Public Perception of Generative AI on Twitter: An Empirical Study Based on Occupation and Usage

computational social science AI/ML/NLP social media

The emergence of generative AI has sparked substantial discussions, with the potential to have profound impacts on society in all aspects. As emerging technologies continue to advance, it is imperative to facilitate their proper integration into society, managing expectations and fear. This paper investigates users’ perceptions of generative AI using 3M posts on Twitter from January 2019 to March 2023, especially focusing on their occupation and usage. We find that people across various occupations, not just IT-related ones, show a strong interest in generative AI. The sentiment toward generative AI is generally positive, …

Kunihiro Miyazaki, Taichi Murayama, Takayuki Uchiba, Jisun An, Haewoon Kwak

EPJ Data Science, 2024

Press coverage-Blockchain News

Enhancing Spatiotemporal Traffic Prediction through Urban Human Activity Analysis

computational social science AI/ML/NLP

Traffic prediction is one of the key elements to ensure the safety and convenience of citizens. Existing traffic prediction models primarily focus on deep learning architectures to capture spatial and temporal correlation. They often overlook the underlying nature of traffic. Specifically, the sensor networks in most traffic datasets do not accurately represent the actual road network exploited by vehicles, failing to provide insights into the traffic patterns in urban activities. To overcome these limitations, we propose an improved traffic prediction method based on graph convolution deep learning algorithms. …

Sumin Han, Youngjun Park, Minji Lee, Jisun An, Dongman Lee

ACM CIKM, 2023

Can We Trust the Evaluation on ChatGPT?

AI/ML/NLP

ChatGPT, the first large language model (LLM) with mass adoption, has demonstrated remarkable performance in numerous natural language tasks. Despite its evident usefulness, evaluating ChatGPT’s performance in diverse problem domains remains challenging due to the closed nature of the model and its continuous updates via Reinforcement Learning from Human Feedback (RLHF). We highlight the issue of data contamination in ChatGPT evaluations, with a case study of the task of stance detection. We discuss the challenge of preventing data contamination and ensuring fair model evaluation in the age of closed and continuously trained models.

Rachith Aiyappa, Jisun An, Haewoon Kwak, Yong-Yeol Ahn

TrustNLP (Collocated with ACL), 2023

100+ papers citing this work (Google scholar)

Wearing Masks Implies Refuting Trump?: Towards Target-specific User Stance Prediction across Events in COVID-19 and US Election 2020

computational social science AI/ML/NLP social media

People who share similar opinions towards controversial topics could form an echo chamber and may share similar political views toward other topics as well. The existence of such connections, which we call connected behavior, gives researchers a unique opportunity to predict how one would behave for a future event given their past behaviors. In this work, we propose a framework to conduct connected behavior analysis. Neural stance detection models are trained on Twitter data collected on three seemingly independent topics, i.e., wearing a mask, racial equality, and Trump, to detect people’s stance, …

Hong Zhang, Haewoon Kwak, Wei Gao, Jisun An

ACM WebSci, 2023

5+ papers citing this work (Google scholar)

Political Honeymoon Effect on Social Media: Characterizing Social Media Reaction to the Changes of Prime Minister in Japan

computational social science social media

New leaders in democratic countries typically enjoy high approval ratings immediately after taking office. This phenomenon is called the honeymoon effect and is regarded as a significant political phenomenon; however, its mechanism remains underexplored. Therefore, this study examines how social media users respond to changes in political leadership in order to better understand the honeymoon effect in politics. In particular, we constructed a 15-year Twitter dataset on eight change timings of Japanese prime ministers consisting of 6.6M tweets and analyzed them in terms of sentiments, topics, and users. …

Kunihiro Miyazaki, Taichi Murayama, Akira Matsui, Masaru Nishikawa, Takayuki Uchiba, Haewoon Kwak, Jisun An

ACM WebSci, 2023

5+ papers citing this work (Google scholar)

Is ChatGPT better than Human Annotators? Potential and Limitations of ChatGPT in Explaining Implicit Hate Speech

AI/ML/NLP online harm

Recent studies have alarmed that many online hate speeches are implicit. With its subtle nature, the explainability of the detection of such hateful speech has been a challenging problem. In this work, we examine whether ChatGPT can be used for providing natural language explanations (NLEs) for implicit hateful speech detection. We design our prompt to elicit concise ChatGPT-generated NLEs and conduct user studies to evaluate their qualities by comparison with human-generated NLEs. We discuss the potential and limitations of ChatGPT in the context of implicit hateful speech research.

Fan Huang, Haewoon Kwak, Jisun An

WWW Companion, 2023

340+ papers citing this work (Google scholar)

Chain of Explanation: New Prompting Method to Generate Higher Quality Natural Language Explanation for Implicit Hate Speech

AI/ML/NLP online harm

Recent studies have exploited advanced generative language models to generate Natural Language Explanations (NLE) for why a certain text could be hateful. We propose the Chain of Explanation (CoE) Prompting method, using the heuristic words and target group, to generate high-quality NLE for implicit hate speech. We improved the BLUE score from 44.0 to 62.3 for NLE generation by providing accurate target information. We then evaluate the quality of generated NLE using various automatic metrics and human annotations of informativeness and clarity scores.

Fan Huang, Haewoon Kwak, Jisun An

WWW Companion, 2023

'This is Fake News': Characterizing the Spontaneous Debunking from Twitter Users to COVID-19 False Information

computational social science computational journalism social media online harm

False information spreads on social media, and fact-checkingis a potential countermeasure. However, there is a severeshortage of fact-checkers; an efficient way to scale fact-checking is desperately needed, especially in pandemics likeCOVID-19. In this study, we focus on spontaneous debunk-ing by social media users, which has been missed in exist-ing research despite its indicated usefulness for fact-checkingand countering false information. Specifically, we character-ize the tweets with false information, or fake tweets, thattend to be debunked and Twitter users who often debunk faketweets.

Kunihiro Miyazaki, Takayuki Uchiba, Kenji Tanaka, Jisun An, Haewoon Kwak, Kazutoshi Sasahara

AAAI ICWSM, 2023

You Have Earned a Trophy: Characterize In-Game Achievements and Their Completions

game analytics HCI user engagement

Achievement systems have been actively adopted in gaming platforms to maintain players’ interests. Among them, trophies in PlayStation games are one of the most successful achievement systems. While the importance of trophy design has been casually discussed in many game developers’ forums, there has been no systematic study of the historical dataset of trophies yet. In this work, we construct a complete dataset of PlayStation games and their trophies and investigate them from both the developers’ and players’ perspectives.

Haewoon Kwak

ACM WebSci, 2022

MAANG? MANGA? Characterizing Spontaneous Ideation Contest on Social Media

computational social science network science social media

Social media is not only a place for people to communicate on a daily matter but also a virtual venue to transmit and exchange various ideas. Such ideas are known as the raw voices of potential consumers, which come from a wide range of people who may not participate in consumer surveys, and therefore their opinions may contain high value to companies. However, how users share their ideas on social media is still underexplored. This study investigates a spontaneous ideation contest about a generic term for new Big Tech companies, which occurred when Facebook changed its name to Meta. We constructed a comprehensive dataset of tweets containing candidates and examined how they were suggested, spread, and exchanged by social media users. Our findings indicate that different ideas are better on different metrics. The ranking of ideas was not decided immediately after the idea contest started. The first people to post ideas have smaller followers than those who post secondarily or who only share the idea. We also confirmed that replies accumulate unique ideas, but most of them are added in the first depth in reply trees. This study would promote the use of social media as a part of open innovation and co-creation processes in the industry.

Kunihiro Miyazaki, Takayuki Uchiba, Haewoon Kwak, Jisun An

IEEE BigData, 2022 (short)

Modeling Political Activism around Gun Debate via Social Media

computational social science political science network science social media

The United States have some of the highest rates of gun violence among developed countries. Yet, there is a disagreement about the extent to which firearms should be regulated. In this study, we employ social media signals to examine the predictors of offline political activism, at both population and individual level. We show that it is possible to classify the stance of users on the gun issue, especially accurately when network information is available. Alongside socioeconomic variables, network information such as the relative size of the two sides of the debate is also predictive of state-level gun policy. On individual level, we build a statistical model using network, content, and psycho-linguistic features that predicts real-life political action, and explore the most predictive linguistic features. Thus, we argue that, alongside demographics and socioeconomic indicators, social media provides useful signals in the holistic modeling of political engagement around the gun debate.

Yelena Mejova, Jisun An, Gianmarco De Francisci Morales, Haewoon Kwak

ACM Transactions on Social Computing, 2022

5+ papers citing this work (Google scholar)

Storm the Capitol: Linking Offline Political Speech and Online Twitter Extra-Representational Participation on QAnon and the January 6 Insurrection

computational social science AI/ML/NLP political science social media online harm

The transfer of power stemming from the 2020 presidential election occurred during an unprecedented period in United States history. Uncertainty from the COVID-19 pandemic, ongoing societal tensions, and a fragile economy increased societal polarization, exacerbated by the outgoing president’s offline rhetoric. As a result, online groups such as QAnon engaged in extra political participation beyond the traditional platforms. This research explores the link between offline political speech and online extra-representational participation by examining Twitter within the context of the January 6 insurrection. Using a mixed-methods approach of quantitative and qualitative thematic analyses, the study combines offline speech information with Twitter data during key speech addresses leading up to the date of the insurrection; exploring the link between Trump’s offline speeches and QAnon’s hashtags across a 3-day timeframe. We find that links between online extra-representational participation and offline political speech exist. This research illuminates this phenomenon and offers policy implications for the role of online messaging as a tool of political mobilization.

Claire Seungeun Lee, Juan Merizalde, John D. Colautti, Jisun An and Haewoon Kwak

Frontiers in Sociology, 2022

Press coverage-PsyPost

Measuring 9 Emotions of News Posts from 8 News Organizations across 4 Social Media Platforms for 8 Months

computational journalism social media user engagement

Using Plutchik’s wheel of emotions framework, we identify the emotional content of 133,487 social media posts and the audience’s emotional engagement expressed in 2,824,162 comments on those posts. We measure nine emotions (anger, anticipation, anxiety, disgust, joy, fear, sadness, surprise, trust) and two sentiments (positive and negative) using two extraction resources (EmoLex, LIWC) for eight major news outlets across four social media platforms (Facebook, Instagram, Twitter, and YouTube) during eight months. We then apply two approaches (Logistic Regression, Long Short-Term Memory) to predict emotional audience reactions before and after publishing the posts. …

Kholoud Khalil Aldous, Jisun An, Bernard J. Jansen

ACM Transactions on Social Computing, 2022

Understanding Toxicity Triggers on Reddit in the Context of Singapore

computational social science AI/ML/NLP social media online harm

While the contagious nature of online toxicity sparked increasing interest in its early detection and prevention, most of the literature focuses on the Western world. In this work, we demonstrate that 1) it is possible to detect toxicity triggers in an Asian online community, and 2) toxicity triggers can be strikingly different between Western and Eastern contexts.

Yun Yu Chong, Haewoon Kwak

Proceedings of the 16th International AAAI Conference on Web and Social Media (ICWSM), 2022 (short)

Press coverage-AI Ethics Brief Newsletter by Montreal AI Ethics Institute

Who Is Missing? Characterizing the Participation of Different Demographic Groups in a Korean Nationwide Daily Conversation Corpus

AI/ML/NLP dataset/tool bias/fairness

A conversation corpus is essential to build interactive AI applications. However, the demographic information of the participants in such corpora is largely underexplored mainly due to the lack of individual data in many corpora. In this work, we analyze a Korean nationwide daily conversation corpus constructed by the National Institute of Korean Language (NIKL) to characterize the participation of different demographic (age and sex) groups in the corpus.

Haewoon Kwak, Jisun An, Kunwoo Park

Proceedings of the 16th International AAAI Conference on Web and Social Media (ICWSM), 2022 (short)

What really matters?: characterising and predicting user engagement of news postings using multiple platforms, sentiments and topics

computational journalism social media user engagement

This research characterises user engagement of approximately 3,000,000 news postings of 53 news outlets and 50,000,000 associated user comments during 8 months on 5 social media platforms (i.e. Facebook, Instagram, Twitter, YouTube, and Reddit). We investigate the effect of sentiments and topics on user engagement across four levels of user engagement expressions (i.e. views, likes, comments, cross-platform posting). We find that sentiments and topics differ by both news outlets and social media platforms, and both sentiments and topics by the four levels of user engagement expression. …

Kholoud Khalil Aldous, Jisun An, Bernard J. Jansen

Behaviour & Information Technology, 2022

Predicting Anti-Asian Hateful Users on Twitter during COVID-19

computational social science AI/ML/NLP social media online harm

We investigate predictors of anti-Asian hate among Twitter users throughout COVID-19. With the rise of xenophobia and polarization that has accompanied widespread social media usage in many nations, online hate has become a major social issue, attracting many researchers. Here, we apply natural language processing techniques to characterize social media users who began to post anti-Asian hate messages during COVID-19. We compare two user groups – those who posted anti-Asian slurs and those who did not – with respect to a rich set of features measured with data prior to COVID-19 and show that it is possible to predict who later publicly posted anti-Asian slurs. …

Jisun An, Haewoon Kwak, Claire Seungeun Lee, Bogang Jun, Yong-Yeol Ahn

Findings of the Association for Computational Linguistics EMNLP 2021

Code repo (github)

30+ papers citing this work (Google scholar)

Precision Public Health Campaign: Delivering Persuasive Messages to Relevant Segments Through Targeted Advertisements on Social Media

social media user engagement

We propose a novel precision public health campaign framework to structure and standardize the process of designing and delivering tailored health messages to target particular population segments using social media–targeted advertising tools. Our framework consists of five stages - defining a campaign goal, priority audience, and evaluation metrics; splitting the target audience into smaller segments; tailoring the message for each segment and conducting a pilot test; running the health campaign formally; and evaluating the performance of the campaigns. We have demonstrated how the framework works through 2 case studies. The precision public health campaign framework has the potential to support higher population uptake and engagement rates by encouraging a more standardized, concise, efficient, and targeted approach to public health campaign development.

Jisun An, Haewoon Kwak, Hanya M Qureshi, Ingmar Weber

JMIR Form Res 2021;5(9):e22313, 2021

25+ papers citing this work (Google scholar)

FrameAxis: characterizing microframe bias and intensity with word embedding

computational social science AI/ML/NLP dataset/tool bias/fairness

Framing is a process of emphasizing a certain aspect of an issue over the others, nudging readers or listeners towards different positions on the issue even without making a biased argument. Here, we propose FrameAxis, a method for characterizing documents by identifying the most relevant semantic axes (“microframes”) that are overrepresented in the text using word embedding. Our unsupervised approach can be readily applied to large datasets because it does not require manual annotations. …

Haewoon Kwak, Jisun An, Elise Jing, Yong-Yeol Ahn

PeerJ Computer Science 7:e644, 2021

Code repo (github)

Populist Supporters on Reddit: A Comparison of Content and Behavioral Patterns Within Publics of Supporters of Donald Trump and Hillary Clinton

computational social science political science social media

We investigate differences along these dimensions on the online forum Reddit by comparing linguistic patterns and content of comments in two subreddits focusing on a populist, Donald Trump (/r/The_Donald), and a center-left politician, Hillary Clinton (/r/hillaryclinton), during the 2016 U.S. presidential election campaign.

Andreas Jungherr, Oliver Posegga, Jisun An

Social Science Computer Review. March 2021.

How-to Present News on Social Media: A Causal Analysis of Editing News Headlines for Boosting User Engagement

computational journalism AI/ML/NLP social media user engagement

We first build a parallel corpus of original news articles and their corresponding tweets that were shared by eight media outlets. Then, we explore how those media edited tweets against original headlines, and the effects would be..

Kunwoo Park, Haewoon Kwak, Jisun An, Sanjay Chawla

Proceedings of the 15th International AAAI Conference on Web and Social Media (ICWSM), 2021

A Systematic Media Frame Analysis of 1.5 Million New York Times Articles from 2000 to 2017

computational journalism AI/ML/NLP bias/fairness

Framing is an indispensable narrative device for news media because even the same facts may lead to conflicting understandings if deliberate framing is employed. By developing a media frame classifier that achieves state-of-the-art performance, we systematically analyze the media frames of 1.5 million New York Times articles published from 2000 to 2017.

Haewoon Kwak, Jisun An, Yong-Yeol Ahn

Proceedings of the 12th ACM Conference on Web Science (WebSci), 2020

40+ papers citing this work (Google scholar)

Identifying and Characterizing Alternative News Media on Facebook

computational journalism network science social media bias/fairness

In this work, we propose a graph-based semi-supervised method to measure the political bias of pages on most countries and show the political split of the alternative media, mainstream media, and public figures pages. We validate our method using the publicly available U.S. dataset and then apply it to Brazilian pages, where we found a larger number of right-wing pages in general, except for alternative news media.

Samuel S Guimarães, Julio CS Reis, Lucas Lima, Filipe N Ribeiro, Marisa Vasconcelos, Jisun An, Haewoon Kwak, Fabrício Benevenuto

IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), 2020

What Was Written vs. Who Read It: News Media Profiling Using Text Analysis and Social Media Context

computational journalism AI/ML/NLP social media bias/fairness

Predicting the political bias and the factuality of reporting of entire news outlets are critical elements of media profiling, which is an understudied but an increasingly important research direction. The present level of proliferation of fake, biased, and propagandistic content online has made it impossible to fact-check every single suspicious claim, either manually or automatically. Thus, it has been proposed to profile entire news outlets and to look for those that are likely to publish fake or biased content. This makes it possible to detect likely “fake news” the moment they are published, by simply checking the reliability of their source. From a practical perspective, political bias and factuality of reporting have a linguistic aspect but also a social context.

Ramy Baly, Georgi Karadzhov, Jisun An, Haewoon Kwak, Yoan Dinkov, Ahmed Ali, James Glass, Preslav Nakov

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL) (2020)

55+ papers citing this work (Google scholar)

Empirical Evaluation of Three Common Assumptions in Building Political Media Bias Datasets

computational journalism AI/ML/NLP dataset/tool bias/fairness

We empirically validate three common assumptions in building political media bias datasets, which are (i) labelers’ political leanings do not affect labeling tasks, (ii) news articles follow their source outlet’s political leaning, and (iii) political leaning of a news outlet is stable across different topics.

Soumen Ganguly, Juhi Kulshrestha, Jisun An, Haewoon Kwak

Proceedings of the 14th International AAAI Conference on Web and Social Media (ICWSM), 2020

25+ papers citing this work (Google scholar)

“Trust Me, I Have a Ph.D.”: A Propensity Score Analysis on the Halo Effect of Disclosing One's Offline Social Status in Online Communities

computational social science AI/ML/NLP social media user engagement

We study two Reddit communities that adopted this scheme, whereby posts include tags identifying education status referred to as flairs, and we examine how the “transferred” social status affects the interactions among the users.

Kunwoo Park, Haewoon Kwak, Hyunho Song, Meeyoung Cha

Proceedings of the 14th International AAAI Conference on Web and Social Media (ICWSM), 2020

Are These Comments Triggering? Predicting Triggers of Toxicity in Online Discussions

computational social science AI/ML/NLP social media online harm

We define toxicity triggers in online discussions as a non-toxic comment that lead to toxic replies. Then, we build a neural network-based prediction model for toxicity trigger.

Hind Almerekhi, Haewoon Kwak, Bernard Jim Jansen, Joni Salminen (short)

Proceedings of The Web Conference (WWW), 2020

50+ papers citing this work (Google scholar)

Going beyond accuracy: estimating homophily in social networks using predictions

computational social science

We show that estimating homophily in a network can be viewed as a dyadic prediction problem, and that homophily estimates are unbiased when dyad-level residuals sum to zero in the network. Then, we propose a novel “ego-alter” modeling approach that outperforms standard node and dyad classification strategies.

George Berry, Antonio Sirianni, Ingmar Weber, Jisun An, Michael Macy (preprint)

arXiv preprint arXiv:2001.11171, 2020

Tanbih: Get To Know What You Are Reading

computational journalism AI/ML/NLP bias/fairness dataset/tool

We introduce Tanbih, a news aggregator with intelligent analysis tools to help readers understanding what’s behind a news story. Our system displays news grouped into events and generates media profiles that show the general factuality of reporting, the degree of propagandistic content, hyper-partisanship, leading political ideology, general frame of reporting, and stance with respect to various claims and topics of a news outlet.

Yifan Zhang, Giovanni Da San Martino, Alberto Barrón-Cedeño, Salvatore Romeo, Jisun An, Haewoon Kwak, Todor Staykovski, Israa Jaradat, Georgi Karadzhov, Ramy Baly, Kareem Darwish, James Glass, Preslav Nakov (demo)

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019

Gender and Racial Diversity in Commercial Brands' Advertising Images on Social Media

computational social science social media bias/fairness

Gender and racial diversity in the mediated images from the media shape our perception of different demographic groups. In this work, we investigate gender and racial diversity of 85,957 advertising images shared by the 73 top international brands on Instagram and Facebook.

Jisun An, Haewoon Kwak

Proceedings of Social Informatics (SocInfo), 2019

Best Paper Award

Political Discussions in Homogeneous and Cross-Cutting Communication Spaces

computational social science political science social media

We use Reddit to explore the nature of political discussionsin homogeneous and cross-cutting communication spaces. Inparticular, we develop an analytical template to studyinter-actionandlinguistic patternswithin and between politicallyhomogeneous and heterogeneous communication spaces. Ouranalyses reveal different behavioral patterns in homogeneousand cross-cutting communications spaces.

Jisun An, Haewoon Kwak, Oliver Posegga, Andreas Jungherr

Proceedings of the 13th International AAAI Conference on Web and Social Media (ICWSM), 2019

65+ papers citing this work (Google scholar)

View, Like, Comment, Post: Analyzing User Engagement by Topic at 4 Levels across 5 Social Media Platforms for 53 News Organizations

computational journalism social media user engagement

We evaluate the effects of the topics of social media posts on audiences across five social media platforms (i.e., Facebook, Instagram, Twitter, YouTube, and Reddit) at four levels of user engagement. We collected 3,163,373 social posts from 53 news organizations across five platforms during an 8month period.

Kholoud Khalil Aldous, Jisun An, Bernard J. Jansen

Proceedings of the 13th International AAAI Conference on Web and Social Media (ICWSM), 2019

85+ papers citing this work (Google scholar)

Discursive Power in Contemporary Media Systems: A Comparative Framework

computational journalism social media

We propose the concept of discursive power. This describes the ability of contributors to communication spaces to introduce, amplify, and maintain topics, frames, and speakers, thus shaping public discourses and controversies that unfold in interconnected communication spaces.

Andreas Jungherr, Oliver Posegga, Jisun An

The International Journal of Press/Politics, 24(4), 2019

130+ papers citing this work (Google scholar)

Imaginary People Representing Real Numbers: Generating Personas from Online Social Media Data

HCI

We develop a methodology to automate creating imaginary people, referred to as personas, by processing complex behavioral and demographic data of social media audiences. From a popular social media account containing more than 30 million interactions by viewers from 198 countries engaging with more than 4,200 online videos produced by a global media corporation, we demonstrate that our methodology has several novel accomplishments.

Jisun An, Haewoon Kwak, Soon-gyo Jung, Joni Salminen, M. Admad, Bernard J. Jansen

ACM Transactions on the Web, 12(4), 2018

110+ papers citing this work (Google scholar)

Assessing the Accuracy of Four Popular Face Recognition Tools for Inferring Gender, Age, and Race

AI/ML/NLP dataset/tool bias/fairness

We evaluate four widely used face detection tools, which are Face++, IBM Bluemix Visual Recognition, AWS Rekognition, and Microsoft Azure Face API, using multiple datasets to determine their accuracy in inferring user attributes, including gender, race, and age.

Soon-gyo Jung, Jisun An, Haewoon Kwak, Joni Salminen, Bernard Jim Jansen

Proceedings of the 12th International AAAI Conference on Web and Social Media (ICWSM), 2018 (short)

65+ papers citing this work (Google scholar)

Anatomy of Online Hate: Developing a Taxonomy and Machine Learning Models for Identifying and Classifying Hate in Online News Media

computational social science social media online harm

We manually label 5,143 hateful expressions posted to YouTube and Facebook videos among a dataset of 137,098 comments from an online news media. We then create a granular taxonomy of different types and targets of online hate and train machine learning models to automatically detect and classify the hateful comments in the full dataset.

Joni Salminen, Hind Almerekhi, Milica Milenković, Soon-gyo Jung, Jisun An, Haewoon Kwak, Bernard J. Jansen

Proceedings of the 12th International AAAI Conference on Web and Social Media (ICWSM), 2018

140+ papers citing this work (Google scholar)

Identifying Regional Trends in Avatar Customization

computational social science game analytics HCI

Peter Mawhorter, Sercan Şengün, Haewoon Kwak, D. Fox Harrell

IEEE Transactions on Games, 10(2), 2018

SemAxis: A Lightweight Framework to Characterize Domain-Specific Word Semantics Beyond Sentiment

computational social science AI/ML/NLP dataset/tool

We propose SemAxis, a simple yet powerful framework to characterize word semantics using many semantic axes in word-vector spaces beyond sentiment. We demonstrate that SemAxis can capture nuanced semantic representations in multiple online communities. We also show that, when the sentiment axis is examined, SemAxis outperforms the state-of-the-art approaches in building domain-specific sentiment lexicons.

Jisun An, Haewoon Kwak, Yong-Yeol Ahn

Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL), 2018

50+ papers citing this work (Google scholar)

What We Read, What We Search: Media Attention and Public Attention among 193 Countries

computational social science computational journalism network science

We investigate the alignment of international attention of news media organizations within 193 countries with the expressed international interests of the public within those same countries from March 7, 2016 to April 14, 2017. We collect fourteen months of longitudinal data of online news from Unfiltered News and web search volume data from Google Trends and build a multiplex network of media attention and public attention in order to study its structural and dynamic properties.

Haewoon Kwak, Jisun An, Joni Salminen, Soon-Gyo Jung, Bernard J. Jansen.

Proceedings of the 2018 World Wide Web Conference (WWW), 2018

What is Gab? A Bastion of Free Speech or an Alt-Right Echo Chamber?

computational social science social media online harm

We provide, to the best of our knowledge, the first characterization of Gab. We collect and analyze 22M posts produced by 336K users between August 2016 and January 2018, finding that Gab is predominantly used for the dissemination and discussion of news and world events, and that it attracts alt-right users, conspiracy theorists, and other trolls

Savvas Zannettou, Barry Bradlyn, Emiliano De Cristofaro, Haewoon Kwak, Michael Sirivianos, Gianluca Stringhini, Jeremy Blackburn

Companion Proceedings of the The Web Conference (WWW), 2018

Press coverage-New Scientist, and Vice

280+ papers citing this work (Google scholar)

“Is More Better?”: Impact of Multiple Photos on Perception of Persona Profiles

HCI

We investigate if and how more photos than a single headshot can heighten the level of information provided by persona profiles. We conduct eye-tracking experiments and qualitative interviews with variations in the photos-a single headshot, a headshot and images of the persona in different contexts, and a headshot with pictures of different people representing key persona attributes.

Joni Salminen, Lene Nielsen, Soon-Gyo Jung, Jisun An, Haewoon Kwak, Bernard J. Jansen

Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI), 2018

Convergence of Media Attention Across 129 Countries

computational journalism network science

The objective of this study is to assess the longitudinal trends of media similarity and dissimilarity on the international scale. As news value has well-established political, cultural, and economic consequences, the degree to which media coverage and content is converging across countries has implications for international relations. To study this convergence, we use the daily data of the 100 topics that were overreported in each country, compared to other countries, from March 7 to October 9, 2016.

Jisun An, Hassan Aldarbesti, Haewoon Kwak

Proceedings of Social Informatics (SocInfo), 2017 (short)

Multidimensional Analysis of the News Consumption of Different Demographic Groups on a Nationwide Scale

computational journalism user engagement

Examining 103,133 news articles that are the most popular for different demographic groups in Daum News (the second most popular news portal in South Korea) during the whole year of 2015, we provided multi-level analyses of gender and age diﬀerences in news consumption. We measured such diﬀerences in four diﬀerent levels - (1) by actual news items, (2) by section, (3) by topic, and (4) by subtopic. We characterized the news items at the four levels by using the computational techniques, which are topic modeling and the vector representation of words and news items. We found that diﬀerences in news reading behavior across diﬀerent demographic groups are the most noticeable in subtopic level but neither section nor topic levels.

Jisun An, Haewoon Kwak

Proceedings of Social Informatics (SocInfo), 2017

Multiplex Media Attention and Disregard Network among 129 Countries

computational journalism network science

We built a multiplex media attention and disregard network (MADN) among 129 countries over 212 days. By characterizing the MADN from multiple levels, we found that it is formed primarily by skewed, hierarchical, and asymmetric relationships. Also, we found strong evidence that our news world is becoming a “global village.” However, at the same time, unique attention blocks of the Middle East and North Africa (MENA) region, as well as Russia and its neighbors, still exist.

Haewoon Kwak, Jisun An

Proceedings of the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), 2017

Demographics of News Sharing in the US Twittersphere

computational journalism social media user engagement

The widespread adoption and dissemination of online news through social media systems have been revolutionizing many segments of our society and ultimately our daily lives. In these systems, users can play a central role as they share content to their friends. Despite that, little is known about news spreaders in social media. In this paper, we provide the first of its kind in-depth characterization of news spreaders in social media. In particular, we investigate their demographics, what kind of content they share, and the audience they reach.

Julio Reis, Haewoon Kwak, and Jisun An, Johnnatan Messias, Fabrıcio Benevenuto

Proceedings of the 28th ACM Conference on Hypertext and Social Media (HT), 2017

Data-driven Approach to Measuring the Level of Press Freedom Using Media Attention Diversity from Unfiltered News

computational journalism bias/fairness

Published by Reporters Without Borders every year, the Press Freedom Index (PFI) reflects the fear and tension in the newsroom pushed by the government and private sectors. While the PFI is invaluable in monitoring media environments worldwide, the current survey-based method has inherent limitations to updates in terms of cost and time. In this work, we introduce an alternative way to measure the level of press freedom using media attention diversity compiled from Unfiltered News.

Jisun An, Haewoon Kwak

Proceedings of the ICWSM Workshop on NEws and publiC Opinion (NECO), 2017

Picked as The Best of the Physics arXiv (week ending April 15, 2017) in MIT Technology Review

I Would Not Plant Apple Trees If the World Will Be Wiped: Analyzing Hundreds of Millions of Behavioral Records of Players During an MMORPG Beta Test

computational social science game analytics

We use player behavior during the closed beta test of the MMORPG ArcheAge as a proxy for an extreme situation-at the end of the closed beta test, all user data is deleted, and thus, the outcome (or penalty) of players’ in-game behaviors in the last few days loses its meaning. We analyzed 270 million records of player behavior in the 4th closed beta test of ArcheAge.

Ah Reum Kang, Jeremy Blackburn, Haewoon Kwak, Huy Kang Kim

Proceedings of the 26th International Conference on World Wide Web (WWW) Companion, 2017

Press coverage-New Scientist, IFL Science, PC Gamer, Massively OK, El Confidencial, Joongang Ilbo, and so on.

Achievement and Friends: Key Factors of Player Retention Vary Across Player Levels in Online Multiplayer Games

game analytics user engagement

Retaining players over an extended period of time is a long-standing challenge in game industry. Significant effort has been paid to understanding what motivates players enjoy games. While individuals may have varying reasons to play or abandon a game at different stages within the game, previous studies have looked at the retention problem from a snapshot view. This study, by analyzing in-game logs of 51,104 distinct individuals in an online multiplayer game, uniquely offers a multifaceted view of the retention problem over the players’ virtual life phases.

Kunwoo Park, Meeyoung Cha, Haewoon Kwak, Kuan-Ta Chen

Proceedings of the 26th International Conference on World Wide Web (WWW) Companion, 2017

High correlation of Middle East respiratory syndrome spread with Google search and Twitter trends in Korea

computational social science social media

The Middle East respiratory syndrome coronavirus (MERS-CoV) was exported to Korea in 2015, resulting in a threat to neighboring nations. We evaluated the possibility of using a digital surveillance system based on web searches and social media data to monitor this MERS outbreak. We collected the number of daily laboratory-confirmed MERS cases and quarantined cases from May 11, 2015 to June 26, 2015 using the Korean government MERS portal. The daily trends observed via Google search and Twitter during the same time period were also ascertained using Google Trends and Topsy. Correlations among the data were then examined using Spearman correlation analysis.

Soo-Yong Shin, Dong-Woo Seo, Jisun An, Haewoon Kwak, Sung-Han Kim, Jin Gwack, Min-Woo Jo

Scientific Reports 6, Article number 32920 (2016)

130+ papers citing this work (Google scholar)

Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Photos Using GDELT and Deep Learning-based Vision APIs

computational journalism AI/ML/NLP bias/fairness dataset/tool

In this work, we analyze more than two million news photos published in January 2016. We demonstrate i) which objects appear the most in news photos; ii) what the sentiments of news photos are; iii) whether the sentiment of news photos is aligned with the tone of the text; iv) how gender is treated; and v) how differently political candidates are portrayed. To our best knowledge, this is the first large-scale study of news photo contents using deep learning-based vision APIs.

Haewoon Kwak, Jisun An

ICWSM Workshop on NEws and publiC Opinion (NECO), 2016

Picked as The Best of the Physics arXiv (week ending March 26, 2016) in MIT Technology Review