Publications

A list of selected papers in which research team members participated.
For a full list see below or go to Google Scholar (Jisun An and Haewoon Kwak).

computational journalism political science network science game analytics AI/ML/NLP HCI
online harm dataset/tool bias/fairness user engagement

The Impact of Toxic Trolling Comments on Anti-vaccine YouTube Videos

AI/ML/NLP

Anti-vaccine trolling on video-hosting websites hinders efforts to increase vaccination rates by using toxic language and threatening claims to intimidate people and promote vaccine hesitancy. However, there is a shortage of research investigating the effects of toxic messages on these platforms. This study focused on YouTube anti-vaccine videos and examined the relationship between toxicity and fear in the comment section of these videos. We discovered that highly liked toxic comments were associated with a significant level of fear in subsequent comments. Moreover, we found complex patterns of contagion between toxicity and fear in the comments. …

Kunihiro Miyazaki, Takayuki Uchiba, Haewoon Kwak, Jisun An, Kazutoshi Sasahara

Scientific Reports, 2024

Public Perception of Generative AI on Twitter: An Empirical Study Based on Occupation and Usage

AI/ML/NLP

The emergence of generative AI has sparked substantial discussions, with the potential to have profound impacts on society in all aspects. As emerging technologies continue to advance, it is imperative to facilitate their proper integration into society, managing expectations and fear. This paper investigates users’ perceptions of generative AI using 3M posts on Twitter from January 2019 to March 2023, especially focusing on their occupation and usage. We find that people across various occupations, not just IT-related ones, show a strong interest in generative AI. The sentiment toward generative AI is generally positive, …

Kunihiro Miyazaki, Taichi Murayama, Takayuki Uchiba, Jisun An, Haewoon Kwak

EPJ Data Science, 2024

Press coverage-Blockchain News

5+ papers citing this work (Google scholar)

Enhancing Spatiotemporal Traffic Prediction through Urban Human Activity Analysis

AI/ML/NLP

Traffic prediction is one of the key elements to ensure the safety and convenience of citizens. Existing traffic prediction models primarily focus on deep learning architectures to capture spatial and temporal correlation. They often overlook the underlying nature of traffic. Specifically, the sensor networks in most traffic datasets do not accurately represent the actual road network exploited by vehicles, failing to provide insights into the traffic patterns in urban activities. To overcome these limitations, we propose an improved traffic prediction method based on graph convolution deep learning algorithms. …

Sumin Han, Youngjun Park, Minji Lee, Jisun An, Dongman Lee

ACM CIKM, 2023

Can We Trust the Evaluation on ChatGPT?

AI/ML/NLP

ChatGPT, the first large language model (LLM) with mass adoption, has demonstrated remarkable performance in numerous natural language tasks. Despite its evident usefulness, evaluating ChatGPT’s performance in diverse problem domains remains challenging due to the closed nature of the model and its continuous updates via Reinforcement Learning from Human Feedback (RLHF). We highlight the issue of data contamination in ChatGPT evaluations, with a case study of the task of stance detection. We discuss the challenge of preventing data contamination and ensuring fair model evaluation in the age of closed and continuously trained models.

Rachith Aiyappa, Jisun An, Haewoon Kwak, Yong-Yeol Ahn

TrustNLP (Collocated with ACL), 2023

45+ papers citing this work (Google scholar)

Wearing Masks Implies Refuting Trump?: Towards Target-specific User Stance Prediction across Events in COVID-19 and US Election 2020

AI/ML/NLP

People who share similar opinions towards controversial topics could form an echo chamber and may share similar political views toward other topics as well. The existence of such connections, which we call connected behavior, gives researchers a unique opportunity to predict how one would behave for a future event given their past behaviors. In this work, we propose a framework to conduct connected behavior analysis. Neural stance detection models are trained on Twitter data collected on three seemingly independent topics, i.e., wearing a mask, racial equality, and Trump, to detect people’s stance, …

Hong Zhang, Haewoon Kwak, Wei Gao, Jisun An

ACM WebSci, 2023

Political Honeymoon Effect on Social Media: Characterizing Social Media Reaction to the Changes of Prime Minister in Japan

New leaders in democratic countries typically enjoy high approval ratings immediately after taking office. This phenomenon is called the honeymoon effect and is regarded as a significant political phenomenon; however, its mechanism remains underexplored. Therefore, this study examines how social media users respond to changes in political leadership in order to better understand the honeymoon effect in politics. In particular, we constructed a 15-year Twitter dataset on eight change timings of Japanese prime ministers consisting of 6.6M tweets and analyzed them in terms of sentiments, topics, and users. …

Kunihiro Miyazaki, Taichi Murayama, Akira Matsui, Masaru Nishikawa, Takayuki Uchiba, Haewoon Kwak, Jisun An

ACM WebSci, 2023

Is ChatGPT better than Human Annotators? Potential and Limitations of ChatGPT in Explaining Implicit Hate Speech

AI/ML/NLP online harm

Recent studies have alarmed that many online hate speeches are implicit. With its subtle nature, the explainability of the detection of such hateful speech has been a challenging problem. In this work, we examine whether ChatGPT can be used for providing natural language explanations (NLEs) for implicit hateful speech detection. We design our prompt to elicit concise ChatGPT-generated NLEs and conduct user studies to evaluate their qualities by comparison with human-generated NLEs. We discuss the potential and limitations of ChatGPT in the context of implicit hateful speech research.

Fan Huang, Haewoon Kwak, Jisun An

WWW Companion, 2023

150+ papers citing this work (Google scholar)

Chain of Explanation: New Prompting Method to Generate Higher Quality Natural Language Explanation for Implicit Hate Speech

AI/ML/NLP online harm

Recent studies have exploited advanced generative language models to generate Natural Language Explanations (NLE) for why a certain text could be hateful. We propose the Chain of Explanation (CoE) Prompting method, using the heuristic words and target group, to generate high-quality NLE for implicit hate speech. We improved the BLUE score from 44.0 to 62.3 for NLE generation by providing accurate target information. We then evaluate the quality of generated NLE using various automatic metrics and human annotations of informativeness and clarity scores.

Fan Huang, Haewoon Kwak, Jisun An

WWW Companion, 2023

15+ papers citing this work (Google scholar)

'This is Fake News': Characterizing the Spontaneous Debunking from Twitter Users to COVID-19 False Information

computational journalism online harm

False information spreads on social media, and fact-checkingis a potential countermeasure. However, there is a severeshortage of fact-checkers; an efficient way to scale fact-checking is desperately needed, especially in pandemics likeCOVID-19. In this study, we focus on spontaneous debunk-ing by social media users, which has been missed in exist-ing research despite its indicated usefulness for fact-checkingand countering false information. Specifically, we character-ize the tweets with false information, or fake tweets, thattend to be debunked and Twitter users who often debunk faketweets.

Kunihiro Miyazaki, Takayuki Uchiba, Kenji Tanaka, Jisun An, Haewoon Kwak, Kazutoshi Sasahara

AAAI ICWSM, 2023

You Have Earned a Trophy: Characterize In-Game Achievements and Their Completions

game analytics HCI user engagement

Achievement systems have been actively adopted in gaming platforms to maintain players’ interests. Among them, trophies in PlayStation games are one of the most successful achievement systems. While the importance of trophy design has been casually discussed in many game developers’ forums, there has been no systematic study of the historical dataset of trophies yet. In this work, we construct a complete dataset of PlayStation games and their trophies and investigate them from both the developers’ and players’ perspectives.

Haewoon Kwak

ACM WebSci, 2022

MAANG? MANGA? Characterizing Spontaneous Ideation Contest on Social Media

network science

Social media is not only a place for people to communicate on a daily matter but also a virtual venue to transmit and exchange various ideas. Such ideas are known as the raw voices of potential consumers, which come from a wide range of people who may not participate in consumer surveys, and therefore their opinions may contain high value to companies. However, how users share their ideas on social media is still underexplored. This study investigates a spontaneous ideation contest about a generic term for new Big Tech companies, which occurred when Facebook changed its name to Meta. We constructed a comprehensive dataset of tweets containing candidates and examined how they were suggested, spread, and exchanged by social media users. Our findings indicate that different ideas are better on different metrics. The ranking of ideas was not decided immediately after the idea contest started. The first people to post ideas have smaller followers than those who post secondarily or who only share the idea. We also confirmed that replies accumulate unique ideas, but most of them are added in the first depth in reply trees. This study would promote the use of social media as a part of open innovation and co-creation processes in the industry.

Kunihiro Miyazaki, Takayuki Uchiba, Haewoon Kwak, Jisun An

IEEE BigData, 2022 (short)

Modeling Political Activism around Gun Debate via Social Media

political science network science

The United States have some of the highest rates of gun violence among developed countries. Yet, there is a disagreement about the extent to which firearms should be regulated. In this study, we employ social media signals to examine the predictors of offline political activism, at both population and individual level. We show that it is possible to classify the stance of users on the gun issue, especially accurately when network information is available. Alongside socioeconomic variables, network information such as the relative size of the two sides of the debate is also predictive of state-level gun policy. On individual level, we build a statistical model using network, content, and psycho-linguistic features that predicts real-life political action, and explore the most predictive linguistic features. Thus, we argue that, alongside demographics and socioeconomic indicators, social media provides useful signals in the holistic modeling of political engagement around the gun debate.

Yelena Mejova, Jisun An, Gianmarco De Francisci Morales, Haewoon Kwak

ACM Transactions on Social Computing, 2022

5+ papers citing this work (Google scholar)

Storm the Capitol: Linking Offline Political Speech and Online Twitter Extra-Representational Participation on QAnon and the January 6 Insurrection

AI/ML/NLP political science online harm

The transfer of power stemming from the 2020 presidential election occurred during an unprecedented period in United States history. Uncertainty from the COVID-19 pandemic, ongoing societal tensions, and a fragile economy increased societal polarization, exacerbated by the outgoing president’s offline rhetoric. As a result, online groups such as QAnon engaged in extra political participation beyond the traditional platforms. This research explores the link between offline political speech and online extra-representational participation by examining Twitter within the context of the January 6 insurrection. Using a mixed-methods approach of quantitative and qualitative thematic analyses, the study combines offline speech information with Twitter data during key speech addresses leading up to the date of the insurrection; exploring the link between Trump’s offline speeches and QAnon’s hashtags across a 3-day timeframe. We find that links between online extra-representational participation and offline political speech exist. This research illuminates this phenomenon and offers policy implications for the role of online messaging as a tool of political mobilization.

Claire Seungeun Lee, Juan Merizalde, John D. Colautti, Jisun An and Haewoon Kwak

Frontiers in Sociology, 2022

Press coverage-PsyPost

15+ papers citing this work (Google scholar)

Measuring 9 Emotions of News Posts from 8 News Organizations across 4 Social Media Platforms for 8 Months

computational journalism user engagement

Using Plutchik’s wheel of emotions framework, we identify the emotional content of 133,487 social media posts and the audience’s emotional engagement expressed in 2,824,162 comments on those posts. We measure nine emotions (anger, anticipation, anxiety, disgust, joy, fear, sadness, surprise, trust) and two sentiments (positive and negative) using two extraction resources (EmoLex, LIWC) for eight major news outlets across four social media platforms (Facebook, Instagram, Twitter, and YouTube) during eight months. We then apply two approaches (Logistic Regression, Long Short-Term Memory) to predict emotional audience reactions before and after publishing the posts. …

Kholoud Khalil Aldous, Jisun An, Bernard J. Jansen

ACM Transactions on Social Computing, 2022

10+ papers citing this work (Google scholar)

Understanding Toxicity Triggers on Reddit in the Context of Singapore

AI/ML/NLP online harm

While the contagious nature of online toxicity sparked increasing interest in its early detection and prevention, most of the literature focuses on the Western world. In this work, we demonstrate that 1) it is possible to detect toxicity triggers in an Asian online community, and 2) toxicity triggers can be strikingly different between Western and Eastern contexts.

Yun Yu Chong, Haewoon Kwak

Proceedings of the 16th International AAAI Conference on Web and Social Media (ICWSM), 2022 (short)

Press coverage-AI Ethics Brief Newsletter by Montreal AI Ethics Institute

10+ papers citing this work (Google scholar)

Who Is Missing? Characterizing the Participation of Different Demographic Groups in a Korean Nationwide Daily Conversation Corpus

AI/ML/NLP dataset/tool bias/fairness

A conversation corpus is essential to build interactive AI applications. However, the demographic information of the participants in such corpora is largely underexplored mainly due to the lack of individual data in many corpora. In this work, we analyze a Korean nationwide daily conversation corpus constructed by the National Institute of Korean Language (NIKL) to characterize the participation of different demographic (age and sex) groups in the corpus.

Haewoon Kwak, Jisun An, Kunwoo Park

Proceedings of the 16th International AAAI Conference on Web and Social Media (ICWSM), 2022 (short)

What really matters?: characterising and predicting user engagement of news postings using multiple platforms, sentiments and topics

computational journalism user engagement

This research characterises user engagement of approximately 3,000,000 news postings of 53 news outlets and 50,000,000 associated user comments during 8 months on 5 social media platforms (i.e. Facebook, Instagram, Twitter, YouTube, and Reddit). We investigate the effect of sentiments and topics on user engagement across four levels of user engagement expressions (i.e. views, likes, comments, cross-platform posting). We find that sentiments and topics differ by both news outlets and social media platforms, and both sentiments and topics by the four levels of user engagement expression. …

Kholoud Khalil Aldous, Jisun An, Bernard J. Jansen

Behaviour & Information Technology, 2022

10+ papers citing this work (Google scholar)

Predicting Anti-Asian Hateful Users on Twitter during COVID-19

AI/ML/NLP online harm

We investigate predictors of anti-Asian hate among Twitter users throughout COVID-19. With the rise of xenophobia and polarization that has accompanied widespread social media usage in many nations, online hate has become a major social issue, attracting many researchers. Here, we apply natural language processing techniques to characterize social media users who began to post anti-Asian hate messages during COVID-19. We compare two user groups – those who posted anti-Asian slurs and those who did not – with respect to a rich set of features measured with data prior to COVID-19 and show that it is possible to predict who later publicly posted anti-Asian slurs. …

Jisun An, Haewoon Kwak, Claire Seungeun Lee, Bogang Jun, Yong-Yeol Ahn

Findings of the Association for Computational Linguistics EMNLP 2021

Code repo (github)

20+ papers citing this work (Google scholar)

Precision Public Health Campaign: Delivering Persuasive Messages to Relevant Segments Through Targeted Advertisements on Social Media

user engagement

We propose a novel precision public health campaign framework to structure and standardize the process of designing and delivering tailored health messages to target particular population segments using social media–targeted advertising tools. Our framework consists of five stages - defining a campaign goal, priority audience, and evaluation metrics; splitting the target audience into smaller segments; tailoring the message for each segment and conducting a pilot test; running the health campaign formally; and evaluating the performance of the campaigns. We have demonstrated how the framework works through 2 case studies. The precision public health campaign framework has the potential to support higher population uptake and engagement rates by encouraging a more standardized, concise, efficient, and targeted approach to public health campaign development.

Jisun An, Haewoon Kwak, Hanya M Qureshi, Ingmar Weber

JMIR Form Res 2021;5(9):e22313, 2021

10+ papers citing this work (Google scholar)

FrameAxis: characterizing microframe bias and intensity with word embedding

AI/ML/NLP dataset/tool bias/fairness

Framing is a process of emphasizing a certain aspect of an issue over the others, nudging readers or listeners towards different positions on the issue even without making a biased argument. Here, we propose FrameAxis, a method for characterizing documents by identifying the most relevant semantic axes (“microframes”) that are overrepresented in the text using word embedding. Our unsupervised approach can be readily applied to large datasets because it does not require manual annotations. …

Haewoon Kwak, Jisun An, Elise Jing, Yong-Yeol Ahn

PeerJ Computer Science 7:e644, 2021

Code repo (github)

25+ papers citing this work (Google scholar)

Populist Supporters on Reddit: A Comparison of Content and Behavioral Patterns Within Publics of Supporters of Donald Trump and Hillary Clinton

political science

We investigate differences along these dimensions on the online forum Reddit by comparing linguistic patterns and content of comments in two subreddits focusing on a populist, Donald Trump (/r/The_Donald), and a center-left politician, Hillary Clinton (/r/hillaryclinton), during the 2016 U.S. presidential election campaign.

Andreas Jungherr, Oliver Posegga, Jisun An

Social Science Computer Review. March 2021.

15+ papers citing this work (Google scholar)

How-to Present News on Social Media: A Causal Analysis of Editing News Headlines for Boosting User Engagement

computational journalism AI/ML/NLP user engagement

We first build a parallel corpus of original news articles and their corresponding tweets that were shared by eight media outlets. Then, we explore how those media edited tweets against original headlines, and the effects would be..

Kunwoo Park, Haewoon Kwak, Jisun An, Sanjay Chawla

Proceedings of the 15th International AAAI Conference on Web and Social Media (ICWSM), 2021

5+ papers citing this work (Google scholar)

A Systematic Media Frame Analysis of 1.5 Million New York Times Articles from 2000 to 2017

computational journalism AI/ML/NLP bias/fairness

Framing is an indispensable narrative device for news media because even the same facts may lead to conflicting understandings if deliberate framing is employed. By developing a media frame classifier that achieves state-of-the-art performance, we systematically analyze the media frames of 1.5 million New York Times articles published from 2000 to 2017.

Haewoon Kwak, Jisun An, Yong-Yeol Ahn

Proceedings of the 12th ACM Conference on Web Science (WebSci), 2020

20+ papers citing this work (Google scholar)

Identifying and Characterizing Alternative News Media on Facebook

computational journalism network science bias/fairness

In this work, we propose a graph-based semi-supervised method to measure the political bias of pages on most countries and show the political split of the alternative media, mainstream media, and public figures pages. We validate our method using the publicly available U.S. dataset and then apply it to Brazilian pages, where we found a larger number of right-wing pages in general, except for alternative news media.

Samuel S Guimarães, Julio CS Reis, Lucas Lima, Filipe N Ribeiro, Marisa Vasconcelos, Jisun An, Haewoon Kwak, Fabrício Benevenuto

IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), 2020

What Was Written vs. Who Read It: News Media Profiling Using Text Analysis and Social Media Context

computational journalism AI/ML/NLP bias/fairness

Predicting the political bias and the factuality of reporting of entire news outlets are critical elements of media profiling, which is an understudied but an increasingly important research direction. The present level of proliferation of fake, biased, and propagandistic content online has made it impossible to fact-check every single suspicious claim, either manually or automatically. Thus, it has been proposed to profile entire news outlets and to look for those that are likely to publish fake or biased content. This makes it possible to detect likely “fake news” the moment they are published, by simply checking the reliability of their source. From a practical perspective, political bias and factuality of reporting have a linguistic aspect but also a social context.

Ramy Baly, Georgi Karadzhov, Jisun An, Haewoon Kwak, Yoan Dinkov, Ahmed Ali, James Glass, Preslav Nakov

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL) (2020)

35+ papers citing this work (Google scholar)

Empirical Evaluation of Three Common Assumptions in Building Political Media Bias Datasets

computational journalism AI/ML/NLP dataset/tool bias/fairness

We empirically validate three common assumptions in building political media bias datasets, which are (i) labelers’ political leanings do not affect labeling tasks, (ii) news articles follow their source outlet’s political leaning, and (iii) political leaning of a news outlet is stable across different topics.

Soumen Ganguly, Juhi Kulshrestha, Jisun An, Haewoon Kwak

Proceedings of the 14th International AAAI Conference on Web and Social Media (ICWSM), 2020

15+ papers citing this work (Google scholar)

“Trust Me, I Have a Ph.D.”: A Propensity Score Analysis on the Halo Effect of Disclosing One's Offline Social Status in Online Communities

AI/ML/NLP user engagement

We study two Reddit communities that adopted this scheme, whereby posts include tags identifying education status referred to as flairs, and we examine how the “transferred” social status affects the interactions among the users.

Kunwoo Park, Haewoon Kwak, Hyunho Song, Meeyoung Cha

Proceedings of the 14th International AAAI Conference on Web and Social Media (ICWSM), 2020

5+ papers citing this work (Google scholar)

Are These Comments Triggering? Predicting Triggers of Toxicity in Online Discussions

AI/ML/NLP online harm

We define toxicity triggers in online discussions as a non-toxic comment that lead to toxic replies. Then, we build a neural network-based prediction model for toxicity trigger.

Hind Almerekhi, Haewoon Kwak, Bernard Jim Jansen, Joni Salminen (short)

Proceedings of The Web Conference (WWW), 2020

35+ papers citing this work (Google scholar)

Going beyond accuracy: estimating homophily in social networks using predictions

We show that estimating homophily in a network can be viewed as a dyadic prediction problem, and that homophily estimates are unbiased when dyad-level residuals sum to zero in the network. Then, we propose a novel “ego-alter” modeling approach that outperforms standard node and dyad classification strategies.

George Berry, Antonio Sirianni, Ingmar Weber, Jisun An, Michael Macy (preprint)

arXiv preprint arXiv:2001.11171, 2020

Tanbih: Get To Know What You Are Reading

computational journalism AI/ML/NLP bias/fairness dataset/tool

We introduce Tanbih, a news aggregator with intelligent analysis tools to help readers understanding what’s behind a news story. Our system displays news grouped into events and generates media profiles that show the general factuality of reporting, the degree of propagandistic content, hyper-partisanship, leading political ideology, general frame of reporting, and stance with respect to various claims and topics of a news outlet.

Yifan Zhang, Giovanni Da San Martino, Alberto Barrón-Cedeño, Salvatore Romeo, Jisun An, Haewoon Kwak, Todor Staykovski, Israa Jaradat, Georgi Karadzhov, Ramy Baly, Kareem Darwish, James Glass, Preslav Nakov (demo)

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019

Gender and Racial Diversity in Commercial Brands' Advertising Images on Social Media

bias/fairness

Gender and racial diversity in the mediated images from the media shape our perception of different demographic groups. In this work, we investigate gender and racial diversity of 85,957 advertising images shared by the 73 top international brands on Instagram and Facebook.

Jisun An, Haewoon Kwak

Proceedings of Social Informatics (SocInfo), 2019

Best Paper Award

20+ papers citing this work (Google scholar)

Political Discussions in Homogeneous and Cross-Cutting Communication Spaces

political science

We use Reddit to explore the nature of political discussionsin homogeneous and cross-cutting communication spaces. Inparticular, we develop an analytical template to studyinter-actionandlinguistic patternswithin and between politicallyhomogeneous and heterogeneous communication spaces. Ouranalyses reveal different behavioral patterns in homogeneousand cross-cutting communications spaces.

Jisun An, Haewoon Kwak, Oliver Posegga, Andreas Jungherr

Proceedings of the 13th International AAAI Conference on Web and Social Media (ICWSM), 2019

65+ papers citing this work (Google scholar)

View, Like, Comment, Post: Analyzing User Engagement by Topic at 4 Levels across 5 Social Media Platforms for 53 News Organizations

computational journalism user engagement

We evaluate the effects of the topics of social media posts on audiences across five social media platforms (i.e., Facebook, Instagram, Twitter, YouTube, and Reddit) at four levels of user engagement. We collected 3,163,373 social posts from 53 news organizations across five platforms during an 8month period.

Kholoud Khalil Aldous, Jisun An, Bernard J. Jansen

Proceedings of the 13th International AAAI Conference on Web and Social Media (ICWSM), 2019

85+ papers citing this work (Google scholar)

Discursive Power in Contemporary Media Systems: A Comparative Framework

computational journalism

We propose the concept of discursive power. This describes the ability of contributors to communication spaces to introduce, amplify, and maintain topics, frames, and speakers, thus shaping public discourses and controversies that unfold in interconnected communication spaces.

Andreas Jungherr, Oliver Posegga, Jisun An

The International Journal of Press/Politics, 24(4), 2019

130+ papers citing this work (Google scholar)

Imaginary People Representing Real Numbers: Generating Personas from Online Social Media Data

HCI

We develop a methodology to automate creating imaginary people, referred to as personas, by processing complex behavioral and demographic data of social media audiences. From a popular social media account containing more than 30 million interactions by viewers from 198 countries engaging with more than 4,200 online videos produced by a global media corporation, we demonstrate that our methodology has several novel accomplishments.

Jisun An, Haewoon Kwak, Soon-gyo Jung, Joni Salminen, M. Admad, Bernard J. Jansen

ACM Transactions on the Web, 12(4), 2018

110+ papers citing this work (Google scholar)

Assessing the Accuracy of Four Popular Face Recognition Tools for Inferring Gender, Age, and Race

AI/ML/NLP dataset/tool bias/fairness

We evaluate four widely used face detection tools, which are Face++, IBM Bluemix Visual Recognition, AWS Rekognition, and Microsoft Azure Face API, using multiple datasets to determine their accuracy in inferring user attributes, including gender, race, and age.

Soon-gyo Jung, Jisun An, Haewoon Kwak, Joni Salminen, Bernard Jim Jansen

Proceedings of the 12th International AAAI Conference on Web and Social Media (ICWSM), 2018 (short)

65+ papers citing this work (Google scholar)

Anatomy of Online Hate: Developing a Taxonomy and Machine Learning Models for Identifying and Classifying Hate in Online News Media

online harm

We manually label 5,143 hateful expressions posted to YouTube and Facebook videos among a dataset of 137,098 comments from an online news media. We then create a granular taxonomy of different types and targets of online hate and train machine learning models to automatically detect and classify the hateful comments in the full dataset.

Joni Salminen, Hind Almerekhi, Milica Milenković, Soon-gyo Jung, Jisun An, Haewoon Kwak, Bernard J. Jansen

Proceedings of the 12th International AAAI Conference on Web and Social Media (ICWSM), 2018

140+ papers citing this work (Google scholar)

Identifying Regional Trends in Avatar Customization

game analytics HCI

We manually label 5,143 hateful expressions posted to YouTube and Facebook videos among a dataset of 137,098 comments from an online news media. We then create a granular taxonomy of different types and targets of online hate and train machine learning models to automatically detect and classify the hateful comments in the full dataset.

Peter Mawhorter, Sercan Şengün, Haewoon Kwak, D. Fox Harrell

IEEE Transactions on Games, 10(2), 2018

SemAxis: A Lightweight Framework to Characterize Domain-Specific Word Semantics Beyond Sentiment

AI/ML/NLP dataset/tool

We propose SemAxis, a simple yet powerful framework to characterize word semantics using many semantic axes in word-vector spaces beyond sentiment. We demonstrate that SemAxis can capture nuanced semantic representations in multiple online communities. We also show that, when the sentiment axis is examined, SemAxis outperforms the state-of-the-art approaches in building domain-specific sentiment lexicons.

Jisun An, Haewoon Kwak, Yong-Yeol Ahn

Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL), 2018

50+ papers citing this work (Google scholar)

What We Read, What We Search: Media Attention and Public Attention among 193 Countries

computational journalism network science

We investigate the alignment of international attention of news media organizations within 193 countries with the expressed international interests of the public within those same countries from March 7, 2016 to April 14, 2017. We collect fourteen months of longitudinal data of online news from Unfiltered News and web search volume data from Google Trends and build a multiplex network of media attention and public attention in order to study its structural and dynamic properties.

Haewoon Kwak, Jisun An, Joni Salminen, Soon-Gyo Jung, Bernard J. Jansen.

Proceedings of the 2018 World Wide Web Conference (WWW), 2018

20+ papers citing this work (Google scholar)

What is Gab? A Bastion of Free Speech or an Alt-Right Echo Chamber?

online harm

We provide, to the best of our knowledge, the first characterization of Gab. We collect and analyze 22M posts produced by 336K users between August 2016 and January 2018, finding that Gab is predominantly used for the dissemination and discussion of news and world events, and that it attracts alt-right users, conspiracy theorists, and other trolls

Savvas Zannettou, Barry Bradlyn, Emiliano De Cristofaro, Haewoon Kwak, Michael Sirivianos, Gianluca Stringhini, Jeremy Blackburn

Companion Proceedings of the The Web Conference (WWW), 2018

Press coverage-New Scientist, and Vice

280+ papers citing this work (Google scholar)

“Is More Better?”: Impact of Multiple Photos on Perception of Persona Profiles

HCI

We investigate if and how more photos than a single headshot can heighten the level of information provided by persona profiles. We conduct eye-tracking experiments and qualitative interviews with variations in the photos-a single headshot, a headshot and images of the persona in different contexts, and a headshot with pictures of different people representing key persona attributes.

Joni Salminen, Lene Nielsen, Soon-Gyo Jung, Jisun An, Haewoon Kwak, Bernard J. Jansen

Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI), 2018

Convergence of Media Attention Across 129 Countries

computational journalism network science

The objective of this study is to assess the longitudinal trends of media similarity and dissimilarity on the international scale. As news value has well-established political, cultural, and economic consequences, the degree to which media coverage and content is converging across countries has implications for international relations. To study this convergence, we use the daily data of the 100 topics that were overreported in each country, compared to other countries, from March 7 to October 9, 2016.

Jisun An, Hassan Aldarbesti, Haewoon Kwak

Proceedings of Social Informatics (SocInfo), 2017 (short)

Multidimensional Analysis of the News Consumption of Different Demographic Groups on a Nationwide Scale

computational journalism user engagement

Examining 103,133 news articles that are the most popular for different demographic groups in Daum News (the second most popular news portal in South Korea) during the whole year of 2015, we provided multi-level analyses of gender and age differences in news consumption. We measured such differences in four different levels - (1) by actual news items, (2) by section, (3) by topic, and (4) by subtopic. We characterized the news items at the four levels by using the computational techniques, which are topic modeling and the vector representation of words and news items. We found that differences in news reading behavior across different demographic groups are the most noticeable in subtopic level but neither section nor topic levels.

Jisun An, Haewoon Kwak

Proceedings of Social Informatics (SocInfo), 2017

Multiplex Media Attention and Disregard Network among 129 Countries

computational journalism network science

We built a multiplex media attention and disregard network (MADN) among 129 countries over 212 days. By characterizing the MADN from multiple levels, we found that it is formed primarily by skewed, hierarchical, and asymmetric relationships. Also, we found strong evidence that our news world is becoming a “global village.” However, at the same time, unique attention blocks of the Middle East and North Africa (MENA) region, as well as Russia and its neighbors, still exist.

Haewoon Kwak, Jisun An

Proceedings of the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), 2017

Demographics of News Sharing in the US Twittersphere

computational journalism user engagement

The widespread adoption and dissemination of online news through social media systems have been revolutionizing many segments of our society and ultimately our daily lives. In these systems, users can play a central role as they share content to their friends. Despite that, little is known about news spreaders in social media. In this paper, we provide the first of its kind in-depth characterization of news spreaders in social media. In particular, we investigate their demographics, what kind of content they share, and the audience they reach.

Julio Reis, Haewoon Kwak, and Jisun An, Johnnatan Messias, Fabrıcio Benevenuto

Proceedings of the 28th ACM Conference on Hypertext and Social Media (HT), 2017

35+ papers citing this work (Google scholar)

Data-driven Approach to Measuring the Level of Press Freedom Using Media Attention Diversity from Unfiltered News

computational journalism bias/fairness

Published by Reporters Without Borders every year, the Press Freedom Index (PFI) reflects the fear and tension in the newsroom pushed by the government and private sectors. While the PFI is invaluable in monitoring media environments worldwide, the current survey-based method has inherent limitations to updates in terms of cost and time. In this work, we introduce an alternative way to measure the level of press freedom using media attention diversity compiled from Unfiltered News.

Jisun An, Haewoon Kwak

Proceedings of the ICWSM Workshop on NEws and publiC Opinion (NECO), 2017

Picked as The Best of the Physics arXiv (week ending April 15, 2017) in MIT Technology Review

I Would Not Plant Apple Trees If the World Will Be Wiped: Analyzing Hundreds of Millions of Behavioral Records of Players During an MMORPG Beta Test

game analytics

We use player behavior during the closed beta test of the MMORPG ArcheAge as a proxy for an extreme situation-at the end of the closed beta test, all user data is deleted, and thus, the outcome (or penalty) of players’ in-game behaviors in the last few days loses its meaning. We analyzed 270 million records of player behavior in the 4th closed beta test of ArcheAge.

Ah Reum Kang, Jeremy Blackburn, Haewoon Kwak, Huy Kang Kim

Proceedings of the 26th International Conference on World Wide Web (WWW) Companion, 2017

Press coverage-New Scientist, IFL Science, PC Gamer, Massively OK, El Confidencial, Joongang Ilbo, and so on.

10+ papers citing this work (Google scholar)

Achievement and Friends: Key Factors of Player Retention Vary Across Player Levels in Online Multiplayer Games

game analytics user engagement

Retaining players over an extended period of time is a long-standing challenge in game industry. Significant effort has been paid to understanding what motivates players enjoy games. While individuals may have varying reasons to play or abandon a game at different stages within the game, previous studies have looked at the retention problem from a snapshot view. This study, by analyzing in-game logs of 51,104 distinct individuals in an online multiplayer game, uniquely offers a multifaceted view of the retention problem over the players’ virtual life phases.

Kunwoo Park, Meeyoung Cha, Haewoon Kwak, Kuan-Ta Chen

Proceedings of the 26th International Conference on World Wide Web (WWW) Companion, 2017

35+ papers citing this work (Google scholar)

High correlation of Middle East respiratory syndrome spread with Google search and Twitter trends in Korea

The Middle East respiratory syndrome coronavirus (MERS-CoV) was exported to Korea in 2015, resulting in a threat to neighboring nations. We evaluated the possibility of using a digital surveillance system based on web searches and social media data to monitor this MERS outbreak. We collected the number of daily laboratory-confirmed MERS cases and quarantined cases from May 11, 2015 to June 26, 2015 using the Korean government MERS portal. The daily trends observed via Google search and Twitter during the same time period were also ascertained using Google Trends and Topsy. Correlations among the data were then examined using Spearman correlation analysis.

Soo-Yong Shin, Dong-Woo Seo, Jisun An, Haewoon Kwak, Sung-Han Kim, Jin Gwack, Min-Woo Jo

Scientific Reports 6, Article number 32920 (2016)

130+ papers citing this work (Google scholar)

Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Photos Using GDELT and Deep Learning-based Vision APIs

computational journalism AI/ML/NLP bias/fairness dataset/tool

In this work, we analyze more than two million news photos published in January 2016. We demonstrate i) which objects appear the most in news photos; ii) what the sentiments of news photos are; iii) whether the sentiment of news photos is aligned with the tone of the text; iv) how gender is treated; and v) how differently political candidates are portrayed. To our best knowledge, this is the first large-scale study of news photo contents using deep learning-based vision APIs.

Haewoon Kwak, Jisun An

ICWSM Workshop on NEws and publiC Opinion (NECO), 2016

Picked as The Best of the Physics arXiv (week ending March 26, 2016) in MIT Technology Review

20+ papers citing this work (Google scholar)

Are You Charlie or Ahmed? Cultural Pluralism in Charlie Hebdo Response on Twitter

We study the response to the Charlie Hebdo shootings of January 7, 2015 on Twitter across the globe. We ask whether the stances on the issue of freedom of speech can be modeled using established sociological theories, including Huntington’s culturalist Clash of Civilizations, and those taking into consideration social context, including Density and Interdependence theories. We find support for Huntington’s culturalist explanation, in that the established traditions and norms of one’s “civilization” predetermine some of one’s opinion.

Jisun An, Haewoon Kwak, Yelena Mejova, Sonia Alonso Saenz De Oger, Braulio Gomez Fortes

Proceeding of the 10th International Conference on Web and Social Media (ICWSM), 2016

45+ papers citing this work (Google scholar)

Exploring Cyberbullying and Other Toxic Behavior in Team Competition Online Games

game analytics HCI online harm

In this work we explore cyberbullying and other toxic behavior in team competition online games. Using a dataset of over 10 million player reports on 1.46 million toxic players along with corresponding crowdsourced decisions, we test several hypotheses drawn from theories explaining toxic behavior. Besides providing large-scale, empirical based understanding of toxic behavior, our work can be used as a basis for building systems to detect, prevent, and counter-act toxic behavior.

Haewoon Kwak, Jeremy Blackburn, Seungyeop Han

Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI), 2015

310+ papers citing this work (Google scholar)

A First Look at Global News Coverage of Disasters By Using the GDELT Dataset

computational journalism dataset/tool bias/fairness

In this work, we reveal the structure of global news coverage of disasters and its determinants by using a large-scale news coverage dataset collected by the GDELT (Global Data on Events, Location, and Tone) project that monitors news media in over 100 languages from the whole world. Significant variables in our hierarchical (mixed-effect) regression model, such as population, political stability, damage, and more, are well aligned with a series of previous research. However, we find strong regionalism in news geography, highlighting the necessity of comprehensive datasets for the study of global news coverage.

Haewoon Kwak, Jisun An

Proceedings of Social Informatics, 2014

Press Coverage-MIT Technology Review, ACM TechNews

60+ papers citing this work (Google scholar)

Linguistic Analysis of Toxic Behavior in an Online Video Game

game analytics online harm

In this paper we explore the linguistic components of toxic behavior by using crowdsourced data from over 590 thousand cases of accused toxic players in a popular match-based competition game, League of Legends. We perform a series of linguistic analyses to gain a deeper understanding of the role communication plays in the expression of toxic behavior. We characterize linguistic behavior of toxic players and compare it with that of typical players in an online competition game. We also find empirical support describing how a player transitions from typical to toxic behavior. Our findings can be helpful to automatically detect and warn players who may become toxic and thus insulate potential victims from toxic playing in advance.

Haewoon Kwak, Jeremy Blackburn

SocInfo Workshop on Exploration on Games and Gamers (EGG), 2014

110+ papers citing this work (Google scholar)

Searching for a Unique Style in Soccer

network science

We introduce the the concept of “flow motifs” to characterize the statistically significant pass sequence patterns. It extends the idea of the network motifs, highly significant subgraphs that usually consists of three or four nodes. The analysis of the motifs in the pass networks allows us to compare and differentiate the styles of different teams. Although most teams tend to apply homogenous style, surprisingly, a unique strategy of soccer exists. Specifically, FC Barcelona’s famous tiki-taka does not consist of uncountable random passes but rather has a precise, finely constructed structure.

Laszlo Gyarmati, Haewoon Kwak, Pablo Rodriguez

KDD Workshop on Large-Scale Sports Analytics, 2014

Press coverage-BBC, MIT Technology Review, The Times, The Economist, Slate, Pacific Standard, and so on.

110+ papers citing this work (Google scholar)

STFU NOOB! Predicting Crowdsourced Decisions on Toxic Behavior in Online Games

game analytics online harm

We propose a supervised learning approach for predicting crowdsourced decisions on toxic behavior with large-scale labeled data collections; over 10 million user reports involved in 1.46 million toxic players and corresponding crowdsourced decisions. Our result shows good performance in detecting overwhelmingly majority cases and predicting crowdsourced decisions on them. We demonstrate good portability of our classifier across regions.

Jeremy Blackburn, Haewoon Kwak

Proceedings of the 23rd international conference on World wide web (WWW), 2014

Press coverage-Nature, Scientific American, Chosun Ilbo

190+ papers citing this work (Google scholar)

Recommending investors for crowdfunding projects

One of the most popular crowdfunding sites is Kickstarter. In it, creators post descriptions of their projects and advertise them on social media sites (mainly Twitter), while investors look for projects to support. We set out to propose different ways of recommending investors found on Twitter for specific Kickstarter projects. We do so by conducting hypothesis-driven analyses of pledging behavior and translate the corresponding findings into different recommendation strategies. The best strategy achieves, on average, 84% of accuracy in predicting a list of potential investors’ Twitter accounts for any given project.

Jisun An, Daniele Quercia, Jon Crowcroft

Proceedings of the 23rd international conference on World wide web (WWW), 2014

Press coverage-FastCompany

100+ papers citing this work (Google scholar)

Fragile Online Relationship: a First Look at Unfollow Dynamics in Twitter

network science HCI

We analyze the dynamics of the behavior known as ‘unfollow’ in Twitter. We collected daily snapshots of the online relationships of 1.2 million Korean-speaking users for 51 days as well as all of their tweets. We found that Twitter users frequently unfollow. We then discover the major factors, including the reciprocity of the relationships, the duration of a relationship, the followees’ informativeness, and the overlap of the relationships, which affect the decision to unfollow. We conduct interview with 22 Korean respondents to supplement the quantitative results.

Haewoon Kwak, Hyunwoo Chun, Sue Moon

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI), 2011.

Press coverage - Kyunghyang Shinmun

150+ papers citing this work (Google scholar)

Media Landscape in Twitter: A World of New Conventions and Political Diversity

computational journalism network science

We present a preliminary but groundbreaking study of the media landscape of Twitter. We use public data on whom follows who to uncover common behaviour in media consumption, the relationship between various classes of media, and the diversity of media content which social links may bring. Our analysis shows that there is a non-negligible amount of indirect media exposure, either through friends who follow particular media sources, or via retweeted messages. We show that the indirect media exposure expands the political diversity of news to which users are exposed to a surprising extent, increasing the range by between 60-98%. These results are valuable because they have not been readily available to traditional media, and they can help predict how we will read news, and how publishers will interact with us in the future.

Jisun An, Meeyoung Cha, Krishna Gummadi, Jon Crowcroft

Proceedings of the 5th International AAAI Conference on Weblogs and Social Media (ICWSM), 2011.

200+ papers citing this work (Google scholar)

What is Twitter, a Social Network or a News Media?

network science

We have crawled the entire Twitter site and obtained 41.7 million user profiles, 1.47 billion social relations, 4,262 trending topics, and 106 million tweets. In its follower-following topology analysis we have found a non-power-law follower distribution, a short effective diameter, and low reciprocity, which all mark a deviation from known characteristics of human social networks [28]. In order to identify influentials on Twitter, we have ranked users by the number of followers and by PageRank and found two rankings to be similar. Ranking by retweets differs from the previous two rankings, indicating a gap in influence inferred from the number of followers and that from the popularity of one’s tweets.

Haewoon Kwak, Changhyun Lee, Hosung Park, Sue Moon

Proceedings of the 19th international conference on World wide web (WWW), 2010.

Press coverage - Mashable Op-Ed, ReadWrite, The Guardian, PC News, Chosun Ilbo, DongA Ilbo

9900+ papers citing this work (Google scholar)

 

Full List

The Impact of Toxic Trolling Comments on Anti-vaccine YouTube Videos
Kunihiro Miyazaki, Takayuki Uchiba, Haewoon Kwak, Jisun An, Kazutoshi Sasahara
Scientific Reports, 2024

Public Perception of Generative AI on Twitter: An Empirical Study Based on Occupation and Usage
Kunihiro Miyazaki, Taichi Murayama, Takayuki Uchiba, Jisun An, Haewoon Kwak
EPJ Data Science, 2024
Press coverage-Blockchain News
5+ papers citing this work (Google scholar)

Enhancing Spatiotemporal Traffic Prediction through Urban Human Activity Analysis
Sumin Han, Youngjun Park, Minji Lee, Jisun An, Dongman Lee
ACM CIKM, 2023

Can We Trust the Evaluation on ChatGPT?
Rachith Aiyappa, Jisun An, Haewoon Kwak, Yong-Yeol Ahn
TrustNLP (Collocated with ACL), 2023
45+ papers citing this work (Google scholar)

Wearing Masks Implies Refuting Trump?: Towards Target-specific User Stance Prediction across Events in COVID-19 and US Election 2020
Hong Zhang, Haewoon Kwak, Wei Gao, Jisun An
ACM WebSci, 2023

Political Honeymoon Effect on Social Media: Characterizing Social Media Reaction to the Changes of Prime Minister in Japan
Kunihiro Miyazaki, Taichi Murayama, Akira Matsui, Masaru Nishikawa, Takayuki Uchiba, Haewoon Kwak, Jisun An
ACM WebSci, 2023

Is ChatGPT better than Human Annotators? Potential and Limitations of ChatGPT in Explaining Implicit Hate Speech
Fan Huang, Haewoon Kwak, Jisun An
WWW Companion, 2023
150+ papers citing this work (Google scholar)

Chain of Explanation: New Prompting Method to Generate Higher Quality Natural Language Explanation for Implicit Hate Speech
Fan Huang, Haewoon Kwak, Jisun An
WWW Companion, 2023
15+ papers citing this work (Google scholar)

YouNICon: YouTube’s CommuNIty of Conspiracy Videos
Shaoyi Liaw, Fan Huang, Fabricio Benevenuto, Haewoon Kwak, Jisun An
ICWSM Dataset, 2023

‘This is Fake News’: Characterizing the Spontaneous Debunking from Twitter Users to COVID-19 False Information
Kunihiro Miyazaki, Takayuki Uchiba, Kenji Tanaka, Jisun An, Haewoon Kwak, Kazutoshi Sasahara
AAAI ICWSM, 2023

You Have Earned a Trophy: Characterize In-Game Achievements and Their Completions
Haewoon Kwak
ACM WebSci, 2022

MAANG? MANGA? Characterizing Spontaneous Ideation Contest on Social Media
Kunihiro Miyazaki, Takayuki Uchiba, Haewoon Kwak, Jisun An
IEEE BigData, 2022 (short)

Modeling Political Activism around Gun Debate via Social Media
Yelena Mejova, Jisun An, Gianmarco De Francisci Morales, Haewoon Kwak
ACM Transactions on Social Computing, 2022
5+ papers citing this work (Google scholar)

Storm the Capitol: Linking Offline Political Speech and Online Twitter Extra-Representational Participation on QAnon and the January 6 Insurrection
Claire Seungeun Lee, Juan Merizalde, John D. Colautti, Jisun An and Haewoon Kwak
Frontiers in Sociology, 2022
Press coverage-PsyPost
15+ papers citing this work (Google scholar)

Measuring 9 Emotions of News Posts from 8 News Organizations across 4 Social Media Platforms for 8 Months
Kholoud Khalil Aldous, Jisun An, Bernard J. Jansen
ACM Transactions on Social Computing, 2022
10+ papers citing this work (Google scholar)

Understanding Toxicity Triggers on Reddit in the Context of Singapore
Yun Yu Chong, Haewoon Kwak
Proceedings of the 16th International AAAI Conference on Web and Social Media (ICWSM), 2022 (short)
Press coverage-AI Ethics Brief Newsletter by Montreal AI Ethics Institute
10+ papers citing this work (Google scholar)

Who Is Missing? Characterizing the Participation of Different Demographic Groups in a Korean Nationwide Daily Conversation Corpus
Haewoon Kwak, Jisun An, Kunwoo Park
Proceedings of the 16th International AAAI Conference on Web and Social Media (ICWSM), 2022 (short)

What really matters?: characterising and predicting user engagement of news postings using multiple platforms, sentiments and topics
Kholoud Khalil Aldous, Jisun An, Bernard J. Jansen
Behaviour & Information Technology, 2022
10+ papers citing this work (Google scholar)

Predicting Anti-Asian Hateful Users on Twitter during COVID-19
Jisun An, Haewoon Kwak, Claire Seungeun Lee, Bogang Jun, Yong-Yeol Ahn
Findings of the Association for Computational Linguistics EMNLP 2021
Code repo (github)
20+ papers citing this work (Google scholar)

Precision Public Health Campaign: Delivering Persuasive Messages to Relevant Segments Through Targeted Advertisements on Social Media
Jisun An, Haewoon Kwak, Hanya M Qureshi, Ingmar Weber
JMIR Form Res 2021;5(9):e22313, 2021
10+ papers citing this work (Google scholar)

FrameAxis: characterizing microframe bias and intensity with word embedding
Haewoon Kwak, Jisun An, Elise Jing, Yong-Yeol Ahn
PeerJ Computer Science 7:e644, 2021
Code repo (github)
25+ papers citing this work (Google scholar)

Populist Supporters on Reddit: A Comparison of Content and Behavioral Patterns Within Publics of Supporters of Donald Trump and Hillary Clinton
Andreas Jungherr, Oliver Posegga, Jisun An
Social Science Computer Review. March 2021.
15+ papers citing this work (Google scholar)

How-to Present News on Social Media: A Causal Analysis of Editing News Headlines for Boosting User Engagement
Kunwoo Park, Haewoon Kwak, Jisun An, Sanjay Chawla
Proceedings of the 15th International AAAI Conference on Web and Social Media (ICWSM), 2021
5+ papers citing this work (Google scholar)

A Systematic Media Frame Analysis of 1.5 Million New York Times Articles from 2000 to 2017
Haewoon Kwak, Jisun An, Yong-Yeol Ahn
Proceedings of the 12th ACM Conference on Web Science (WebSci), 2020
20+ papers citing this work (Google scholar)

Identifying and Characterizing Alternative News Media on Facebook
Samuel S Guimarães, Julio CS Reis, Lucas Lima, Filipe N Ribeiro, Marisa Vasconcelos, Jisun An, Haewoon Kwak, Fabrício Benevenuto
IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), 2020

What Was Written vs. Who Read It: News Media Profiling Using Text Analysis and Social Media Context
Ramy Baly, Georgi Karadzhov, Jisun An, Haewoon Kwak, Yoan Dinkov, Ahmed Ali, James Glass, Preslav Nakov
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL) (2020)
35+ papers citing this work (Google scholar)

Empirical Evaluation of Three Common Assumptions in Building Political Media Bias Datasets
Soumen Ganguly, Juhi Kulshrestha, Jisun An, Haewoon Kwak
Proceedings of the 14th International AAAI Conference on Web and Social Media (ICWSM), 2020
15+ papers citing this work (Google scholar)

“Trust Me, I Have a Ph.D.”: A Propensity Score Analysis on the Halo Effect of Disclosing One’s Offline Social Status in Online Communities
Kunwoo Park, Haewoon Kwak, Hyunho Song, Meeyoung Cha
Proceedings of the 14th International AAAI Conference on Web and Social Media (ICWSM), 2020
5+ papers citing this work (Google scholar)

Are These Comments Triggering? Predicting Triggers of Toxicity in Online Discussions
Hind Almerekhi, Haewoon Kwak, Bernard Jim Jansen, Joni Salminen (short)
Proceedings of The Web Conference (WWW), 2020
35+ papers citing this work (Google scholar)

Going beyond accuracy: estimating homophily in social networks using predictions
George Berry, Antonio Sirianni, Ingmar Weber, Jisun An, Michael Macy (preprint)
arXiv preprint arXiv:2001.11171, 2020

Persona Perception Scale: Development and Exploratory Validation of an Instrument for Evaluating Individuals’ Perceptions of Personas
Joni Salminen, Joao M. Santos, Haewoon Kwak, Jisun An, Soon-gyo Jung, Bernard J. Jansen
International Journal of Human-Computer Studies, 2020

Tanbih: Get To Know What You Are Reading
Yifan Zhang, Giovanni Da San Martino, Alberto Barrón-Cedeño, Salvatore Romeo, Jisun An, Haewoon Kwak, Todor Staykovski, Israa Jaradat, Georgi Karadzhov, Ramy Baly, Kareem Darwish, James Glass, Preslav Nakov (demo)
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019

Gender and Racial Diversity in Commercial Brands’ Advertising Images on Social Media
Jisun An, Haewoon Kwak
Proceedings of Social Informatics (SocInfo), 2019
Best Paper Award
20+ papers citing this work (Google scholar)

Stylistic Features Usage: Similarities and Differences Using Multiple Social Networks
Kholoud Khalil Aldous, Jisun An, Bernard J. Jansen
Proceedings of Social Informatics (SocInfo), 2019

Predicting Audience Engagement Across Social Media Platforms in the News Domain
Kholoud Khalil Aldous, Jisun An, Bernard J. Jansen
Proceedings of Social Informatics (SocInfo), 2019

Detecting Toxicity Triggers in Online Discussions
Hind Almerekhi, Haewoon Kwak, Bernard Jim Jansen, Joni Salminen (poster)
Proceedings of the 30th ACM Conference on Hypertext and Social Media (HT), 2019
40+ papers citing this work (Google scholar)

Political Discussions in Homogeneous and Cross-Cutting Communication Spaces
Jisun An, Haewoon Kwak, Oliver Posegga, Andreas Jungherr
Proceedings of the 13th International AAAI Conference on Web and Social Media (ICWSM), 2019
65+ papers citing this work (Google scholar)

View, Like, Comment, Post: Analyzing User Engagement by Topic at 4 Levels across 5 Social Media Platforms for 53 News Organizations
Kholoud Khalil Aldous, Jisun An, Bernard J. Jansen
Proceedings of the 13th International AAAI Conference on Web and Social Media (ICWSM), 2019
85+ papers citing this work (Google scholar)

The Challenges of Creating Engaging Content: Results from a Focus Group Study of a Popular News Media Organization
Kholoud Khalil Aldous, Jisun An, Bernard J. Jansen (Extended Abstracts)
Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems (CHI), 2019

Social media mining for journalism
Arkaitz Zubiaga, Bahareh Heravi, Jisun An, Haewoon Kwak (Guest editorial)
Online Information Review, 2019
10+ papers citing this work (Google scholar)

Discursive Power in Contemporary Media Systems: A Comparative Framework
Andreas Jungherr, Oliver Posegga, Jisun An
The International Journal of Press/Politics, 24(4), 2019
130+ papers citing this work (Google scholar)

Reports of the Workshops Held at the 2018 International AAAI Conference on Web and Social Media
Jisun An, Rumi Chunara, David J. Crandall, Darian Frajberg, Megan French, Bernard J. Jansen, Juhi Kulshrestha, Yelena Mejova, Daniel M. Romero, Joni Salminen, Amit Sharma, Amit Sheth, Chenhao Tan, Samuel Hardman Taylor, Sanjaya Wijeratne
AI Magazine, 2018

Imaginary People Representing Real Numbers: Generating Personas from Online Social Media Data
Jisun An, Haewoon Kwak, Soon-gyo Jung, Joni Salminen, M. Admad, Bernard J. Jansen
ACM Transactions on the Web, 12(4), 2018
110+ papers citing this work (Google scholar)

Assessing the Accuracy of Four Popular Face Recognition Tools for Inferring Gender, Age, and Race
Soon-gyo Jung, Jisun An, Haewoon Kwak, Joni Salminen, Bernard Jim Jansen
Proceedings of the 12th International AAAI Conference on Web and Social Media (ICWSM), 2018 (short)
65+ papers citing this work (Google scholar)

Anatomy of Online Hate: Developing a Taxonomy and Machine Learning Models for Identifying and Classifying Hate in Online News Media
Joni Salminen, Hind Almerekhi, Milica Milenković, Soon-gyo Jung, Jisun An, Haewoon Kwak, Bernard J. Jansen
Proceedings of the 12th International AAAI Conference on Web and Social Media (ICWSM), 2018
140+ papers citing this work (Google scholar)

Identifying Regional Trends in Avatar Customization
Peter Mawhorter, Sercan Şengün, Haewoon Kwak, D. Fox Harrell
IEEE Transactions on Games, 10(2), 2018

SemAxis: A Lightweight Framework to Characterize Domain-Specific Word Semantics Beyond Sentiment
Jisun An, Haewoon Kwak, Yong-Yeol Ahn
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL), 2018
50+ papers citing this work (Google scholar)

What We Read, What We Search: Media Attention and Public Attention among 193 Countries
Haewoon Kwak, Jisun An, Joni Salminen, Soon-Gyo Jung, Bernard J. Jansen.
Proceedings of the 2018 World Wide Web Conference (WWW), 2018
20+ papers citing this work (Google scholar)

What is Gab? A Bastion of Free Speech or an Alt-Right Echo Chamber?
Savvas Zannettou, Barry Bradlyn, Emiliano De Cristofaro, Haewoon Kwak, Michael Sirivianos, Gianluca Stringhini, Jeremy Blackburn
Companion Proceedings of the The Web Conference (WWW), 2018
Press coverage-New Scientist, and Vice
280+ papers citing this work (Google scholar)

Fixation and Confusion: Investigating Eye-tracking Participants’ Exposure to Information in Personas
Joni Salminen, Bernard J. Jansen, Jisun An, Soon-Gyo Jung, Lene Nielsen, Haewoon Kwak
Proceedings of the 2018 Conference on Human Information Interaction & Retrieval (CHIIR), 2018

“Is More Better?”: Impact of Multiple Photos on Perception of Persona Profiles
Joni Salminen, Lene Nielsen, Soon-Gyo Jung, Jisun An, Haewoon Kwak, Bernard J. Jansen
Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI), 2018

Reports of the Workshops Held at the 2017 International AAAI Conference on Web and Social Media
Jisun An, Giovanni Luca Ciampaglia, Nir Grinberg, Kenneth Joseph, Alexios Mantzarlis, Gregory Maus, Filippo Menczer, Nicholas Proferes, Brooke Foucault Welles
AI Magazine, 2017

Convergence of Media Attention Across 129 Countries
Jisun An, Hassan Aldarbesti, Haewoon Kwak
Proceedings of Social Informatics (SocInfo), 2017 (short)

Multidimensional Analysis of the News Consumption of Different Demographic Groups on a Nationwide Scale
Jisun An, Haewoon Kwak
Proceedings of Social Informatics (SocInfo), 2017

Multiplex Media Attention and Disregard Network among 129 Countries
Haewoon Kwak, Jisun An
Proceedings of the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), 2017

Demographics of News Sharing in the US Twittersphere
Julio Reis, Haewoon Kwak, and Jisun An, Johnnatan Messias, Fabrıcio Benevenuto
Proceedings of the 28th ACM Conference on Hypertext and Social Media (HT), 2017
35+ papers citing this work (Google scholar)

Data-driven Approach to Measuring the Level of Press Freedom Using Media Attention Diversity from Unfiltered News
Jisun An, Haewoon Kwak
Proceedings of the ICWSM Workshop on NEws and publiC Opinion (NECO), 2017
Picked as The Best of the Physics arXiv (week ending April 15, 2017) in MIT Technology Review

What Gets Media Attention and How Media Attention Evolves Over Time - Large-scale Empirical Evidence from 196 Countries
Jisun An, Haewoon Kwak (short)
Proceedings of the 11th International AAAI Conference on Web and Social Media (ICWSM), 2017

Persona Generation from Aggregated Social Media Data
Soon-Gyo Jung, Jisun An, Haewoon Kwak, Moeed Ahmad, Lene Nielsen, Bernard J. Jansen (Extended Abstract)
Proceedings of the 2017 CHI Conference Extended Abstracts on Human Factors in Computing Systems (CHI), 2017

I Would Not Plant Apple Trees If the World Will Be Wiped: Analyzing Hundreds of Millions of Behavioral Records of Players During an MMORPG Beta Test
Ah Reum Kang, Jeremy Blackburn, Haewoon Kwak, Huy Kang Kim
Proceedings of the 26th International Conference on World Wide Web (WWW) Companion, 2017
Press coverage-New Scientist, IFL Science, PC Gamer, Massively OK, El Confidencial, Joongang Ilbo, and so on.
10+ papers citing this work (Google scholar)

Achievement and Friends: Key Factors of Player Retention Vary Across Player Levels in Online Multiplayer Games
Kunwoo Park, Meeyoung Cha, Haewoon Kwak, Kuan-Ta Chen
Proceedings of the 26th International Conference on World Wide Web (WWW) Companion, 2017
35+ papers citing this work (Google scholar)

Culturally-Grounded Analysis of Everyday Creativity in Social Media: A Case Study in Qatari Context
D. Fox Harrell, Sarah Vieweg, Haewoon Kwak, Chong-U Lim, Sercan Sengun, Ali Jahanian, Pablo Ortiz
Proceedings of the 2017 ACM SIGCHI Conference on Creativity and Cognition (C&C), 2017

Who Are Your Users? Comparing Media Professionals’ Preconception of Users to Data-Driven Personas
Lene Nielsen, Soon-Gyo Jung, Jisun An, Joni Salminen, Haewoon Kwak, Bernard J. Jansen
Proceedings of the 29th Australian Conference on Computer-Human Interaction (OZCHI), 2017

Generating Cultural Personas from Social Data: A Perspective of Middle Eastern Users
J. Salminen, S. Sengün, H. Kwak, B. Jansen, J. An, S. Jung, S. Vieweg, D. F. Harrell
Proceedings of the 5th International Conference on Future Internet of Things and Cloud Workshops, 2017

Personas for Content Creators via Decomposed Aggregate Audience Statistics
Jisun An, Haewoon Kwak, Bernard J. Jansen (short)
Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), 2017

High correlation of Middle East respiratory syndrome spread with Google search and Twitter trends in Korea
Soo-Yong Shin, Dong-Woo Seo, Jisun An, Haewoon Kwak, Sung-Han Kim, Jin Gwack, Min-Woo Jo
Scientific Reports 6, Article number 32920 (2016)
130+ papers citing this work (Google scholar)

Multidimensional Analysis of Gender and Age Differences in News Consumption
Jisun An, Haewoon Kwak
Computation+Journalism (C+J) Symposium (2016)

Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Photos Using GDELT and Deep Learning-based Vision APIs
Haewoon Kwak, Jisun An
ICWSM Workshop on NEws and publiC Opinion (NECO), 2016
Picked as The Best of the Physics arXiv (week ending March 26, 2016) in MIT Technology Review
20+ papers citing this work (Google scholar)

Two Tales of the World: Comparison of Widely Used World News Datasets: GDELT and EventRegistry
Haewoon Kwak, Jisun An
Proceeding of the 10th International Conference on Web and Social Media (ICWSM), 2016 (short)
30+ papers citing this work (Google scholar)

Are You Charlie or Ahmed? Cultural Pluralism in Charlie Hebdo Response on Twitter
Jisun An, Haewoon Kwak, Yelena Mejova, Sonia Alonso Saenz De Oger, Braulio Gomez Fortes
Proceeding of the 10th International Conference on Web and Social Media (ICWSM), 2016
45+ papers citing this work (Google scholar)

#greysanatomy vs. #yankees: Demographics and Hashtag Use on Twitter.
Jisun An, Ingmar Weber (short)
Proceeding of the 10th International Conference on Web and Social Media (ICWSM), 2016

Whom should we sense in ‘social sensing’-analyzing which users work best for social media now-casting
Jisun An, Ingmar Weber
EPJ Data Science, 4, Article number 22, 2015

Consumers and Suppliers: Attention asymmetries. A Case Study of Aljazeera’s News Coverage and Comments
Sofiane Abbar, Jisun An, Haewoon Kwak, Yacine Messaoui, Javier Borge-Holthoefer
Computation+Journalsim (C+J) Symposium, 2015

Breaking the News: First Impressions Matter on Online News
Julio Reis, Fabrıcio Benevenuto, Pedro Olmo, Raquel Prates, Haewoon Kwak, Jisun An
Proceeding of the 9th International Conference on Web and Social Media (ICWSM), 2015
Picked as Other Interesting arXiv Papers (Week ending April 11, 2015) in MIT Technology Review, and O Globo
200+ papers citing this work (Google scholar)

Exploring Cyberbullying and Other Toxic Behavior in Team Competition Online Games
Haewoon Kwak, Jeremy Blackburn, Seungyeop Han
Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI), 2015
310+ papers citing this work (Google scholar)

From Cells to Streets: Estimating Mobile Paths with Cellular-Side Data
Ilias Leontiadis, Antonio Lima, Haewoon Kwak, Rade Stanojevic, David Wetherall, Konstantina Papagiannaki
Proceedings of the 10th ACM International on Conference on emerging Networking Experiments and Technologies (CoNEXT), 2014

Understanding News Geography and Major Determinants of Global News Coverage of Disasters
Haewoon Kwak, Jisun An (extension of SocInfo’14)
Computation+Journalism (C+J) Symposium, 2014
25+ papers citing this work (Google scholar)

A First Look at Global News Coverage of Disasters By Using the GDELT Dataset
Haewoon Kwak, Jisun An
Proceedings of Social Informatics, 2014
Press Coverage-MIT Technology Review, ACM TechNews
60+ papers citing this work (Google scholar)

Linguistic Analysis of Toxic Behavior in an Online Video Game
Haewoon Kwak, Jeremy Blackburn
SocInfo Workshop on Exploration on Games and Gamers (EGG), 2014
110+ papers citing this work (Google scholar)

Searching for a Unique Style in Soccer
Laszlo Gyarmati, Haewoon Kwak, Pablo Rodriguez
KDD Workshop on Large-Scale Sports Analytics, 2014
Press coverage-BBC, MIT Technology Review, The Times, The Economist, Slate, Pacific Standard, and so on.
110+ papers citing this work (Google scholar)

Didn’t You See My Message? Predicting Attentiveness to Mobile Instant Messages
Martin Pielot, Rodrigo de Oliveira, Haewoon Kwak, Nuria Oliver
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI), 2014
240+ papers citing this work (Google scholar)

STFU NOOB! Predicting Crowdsourced Decisions on Toxic Behavior in Online Games
Jeremy Blackburn, Haewoon Kwak
Proceedings of the 23rd international conference on World wide web (WWW), 2014
Press coverage-Nature, Scientific American, Chosun Ilbo
190+ papers citing this work (Google scholar)

Has Much Potential but Biased: Exploring the Scholarly Landscape in Twitter
Haewoon Kwak, Jonggun Lee (poster)
Proceedings of the 23rd International Conference on World Wide Web Companion, 2014

Sharing political news: the balancing act of intimacy and socialization in selective exposure
Jisun An, Daniele Quercia, Meeyoung Cha, Krishna Gummadi, Jon Crowcroft
EPJ Data Science volume 3, Article number 12, 2014

Recommending investors for crowdfunding projects
Jisun An, Daniele Quercia, Jon Crowcroft
Proceedings of the 23rd international conference on World wide web (WWW), 2014
Press coverage-FastCompany
100+ papers citing this work (Google scholar)

Partisan Sharing: Facebook Evidence and Societal Consequences
Jisun An, Daniele Quercia, Jon Crowcroft
Proceedings of the Second ACM Conference on Online Social Networks (COSN), 2014
110+ papers citing this work (Google scholar)

Tower of Babel: A Crowdsourcing Game Building Sentiment Lexicons for Resource-scarce Languages
Yoonsung Hong, Haewoon Kwak, Youngmin Baek, Sue Moon
WWW Workshop on Multidisciplinary Approaches to Big Social Data Analysis, 2013
25+ papers citing this work (Google scholar)

Structures of Broken Ties: Exploring Unfollow Behavior on Twitter
Bo Xu, Yun Huang, Haewoon Kwak, Noshir S. Contractor
Proceedings of the 16th ACM Conference on Computer Supported Cooperative Work and Social Computing (CSCW), 2013
70+ papers citing this work (Google scholar)

Why Individuals Seek Diverse Opinions (or Why They Don’t)
Jisun An, Daniele Quercia, Jon Crowcroft
Proceedings of the 5th Annual ACM Web Science Conference (WebSci), 2013

Why Do I Retweet It? An Information Propagation Model for Microblogs
Fabio Pezzoni, Jisun An, Andrea Passarella, Jon Crowcroft, Marco Conti
Proceedings of the 5th International Conference on Social Informatics (SocInfo), 2013

Traditional Media Seen from Social Media
Jisun An, Daniele Quercia, Meeyoung Cha, Krishna Gummadi, Jon Crowcroft
Proceedings of the 5th Annual ACM Web Science Conference (WebSci), 2013

Fragmented Social Media: A Look into Selective Exposure to Political News
Jisun An, Daniele Quercia, Jon Crowcroft (poster)
Proceedings of the 22nd International Conference on World Wide Web (WWW) Companion, 2013

More of a Receiver than a Giver: Why Do People Unfollow in Twitter?
Haewoon Kwak, Sue Moon, Wonjae Lee (4 page poster)
Proceedings of the 6th International AAAI Conference on Weblogs and Social Media (ICWSM), 2012
50+ papers citing this work (Google scholar)

Visualizing Media Bias through Twitter
Jisun An, Meeyoung Cha, Krishna Gummadi, Jon Crowcroft, Daniele Quercia
ICWSM Workshop on the Potential of Social Media Tools and Data for Journalists, 2012

Consistent Community Identification in Complex Networks
Haewoon Kwak, Sue Moon, Young-Ho Eom, Yoonchan Choi, Hawoong Jeong
Journal of Korean Physical Society, Vol. 59, No. 5, November 2011.

Fragile Online Relationship: a First Look at Unfollow Dynamics in Twitter
Haewoon Kwak, Hyunwoo Chun, Sue Moon
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI), 2011.
Press coverage - Kyunghyang Shinmun
150+ papers citing this work (Google scholar)

Media Landscape in Twitter: A World of New Conventions and Political Diversity
Jisun An, Meeyoung Cha, Krishna Gummadi, Jon Crowcroft
Proceedings of the 5th International AAAI Conference on Weblogs and Social Media (ICWSM), 2011.
200+ papers citing this work (Google scholar)

What is Twitter, a Social Network or a News Media?
Haewoon Kwak, Changhyun Lee, Hosung Park, Sue Moon
Proceedings of the 19th international conference on World wide web (WWW), 2010.
Press coverage - Mashable Op-Ed, ReadWrite, The Guardian, PC News, Chosun Ilbo, DongA Ilbo
9900+ papers citing this work (Google scholar)

Finding Influentials based on the Temporal Order of Information Adoption in Twitter
Changhyun Lee, Haewoon Kwak, Hosung Park, Sue Moon (poster)
Proceedings of the 19th international conference on World wide web (WWW), 2010.
200+ papers citing this work (Google scholar)

Understanding Topological Mesoscale Features in Community Mining
Sue Moon, Jinyoung You, Haewoon Kwak, Daniel Kim, and Hawoong Jeong (invited paper)
Proceedings of the Second International Conference on COMmunication Systems and NETworks (COMSNETS), 2010.

Analyzing the Video Popularity Characteristics of Large-Scale User Generated Content Systems
Meeyoung Cha, Haewoon Kwak, Pablo Rodriguez, Yong-Yeol Ahn, and Sue Moon
ACM/IEEE Transactions on Networking, Vol 17, Issue 5, 2009
600+ papers citing this work (Google scholar)

Mining Communities in Networks: a Solution for Consistency and Its Evaluation
Haewoon Kwak, Yoonchan Choi, Young-Ho Eom, Hawoong Jeong, Sue Moon
Proceedings of the 9th ACM SIGCOMM conference on Internet measurement (IMC), 2009
90+ papers citing this work (Google scholar)

The Wisdom of the Few: A Collaborative Filtering Approach based on Expert Opinions from the Web
Xavier Amatriaain, Neal Lathia, Josep M. Pujol, Haewoon Kwak, Nuria Oliver
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval (SIGIR), 2009
150+ papers citing this work (Google scholar)

Connecting Users with Similar Interests Across Multiple Web Services
Haewoon Kwak, Hwa-Yong Shin, Jong-Il Yoon, Sue Moon (poster)
Proceedings of the 3rd International AAAI Conference on Weblogs and Social Media (ICWSM), 2009

Comparison of Online Social Relations in Volume vs Interaction: A Case Study of Cyworld
Hyunwoo Chun, Haewoon Kwak, Young-Ho Eom, Yong-Yeol Ahn, Sue Moon, and Hawoong Jeong
Proceedings of the 8th ACM SIGCOMM conference on Internet measurement (IMC), 2008
250+ papers citing this work (Google scholar)

I Tube, You Tube, Everybody Tubes: Analyzing the World’s Largest User Generated Content Video System
Meeyoung Cha, Haewoon Kwak, Pablo Rodriguez, Yong-Yeol Ahn, Sue Moon
Proceedings of the 7th ACM SIGCOMM conference on Internet measurement (IMC), 2009
Best paper award, The IMC Test of Time Award
2,200+ papers citing this work (Google scholar)

Analysis of topological characteristics of huge online social networking services
Yong-Yeol Ahn, Seungyeop Han, Haewoon Kwak, Sue Moon, Hawoong Jeong
Proceedings of the 16th international conference on World Wide Web (WWW), 2007
1,400+ papers citing this work (Google scholar)