Publications

A list of selected papers in which research team members participated.
For a full list see below or go to Google Scholar (Jisun An and Haewoon Kwak).

computational journalism political science network science game analytics AI/ML/NLP HCI
online harm dataset/tool bias/fairness user engagement

Who Is Missing? Characterizing the Participation of Different Demographic Groups in a Korean Nationwide Daily Conversation Corpus

AI/ML/NLP dataset/tool bias/fairness

A conversation corpus is essential to build interactive AI applications. However, the demographic information of the participants in such corpora is largely underexplored mainly due to the lack of individual data in many corpora. In this work, we analyze a Korean nationwide daily conversation corpus constructed by the National Institute of Korean Language (NIKL) to characterize the participation of different demographic (age and sex) groups in the corpus.

Haewoon Kwak, Jisun An, Kunwoo Park

Proceedings of the 16th International AAAI Conference on Web and Social Media (ICWSM), 2022 (short)

FrameAxis: characterizing microframe bias and intensity with word embedding

AI/ML/NLP dataset/tool bias/fairness

Framing is a process of emphasizing a certain aspect of an issue over the others, nudging readers or listeners towards different positions on the issue even without making a biased argument. Here, we propose FrameAxis, a method for characterizing documents by identifying the most relevant semantic axes (“microframes”) that are overrepresented in the text using word embedding. Our unsupervised approach can be readily applied to large datasets because it does not require manual annotations. …

Haewoon Kwak, Jisun An, Elise Jing, Yong-Yeol Ahn

PeerJ Computer Science 7:e644, 2021

Code repo (github)

25+ papers citing this work (Google scholar)

A Systematic Media Frame Analysis of 1.5 Million New York Times Articles from 2000 to 2017

computational journalism AI/ML/NLP bias/fairness

Framing is an indispensable narrative device for news media because even the same facts may lead to conflicting understandings if deliberate framing is employed. By developing a media frame classifier that achieves state-of-the-art performance, we systematically analyze the media frames of 1.5 million New York Times articles published from 2000 to 2017.

Haewoon Kwak, Jisun An, Yong-Yeol Ahn

Proceedings of the 12th ACM Conference on Web Science (WebSci), 2020

20+ papers citing this work (Google scholar)

Identifying and Characterizing Alternative News Media on Facebook

computational journalism network science bias/fairness

In this work, we propose a graph-based semi-supervised method to measure the political bias of pages on most countries and show the political split of the alternative media, mainstream media, and public figures pages. We validate our method using the publicly available U.S. dataset and then apply it to Brazilian pages, where we found a larger number of right-wing pages in general, except for alternative news media.

Samuel S Guimarães, Julio CS Reis, Lucas Lima, Filipe N Ribeiro, Marisa Vasconcelos, Jisun An, Haewoon Kwak, Fabrício Benevenuto

IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), 2020

What Was Written vs. Who Read It: News Media Profiling Using Text Analysis and Social Media Context

computational journalism AI/ML/NLP bias/fairness

Predicting the political bias and the factuality of reporting of entire news outlets are critical elements of media profiling, which is an understudied but an increasingly important research direction. The present level of proliferation of fake, biased, and propagandistic content online has made it impossible to fact-check every single suspicious claim, either manually or automatically. Thus, it has been proposed to profile entire news outlets and to look for those that are likely to publish fake or biased content. This makes it possible to detect likely “fake news” the moment they are published, by simply checking the reliability of their source. From a practical perspective, political bias and factuality of reporting have a linguistic aspect but also a social context.

Ramy Baly, Georgi Karadzhov, Jisun An, Haewoon Kwak, Yoan Dinkov, Ahmed Ali, James Glass, Preslav Nakov

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL) (2020)

35+ papers citing this work (Google scholar)

Empirical Evaluation of Three Common Assumptions in Building Political Media Bias Datasets

computational journalism AI/ML/NLP dataset/tool bias/fairness

We empirically validate three common assumptions in building political media bias datasets, which are (i) labelers’ political leanings do not affect labeling tasks, (ii) news articles follow their source outlet’s political leaning, and (iii) political leaning of a news outlet is stable across different topics.

Soumen Ganguly, Juhi Kulshrestha, Jisun An, Haewoon Kwak

Proceedings of the 14th International AAAI Conference on Web and Social Media (ICWSM), 2020

15+ papers citing this work (Google scholar)

Tanbih: Get To Know What You Are Reading

computational journalism AI/ML/NLP bias/fairness dataset/tool

We introduce Tanbih, a news aggregator with intelligent analysis tools to help readers understanding what’s behind a news story. Our system displays news grouped into events and generates media profiles that show the general factuality of reporting, the degree of propagandistic content, hyper-partisanship, leading political ideology, general frame of reporting, and stance with respect to various claims and topics of a news outlet.

Yifan Zhang, Giovanni Da San Martino, Alberto Barrón-Cedeño, Salvatore Romeo, Jisun An, Haewoon Kwak, Todor Staykovski, Israa Jaradat, Georgi Karadzhov, Ramy Baly, Kareem Darwish, James Glass, Preslav Nakov (demo)

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019

Gender and Racial Diversity in Commercial Brands' Advertising Images on Social Media

bias/fairness

Gender and racial diversity in the mediated images from the media shape our perception of different demographic groups. In this work, we investigate gender and racial diversity of 85,957 advertising images shared by the 73 top international brands on Instagram and Facebook.

Jisun An, Haewoon Kwak

Proceedings of Social Informatics (SocInfo), 2019

Best Paper Award

20+ papers citing this work (Google scholar)

Assessing the Accuracy of Four Popular Face Recognition Tools for Inferring Gender, Age, and Race

AI/ML/NLP dataset/tool bias/fairness

We evaluate four widely used face detection tools, which are Face++, IBM Bluemix Visual Recognition, AWS Rekognition, and Microsoft Azure Face API, using multiple datasets to determine their accuracy in inferring user attributes, including gender, race, and age.

Soon-gyo Jung, Jisun An, Haewoon Kwak, Joni Salminen, Bernard Jim Jansen

Proceedings of the 12th International AAAI Conference on Web and Social Media (ICWSM), 2018 (short)

65+ papers citing this work (Google scholar)

Data-driven Approach to Measuring the Level of Press Freedom Using Media Attention Diversity from Unfiltered News

computational journalism bias/fairness

Published by Reporters Without Borders every year, the Press Freedom Index (PFI) reflects the fear and tension in the newsroom pushed by the government and private sectors. While the PFI is invaluable in monitoring media environments worldwide, the current survey-based method has inherent limitations to updates in terms of cost and time. In this work, we introduce an alternative way to measure the level of press freedom using media attention diversity compiled from Unfiltered News.

Jisun An, Haewoon Kwak

Proceedings of the ICWSM Workshop on NEws and publiC Opinion (NECO), 2017

Picked as The Best of the Physics arXiv (week ending April 15, 2017) in MIT Technology Review

Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Photos Using GDELT and Deep Learning-based Vision APIs

computational journalism AI/ML/NLP bias/fairness dataset/tool

In this work, we analyze more than two million news photos published in January 2016. We demonstrate i) which objects appear the most in news photos; ii) what the sentiments of news photos are; iii) whether the sentiment of news photos is aligned with the tone of the text; iv) how gender is treated; and v) how differently political candidates are portrayed. To our best knowledge, this is the first large-scale study of news photo contents using deep learning-based vision APIs.

Haewoon Kwak, Jisun An

ICWSM Workshop on NEws and publiC Opinion (NECO), 2016

Picked as The Best of the Physics arXiv (week ending March 26, 2016) in MIT Technology Review

20+ papers citing this work (Google scholar)

A First Look at Global News Coverage of Disasters By Using the GDELT Dataset

computational journalism dataset/tool bias/fairness

In this work, we reveal the structure of global news coverage of disasters and its determinants by using a large-scale news coverage dataset collected by the GDELT (Global Data on Events, Location, and Tone) project that monitors news media in over 100 languages from the whole world. Significant variables in our hierarchical (mixed-effect) regression model, such as population, political stability, damage, and more, are well aligned with a series of previous research. However, we find strong regionalism in news geography, highlighting the necessity of comprehensive datasets for the study of global news coverage.

Haewoon Kwak, Jisun An

Proceedings of Social Informatics, 2014

Press Coverage-MIT Technology Review, ACM TechNews

60+ papers citing this work (Google scholar)

 

Full List

Who Is Missing? Characterizing the Participation of Different Demographic Groups in a Korean Nationwide Daily Conversation Corpus
Haewoon Kwak, Jisun An, Kunwoo Park
Proceedings of the 16th International AAAI Conference on Web and Social Media (ICWSM), 2022 (short)

FrameAxis: characterizing microframe bias and intensity with word embedding
Haewoon Kwak, Jisun An, Elise Jing, Yong-Yeol Ahn
PeerJ Computer Science 7:e644, 2021
Code repo (github)
25+ papers citing this work (Google scholar)

A Systematic Media Frame Analysis of 1.5 Million New York Times Articles from 2000 to 2017
Haewoon Kwak, Jisun An, Yong-Yeol Ahn
Proceedings of the 12th ACM Conference on Web Science (WebSci), 2020

20+ papers citing this work (Google scholar)

Identifying and Characterizing Alternative News Media on Facebook
Samuel S Guimarães, Julio CS Reis, Lucas Lima, Filipe N Ribeiro, Marisa Vasconcelos, Jisun An, Haewoon Kwak, Fabrício Benevenuto
IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), 2020

What Was Written vs. Who Read It: News Media Profiling Using Text Analysis and Social Media Context
Ramy Baly, Georgi Karadzhov, Jisun An, Haewoon Kwak, Yoan Dinkov, Ahmed Ali, James Glass, Preslav Nakov
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL) (2020)

35+ papers citing this work (Google scholar)

Empirical Evaluation of Three Common Assumptions in Building Political Media Bias Datasets
Soumen Ganguly, Juhi Kulshrestha, Jisun An, Haewoon Kwak
Proceedings of the 14th International AAAI Conference on Web and Social Media (ICWSM), 2020

15+ papers citing this work (Google scholar)

Tanbih: Get To Know What You Are Reading
Yifan Zhang, Giovanni Da San Martino, Alberto Barrón-Cedeño, Salvatore Romeo, Jisun An, Haewoon Kwak, Todor Staykovski, Israa Jaradat, Georgi Karadzhov, Ramy Baly, Kareem Darwish, James Glass, Preslav Nakov (demo)
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019

Gender and Racial Diversity in Commercial Brands’ Advertising Images on Social Media
Jisun An, Haewoon Kwak
Proceedings of Social Informatics (SocInfo), 2019
Best Paper Award
20+ papers citing this work (Google scholar)

Assessing the Accuracy of Four Popular Face Recognition Tools for Inferring Gender, Age, and Race
Soon-gyo Jung, Jisun An, Haewoon Kwak, Joni Salminen, Bernard Jim Jansen
Proceedings of the 12th International AAAI Conference on Web and Social Media (ICWSM), 2018 (short)

65+ papers citing this work (Google scholar)

Data-driven Approach to Measuring the Level of Press Freedom Using Media Attention Diversity from Unfiltered News
Jisun An, Haewoon Kwak
Proceedings of the ICWSM Workshop on NEws and publiC Opinion (NECO), 2017
Picked as The Best of the Physics arXiv (week ending April 15, 2017) in MIT Technology Review

Multidimensional Analysis of Gender and Age Differences in News Consumption
Jisun An, Haewoon Kwak
Computation+Journalism (C+J) Symposium (2016)

Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Photos Using GDELT and Deep Learning-based Vision APIs
Haewoon Kwak, Jisun An
ICWSM Workshop on NEws and publiC Opinion (NECO), 2016
Picked as The Best of the Physics arXiv (week ending March 26, 2016) in MIT Technology Review
20+ papers citing this work (Google scholar)

Understanding News Geography and Major Determinants of Global News Coverage of Disasters
Haewoon Kwak, Jisun An (extension of SocInfo’14)
Computation+Journalism (C+J) Symposium, 2014

25+ papers citing this work (Google scholar)

A First Look at Global News Coverage of Disasters By Using the GDELT Dataset
Haewoon Kwak, Jisun An
Proceedings of Social Informatics, 2014
Press Coverage-MIT Technology Review, ACM TechNews
60+ papers citing this work (Google scholar)