Publications

A list of selected papers in which research team members participated.
For a full list see below or go to Google Scholar (Jisun An and Haewoon Kwak).

computational social science computational journalism political science network science game analytics AI/ML/NLP HCI
social media online harm dataset/tool bias/fairness user engagement

Who Is Missing? Characterizing the Participation of Different Demographic Groups in a Korean Nationwide Daily Conversation Corpus

AI/ML/NLP dataset/tool bias/fairness

A conversation corpus is essential to build interactive AI applications. However, the demographic information of the participants in such corpora is largely underexplored mainly due to the lack of individual data in many corpora. In this work, we analyze a Korean nationwide daily conversation corpus constructed by the National Institute of Korean Language (NIKL) to characterize the participation of different demographic (age and sex) groups in the corpus.

Haewoon Kwak, Jisun An, Kunwoo Park

Proceedings of the 16th International AAAI Conference on Web and Social Media (ICWSM), 2022 (short)

FrameAxis: characterizing microframe bias and intensity with word embedding

computational social science AI/ML/NLP dataset/tool bias/fairness

Framing is a process of emphasizing a certain aspect of an issue over the others, nudging readers or listeners towards different positions on the issue even without making a biased argument. Here, we propose FrameAxis, a method for characterizing documents by identifying the most relevant semantic axes (“microframes”) that are overrepresented in the text using word embedding. Our unsupervised approach can be readily applied to large datasets because it does not require manual annotations. …

Haewoon Kwak, Jisun An, Elise Jing, Yong-Yeol Ahn

PeerJ Computer Science 7:e644, 2021

Code repo (github)

35+ papers citing this work (Google scholar)

Empirical Evaluation of Three Common Assumptions in Building Political Media Bias Datasets

computational journalism AI/ML/NLP dataset/tool bias/fairness

We empirically validate three common assumptions in building political media bias datasets, which are (i) labelers’ political leanings do not affect labeling tasks, (ii) news articles follow their source outlet’s political leaning, and (iii) political leaning of a news outlet is stable across different topics.

Soumen Ganguly, Juhi Kulshrestha, Jisun An, Haewoon Kwak

Proceedings of the 14th International AAAI Conference on Web and Social Media (ICWSM), 2020

25+ papers citing this work (Google scholar)

Tanbih: Get To Know What You Are Reading

computational journalism AI/ML/NLP bias/fairness dataset/tool

We introduce Tanbih, a news aggregator with intelligent analysis tools to help readers understanding what’s behind a news story. Our system displays news grouped into events and generates media profiles that show the general factuality of reporting, the degree of propagandistic content, hyper-partisanship, leading political ideology, general frame of reporting, and stance with respect to various claims and topics of a news outlet.

Yifan Zhang, Giovanni Da San Martino, Alberto Barrón-Cedeño, Salvatore Romeo, Jisun An, Haewoon Kwak, Todor Staykovski, Israa Jaradat, Georgi Karadzhov, Ramy Baly, Kareem Darwish, James Glass, Preslav Nakov (demo)

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019

Assessing the Accuracy of Four Popular Face Recognition Tools for Inferring Gender, Age, and Race

AI/ML/NLP dataset/tool bias/fairness

We evaluate four widely used face detection tools, which are Face++, IBM Bluemix Visual Recognition, AWS Rekognition, and Microsoft Azure Face API, using multiple datasets to determine their accuracy in inferring user attributes, including gender, race, and age.

Soon-gyo Jung, Jisun An, Haewoon Kwak, Joni Salminen, Bernard Jim Jansen

Proceedings of the 12th International AAAI Conference on Web and Social Media (ICWSM), 2018 (short)

65+ papers citing this work (Google scholar)

SemAxis: A Lightweight Framework to Characterize Domain-Specific Word Semantics Beyond Sentiment

computational social science AI/ML/NLP dataset/tool

We propose SemAxis, a simple yet powerful framework to characterize word semantics using many semantic axes in word-vector spaces beyond sentiment. We demonstrate that SemAxis can capture nuanced semantic representations in multiple online communities. We also show that, when the sentiment axis is examined, SemAxis outperforms the state-of-the-art approaches in building domain-specific sentiment lexicons.

Jisun An, Haewoon Kwak, Yong-Yeol Ahn

Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL), 2018

50+ papers citing this work (Google scholar)

Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Photos Using GDELT and Deep Learning-based Vision APIs

computational journalism AI/ML/NLP bias/fairness dataset/tool

In this work, we analyze more than two million news photos published in January 2016. We demonstrate i) which objects appear the most in news photos; ii) what the sentiments of news photos are; iii) whether the sentiment of news photos is aligned with the tone of the text; iv) how gender is treated; and v) how differently political candidates are portrayed. To our best knowledge, this is the first large-scale study of news photo contents using deep learning-based vision APIs.

Haewoon Kwak, Jisun An

ICWSM Workshop on NEws and publiC Opinion (NECO), 2016

Picked as The Best of the Physics arXiv (week ending March 26, 2016) in MIT Technology Review

20+ papers citing this work (Google scholar)

A First Look at Global News Coverage of Disasters By Using the GDELT Dataset

computational journalism dataset/tool bias/fairness

In this work, we reveal the structure of global news coverage of disasters and its determinants by using a large-scale news coverage dataset collected by the GDELT (Global Data on Events, Location, and Tone) project that monitors news media in over 100 languages from the whole world. Significant variables in our hierarchical (mixed-effect) regression model, such as population, political stability, damage, and more, are well aligned with a series of previous research. However, we find strong regionalism in news geography, highlighting the necessity of comprehensive datasets for the study of global news coverage.

Haewoon Kwak, Jisun An

Proceedings of Social Informatics, 2014

Press Coverage-MIT Technology Review, ACM TechNews

60+ papers citing this work (Google scholar)

Full List

Who Is Missing? Characterizing the Participation of Different Demographic Groups in a Korean Nationwide Daily Conversation Corpus
Haewoon Kwak, Jisun An, Kunwoo Park
Proceedings of the 16th International AAAI Conference on Web and Social Media (ICWSM), 2022 (short)

FrameAxis: characterizing microframe bias and intensity with word embedding
Haewoon Kwak, Jisun An, Elise Jing, Yong-Yeol Ahn
PeerJ Computer Science 7:e644, 2021
Code repo (github)
35+ papers citing this work (Google scholar)

Empirical Evaluation of Three Common Assumptions in Building Political Media Bias Datasets
Soumen Ganguly, Juhi Kulshrestha, Jisun An, Haewoon Kwak
Proceedings of the 14th International AAAI Conference on Web and Social Media (ICWSM), 2020

25+ papers citing this work (Google scholar)

Tanbih: Get To Know What You Are Reading
Yifan Zhang, Giovanni Da San Martino, Alberto Barrón-Cedeño, Salvatore Romeo, Jisun An, Haewoon Kwak, Todor Staykovski, Israa Jaradat, Georgi Karadzhov, Ramy Baly, Kareem Darwish, James Glass, Preslav Nakov (demo)
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019

Assessing the Accuracy of Four Popular Face Recognition Tools for Inferring Gender, Age, and Race
Soon-gyo Jung, Jisun An, Haewoon Kwak, Joni Salminen, Bernard Jim Jansen
Proceedings of the 12th International AAAI Conference on Web and Social Media (ICWSM), 2018 (short)

65+ papers citing this work (Google scholar)

SemAxis: A Lightweight Framework to Characterize Domain-Specific Word Semantics Beyond Sentiment
Jisun An, Haewoon Kwak, Yong-Yeol Ahn
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL), 2018

50+ papers citing this work (Google scholar)

Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Photos Using GDELT and Deep Learning-based Vision APIs
Haewoon Kwak, Jisun An
ICWSM Workshop on NEws and publiC Opinion (NECO), 2016
Picked as The Best of the Physics arXiv (week ending March 26, 2016) in MIT Technology Review
20+ papers citing this work (Google scholar)

Two Tales of the World: Comparison of Widely Used World News Datasets: GDELT and EventRegistry
Haewoon Kwak, Jisun An
Proceeding of the 10th International Conference on Web and Social Media (ICWSM), 2016 (short)

30+ papers citing this work (Google scholar)

Understanding News Geography and Major Determinants of Global News Coverage of Disasters
Haewoon Kwak, Jisun An (extension of SocInfo’14)
Computation+Journalism (C+J) Symposium, 2014

25+ papers citing this work (Google scholar)

A First Look at Global News Coverage of Disasters By Using the GDELT Dataset
Haewoon Kwak, Jisun An
Proceedings of Social Informatics, 2014
Press Coverage-MIT Technology Review, ACM TechNews
60+ papers citing this work (Google scholar)