Guardrails in Communication Networks
Who decides what we see online? At its 3rd annual workshop, the Center for Information Networks and Democracy will gather an exceptional group of scholars to assess whether online discussions and content moderation enhance or diminish the information we see.
| schedule • abstracts • logistics |
Schedule
**Registration to this event will open soon**. Please complete this form if you would like to attend the workshop. Seats are limited and reserved for Penn community members. We will confirm participation via email.
Thursday, April 30
- 8:30-9:00am | Breakfast (workshop participants)
- 9:00-9:15 | Welcome remarks
- 9:15-10:45 | Keynote: Jaime Settle, College of William & Mary, "Talking About Politics When Everything is Political."
- 10:45-11:00 | Coffee break (workshop participants)
- 11:00-12:00pm | Presentation 1: Laura Edelson, Northeastern University, "Content Removal vs. Algorithmic Promotion: Exploring Their Combined Effect on User Exposure."
- 12:00-1:00 | Presentation 2: Douglas Guilbeault, Stanford University, "Age and Gender Distortion in Online Media and Large Language Models."
- 1:00-2:00 | Lunch (workshop participants)
- 2:00-3:00 | Presentation 3: Lisa P. Argyle, Purdue University, "Productive Disagreement as a Skill: AI Training for Political Conversation."
- 3:00-4:00 | Presentation 4: J. Nathan Matias, Cornell University, "TBC"
- 4:00-4:30 | Coffee break (workshop participants)
- 4:30-5:30 | Short Format Presentations. Speakers: Yijing Chen, Annenberg School for Communication, University of Pennsylvania, "Banned Communities Grow through a Distinctive Pattern of User Engagement and Networked Coordination" and Vishwanath E.V.S., Annenberg School for Communication, University of Pennsylvania, "Unpacking How Context (Conversation History) Shifts the Framing of Large Language Model Outputs."
- 6:30-8:30 | Dinner at the White Dog (invited speakers only)
Friday, May 1
- 8:30-9:15am | Breakfast (workshop participants)
- 9:15-10:45 | Keynote: Moor Naaman, Cornell Tech, "TBC."
- 10:45-11:00 | Coffee break (workshop participants)
- 11:00-12:00pm | Presentation 5: Matthew DeVerna, Stanford University, "Large Language Models Require Curated Context for Reliable Political Fact-Checking — Even with Reasoning and Web Search."
- 12:00-1:00 | Presentation 6: Ariel Hasell, University of Michigan, "How Information Abundance, News Negativity, and Media Distrust are Changing the Way Americans Engage with Journalism."
- 1:00-2:00 | Lunch (workshop participants)
- 2:00-3:00 | Short Format Presentations. Speakers: Emma Lurie, Computer and Information Science, University of Pennsylvania, "Longitudinal Monitoring of LLM Content Moderation of Social Issues" and Rehan Mirza, Annenberg School for Communication, University of Pennsylvania, "Motivated moderation: How Partisan Alignment Influences Civility Judgments in Community-Based Meme Forums."
- 3:00-4:00 | Recap Discussion. Moderator: Alex Engler, Penn Center on Media, Tech, and Democracy.
- 4:00-4:15 | Closing remarks
- 4:15-5:30 | Happy Hour at Louie Louie (workshop participants)
Abstracts
"Talking About Politics When Everything is Political", by Jaime Settle
Do people make their divisions deeper when they interact with each other about politics? This question has long motivated scholars of political psychology and communication, but it has become all the more pressing in an era defined by polarization, hyperpartisanship, and the politicization of many facets of society. In this research presentation, I theorize more fully about the nature of organic political interactions, both online and offline, and the implications of the underlying psychology of communication for the public’s willingness to engage about substantive political topics. Integrating insights from research using a diverse of methods, ranging from psychophysiological measurement to computational social science, we will unpack more realistic expectations about when political interaction might exacerbate our divides and when it might ameliorate them.
"Content Removal vs. Algorithmic Promotion: Exploring Their Combined Effect on User Exposure", by Laura Edelson
Content moderation and algorithmic feed recommendation are the two platform systems with the most direct control over what users see. Yet research and policy discussions almost always treat them in isolation — moderation scholarship measures removal rates and response times, while recommendation system research focuses on ranking objectives and engagement optimization. This separation obscures a basic question: what is the net effect of these two systems operating simultaneously, sometimes in tension with one another? In this talk, I draw on two empirical investigations into each side of this question. First, a measurement study of content removal on Facebook introduces the metric of prevented dissemination and finds that removals prevent only 24–30% of posts' predicted engagement, revealing the limits of moderation when engagement accrues faster than review. Second, a comparative survey of feed algorithm designs across six major platforms identifies key design dimensions — including usage intensity optimization, content timeliness, and inventory source selection — that create structurally different conditions for moderation to operate in. Building on these findings, I propose new directions for measuring the relationship between these two systems proportionally: How much of a user's exposure to harmful content is shaped by what the algorithm chose to promote versus what moderation failed to catch in time? What algorithm design traits make moderation more or less effective? I will outline approaches for connecting these measurement frameworks and discuss what such an integrated view could mean for platform safety research and policy.
"Age and Gender Distortion in Online Media and Large Language Models", by Douglas Guilbeault
Are widespread stereotypes accurate or socially distorted? This continuing debate is limited by the lack of large-scale multimodal data on stereotypical associations and the inability to compare these to ground truth indicators. In this talk, I will present our recent work in which we address this challenge in the analysis of age-related gender bias, for which age provides an objective anchor for evaluating stereotype accuracy. Despite there being no systematic age differences between women and men in the workforce according to the US Census, we find that women are represented as younger than men across occupations and social roles in nearly 1.4 million images and videos from Google, Wikipedia, IMDb, Flickr and YouTube, as well as in nine language models trained on billions of words from the internet. This age gap is starkest for content depicting occupations with higher status and earnings. We further show how mainstream algorithms amplify this bias. A nationally representative pre-registered experiment (n = 459) finds that Googling images of occupations amplifies age-related gender bias in participants’ beliefs and hiring preferences. We additionally show that when generating and evaluating resumes, ChatGPT assumes that women are younger and less experienced, rating older male applicants as higher quality. I conclude by discussing ongoing work that builds on this computational paradigm to show how we can leverage large-scale social data and artificial intelligence to discover novel dimensions of stereotypes that are predictive of human psychology.
"Productive Disagreement as a Skill: AI Training for Political Conversation", by Lysa P. Argyle
A growing body of academic research and practitioner evidence shows that, under the right conditions, political conversations can reduce political animosity, de-escalate conflict, and facilitate compromise. However, not all difficult political conversations are so idyllic: many are heated, combative, and create feelings of stress, frustration, or anger. Because many people are concerned about such negative impacts on themselves and their relationships, they avoid engaging at all in political conversations with people they disagree with. Even when they do choose to participate in these discussions, people generally lack the civic skills for productive engagement in divisive, disagreement-based, or conflictual interactions. We propose that productive disagreement is a skill that can be improved with instruction and practice. At the same time, existing programs that can teach this skill are severely resource constrained and face limits on their scalability. In this project, we suggest that generative AI tools can step into this space to both train people to develop these disagreement skills and test transfer from this learning into conversations with others. Specifically, we develop an AI agent tailored to provide real-time coaching and training to participants about engaging productively in political discussion using evidence-based best practices drawn from a variety of fields. Post-training, we also deploy a separate AI agent to give people practice engaging in potentially heated discussions without posing any risk to their real-life relationships and as an evaluation tool. We then evaluate whether practice and coaching improve people’s confidence and ability to engage productively in divisive political conversations.
"Title TBC", by J. Nathan Matias
Abstract TBC.
"Banned Communities Grow through a Distinctive Pattern of User Engagement and Networked Coordination", by Yijing Chen
Research on online content moderation overwhelmingly examines what happens after platforms intervene (e.g., subreddit bans or quarantines), with far less attention being devoted to the processes that unfold before such interventions occur. To address this gap, we analyze the full histories of 4,237 subreddits banned in 2020 using the Pushshift Reddit dataset, and compare them with a matched sample of unbanned subreddits with similar lifespans and popularity. Our results show that subreddits banned in 2020 follow systematically different trajectories from comparable baselines well before their bans. At the macro level, banned subreddits accumulate substantially more users, submissions, and comments. At the micro level, they attract more committed users who join earlier in their platform lifetime and remain active for longer durations. At the meso level, they are more structurally embedded in the subreddit ecosystem: they occupy more central positions in co-posting, referencing, and activity-flow networks, with greater user overlap, broader cross-subreddit references, and higher volumes of activity flows. Furthermore, we show that subreddit growth is more a product of network-mediated participation than atomistic individual decisions, a pattern that exists in both banned and matched subreddits but is more salient in banned ones. User entries into banned subreddits are more temporally clustered among users with prior shared community affiliations, and activity flows into banned subreddits are more strongly associated with references to them in other subreddits. Overall, our findings highlight warning signals of moderation-relevant risks, and reframe high-risk communities not as isolated cases of rule violations, but as products of networked engagement that warrant an ecosystem-level perspective.
"Unpacking How Context (Conversation History) Shifts the Framing of Large Language Model Outputs", by Vishwanath E.V.S.
One of the earliest and most prominent use cases proposed for Large Language Models (LLMs) is their potential to serve as search engines. Proponents have argued that querying LLMs would be akin to conversing with "domain experts" who would return comprehensive answers, instead of a ranked list of sources containing relevant information. However, new search and information seeking systems are not free of old problems, and in the case of LLM-powered search engines, the risk of creating or reinforcing echo chambers re-emerges through the new affordances and information retrieval mechanisms intrinsic to the algorithmic operation of these models . The existence of LLM-driven echo chambers is still an emerging area of research. Recent work in this area finds that the conversational design of LLMs can encourage users to rely on confirmatory queries. This project aims to study whether LLMs generate responses aligned with users' political priors through an in-silico study evaluating how LLM responses to the same questions vary as a function of the ideological slant of the language used in prior conversational exchanges.
"Title TBC", by Moor Naaman
Abstract TBC.
"Large Language Models Require Curated Context for Reliable Political Fact-Checking—Even with Reasoning and Web Search", by Matthew DeVerna
Large language models (LLMs) have raised hopes for automated end-to-end fact-checking, but prior studies report mixed results. As mainstream chatbots increasingly ship with reasoning capabilities and web search tools—and millions of users already rely on them for verification—rigorous evaluation is urgent. In this talk, I will present results from a study evaluating 15 recent LLMs from OpenAI, Google, Meta, and DeepSeek on more than 6,000 claims fact-checked by PolitiFact, comparing standard models with reasoning- and web-search variants. Standard models perform poorly, reasoning offers minimal benefits, and web search provides only moderate gains, despite fact-checks being available on the web. In contrast, a curated RAG system using PolitiFact summaries improved macro F1 by 233% on average across model variants. These findings suggest that giving models access to curated high-quality context is a promising path for automated fact-checking.
"How information abundance, news negativity, and media distrust are changing the way Americans engage with journalism", by Ariel Hasell
As Americans increasingly turn to social media for news, they face political information environments that are overwhelmingly crowded, emotional, and lack clear epistemic hierarchies of informational sources. This has wide reaching consequences for politics when it comes to issues like misinformation and polarization, but it is also changing public perceptions of what news media are and challenging traditional understanding of the role of news media in democracies. Using panel survey data from 2024, this talk explores how digital information environments can encourage news distrust and disengagement from news and politics, but also how alternative voices in these environments, like social media influencers, may play a role in encouraging more engagement with news and politics. Together, these studies highlight how social media are shaping how Americans define and consume news media.
"Longitudinal Monitoring of LLM Content Moderation of Social Issues", by Emma Lurie
Large language models' (LLMs') outputs are shaped by opaque and frequently-changing company content moderation policies and practices. LLM moderation often takes the form of refusal; models' refusal to produce text about certain topics both reflects company policy and subtly shapes public discourse. We introduce AI Watchman, a longitudinal auditing system to publicly measure and track LLM refusals over time, to provide transparency into an important and black-box aspect of LLMs. Using a dataset of over 400 social issues, we audit Open AI's moderation endpoint, GPT-4.1, and GPT-5, and DeepSeek (both in English and Chinese). We find evidence that changes in company policies, even those not publicly announced, can be detected by AI Watchman, and identify company- and model-specific differences in content moderation. We also qualitatively analyze and categorize different forms of refusal. This work contributes evidence for the value of longitudinal auditing of LLMs, and AI Watchman, one system for doing so.
"Motivated moderation: How partisan alignment influences civility judgments in community-based meme forums", by Rehan Mirza
Content moderation requires moderators to balance harm prevention with free expression, a tension that is especially acute for political incivility, where norm violations must be weighed against salient political speech. Prior work suggests that moderation decisions may be distorted by motivated partisan reasoning, yet evidence from community-driven settings, where users create and enforce rules, remains limited. This study examines whether political alignment biases moderation judgments in community settings. Using a 2x3 within-subjects survey experiment simulating a moderator queue, U.S.-based participants evaluated social media memes varying in political alignment (aligned vs. opposed) and civility (uncivil, borderline uncivil, civil), applying explicit community rules to select enforcement actions with specified consequences. Outcomes were measured across two dimensions: violation recognition (whether content should be censored) and enforcement severity (the level of action taken). Using memes with political messaging overlaid on cartoon images, I test whether alignment effects persist in humorous, seemingly low-stakes contexts where political salience may be reduced. Focusing on civility rather than fact-based violations like misinformation, the study isolates motivated reasoning mechanisms, including “party promotion” and “preference gaps”. Results show that increased incivility and partisan opposition significantly increase violation recognition and enforcement severity, with no interaction between the two. This indicates that partisan bias operates independently of interpretative ambiguity. Political content is systematically judged more stringently than non-political content, suggesting it may arouse negative affect. This study carries direct implications for platform oversight at a time when platforms are increasingly delegating moderation duties to volunteer communities.
Logistics
When?
The workshop will take place on April 30-May 1, 2026.
Where?
Room 500 at the Annenberg School for Communication (please, use the Walnut Street entrance to be directed on how to reach the room).
| schedule • abstracts • logistics |