Advancing Policy Insights

Opinion Data Analysis and Discourse Structuring Using LLMs

Aaditya (Sonny) Bhatia

Advisor: Dr. Gita Sukthankar

University of Central Florida

2024-04-03

Introduction

Context

Policy decisions pose a complex, wicked problem 1 2

  • Effectiveness determined by solving it; single attempt
  • Measuring impact will shift problem
  • Public discourse helps shape solutions; crucial for policy-making

Determining public opinion

  • Surveys and polls -> Social Media -> Discussion Platforms
  • People willing to express freely
  • Digital platforms provide a wealth of data
  • Unstructured, vast, and complex

Background

Issue Based Information System relies on three types of nodes and nine relationship edges. It is the most commonly used AI argumentation approach that provides the basis for several other platforms. (J. Conklin and Begeman 1988)

Issue Based Information System relies on three types of nodes and nine relationship edges. It is the most commonly used AI argumentation approach that provides the basis for several other platforms. (J. Conklin and Begeman 1988)

Problem Statement

How can LLMs enable us to

  • ingest massive streams of unstructured information
  • incorporate diverse perspectives, and
  • distill them into actionable insights, that
  • align with public opinion?

Research Questions

  • How effectively can LLMs structure and enable access to large amounts of opinion data?

  • What metrics and insights can we generate from embeddings?

  • What are the inherent risks associated with the deployment of LLMs?

gIBIS

  • Networked decision support system (J. Conklin and Begeman 1988)
  • Structured conversation using
    • Issues
    • Positions
    • Arguments
  • Helped identify underlying assumptions
  • Promoted divergent and convergent thinking
  • Limited by structure, learning curve, scalability
  • Led to development of Compendium

gIBIS application supports several types of nodes and links for structured thinking. (J. Conklin and Begeman 1988)

Polis

  • “Real-time system for gathering, analyzing and understanding” public opinion (Small 2021)
  • Developed as an open source platform for public discourse
  • Published several case studies
  • Participants post short messages and vote on others
  • Polis algorithm ensures exposure to diverse opinions
  • \(\vec{comments} \times \vec{votes} =\) opinion matrix
    • fed into statistical models
    • understand where people agree or disagree

Polis live report from Townhall meeting in Bowling Green, Kentucky that took place in 2018.

Crowd-Scale Deliberation for Complex Problems

  • Deliberatorium (Klein 2022)
  • Knowledge Schema: QuAACR
    • Questions
    • Answers
    • Arguments (+, -)
    • Criteria
    • Ratings
    • Decision: Group Consensus
  • Attention Mediation determines next deliberation actions
    • Ideation: Generate more answers
    • Assessment: Evaluate answers
    • Selection: Pick best answers
  • Metrics: Support, Pareto-optimality, Controversiality, Maturity, Decision confidence, Value of information, User Expertise

“Wisdom of the crowd” is used to generate a deliberation map. This produces analytics that are evaluated across several metrics and fed into an attention mediation system, which determines which actions should be taken next and which portions of the tree need the most attention. Those posts are surfaced to the participants for further discussion. (Klein 2022)

D-Agree Crowd-Scale Discussion

  • Automated agent to facilitate online discussion (Ito, Hadfi, and Suzuki 2022)
  • IBIS-based discussion representation
  • Extracts and analyzes discussion structures from online discussions
  • Posts facilitation messages to incentivize participants and grow IBIS tree
  • Best results when agent augmented human facilitators (Hadfi and Ito 2022)
  • Results
    • Use of the agent produced more ideas for any given issue
    • Agent had 1.4 times more replies and 6 times shorter response intervals
    • Increased user satisfaction

Methodology


flowchart LR
    DataIngestion ==> Embeddings
    DataIngestion ==> CommentModeration
    Embeddings ==> TopicModeling
    TopicModeling ==> Labeling
    TopicModeling ==> Structure
    CommentModeration ==> TopicModeling
    Structure & Labeling ==> Tree
    Tree == Agreement Scoring ==> Insights

    DataIngestion[Data Ingestion]
    CommentModeration[Comment \n Moderation]
    Labeling[Topic Label \n Generation]
    TopicModeling[Topic Modeling]
    Structure[Insight \n Generation]
    Tree[Argument Mapping]
    Insights[Actionable Insights]

Data

  • Summary Statistics: conversation topic, number of participants, total comments, total votes
  • Comments: author, comment text, moderated, agree votes, disagree votes
  • Votes: voter ID, comment ID, timestamp, vote
  • Participant-Vote Matrix: participant ID, group ID, n-votes, n-agree, n-disagree, comment ID…
  • Stats History: votes, comments, visitors, voters, commenters
Summary of datasets used in the study
Dataset Participants Comments Accepted
american-assembly.bowling-green 2031 896 607
scoop-hivemind.biodiversity 536 314 154
scoop-hivemind.taxes 334 148 91
scoop-hivemind.affordable-housing 381 165 119
scoop-hivemind.freshwater 117 80 51
scoop-hivemind.ubi 234 78 71

Embeddings

  • Numerical vectors; semantical meaning of a word or sentence
  • Transformer embeddings are contextually relevant; used in LLM inference
    • e.g. “bank” could be a financial institution or a river side
  • Calculated at comment level using Sentence Transformers library
  • Models considered
  • Language Model Selection Criteria

Transformer embeddings vectors numerically represent the semantical meaning different words that can be used for clustering or simple calculations. Source

Text Generation

lm += f"""\
The following is a character profile for an RPG game in JSON format.
```json
{{
    "id": "{id}",
    "description": "{description}",
    "name": "{gen('name', stop='"')}",
    "age": {gen('age', regex='[0-9]+', stop=',')},
    "armor": "{select(options=['leather', 'chainmail', 'plate'], name='armor')}",
    "weapon": "{select(options=valid_weapons, name='weapon')}",
    "class": "{gen('class', stop='"')}",
    "mantra": "{gen('mantra', stop='"')}",
    "strength": {gen('strength', regex='[0-9]+', stop=',')},
    "items": ["{gen('item', list_append=True, stop='"')}", "{gen('item', list_append=True, stop='"')}", "{gen('item', list_append=True, stop='"')}"]
}}```"""

Example output produced by guidance. The green highlighted text is generated by the LLM, while rest is programatically inserted into the context. Inference is significantly faster since the model produces fewer tokens. Output format is strictly enforced using stop criteria, regular expressions, and fixed options.

Comment Moderation

  • Retrospective analysis that simulates real-time moderation
  • Goal is to identify spam, irrelevant comments, and those that violate Polis moderation guidelines
  • Framed as a multi-class classification task for spam detection
  • Gold-standard labels available in source dataset

flowchart LR
    subgraph Metrics
        Accuracy
        Precision
        FalsePositiveRate
        UnsureRate
    end

    subgraph Tasks
        Classification ---> Accuracy & Precision & FalsePositiveRate & UnsureRate
        Classification[Classification: Spam Detection]
    end

    subgraph Variables
        TargetLabels[Target Labels]
        TargetLabels --> Classification
        %% TargetLabels --> ThreeClass[ACCEPT, UNSURE, REJECT]
        %% TargetLabels --> SevenClass[ACCEPT, UNSURE, \nSPAM, IRRELEVANT, \nUNPROFESSIONAL, SCOPE, COMPLEX]
        %% ThreeClass & SevenClass --> Classification
        ReasoningTechnique[Reasoning Technique] --> Classification
        Examples[Use of Examples] --> Classification
    end

Instructions: Three-Class Classification

Discussion Title: Improving Bowling Green / Warren County
Discussion Question: What do you believe should change in Bowling Green/Warren County in order to make it a better place to live, work and spend time?

---
You will be presented with comments posted on Polis discussion platform.
Classify each comment objectively based on whether it meets the given guidelines.

---
Classifications:
- ACCEPT: Comment is coherent, makes a suggestion, or presents a real problem or issue.
- UNSURE: Unclear whether the comment meets the guidelines for ACCEPT.
- REJECT: Comment should definitely be rejected for one of the reasons listed below.

---
Reasons for REJECT:
- SPAM: Comments which are spam and add nothing to the discussion.
- COMPLEX: Comments which state more than one idea. It is difficult to determine the where another person would agree or disagree.

---
Output format:
CLASSIFICATION: One of the following based on given guidelines: ACCEPT, UNSURE, REJECT.
THOUGHT: Express the reasoning for REJECT classification.
Am I certain: Answer with YES or NO. If unsure, state NO.
REASON: One of the following based on given guidelines: SPAM, COMPLEX
EXPLANATION: Provide an explanation for why the comment was classified as REJECT.

Output: Three-Class Classification

Use of Examples

Second-Thought Technique

  • False Positives cause more harm
  • Allow the model to turn a REJECT into UNSURE

Instructions: Seven-Class Classification

Discussion Title: Improving Bowling Green / Warren County
Discussion Question: What do you believe should change in Bowling Green/Warren County in order to make it a better place to live, work and spend time?

---
Classify each comment objectively based on the following guidelines:
- ACCEPT: mentions a real problem related to the discussion.
- ACCEPT: recommends a realistic and actionable solution related to the discussion.
- ACCEPT: makes a sincere suggestion related to the discussion.
- IRRELEVANT: frivolous, irrelevant, unrelated to the discussion.
- IRRELEVANT: does not contribute to the discussion in a meaningful way.
- SPAM: incoherent or lacks seriousness.
- SPAM: provides neither a problem nor a solution.
- UNPROFESSIONAL: the language is informal, colloquial, disrespectful or distasteful.
- SCOPE: cannot be addressed within the scope of original question.
- COMPLEX: introduces multiple ideas, even if they are related to the discussion.
- COMPLEX: discusses distinct problems, making it difficult to determine where another person would agree or disagree.
- UNSURE: may be accepted if it appears somewhat related to the discussion.

---
Output format:
CLASSIFICATION: One of the following based on given guidelines: ACCEPT, UNSURE, SPAM, IRRELEVANT, UNPROFESSIONAL, SCOPE, COMPLEX.
EXPLANATION: Provide an explanation for the classification.

Instructions: Comment Deconstruction

Output format:

PROBLEM: The specific problem mentioned in the comment. If only an action is suggested and no problem is explicitly mentioned, state None.
ACTION: What suggestion or change is proposed. If only a problem is mentioned and no action is suggested, state None.
HOW MANY IDEAS: Number of distinct ideas introduced in the comment.
THOUGHT: Deliberate about how the comment should be classified.
CLASSIFICATION: ACCEPT, UNSURE, SPAM, COMPLEX.
REASON: If comment was not classified as ACCEPT, explain.

Output: Comment Deconstruction and Thought Statements

Experimental Configurations

Config Target Classes Examples Deconstruction CoT Technique
1 3 No No N/A
2 3 Yes No N/A
3 3 No No Thought after rejection
4 3 Yes No Thought after rejection
5 7 No No N/A
6 7 No No Thought before decision
7 7 No Yes Thought before decision
8 7 No Yes N/A
9 3 No Yes Thought before decision

Topic Modeling

graph TD
    Vectors[Embedding Vectors]
    ReducedVectors[Reduced Embedding Vectors]
    Clusters[Hierarchical Clusters]
    BagOfWords["Bag of Words (Per-Cluster)"]
    Keywords[Significant Keywords]
    Labels[Topic Labels]
    Representation[Topic Representation]

    subgraph DataPreparation[Data Preparation]
        direction LR
        Statements -- Transformers --> Vectors -- UMAP --> ReducedVectors
    end

    subgraph KeywordExtraction[Keyword Extraction]
        direction LR
        BagOfWords -- c-TF-IDF --> Keywords
    end

    subgraph TopicRepresentation[Topic Representation]
        direction LR
        Representation -- LLM --> Labels
    end

    DataPreparation -- HDBSCAN --> Clusters
    Clusters -- Vectorizer --> KeywordExtraction
    KeywordExtraction -- MMR + POS Filtering --> TopicRepresentation

UMAP

flowchart LR
    Embeddings["Transfomer Embeddings \n 1024-4096 dimensions"] --> UMAP{"UMAP"}
    UMAP --> ReducedEmbeddings["Reduced Embeddings \n 10-100 dimensions"]

HDBSCAN

c-TF-IDF

\[ \text{TF-IDF(word, cluster)} = \frac{\text{word's frequency within that cluster}}{\text{word's frequency in the entire dataset}} \]

  • Used Spacy to remove stop words and optimize signal-to-noise ratio
  • Fed into a language model to generate concise topic labels

Argument Generation

  • Actionable insights that urge for specific actions to address issues
  • Problems and solutions proposed by participants
  • LLM synthesizes arguments from comments within each topic
  • Advocate for specific arguments urging actions to address issues
  • Filter these arguments to derive actionable insights

flowchart LR
    topic[Topic Description]
    comments[User Comments]
    areas[Areas of Improvement]
    agreeability([Filter by Agreeability \n 'agreeability threshold'])
    problems[Problems Discussed]
    solutions[Solutions Identified]
    arguments[Actionable Insights]
    selectTop([Select Top Insights per topic \n 'n_insights_per_topic'])
    report[Argument Map]
    comments --> agreeability & topic --> areas --> problems & solutions --> arguments
    arguments --> selectTop --> report

Argument Scoring

  • Goal: Quantify acceptance of each generated argument
  • Task: Identify comments that support each argument
  • Count the individuals that voted positively on supporting comments
  • Calculate an “acceptance” factor to indicate the degree of consensus

flowchart BT
    ind1["🙂 Alice"]
    ind2["🙂 Bob"]
    ind3["🙂 Charlie"]
    ind4["🙂 Daisy"]
    ind5["🙂 Eve"]
    arg1["🤖 Argument \n +3, -2"]
    s1["🙂 Comment 1 \n +3, -1"] -- SUPPORT --> arg1
    s2["🙂 Comment 2 \n +2, -1"] -- SUPPORT --> arg1
    ind1:::support --> s1
    ind2:::refute --x s1
    ind3:::support --> s1
    ind3:::support --> s2
    ind4:::support --> s1
    ind4:::support --> s2
    ind5:::refute --x s2

    classDef support fill:#8c8
    classDef refute fill:#c88

Potential Biases in Argument Generation

  • Some comments, especially those posted earlier, may receive more votes than others
    • Use the ratio of agreement votes to total votes
  • Certain topics are more popular and have more comments than others
    • Generate a balanced number of arguments for each topic
  • Certain controversial topics are heavily downvoted
    • Comments: Filter by quantiles instead of fixed thresholds within each topic
    • Arguments: Select fixed number of “best arguments” from each topic
  • Some people vote more than others
    • Count the individuals that support an argument over hard vote count

Argument Mapping

  • Used Argdown syntax to generate argument maps
  • Developed a grammar generator to convert data into Argdown format
  • Generated argument maps for each topic to visualize the structure of the debate
===
sourceHighlighter:
    removeFrontMatter: true
webComponent:
    withoutMaximize: true
    height: 500px
===

# Argdown Syntax Example

[Statement]: Argdown is a simple syntax for defining argumentative structures, inspired by Markdown.
  + Writing a list of **pros & cons** in Argdown is as simple as writing a twitter message.
  + But you can also **logically reconstruct** more complex relations.
  + You can export Argdown as a graph and create **argument maps** of whole debates.
  - Not a tool for creating visualizations, but for **structuring arguments**.

<Argument>: Argdown is an excellent tool and should be used by the city of Bowling Green, KY.

[Statement]
  +> <Argument>

Results

Comment Moderation

  • Accuracy generally the same
  • Unsure rate increases with complexity of task
  • Deconstruction reduces false positive rate
  • CoT not as effective as deconstruction
  • Examples must be specific to dataset

Configurations

  • 1: Baseline
  • 2: Examples
  • 3: Thought
  • 4: Thought + Examples
  • 5: 7-class Baseline
  • 6: Thought
  • 7: Thought + Deconstruction
  • 8: Deconstruction
  • 9: Deconstruction, 3-class

UMAP 2D Projection of Reduced Embeddings

Final parameters:

  • n_neighbors = 8
  • min_dist = 0
  • n_components = 32
  • metric = 'cosine'

UMAP Projection of american-assembly.bowling-green dataset

Network connectivity graph that shows distance between high-density regions.

Topic Distribution

Each bar represents a topic cluster, with y-axis representing statement count; Topic -1 is reserved for outliers that do not initially belong to a cluster.

Upon reassigning outliers, each topic has a larger number of statements; we reassign all topics ensuring that no statement is discarded as noise.

Statement Distribution

Statement Distribution after outlier reassignment

Hierarchical Topic Structure

Argument Generation and Scoring

Opioid Epidemic and Healthcare

Argument Generation and Scoring

Community Enrichment

Conclusion

LLMs in structuring online debates

  • Potential of LLMs for simple tasks
  • Chaining simple tasks for complex reasoning
  • Discovering topics in a large dataset and new generating valuable insights
  • Risk of hallucinations and incorrect output
  • Enhancing democratic processes by enabling public discourse
  • Critical need for ethical and inclusive technology deployment

Contributions to Advancing Policy Insights

  • Enhanced Moderation Techniques
  • Topic Modeling
  • Argument Generation
  • Argument Scoring and Mapping

Limitations and Challenges

  • Augmenting vs replacing human moderation processes
  • LLMs’ limitations in processing complex instructions and sentences
    • Complex instructions
    • Relationship modeling based on double and triple negatives
  • Reliability and bias

Future Research Directions

  • Semantic extraction and reasoning during discourse
  • Exploring connections across topics
  • Generalizing techniques to platforms like Kialo, Hacker News

References

Conklin, E. Jeffrey. 2006. Dialogue Mapping: Building Shared Understanding of Wicked Problems. Chichester, England ; Hoboken, NJ: Wiley.
Conklin, Jeff, and Michael L. Begeman. 1988. gIBIS: A Hypertext Tool for Exploratory Policy Discussion.” ACM Transactions on Information Systems 6 (4): 303–31. https://doi.org/10.1145/58566.59297.
Conklin, Jeff, Albert Selvin, Simon Buckingham Shum, and Maarten Sierhuis. 2001. “Facilitated Hypertext for Collective Sensemaking: 15 Years on from gIBIS.” In Proceedings of the 12th ACM Conference on Hypertext and Hypermedia, 123–24. HYPERTEXT ’01. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/504216.504246.
Grootendorst, Maarten. 2022. BERTopic: Neural Topic Modeling with a Class-Based TF-IDF Procedure.” arXiv. http://arxiv.org/abs/2203.05794.
Hadfi, Rafik, and Takayuki Ito. 2022. “Augmented Democratic Deliberation: Can Conversational Agents Boost Deliberation in Social Media?” In Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems, 1794–98. AAMAS ’22. Richland, SC: International Foundation for Autonomous Agents; Multiagent Systems.
Ito, Takayuki, Rafik Hadfi, and Shota Suzuki. 2022. “An Agent That Facilitates Crowd Discussion.” Group Decision and Negotiation 31 (3): 621–47. https://doi.org/10.1007/s10726-021-09765-8.
Klein, Mark. 2022. “Crowd-Scale Deliberation For Complex Problems: A Progress Report.” https://doi.org/10.13140/RG.2.2.34676.01928.
Klein, Mark, and Luca Iandoli. 2008. “Supporting Collaborative Deliberation Using a Large-Scale Argumentation System: The Mit Collaboratorium.” {SSRN} {Scholarly} {Paper}. Rochester, NY. https://doi.org/10.2139/ssrn.1099082.
Kunz, W., and H. W. J. Rittel. 1970. Issues as Elements of Information Systems. California. University. Center for Planning and Development Research. Working Paper, No. 131, no. 131. Institute of Urban; Regional Development, University of California. https://books.google.com/books?id=B-MaAQAAMAAJ.
Small, Christopher. 2021. “Polis: Scaling Deliberation by Mapping High Dimensional Opinion Spaces.” RECERCA. Revista de Pensament i Anàlisi, July. https://doi.org/10.6035/recerca.5516.