ArgMap

On this page

  • Import Packages and Setup Environment
  • Load Dataset
  • Grammar Specification for Text Generation
  • Initialize Language Model
  • Generate Topic Titles
    • TheBloke/Mixtral-8x7B-Instruct-v0.1-GPTQ
    • mistralai/Mistral-7B-Instruct-v0.2
    • meta-llama/Llama-2-13b-chat-hf
    • openai-community/gpt2
  • Chat Instruction Format
    • Mistral without context reset
    • Mistral with context reset
  • View source

Topic Titles using LLMs

Test various language models to generate appealing and representative topic titles
Author

Sonny Bhatia

Published

February 18, 2024

Import Packages and Setup Environment

Code
from polis.data_model import Comments, Topics, HierarchicalTopics, Arguments, ArgumentCommentMap, Summary
import math
import sys
import os
from pprint import pprint
from tqdm.notebook import tqdm

import polars as pl
from argmap.dataModel import DataModel, Summary, Comments

from dotenv import load_dotenv
load_dotenv()

# this allows categorical data from various sources to be combined and handled gracefully; performance cost is acceptable
pl.enable_string_cache()
Code
import guidance
from guidance import models, gen, select, instruction, one_or_more, zero_or_more

Load Dataset

Code
from IPython.display import display_markdown

# for now, work with one source
DATASET = "american-assembly.bowling-green"
# "scoop-hivemind.biodiversity"
# "scoop-hivemind.freshwater"
# "scoop-hivemind.taxes"
# "scoop-hivemind.ubi"
# "scoop-hivemind.affordable-housing"
# "london.youth.policing"
# "canadian-electoral-reform"
# "brexit-consensus"
# "ssis.land-bank-farmland.2rumnecbeh.2021-08-01"

summary = Summary(DATASET)

comments = Comments(DATASET).load_from_parquet()
topics = Topics(DATASET).load_from_parquet()
hierarchicalTopics = HierarchicalTopics(DATASET).load_from_parquet()

display_markdown(f"""
### Dataset: {DATASET}
#### {summary.topic}
#### {summary.get('conversation-description')}
#### Full Report: [{summary.url}]({summary.url})
#### Topic Count: {topics.df.height}
""", raw=True)

print("Topics Dataset Overview:")
topics.glimpse()

Grammar Specification for Text Generation

Code
@guidance(stateless=True)
def generate_line(lm, name: str, max_tokens=50, temperature=0):
    return lm + gen(name=name, max_tokens=max_tokens, temperature=temperature, list_append=True, stop=['\n'])

@guidance(stateless=True)
def generate_phrase(lm, name: str, max_tokens=50, temperature=0):
    return lm + gen(name=name, max_tokens=max_tokens, temperature=temperature, list_append=True, stop=['\n', '.'])

@guidance(stateless=True)
def generate_number(lm, name: str, min: int, max: int):
    return lm + select(list(range(min, max+1)), name=name, list_append=True)
Code
@guidance
def generate_topic_titles(lm, main_title, question, df):
    lm += f"""\
    The following is a dataset of comments from an online discussion about {main_title}. Assign a terse title to each topic based on the given keywords. Avoid repetitive phrases.

    Question: {question}
    """

    for topic, keywords, docs in df.rows():
        lm += f"""
        Topic {topic}:
        Keywords: {', '.join(keywords)}
        Example Comments: {'; '.join(docs)}
        Topic Title: """ + gen(name='topic_title', list_append=True, stop=['\n', '.'], max_tokens=20, temperature=1) + "\n"

    return lm

Initialize Language Model

Code
import torch

if not torch.cuda.is_available():
    raise Exception("No CUDA device found")

mistral = models.TransformersChat("mistralai/Mistral-7B-Instruct-v0.2", device_map="auto") # ~31GB RAM

# mistral1 = models.TransformersChat("mistralai/Mistral-7B-v0.1", device_map="auto") ~26GB RAM
# llama = models.TransformersChat("meta-llama/Llama-2-13b-chat-hf", device_map="auto") # ~50GB RAM
# olmo = models.TransformersChat("allenai/OLMo-7B", device_map="auto") # unable to run without breaking environment
# mixtral = models.TransformersChat("TheBloke/Mixtral-8x7B-Instruct-v0.1-GPTQ", device_map="auto") # #26GB RAM
# gpt2 = models.TransformersChat("openai-community/gpt2-large", device_map="auto") # ~4GB RAM

Generate Topic Titles

TheBloke/Mixtral-8x7B-Instruct-v0.1-GPTQ

Code
# grab topic keywords
df_topic_keywords_docs = df_topics.select('Topic', 'Representation', 'Representative_Docs')

# instruct LLM to generate topic titles
lm = mixtral + generate_topic_titles(summary['topic'], summary['conversation-description'], df_topic_keywords_docs)
The following is a dataset of comments from an online discussion about Improving Bowling Green / Warren County. Assign a terse title to each topic based on the given keywords. Avoid repetitive phrases.

Question: What do you believe should change in Bowling Green/Warren County in order to make it a better place to live, work and spend time?

Topic 0:
Keywords: road, drivers, traffic, lane, roads, bypass, intersection, turn, congestion, cemetery
Example Comments: Widen Cemetry Road, Russellville Road, Lover’s Lane, Campbell Lane, The Old ByPass, and Scotsville Road from old ByPass to Natcher Parkway.; Smallhouse Rd (at Campbell Ln) heading into town needs to be widened to have room for three lanes (left turn, straight, right turn).; Access Rd, east side of Scottsville Rd. in shopping dist causes congestion. Give right of way to access road or cross streets, not a mix!
Topic Title: Widen roads and manage traffic

Topic 1:
Keywords: fairness, ordinance, noise, improvement, city, separation, lawns, code, food, ideas
Example Comments: City officials need to work with WKU admin. to address the noise & public nuisance problems with the frat houses and homeowners downtown.; NO to the "Fairness" Ordinance. Current law is sufficient.  ZERO reported instances where LGBTQ people have been discriminated against in BG; The city should craft a very simple fairness ordinance, saying no discrimination acquiring housing, but not giving LGBTQ EEOC Bypass @work
Topic Title: Enact fairness and noise ordinances

Topic 2:
Keywords: drug, marijuana, opioid, doctors, drugs, pain, deaths, dealers, sex, possession
Example Comments: The fact that there is not as much coverage about African American deaths from cocaine as there is about deaths in the white community from opioids, even though there are as many of them, is evidence of institutionalized racism.; In order to combat teen pregnancy and STD rates, high schools in Bowling Green should offer comprehensive, medically accurate sex education.; In order to better combat the opioid epidemic, it is time to view drug addiction as a health problem rather than a criminal justice problem.
Topic Title: Decriminalize drug use and provide education

Topic 3:
Keywords: homeless, parks, park, house, hangout, tiny house, community, aid, benefit, size
Example Comments: We need to replace "Room at the Inn" services with temporary decent housing connected to aid agencies for the homeless; I would like to see more aid for the homeless whether: tiny house park, health clinic, rehabilitation, job readiness sources.; Homeless people deserve full size places like studio apartments.  Not tiny homes like a smurf would desire in a tiny house park.
Topic Title: Provide larger, better housing for homeless

Topic 4:
Keywords: programs, activities, weekend, training, higher education, youth, museum, credit, teenagers, job
Example Comments: A favorite place to visit (elsewhere) is a park filled with gently used toys (donated).  It inspires imagination, creativity and sharing.; We  a true junior college which offers credit / non credit courses like many other states from basket weaving to cyber security; Need comprehesive probation rehab programs - include addiction treatment, life & job skills & ideas securing housing, training/job etc
Topic Title: Create rehab and educational programs

Topic 5:
Keywords: school, schools, parents, public schools, public school, kids, choice, education, charter, districts
Example Comments: Future charter schools would be a financial burden to public school funding, unless they are held to accountability by local school boards.; some children with cultural differences or dietary needs cannot eat a typical lunch at school please provide special diet options.; Both school districts shamefully manipulate low economic status parents to gain Title I funding. It should stop.
Topic Title: Improve public schools and distribute federal funds fairly

Topic 6:
Keywords: vote, officials, proposal, project, power, seller, influence, local officials, funds, unelected boards
Example Comments: Officials that use social media accounts in a professional capacity should not be allowed to block constituents except under rare conditions like being threatened.; Big names and corporations get away with too much. We need fairer regulation and level playing fields for the small and big guys alike.; Local elected government officials should not be able to hold a seat and also be on the TIF board or other unelected boards.
Topic Title: Distribute balance of power and regulate how power is used

Topic 7:
Keywords: campus, program, university, students, local businesses, miles, restaurants, courses, mental health, center
Example Comments: Residential Parking Permits for areas near WKU. We can rescue our yards from parking. We pay property taxes for the students' convenience,; Provide a food assistance program for college students as students lots of times do not qualify for food stamps due to living on campus; Make areas around WKU's campus livable for college students. Eliminating section 8 housing will lead to less crime & lower cost of attendance.
Topic Title: Provide resources for students and local businesses

Topic 8:
Keywords: river, immigrants, refugees, tourists, demographic, riverfront, communities, ways, art, immigrant
Example Comments: Need engaging art installations throughout the city. People will be able to experience art, have fun, and share on social media.; The very low number of immigrants coming to our county last year was a good thing. Saves costs: Schools, courts, social services-Our taxes!; The Barren River water front needs to be improved on both sides of the river and include  canoeing and kayaking water features in the river.
Topic Title: Engage with immigrant communities and improve riverfront

Topic 9:
Keywords: parking, space, garage, yard, square, shade, mall, trees, downtown, pedestrian
Example Comments: Better parking around the square would draw in more college students who like to shop but can not find parking downtown.; Owners of large parking lots should be required to break them up with pockets of green space and trees; perhaps create a walkable green path; Build parking garage behind Spencer’s, remove some parking from Ft. Square. Add a green space on top of garage! Soccer field or rooftop food
Topic Title: Design green urban spaces

Topic 10:
Keywords: police, cops, parking, enforcement, traffic, accidents, cameras, officers, tickets, presence
Example Comments: Houston "courtesy police" maintain safety on access rds enforcing right of way & parking/litter rules, assist traffic flow - we need!; BG needs to install traffic cameras that help police intersections like Gallatin, TN.  This would reduce accidents, traffic flow and add $; BGDN should publish these BGPD stats weekly – 1) traffic citations 2) vehicle accidents reported; 3) traffic accidents resulting in injury.
Topic Title: Improve parking enforcement and increase police presence

Topic 11:
Keywords: internet, cable, fiber, competition, rates, tv, choices, option, media, companies
Example Comments: BGMU needs to offer residential fiber internet as a UTILITY, using Chattanooga as a perfect example.; Cable companies need competition and not be allowed to raise rates of  customers. Once you sign on, that is your rate unless you upgrade.; Encourage growth of OTA tv stations. Lots of cord cutters and shavers enjoy the HD content they can get with an antenna.
Topic Title: Increase connectivity's speed and affordability

Topic 12:
Keywords: temp, wage, benefits, openings, minimum, retirement, employee, employer, wages, companies
Example Comments: Temp service need to pay higher hourly rate to temp than hiring firm pays their FTE to account for reduced economic security; City should bar garbage token health plans designed to prevent temp services paying $166/month Obamacare emp. fee. Let ppl get subsidy.; City needs to bar big COs employing from half a dozen temp services to keep 100s ineligble for FMLA due to less than 50 ppl/per 'employer'
Topic Title: Improve wage and benefits for temp workers

Topic 13:
Keywords: rental, rent, renters, rental property, properties, prices, property, landlords, units, laws
Example Comments: Rental property taxes should be reduced for private landlords to encourage them to develop more, as opposed to large ones like Chandler.; Stronger tenant and renters rights. Landlord accountability for keeping properties in good shape. So many overpriced dilapidated rentals.; Overdevelopment of rental property is detrimental to cities.  Renters do not pay property tax.  Schools suffer when the tax base is eroded.
Topic Title: Manage growth of residential property market

Topic 14:
Keywords: sidewalks, bike, trails, walking, trail, walkers, great addition, biking, ramps, small ramps
Example Comments: Install new sidewalks along major roads, like Three Springs, and city-wide bike routes, to decrease likelihood of accidents; Make it a walking city. Sidewalks along the main avenues (scottsville rd, campbell ln, 31 bypass) with plants, benches, art pieces.; Have a Dual Slalom Course, Pump Track & Bike Skills Course co-located by the Low Hollow Mountain Bike Trail at Weldon Peete Park.
Topic Title: Connect urban spaces through bike and pedestrian-friendly infrastructure

Topic 15:
Keywords: water, fire, lines, population, growth, bills, new schools, septic tanks, new roads, new subdivisions
Example Comments: Need new schools in Warren County to keep up with population growth, CTE is falling down, Drakes and Greenwood high are way overcrowded; County needs to extend water lines in Tuckertown area. About a dozen homes don't have county water or fire hydrants!; Tuckertown area needs fire protection. Currently rated Class 10 expensive insurance. Water lines are close by on Otter Gap & Tuckertown Rds.
Topic Title: Provide fire protection and manage development to ensure public safety

Topic 16:
Keywords: animals, smoking, pets, leashes, leash, dog, owners, ordinance, pet owners, stray
Example Comments: Better ordinances on keeping animals on leashes or in fenced areas are needed.; There should never be a leash law in the county - neighborhood dogs protect, play with and teach kids. It was a factor in living in county.; Relax restrictions on fencing in Home Owners Associations to promote people keeping animals on leashes or in fenced yards.
Topic Title: Balance pet and owner rights through sensible animal laws

Topic 17:
Keywords: swimming, complex, swim, pools, lessons, facilities, sports, safety, year, therapy
Example Comments: Bowling Green needs more indoor sports facilities, particularly a large complex with multiple indoor soccer fields, for youth.; BG's Kummer Little gym is open limited hours for indoor track. On other hand, County has new gyms that are open lots of hrs. Inequitable.; Indoor swimming pools for public use and therapy year around without time restrictions due to special swim leagues. Encourage healthier life
Topic Title: Provide affordable indoor sports and pool facilities

Topic 18:
Keywords: historic preservation, preservation, planning, building, buildings, historic building, urban planning, heritage, retail businesses, banks
Example Comments: We need urban planning to ensure that mixed use housing, shopping and new neighborhoods are created to make our community better; Historic preservation of homes and buildings needs countywide attention to maintain our unique architectural heritage.; We must ensure historic preservation codes are properly enforced in our historic districts; lack of consistency erodes our unique aesthetic.
Topic Title: Balancing historical preservation and urban development

Topic 19:
Keywords: recycling, litter, trash, bins, services, yards, junk, stronger enforcement, outdated equipment, dump
Example Comments: BG has a bad problem with littering.  My husband and I fill up to 5 recycle bins every week between downtown and campus, cleaning it all up.; Make Scott Waste follow their contract - Return trash cans next to the house and not leave them on the sidewalks.  Keep BG clean!; Recycling services need to be improved and modernized. Current contractor uses outdated equipment, services are messy, company unresponsive.
Topic Title: Improve trash and garbage management

Topic 20:
Keywords: store, food, farmers, grocery, marketing, fresh food, local chefs, brewers, distance, eggs
Example Comments: BG needs to develop a marketing campaign to draw Warren Co and surrounding counties to downtown for eating, shopping, and entertainment.; A fresh food grocery store to be built in Delafield and Morgantown rd areas within walking distance of residents.; We need more businesses like White Squirrel and Nats in BG. They add character to our town. Done with the Olive Garden and Belks.
Topic Title: Localize and diversify food and retail options

Topic 21:
Keywords: tax, homelessness, cents, occ, fund, homeless people, cuts, budget, taxes, overheads
Example Comments: Lower taxes by auditing every department for total accountability & make cuts where possible. Stop city government waste.; City had 109 homeless students in 2017, County Had 112.  Housing those families temporarily is worth raising tax rate from 1.85% to 1.91%.; If city eliminates homelessness via occupational taxes, employers of homeless people need to reimburse city via special occ. tax penalty
Topic Title: Manage cost of addressing homelessness

Topic 22:
Keywords: zoning, planning, land, apartments, single family, family, affected area, country, changes, residences
Example Comments: Too many apartments in Plano. Nice little country setting I bought my home in forever ruined!; We are loosing the city’s long established neighborhoods to multi family developments that do not fit in with the single family residences; Planning & Zoning Board needs an overhaul. Fewer builders appointed, fairness to opposition, stop rubber stamping by P&Z and City Comm.
Topic Title: Balance development and zoning to ensure better neighborhoods

mistralai/Mistral-7B-Instruct-v0.2

Code
# grab topic keywords
df_topic_keywords_docs = df_topics.select('Topic', 'Representation', 'Representative_Docs')

# instruct LLM to generate topic titles
lm = mistral + generate_topic_titles(summary['topic'], summary['conversation-description'], df_topic_keywords_docs)
The following is a dataset of comments from an online discussion about Improving Bowling Green / Warren County. Assign a terse title to each topic based on the given keywords. Avoid repetitive phrases.

Question: What do you believe should change in Bowling Green/Warren County in order to make it a better place to live, work and spend time?

Topic 0:
Keywords: road, drivers, traffic, lane, roads, bypass, intersection, turn, congestion, cemetery
Example Comments: Widen Cemetry Road, Russellville Road, Lover’s Lane, Campbell Lane, The Old ByPass, and Scotsville Road from old ByPass to Natcher Parkway.; Smallhouse Rd (at Campbell Ln) heading into town needs to be widened to have room for three lanes (left turn, straight, right turn).; Access Rd, east side of Scottsville Rd. in shopping dist causes congestion. Give right of way to access road or cross streets, not a mix!
Topic Title: Improving Road Infrastructure and Traffic Flow

Topic 1:
Keywords: fairness, ordinance, noise, improvement, city, separation, lawns, code, food, ideas
Example Comments: City officials need to work with WKU admin. to address the noise & public nuisance problems with the frat houses and homeowners downtown.; NO to the "Fairness" Ordinance. Current law is sufficient.  ZERO reported instances where LGBTQ people have been discriminated against in BG; The city should craft a very simple fairness ordinance, saying no discrimination acquiring housing, but not giving LGBTQ EEOC Bypass @work
Topic Title: Equal Rights and Fairness in the Community

Topic 2:
Keywords: drug, marijuana, opioid, doctors, drugs, pain, deaths, dealers, sex, possession
Example Comments: The fact that there is not as much coverage about African American deaths from cocaine as there is about deaths in the white community from opioids, even though there are as many of them, is evidence of institutionalized racism.; In order to combat teen pregnancy and STD rates, high schools in Bowling Green should offer comprehensive, medically accurate sex education.; In order to better combat the opioid epidemic, it is time to view drug addiction as a health problem rather than a criminal justice problem.
Topic Title: Health and Substance Abuse

Topic 3:
Keywords: homeless, parks, park, house, hangout, tiny house, community, aid, benefit, size
Example Comments: We need to replace "Room at the Inn" services with temporary decent housing connected to aid agencies for the homeless; I would like to see more aid for the homeless whether: tiny house park, health clinic, rehabilitation, job readiness sources.; Homeless people deserve full size places like studio apartments.  Not tiny homes like a smurf would desire in a tiny house park.
Topic Title: Homelessness and Housing

Topic 4:
Keywords: programs, activities, weekend, training, higher education, youth, museum, credit, teenagers, job
Example Comments: A favorite place to visit (elsewhere) is a park filled with gently used toys (donated).  It inspires imagination, creativity and sharing.; We  a true junior college which offers credit / non credit courses like many other states from basket weaving to cyber security; Need comprehesive probation rehab programs - include addiction treatment, life & job skills & ideas securing housing, training/job etc
Topic Title: Education, Youth Development and Community Programs

Topic 5:
Keywords: school, schools, parents, public schools, public school, kids, choice, education, charter, districts
Example Comments: Future charter schools would be a financial burden to public school funding, unless they are held to accountability by local school boards.; some children with cultural differences or dietary needs cannot eat a typical lunch at school please provide special diet options.; Both school districts shamefully manipulate low economic status parents to gain Title I funding. It should stop.
Topic Title: Public Education Funding and Equity

Topic 6:
Keywords: vote, officials, proposal, project, power, seller, influence, local officials, funds, unelected boards
Example Comments: Officials that use social media accounts in a professional capacity should not be allowed to block constituents except under rare conditions like being threatened.; Big names and corporations get away with too much. We need fairer regulation and level playing fields for the small and big guys alike.; Local elected government officials should not be able to hold a seat and also be on the TIF board or other unelected boards.
Topic Title: Ethics, Transparency and Public Officials

Topic 7:
Keywords: campus, program, university, students, local businesses, miles, restaurants, courses, mental health, center
Example Comments: Residential Parking Permits for areas near WKU. We can rescue our yards from parking. We pay property taxes for the students' convenience,; Provide a food assistance program for college students as students lots of times do not qualify for food stamps due to living on campus; Make areas around WKU's campus livable for college students. Eliminating section 8 housing will lead to less crime & lower cost of attendance.
Topic Title: College Campus and Student Life

Topic 8:
Keywords: river, immigrants, refugees, tourists, demographic, riverfront, communities, ways, art, immigrant
Example Comments: Need engaging art installations throughout the city. People will be able to experience art, have fun, and share on social media.; The very low number of immigrants coming to our county last year was a good thing. Saves costs: Schools, courts, social services-Our taxes!; The Barren River water front needs to be improved on both sides of the river and include  canoeing and kayaking water features in the river.
Topic Title: Multiculturalism and tourism

Topic 9:
Keywords: parking, space, garage, yard, square, shade, mall, trees, downtown, pedestrian
Example Comments: Better parking around the square would draw in more college students who like to shop but can not find parking downtown.; Owners of large parking lots should be required to break them up with pockets of green space and trees; perhaps create a walkable green path; Build parking garage behind Spencer’s, remove some parking from Ft. Square. Add a green space on top of garage! Soccer field or rooftop food
Topic Title: Parking and Urban Design

Topic 10:
Keywords: police, cops, parking, enforcement, traffic, accidents, cameras, officers, tickets, presence
Example Comments: Houston "courtesy police" maintain safety on access rds enforcing right of way & parking/litter rules, assist traffic flow - we need!; BG needs to install traffic cameras that help police intersections like Gallatin, TN.  This would reduce accidents, traffic flow and add $; BGDN should publish these BGPD stats weekly – 1) traffic citations 2) vehicle accidents reported; 3) traffic accidents resulting in injury.
Topic Title: Public Safety and Traffic Management

Topic 11:
Keywords: internet, cable, fiber, competition, rates, tv, choices, option, media, companies
Example Comments: BGMU needs to offer residential fiber internet as a UTILITY, using Chattanooga as a perfect example.; Cable companies need competition and not be allowed to raise rates of  customers. Once you sign on, that is your rate unless you upgrade.; Encourage growth of OTA tv stations. Lots of cord cutters and shavers enjoy the HD content they can get with an antenna.
Topic Title: Broadband and Television Competition

Topic 12:
Keywords: temp, wage, benefits, openings, minimum, retirement, employee, employer, wages, companies
Example Comments: Temp service need to pay higher hourly rate to temp than hiring firm pays their FTE to account for reduced economic security; City should bar garbage token health plans designed to prevent temp services paying $166/month Obamacare emp. fee. Let ppl get subsidy.; City needs to bar big COs employing from half a dozen temp services to keep 100s ineligble for FMLA due to less than 50 ppl/per 'employer'
Topic Title: Employment and Labor Standards

Topic 13:
Keywords: rental, rent, renters, rental property, properties, prices, property, landlords, units, laws
Example Comments: Rental property taxes should be reduced for private landlords to encourage them to develop more, as opposed to large ones like Chandler.; Stronger tenant and renters rights. Landlord accountability for keeping properties in good shape. So many overpriced dilapidated rentals.; Overdevelopment of rental property is detrimental to cities.  Renters do not pay property tax.  Schools suffer when the tax base is eroded.
Topic Title: Housing and Real Estate

Topic 14:
Keywords: sidewalks, bike, trails, walking, trail, walkers, great addition, biking, ramps, small ramps
Example Comments: Install new sidewalks along major roads, like Three Springs, and city-wide bike routes, to decrease likelihood of accidents; Make it a walking city. Sidewalks along the main avenues (scottsville rd, campbell ln, 31 bypass) with plants, benches, art pieces.; Have a Dual Slalom Course, Pump Track & Bike Skills Course co-located by the Low Hollow Mountain Bike Trail at Weldon Peete Park.
Topic Title: Pedestrian and Bike Infrastructure

Topic 15:
Keywords: water, fire, lines, population, growth, bills, new schools, septic tanks, new roads, new subdivisions
Example Comments: Need new schools in Warren County to keep up with population growth, CTE is falling down, Drakes and Greenwood high are way overcrowded; County needs to extend water lines in Tuckertown area. About a dozen homes don't have county water or fire hydrants!; Tuckertown area needs fire protection. Currently rated Class 10 expensive insurance. Water lines are close by on Otter Gap & Tuckertown Rds.
Topic Title: Utilities and Infrastructure

Topic 16:
Keywords: animals, smoking, pets, leashes, leash, dog, owners, ordinance, pet owners, stray
Example Comments: Better ordinances on keeping animals on leashes or in fenced areas are needed.; There should never be a leash law in the county - neighborhood dogs protect, play with and teach kids. It was a factor in living in county.; Relax restrictions on fencing in Home Owners Associations to promote people keeping animals on leashes or in fenced yards.
Topic Title: Animal Control and Pet Ownership Regulations

Topic 17:
Keywords: swimming, complex, swim, pools, lessons, facilities, sports, safety, year, therapy
Example Comments: Bowling Green needs more indoor sports facilities, particularly a large complex with multiple indoor soccer fields, for youth.; BG's Kummer Little gym is open limited hours for indoor track. On other hand, County has new gyms that are open lots of hrs. Inequitable.; Indoor swimming pools for public use and therapy year around without time restrictions due to special swim leagues. Encourage healthier life
Topic Title: Health and Recreation Facilities

Topic 18:
Keywords: historic preservation, preservation, planning, building, buildings, historic building, urban planning, heritage, retail businesses, banks
Example Comments: We need urban planning to ensure that mixed use housing, shopping and new neighborhoods are created to make our community better; Historic preservation of homes and buildings needs countywide attention to maintain our unique architectural heritage.; We must ensure historic preservation codes are properly enforced in our historic districts; lack of consistency erodes our unique aesthetic.
Topic Title: Urban Planning and Historic Preservation

Topic 19:
Keywords: recycling, litter, trash, bins, services, yards, junk, stronger enforcement, outdated equipment, dump
Example Comments: BG has a bad problem with littering.  My husband and I fill up to 5 recycle bins every week between downtown and campus, cleaning it all up.; Make Scott Waste follow their contract - Return trash cans next to the house and not leave them on the sidewalks.  Keep BG clean!; Recycling services need to be improved and modernized. Current contractor uses outdated equipment, services are messy, company unresponsive.
Topic Title: Waste Management and Refuse Collection

Topic 20:
Keywords: store, food, farmers, grocery, marketing, fresh food, local chefs, brewers, distance, eggs
Example Comments: BG needs to develop a marketing campaign to draw Warren Co and surrounding counties to downtown for eating, shopping, and entertainment.; A fresh food grocery store to be built in Delafield and Morgantown rd areas within walking distance of residents.; We need more businesses like White Squirrel and Nats in BG. They add character to our town. Done with the Olive Garden and Belks.
Topic Title: Local Food and Entrepreneurship

Topic 21:
Keywords: tax, homelessness, cents, occ, fund, homeless people, cuts, budget, taxes, overheads
Example Comments: Lower taxes by auditing every department for total accountability & make cuts where possible. Stop city government waste.; City had 109 homeless students in 2017, County Had 112.  Housing those families temporarily is worth raising tax rate from 1.85% to 1.91%.; If city eliminates homelessness via occupational taxes, employers of homeless people need to reimburse city via special occ. tax penalty
Topic Title: Taxation and Homelessness

Topic 22:
Keywords: zoning, planning, land, apartments, single family, family, affected area, country, changes, residences
Example Comments: Too many apartments in Plano. Nice little country setting I bought my home in forever ruined!; We are loosing the city’s long established neighborhoods to multi family developments that do not fit in with the single family residences; Planning & Zoning Board needs an overhaul. Fewer builders appointed, fairness to opposition, stop rubber stamping by P&Z and City Comm.
Topic Title: Zoning and Land Use Planning

meta-llama/Llama-2-13b-chat-hf

Code
df_topic_keywords_docs = df_topics.select('Topic', 'Representation', 'Representative_Docs')

lm = llama + generate_topic_titles(summary['topic'], summary['conversation-description'], df_topic_keywords_docs)
The following is a dataset of comments from an online discussion about Improving Bowling Green / Warren County. Assign a terse title to each topic based on the given keywords. Avoid repetitive phrases.

Question: What do you believe should change in Bowling Green/Warren County in order to make it a better place to live, work and spend time?

Topic 0:
Keywords: road, drivers, traffic, lane, roads, bypass, intersection, turn, congestion, cemetery
Example Comments: Widen Cemetry Road, Russellville Road, Lover’s Lane, Campbell Lane, The Old ByPass, and Scotsville Road from old ByPass to Natcher Parkway.; Smallhouse Rd (at Campbell Ln) heading into town needs to be widened to have room for three lanes (left turn, straight, right turn).; Access Rd, east side of Scottsville Rd. in shopping dist causes congestion. Give right of way to access road or cross streets, not a mix!
Topic Title: Widen Roads and Improve Intersections

Topic 1:
Keywords: fairness, ordinance, noise, improvement, city, separation, lawns, code, food, ideas
Example Comments: City officials need to work with WKU admin. to address the noise & public nuisance problems with the frat houses and homeowners downtown.; NO to the "Fairness" Ordinance. Current law is sufficient.  ZERO reported instances where LGBTQ people have been discriminated against in BG; The city should craft a very simple fairness ordinance, saying no discrimination acquiring housing, but not giving LGBTQ EEOC Bypass @work
Topic Title: BG Fairness for All

Topic 2:
Keywords: drug, marijuana, opioid, doctors, drugs, pain, deaths, dealers, sex, possession
Example Comments: The fact that there is not as much coverage about African American deaths from cocaine as there is about deaths in the white community from opioids, even though there are as many of them, is evidence of institutionalized racism.; In order to combat teen pregnancy and STD rates, high schools in Bowling Green should offer comprehensive, medically accurate sex education.; In order to better combat the opioid epidemic, it is time to view drug addiction as a health problem rather than a criminal justice problem.
Topic Title: Drug Abuse Solutions in BG

Topic 3:
Keywords: homeless, parks, park, house, hangout, tiny house, community, aid, benefit, size
Example Comments: We need to replace "Room at the Inn" services with temporary decent housing connected to aid agencies for the homeless; I would like to see more aid for the homeless whether: tiny house park, health clinic, rehabilitation, job readiness sources.; Homeless people deserve full size places like studio apartments.  Not tiny homes like a smurf would desire in a tiny house park.
Topic Title: Temprorary Shlethering Solutions for Homeeless Indviduals

Topic 4:
Keywords: programs, activities, weekend, training, higher education, youth, museum, credit, teenagers, job
Example Comments: A favorite place to visit (elsewhere) is a park filled with gently used toys (donated).  It inspires imagination, creativity and sharing.; We  a true junior college which offers credit / non credit courses like many other states from basket weaving to cyber security; Need comprehesive probation rehab programs - include addiction treatment, life & job skills & ideas securing housing, training/job etc
Topic Title: Young People Prosperity in BG

Topic 5:
Keywords: school, schools, parents, public schools, public school, kids, choice, education, charter, districts
Example Comments: Future charter schools would be a financial burden to public school funding, unless they are held to accountability by local school boards.; some children with cultural differences or dietary needs cannot eat a typical lunch at school please provide special diet options.; Both school districts shamefully manipulate low economic status parents to gain Title I funding. It should stop.
Topic Title: Public School Choice and Accoutability

Topic 6:
Keywords: vote, officials, proposal, project, power, seller, influence, local officials, funds, unelected boards
Example Comments: Officials that use social media accounts in a professional capacity should not be allowed to block constituents except under rare conditions like being threatened.; Big names and corporations get away with too much. We need fairer regulation and level playing fields for the small and big guys alike.; Local elected government officials should not be able to hold a seat and also be on the TIF board or other unelected boards.
Topic Title: Voter Power

Topic 7:
Keywords: campus, program, university, students, local businesses, miles, restaurants, courses, mental health, center
Example Comments: Residential Parking Permits for areas near WKU. We can rescue our yards from parking. We pay property taxes for the students' convenience,; Provide a food assistance program for college students as students lots of times do not qualify for food stamps due to living on campus; Make areas around WKU's campus livable for college students. Eliminating section 8 housing will lead to less crime & lower cost of attendance.
Topic Title: WKU Campus Habitat

Topic 8:
Keywords: river, immigrants, refugees, tourists, demographic, riverfront, communities, ways, art, immigrant
Example Comments: Need engaging art installations throughout the city. People will be able to experience art, have fun, and share on social media.; The very low number of immigrants coming to our county last year was a good thing. Saves costs: Schools, courts, social services-Our taxes!; The Barren River water front needs to be improved on both sides of the river and include  canoeing and kayaking water features in the river.
Topic Title: Barren River Experinces

Topic 9:
Keywords: parking, space, garage, yard, square, shade, mall, trees, downtown, pedestrian
Example Comments: Better parking around the square would draw in more college students who like to shop but can not find parking downtown.; Owners of large parking lots should be required to break them up with pockets of green space and trees; perhaps create a walkable green path; Build parking garage behind Spencer’s, remove some parking from Ft. Square. Add a green space on top of garage! Soccer field or rooftop food
Topic Title: Downtown Parking

Topic 10:
Keywords: police, cops, parking, enforcement, traffic, accidents, cameras, officers, tickets, presence
Example Comments: Houston "courtesy police" maintain safety on access rds enforcing right of way & parking/litter rules, assist traffic flow - we need!; BG needs to install traffic cameras that help police intersections like Gallatin, TN.  This would reduce accidents, traffic flow and add $; BGDN should publish these BGPD stats weekly – 1) traffic citations 2) vehicle accidents reported; 3) traffic accidents resulting in injury.
Topic Title: BGDP Prescence

Topic 11:
Keywords: internet, cable, fiber, competition, rates, tv, choices, option, media, companies
Example Comments: BGMU needs to offer residential fiber internet as a UTILITY, using Chattanooga as a perfect example.; Cable companies need competition and not be allowed to raise rates of  customers. Once you sign on, that is your rate unless you upgrade.; Encourage growth of OTA tv stations. Lots of cord cutters and shavers enjoy the HD content they can get with an antenna.
Topic Title: BG Cable and Fiber Mesuership

Topic 12:
Keywords: temp, wage, benefits, openings, minimum, retirement, employee, employer, wages, companies
Example Comments: Temp service need to pay higher hourly rate to temp than hiring firm pays their FTE to account for reduced economic security; City should bar garbage token health plans designed to prevent temp services paying $166/month Obamacare emp. fee. Let ppl get subsidy.; City needs to bar big COs employing from half a dozen temp services to keep 100s ineligble for FMLA due to less than 50 ppl/per 'employer'
Topic Title: BG Temp Agency Labor Practies

Topic 13:
Keywords: rental, rent, renters, rental property, properties, prices, property, landlords, units, laws
Example Comments: Rental property taxes should be reduced for private landlords to encourage them to develop more, as opposed to large ones like Chandler.; Stronger tenant and renters rights. Landlord accountability for keeping properties in good shape. So many overpriced dilapidated rentals.; Overdevelopment of rental property is detrimental to cities.  Renters do not pay property tax.  Schools suffer when the tax base is eroded.
Topic Title: Rental Properties and Taxation

Topic 14:
Keywords: sidewalks, bike, trails, walking, trail, walkers, great addition, biking, ramps, small ramps
Example Comments: Install new sidewalks along major roads, like Three Springs, and city-wide bike routes, to decrease likelihood of accidents; Make it a walking city. Sidewalks along the main avenues (scottsville rd, campbell ln, 31 bypass) with plants, benches, art pieces.; Have a Dual Slalom Course, Pump Track & Bike Skills Course co-located by the Low Hollow Mountain Bike Trail at Weldon Peete Park.
Topic Title: Bowing Green Walking and Biking

Topic 15:
Keywords: water, fire, lines, population, growth, bills, new schools, septic tanks, new roads, new subdivisions
Example Comments: Need new schools in Warren County to keep up with population growth, CTE is falling down, Drakes and Greenwood high are way overcrowded; County needs to extend water lines in Tuckertown area. About a dozen homes don't have county water or fire hydrants!; Tuckertown area needs fire protection. Currently rated Class 10 expensive insurance. Water lines are close by on Otter Gap & Tuckertown Rds.
Topic Title: Water Lines and Councilgrowth

Topic 16:
Keywords: animals, smoking, pets, leashes, leash, dog, owners, ordinance, pet owners, stray
Example Comments: Better ordinances on keeping animals on leashes or in fenced areas are needed.; There should never be a leash law in the county - neighborhood dogs protect, play with and teach kids. It was a factor in living in county.; Relax restrictions on fencing in Home Owners Associations to promote people keeping animals on leashes or in fenced yards.
Topic Title: Warren County Animal Control

Topic 17:
Keywords: swimming, complex, swim, pools, lessons, facilities, sports, safety, year, therapy
Example Comments: Bowling Green needs more indoor sports facilities, particularly a large complex with multiple indoor soccer fields, for youth.; BG's Kummer Little gym is open limited hours for indoor track. On other hand, County has new gyms that are open lots of hrs. Inequitable.; Indoor swimming pools for public use and therapy year around without time restrictions due to special swim leagues. Encourage healthier life
Topic Title: Indoor Sports Facilities

Topic 18:
Keywords: historic preservation, preservation, planning, building, buildings, historic building, urban planning, heritage, retail businesses, banks
Example Comments: We need urban planning to ensure that mixed use housing, shopping and new neighborhoods are created to make our community better; Historic preservation of homes and buildings needs countywide attention to maintain our unique architectural heritage.; We must ensure historic preservation codes are properly enforced in our historic districts; lack of consistency erodes our unique aesthetic.
Topic Title: Bowling Green Historic Preservation

Topic 19:
Keywords: recycling, litter, trash, bins, services, yards, junk, stronger enforcement, outdated equipment, dump
Example Comments: BG has a bad problem with littering.  My husband and I fill up to 5 recycle bins every week between downtown and campus, cleaning it all up.; Make Scott Waste follow their contract - Return trash cans next to the house and not leave them on the sidewalks.  Keep BG clean!; Recycling services need to be improved and modernized. Current contractor uses outdated equipment, services are messy, company unresponsive.
Topic Title: Clean and Green BG

Topic 20:
Keywords: store, food, farmers, grocery, marketing, fresh food, local chefs, brewers, distance, eggs
Example Comments: BG needs to develop a marketing campaign to draw Warren Co and surrounding counties to downtown for eating, shopping, and entertainment.; A fresh food grocery store to be built in Delafield and Morgantown rd areas within walking distance of residents.; We need more businesses like White Squirrel and Nats in BG. They add character to our town. Done with the Olive Garden and Belks.
Topic Title: BG Distance to Fresh Food

Topic 21:
Keywords: tax, homelessness, cents, occ, fund, homeless people, cuts, budget, taxes, overheads
Example Comments: Lower taxes by auditing every department for total accountability & make cuts where possible. Stop city government waste.; City had 109 homeless students in 2017, County Had 112.  Housing those families temporarily is worth raising tax rate from 1.85% to 1.91%.; If city eliminates homelessness via occupational taxes, employers of homeless people need to reimburse city via special occ. tax penalty
Topic Title: Taxation and Homeousness

Topic 22:
Keywords: zoning, planning, land, apartments, single family, family, affected area, country, changes, residences
Example Comments: Too many apartments in Plano. Nice little country setting I bought my home in forever ruined!; We are loosing the city’s long established neighborhoods to multi family developments that do not fit in with the single family residences; Planning & Zoning Board needs an overhaul. Fewer builders appointed, fairness to opposition, stop rubber stamping by P&Z and City Comm.
Topic Title: BG Apartments Boom

openai-community/gpt2

Code
df_topic_keywords_docs = df_topics.select('Topic', 'Representation', 'Representative_Docs')

lm = gpt2 + generate_topic_titles(summary['topic'], summary['conversation-description'], df_topic_keywords_docs)
The following is a dataset of comments from an online discussion about Improving Bowling Green / Warren County. Assign a terse title to each topic based on the given keywords. Avoid repetitive phrases.

Question: What do you believe should change in Bowling Green/Warren County in order to make it a better place to live, work and spend time?

Topic 0:
Keywords: road, drivers, traffic, lane, roads, bypass, intersection, turn, congestion, cemetery
Example Comments: Widen Cemetry Road, Russellville Road, Lover’s Lane, Campbell Lane, The Old ByPass, and Scotsville Road from old ByPass to Natcher Parkway.; Smallhouse Rd (at Campbell Ln) heading into town needs to be widened to have room for three lanes (left turn, straight, right turn).; Access Rd, east side of Scottsville Rd. in shopping dist causes congestion. Give right of way to access road or cross streets, not a mix!
Topic Title: Approaching Irving Road from Amherst

Topic 1:
Keywords: fairness, ordinance, noise, improvement, city, separation, lawns, code, food, ideas
Example Comments: City officials need to work with WKU admin. to address the noise & public nuisance problems with the frat houses and homeowners downtown.; NO to the "Fairness" Ordinance. Current law is sufficient.  ZERO reported instances where LGBTQ people have been discriminated against in BG; The city should craft a very simple fairness ordinance, saying no discrimination acquiring housing, but not giving LGBTQ EEOC Bypass @work
Topic Title: Trust River

Topic 2:
Keywords: drug, marijuana, opioid, doctors, drugs, pain, deaths, dealers, sex, possession
Example Comments: The fact that there is not as much coverage about African American deaths from cocaine as there is about deaths in the white community from opioids, even though there are as many of them, is evidence of institutionalized racism.; In order to combat teen pregnancy and STD rates, high schools in Bowling Green should offer comprehensive, medically accurate sex education.; In order to better combat the opioid epidemic, it is time to view drug addiction as a health problem rather than a criminal justice problem.
Topic Title: many low income people are getting caught up on their drug spending because they don't have any other exits

Topic 3:
Keywords: homeless, parks, park, house, hangout, tiny house, community, aid, benefit, size
Example Comments: We need to replace "Room at the Inn" services with temporary decent housing connected to aid agencies for the homeless; I would like to see more aid for the homeless whether: tiny house park, health clinic, rehabilitation, job readiness sources.; Homeless people deserve full size places like studio apartments.  Not tiny homes like a smurf would desire in a tiny house park.
Topic Title: Fayetteville is a good place for someone with an addiction to get HELP

Topic 4:
Keywords: programs, activities, weekend, training, higher education, youth, museum, credit, teenagers, job
Example Comments: A favorite place to visit (elsewhere) is a park filled with gently used toys (donated).  It inspires imagination, creativity and sharing.; We  a true junior college which offers credit / non credit courses like many other states from basket weaving to cyber security; Need comprehesive probation rehab programs - include addiction treatment, life & job skills & ideas securing housing, training/job etc
Topic Title: Getting Your Needs in Everyday Currencies

Topic 5:
Keywords: school, schools, parents, public schools, public school, kids, choice, education, charter, districts
Example Comments: Future charter schools would be a financial burden to public school funding, unless they are held to accountability by local school boards.; some children with cultural differences or dietary needs cannot eat a typical lunch at school please provide special diet options.; Both school districts shamefully manipulate low economic status parents to gain Title I funding. It should stop.
Topic Title:
---------------------------------------------------------------------------
OutOfMemoryError                          Traceback (most recent call last)
Cell In[12], line 3
      1 df_topic_keywords_docs = df_topics.select('Topic', 'Representation', 'Representative_Docs')
----> 3 lm = gpt2 + generate_topic_titles(summary['topic'], summary['conversation-description'], df_topic_keywords_docs)

File ~/.conda/envs/guidance/lib/python3.11/site-packages/guidance/models/_model.py:266, in Model.__add__(self, value)
    262         out = lm._run_stateless(value)
    264     # run stateful functions
    265     else:
--> 266         out = value(lm)
    268 # this flushes the display
    269 out._inplace_append("")

File ~/.conda/envs/guidance/lib/python3.11/site-packages/guidance/_grammar.py:54, in StatefulFunction.__call__(self, model)
     53 def __call__(self, model):
---> 54     return self.f(model, *self.args, **self.kwargs)

Cell In[11], line 10, in generate_topic_titles(lm, main_title, question, df)
      3 lm += f"""\
      4 The following is a dataset of comments from an online discussion about {main_title}. Assign a terse title to each topic based on the given keywords. Avoid repetitive phrases.
      5 
      6 Question: {question}
      7 """
      9 for topic, keywords, docs in df.rows():
---> 10     lm += f"""
     11     Topic {topic}:
     12     Keywords: {', '.join(keywords)}
     13     Example Comments: {'; '.join(docs)}
     14     Topic Title: """ + gen(name='topic_title', list_append=True, stop=['\n', '.'], max_tokens=20, temperature=1) + "\n"
     16 return lm

File ~/.conda/envs/guidance/lib/python3.11/site-packages/guidance/models/_model.py:262, in Model.__add__(self, value)
    260 # run stateless functions (grammar nodes)
    261 elif isinstance(value, StatelessFunction):
--> 262     out = lm._run_stateless(value)
    264 # run stateful functions
    265 else:
    266     out = value(lm)

File ~/.conda/envs/guidance/lib/python3.11/site-packages/guidance/models/_model.py:401, in Model._run_stateless(lm, stateless_function, temperature, top_p, n)
    399 delayed_bytes = b""
    400 # last_is_generated = False
--> 401 for new_bytes, is_generated, new_bytes_log_prob, capture_groups, capture_group_log_probs, new_token_count in gen_obj:
    402     # convert the bytes to a string (delaying if we don't yet have a valid unicode string)
    403     lm.token_count += new_token_count
    404     new_bytes = delayed_bytes + new_bytes

File ~/.conda/envs/guidance/lib/python3.11/site-packages/guidance/models/_model.py:692, in Model.__call__(self, grammar, max_tokens, n, top_p, temperature, ensure_bos_token, log_probs)
    690     token_ids,token_byte_positions = self._cleanup_tokens(token_ids, token_byte_positions)
    691     was_forced = False
--> 692 logits = self._get_logits(token_ids, parser.bytes[start_pos:forced_pos])
    694 # if requested we compute the log probabilities so we can track the probabilities of each node
    695 # TODO: we should lower this step to C++ with pybind11
    696 if log_probs:

File ~/.conda/envs/guidance/lib/python3.11/site-packages/guidance/models/transformers/_transformers.py:112, in Transformers._get_logits(self, token_ids, forced_bytes)
    110 if len(new_token_ids) > 0:
    111     with torch.no_grad():
--> 112         model_out = self.model_obj(
    113             input_ids=torch.tensor(new_token_ids).unsqueeze(0).to(self.device),
    114             past_key_values=self._cache_state["past_key_values"],
    115             use_cache=True,
    116             position_ids=torch.arange(past_length, past_length+len(new_token_ids)).unsqueeze(0).to(self.device),
    117             attention_mask=torch.ones(1, past_length + len(new_token_ids)).to(self.device),
    118             return_dict=True,
    119             output_attentions=False,
    120             output_hidden_states=False
    121         )
    123     # save the results
    124     self._cache_state["past_key_values"] = model_out.past_key_values

File ~/.conda/envs/guidance/lib/python3.11/site-packages/torch/nn/modules/module.py:1511, in Module._wrapped_call_impl(self, *args, **kwargs)
   1509     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1510 else:
-> 1511     return self._call_impl(*args, **kwargs)

File ~/.conda/envs/guidance/lib/python3.11/site-packages/torch/nn/modules/module.py:1520, in Module._call_impl(self, *args, **kwargs)
   1515 # If we don't have any hooks, we want to skip the rest of the logic in
   1516 # this function, and just call forward.
   1517 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1518         or _global_backward_pre_hooks or _global_backward_hooks
   1519         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1520     return forward_call(*args, **kwargs)
   1522 try:
   1523     result = None

File ~/.conda/envs/guidance/lib/python3.11/site-packages/transformers/models/gpt2/modeling_gpt2.py:1074, in GPT2LMHeadModel.forward(self, input_ids, past_key_values, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, encoder_hidden_states, encoder_attention_mask, labels, use_cache, output_attentions, output_hidden_states, return_dict)
   1066 r"""
   1067 labels (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*):
   1068     Labels for language modeling. Note that the labels **are shifted** inside the model, i.e. you can set
   1069     `labels = input_ids` Indices are selected in `[-100, 0, ..., config.vocab_size]` All labels set to `-100`
   1070     are ignored (masked), the loss is only computed for labels in `[0, ..., config.vocab_size]`
   1071 """
   1072 return_dict = return_dict if return_dict is not None else self.config.use_return_dict
-> 1074 transformer_outputs = self.transformer(
   1075     input_ids,
   1076     past_key_values=past_key_values,
   1077     attention_mask=attention_mask,
   1078     token_type_ids=token_type_ids,
   1079     position_ids=position_ids,
   1080     head_mask=head_mask,
   1081     inputs_embeds=inputs_embeds,
   1082     encoder_hidden_states=encoder_hidden_states,
   1083     encoder_attention_mask=encoder_attention_mask,
   1084     use_cache=use_cache,
   1085     output_attentions=output_attentions,
   1086     output_hidden_states=output_hidden_states,
   1087     return_dict=return_dict,
   1088 )
   1089 hidden_states = transformer_outputs[0]
   1091 # Set device for model parallelism

File ~/.conda/envs/guidance/lib/python3.11/site-packages/torch/nn/modules/module.py:1511, in Module._wrapped_call_impl(self, *args, **kwargs)
   1509     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1510 else:
-> 1511     return self._call_impl(*args, **kwargs)

File ~/.conda/envs/guidance/lib/python3.11/site-packages/torch/nn/modules/module.py:1520, in Module._call_impl(self, *args, **kwargs)
   1515 # If we don't have any hooks, we want to skip the rest of the logic in
   1516 # this function, and just call forward.
   1517 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1518         or _global_backward_pre_hooks or _global_backward_hooks
   1519         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1520     return forward_call(*args, **kwargs)
   1522 try:
   1523     result = None

File ~/.conda/envs/guidance/lib/python3.11/site-packages/transformers/models/gpt2/modeling_gpt2.py:888, in GPT2Model.forward(self, input_ids, past_key_values, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, encoder_hidden_states, encoder_attention_mask, use_cache, output_attentions, output_hidden_states, return_dict)
    876     outputs = self._gradient_checkpointing_func(
    877         block.__call__,
    878         hidden_states,
   (...)
    885         output_attentions,
    886     )
    887 else:
--> 888     outputs = block(
    889         hidden_states,
    890         layer_past=layer_past,
    891         attention_mask=attention_mask,
    892         head_mask=head_mask[i],
    893         encoder_hidden_states=encoder_hidden_states,
    894         encoder_attention_mask=encoder_attention_mask,
    895         use_cache=use_cache,
    896         output_attentions=output_attentions,
    897     )
    899 hidden_states = outputs[0]
    900 if use_cache is True:

File ~/.conda/envs/guidance/lib/python3.11/site-packages/torch/nn/modules/module.py:1511, in Module._wrapped_call_impl(self, *args, **kwargs)
   1509     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1510 else:
-> 1511     return self._call_impl(*args, **kwargs)

File ~/.conda/envs/guidance/lib/python3.11/site-packages/torch/nn/modules/module.py:1520, in Module._call_impl(self, *args, **kwargs)
   1515 # If we don't have any hooks, we want to skip the rest of the logic in
   1516 # this function, and just call forward.
   1517 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1518         or _global_backward_pre_hooks or _global_backward_hooks
   1519         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1520     return forward_call(*args, **kwargs)
   1522 try:
   1523     result = None

File ~/.conda/envs/guidance/lib/python3.11/site-packages/transformers/models/gpt2/modeling_gpt2.py:390, in GPT2Block.forward(self, hidden_states, layer_past, attention_mask, head_mask, encoder_hidden_states, encoder_attention_mask, use_cache, output_attentions)
    388 residual = hidden_states
    389 hidden_states = self.ln_1(hidden_states)
--> 390 attn_outputs = self.attn(
    391     hidden_states,
    392     layer_past=layer_past,
    393     attention_mask=attention_mask,
    394     head_mask=head_mask,
    395     use_cache=use_cache,
    396     output_attentions=output_attentions,
    397 )
    398 attn_output = attn_outputs[0]  # output_attn: a, present, (attentions)
    399 outputs = attn_outputs[1:]

File ~/.conda/envs/guidance/lib/python3.11/site-packages/torch/nn/modules/module.py:1511, in Module._wrapped_call_impl(self, *args, **kwargs)
   1509     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1510 else:
-> 1511     return self._call_impl(*args, **kwargs)

File ~/.conda/envs/guidance/lib/python3.11/site-packages/torch/nn/modules/module.py:1520, in Module._call_impl(self, *args, **kwargs)
   1515 # If we don't have any hooks, we want to skip the rest of the logic in
   1516 # this function, and just call forward.
   1517 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1518         or _global_backward_pre_hooks or _global_backward_hooks
   1519         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1520     return forward_call(*args, **kwargs)
   1522 try:
   1523     result = None

File ~/.conda/envs/guidance/lib/python3.11/site-packages/transformers/models/gpt2/modeling_gpt2.py:321, in GPT2Attention.forward(self, hidden_states, layer_past, attention_mask, head_mask, encoder_hidden_states, encoder_attention_mask, use_cache, output_attentions)
    319     past_key, past_value = layer_past
    320     key = torch.cat((past_key, key), dim=-2)
--> 321     value = torch.cat((past_value, value), dim=-2)
    323 if use_cache is True:
    324     present = (key, value)

OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacity of 3.81 GiB of which 12.00 MiB is free. Including non-PyTorch memory, this process has 3.79 GiB memory in use. Of the allocated memory 3.55 GiB is allocated by PyTorch, and 187.07 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Chat Instruction Format

So far, we used the completion technique to present a portion of the text to the model and have it generate text that best completes the content. In the following tests, we will use the chat instruction format instead. This is an interactive techqiue that starts with an instructional prompt, and then allows a conversation between user and assistant. Using guidance framework, we still control the content that is placed in assistant’s context, but it allows for a more natural conversation flow and enables the model to better understand the task.

Mistral without context reset

Code
topic_labels = None
topic_titles = None

from guidance import user, assistant, instruction
import re

@guidance
def generate_topic_titles(lm, name_title, name_label, main_title, question, df, temperature=0):
    global topic_labels, topic_titles

    main_title_stop_words = re.split(r'\W+', main_title.lower()) # pick words in summary


    with instruction():
        lm += f"""\
        Respond with a detailed title and a short label to best represent each given topic.
        Avoid repetitive words such as "Enhancing" or "Improving".
        Start title with a noun.
        Label should be terse and up to 4 words.
        Do not output any of the following words: "{', '.join(main_title_stop_words)}"
        """

    with user():
        lm += f"""\
        The following is a dataset of comments from an online discussion.
        Discussion Question: {question}

        KEYWORDS: [a set of keywords that describe the topic]
        SAMPLE STATEMENTS: [a set of statements that best represent the topic]
        """

    with assistant():
        lm += f"""\
        TITLE: [a descriptive sentence that represents the topic]
        LABEL: [terse phrase]
        """

    topic_labels = []
    topic_titles = []

    for topic, keywords, docs in df.iter_rows():
        with user():
            lm = lm + f"""
            # Topic {topic}
            KEYWORDS: {', '.join(keywords)}
            SAMPLE STATEMENTS: {'; '.join(docs)}
            """
        with assistant():
            lm += f"TITLE: " + gen(name=name_title, stop=['\n', '.'], max_tokens=20, temperature=temperature) + "\n"
            lm += f"LABEL: " + gen(name=name_label, stop=['\n', '.'], max_tokens=12, temperature=temperature) + "\n"

        # since the title almost always starts with a verb, remove the first word if it ends in 'ing'
        title = lm['topic_title']
        # if title.split(' ')[0].endswith('ing'):
        #     title = ' '.join(title.split(' ')[1:])
        topic_labels += [lm['topic_label']]
        topic_titles += [title]

    return lm

# grab topic keywords
df_topic_keywords_docs = pl.from_pandas(topic_model.get_topic_info()).select('Topic', 'Representation', 'Representative_Docs')

# instruct LLM to generate topic titles
lm = mistral + generate_topic_titles('topic_title', 'topic_label', summary['topic'], summary['conversation-description'], df_topic_keywords_docs, temperature=0)
instruction
Respond with a detailed title and a short label to best represent the each given topic. Avoid repetitive words such as "Enhancing" or "Improving". Start title with a noun. Label should be terse and up to 4 words. Do not output any of the following words: "improving, bowling, green, warren, county"
user
The following is a dataset of comments from an online discussion. Discussion Question: What do you believe should change in Bowling Green/Warren County in order to make it a better place to live, work and spend time? KEYWORDS: [a set of keywords that describe the topic] SAMPLE STATEMENTS: [a set of statements that best represent the topic]
assistant
TITLE: [a descriptive sentence that represents the topic] LABEL: [terse phrase]
user
# Topic 0 KEYWORDS: school, education, students, schools, program, arts, public, campus, community, parents SAMPLE STATEMENTS: Future charter schools would be a financial burden to public school funding, unless they are held to accountability by local school boards.; Provide a food assistance program for college students as students lots of times do not qualify for food stamps due to living on campus; Public school systems need to provide more funding for art education programs, such as music, theater, and visual arts.
assistant
TITLE: Enhancing Educational Opportunities LABEL: Education Funding
user
# Topic 1 KEYWORDS: rental, city, jobs, residents, services, population, water, grants, units, housing SAMPLE STATEMENTS: Rental property taxes should be reduced for private landlords to encourage them to develop more, as opposed to large ones like Chandler.; Overdevelopment of rental property is detrimental to cities. Renters do not pay property tax. Schools suffer when the tax base is eroded.; We must ensure historic preservation codes are properly enforced in our historic districts; lack of consistency erodes our unique aesthetic.
assistant
TITLE: Managing Rental Property Development LABEL: Rental Taxes
user
# Topic 2 KEYWORDS: roads, sidewalks, traffic, cars, areas, lights, stop, drivers, street, roadways SAMPLE STATEMENTS: We need to SHADE street lamps. This actually increases visibility, reduces glare, and saves energy/ money. Using LEDs would save money, too.; Busy roads like the by pass either need turn lanes or block left hand turns except at stop lights.; Install new sidewalks along major roads, like Three Springs, and city-wide bike routes, to decrease likelihood of accidents
assistant
TITLE: Improving Road Infrastructure LABEL: Road Safety
user
# Topic 3 KEYWORDS: ordinance, fairness, complex, separation, food, facilities, ideas, use, city, sports SAMPLE STATEMENTS: Bowling Green City Commission needs to pass an ordinance banning the sale or use of large fireworks in the city limits.; The city should craft a very simple fairness ordinance, saying no discrimination acquiring housing, but not giving LGBTQ EEOC Bypass @work; NO to the "Fairness" Ordinance. Current law is sufficient. ZERO reported instances where LGBTQ people have been discriminated against in BG
assistant
TITLE: Regulating Community Ordinances LABEL: Community Regulations
user
# Topic 4 KEYWORDS: road, lane, traffic, bypass, cemetery, intersection, access, congestion, light, park SAMPLE STATEMENTS: Widen Cemetry Road, Russellville Road, Lover’s Lane, Campbell Lane, The Old ByPass, and Scotsville Road from old ByPass to Natcher Parkway.; Make it a walking city. Sidewalks along the main avenues (scottsville rd, campbell ln, 31 bypass) with plants, benches, art pieces.; Access Rd, east side of Scottsville Rd. in shopping dist causes congestion. Give right of way to access road or cross streets, not a mix!
assistant
TITLE: Urban Planning and Transportation LABEL: Urban Development
user
# Topic 5 KEYWORDS: internet, store, yard, airport, fiber, people, town, vote, working, politics SAMPLE STATEMENTS: Working folks need ordinances struck down that prevent them from assembling to bargain for better pay. benefits and working conditions.; We need more businesses like White Squirrel and Nats in BG. They add character to our town. Done with the Olive Garden and Belks.; Wish the airport would get better flights. WheelsUp was cool but need more locations. Sad Owensboro is so far ahead there.
assistant
TITLE: Community Economic Development LABEL: Business Growth
user
# Topic 6 KEYWORDS: officials, proposal, possession, mental health, research, board, project, power, connections, executives SAMPLE STATEMENTS: Officials that use social media accounts in a professional capacity should not be allowed to block constituents except under rare conditions like being threatened.; Planning & Zoning Board needs an overhaul. Fewer builders appointed, fairness to opposition, stop rubber stamping by P&Z and City Comm.; Local elected government officials should not be able to hold a seat and also be on the TIF board or other unelected boards.
assistant
TITLE: Government Transparency and Accountability LABEL: Government Reform
user
# Topic 7 KEYWORDS: activities, weekend, tourism, advertising, hangout, professionals, young professionals, walkable city, art, tourists SAMPLE STATEMENTS: A favorite place to visit (elsewhere) is a park filled with gently used toys (donated). It inspires imagination, creativity and sharing.; SKYPAC needs to bring in younger, more entertaining acts. It will fall apart if they keep only catering to 50+. Could be a great venue.; BG needs to develop a marketing campaign to draw Warren Co and surrounding counties to downtown for eating, shopping, and entertainment.
assistant
TITLE: Attracting Tourists and Young Professionals LABEL: Marketing Campaign
user
# Topic 8 KEYWORDS: parking, downtown, square, river, garage, mall, restaurants, sides, area, subdivisions SAMPLE STATEMENTS: Develop the Greenway so that the paved trail is continuous and reaches major parts of the city including the square, WKU, GM, and others.; Residential Parking Permits for areas near WKU. We can rescue our yards from parking. We pay property taxes for the students' convenience,; Build parking garage behind Spencer’s, remove some parking from Ft. Square. Add a green space on top of garage! Soccer field or rooftop food
assistant
TITLE: Urban Planning and Parking LABEL: Parking Solutions
user
# Topic 9 KEYWORDS: police, enforcement, traffic, cops, laws, accidents, drivers, rules, officers, regulations SAMPLE STATEMENTS: Houston "courtesy police" maintain safety on access rds enforcing right of way & parking/litter rules, assist traffic flow - we need!; Why has the Police stop enforcing parking regulations? Park any way any direction. Is it time for a separate Traffic Enforcement Division?; BG needs to install traffic cameras that help police intersections like Gallatin, TN. This would reduce accidents, traffic flow and add $
assistant
TITLE: Traffic Law Enforcement LABEL: Traffic Safety
user
# Topic 10 KEYWORDS: drug, opioid, deaths, drugs, pain, doctors, crisis, dealers, sex, pregnancy SAMPLE STATEMENTS: The fact that there is not as much coverage about African American deaths from cocaine as there is about deaths in the white community from opioids, even though there are as many of them, is evidence of institutionalized racism.; In order to better combat the opioid epidemic, it is time to view drug addiction as a health problem rather than a criminal justice problem.; In order to combat teen pregnancy and STD rates, high schools in Bowling Green should offer comprehensive, medically accurate sex education.
assistant
TITLE: Combating Substance Abuse and Teen Pregnancy LABEL: Health Education
user
# Topic 11 KEYWORDS: trees, trails, parks, recycling, shade, complexes, leashes, lawns, bike, trash SAMPLE STATEMENTS: Developing more bike trails into surrounding areas of the city would allow for a more developed both transport and recreational experiences; More walking/nature trails in our sports-centric parks. Have certain hours where walkers can enjoy the beauty of Paul Walker Golf Course.; Owners of large parking lots should be required to break them up with pockets of green space and trees; perhaps create a walkable green path
assistant
TITLE: Urban Green Spaces and Recycling LABEL: Green Initiatives
user
# Topic 12 KEYWORDS: temp, wage, tax, openings, minimum, companies, occ, employee, job, rate SAMPLE STATEMENTS: Temp service need to pay higher hourly rate to temp than hiring firm pays their FTE to account for reduced economic security; City needs to bar big COs employing from half a dozen temp services to keep 100s ineligble for FMLA due to less than 50 ppl/per 'employer'; City should bar garbage token health plans designed to prevent temp services paying $166/month Obamacare emp. fee. Let ppl get subsidy.
assistant
TITLE: Temporary Worker Benefits and Wages LABEL: Temp Worker Rights
Code
print("Titles:", '\n'.join(topic_titles), "", "Labels:", '\n'.join(topic_labels), sep='\n')
Titles:
Enhancing Educational Opportunities
Managing Rental Property Development
Improving Road Infrastructure
Regulating Community Ordinances
Urban Planning and Transportation
Community Economic Development
Government Transparency and Accountability
Attracting Tourists and Young Professionals
Urban Planning and Parking
Traffic Law Enforcement
Combating Substance Abuse and Teen Pregnancy
Urban Green Spaces and Recycling
Temporary Worker Benefits and Wages

Labels:
Education Funding
Rental Taxes
Road Safety
Community Regulations
Urban Development
Business Growth
Government Reform
Marketing Campaign
Parking Solutions
Traffic Safety
Health Education
Green Initiatives
Temp Worker Rights

Mistral with context reset

Code
topic_labels = None
topic_titles = None

from guidance import user, assistant, instruction
import re

# TODO: use a reset_context parameter instead of redeffing the function

@guidance
def generate_topic_titles(lm, name_title, name_label, main_title, question, df, temperature=0):
    global topic_labels, topic_titles

    main_title_stop_words = re.split(r'\W+', main_title.lower()) # pick words in summary


    with instruction():
        lm += f"""\
        Respond with a detailed title and a short label to best represent each given topic.
        Avoid repetitive words such as "Enhancing" or "Improving".
        Start title with a noun.
        Label should be terse and up to 4 words.
        Do not output any of the following words: "{', '.join(main_title_stop_words)}"
        """

    with user():
        lm += f"""\
        The following is a dataset of comments from an online discussion.
        Discussion Question: {question}

        KEYWORDS: [a set of keywords that describe the topic]
        SAMPLE STATEMENTS: [a set of statements that best represent the topic]
        """

    with assistant():
        lm += f"""\
        TITLE: [a descriptive sentence that represents the topic]
        LABEL: [terse phrase]
        """

    topic_labels = []
    topic_titles = []

    for topic, keywords, docs in df.iter_rows():
        with user():
            lm_topic = lm + f"""
            # Topic {topic}
            KEYWORDS: {', '.join(keywords)}
            SAMPLE STATEMENTS: {'; '.join(docs)}
            """
        with assistant():
            lm_topic += f"TITLE: " + gen(name=name_title, stop=['\n', '.'], max_tokens=20, temperature=temperature) + "\n"
            lm_topic += f"LABEL: " + gen(name=name_label, stop=['\n', '.'], max_tokens=12, temperature=temperature) + "\n"

        # since the title almost always starts with a verb, remove the first word if it ends in 'ing'
        title = lm_topic['topic_title']
        # if title.split(' ')[0].endswith('ing'):
        #     title = ' '.join(title.split(' ')[1:])
        topic_labels += [lm_topic['topic_label']]
        topic_titles += [title]

    return lm

# grab topic keywords
df_topic_keywords_docs = pl.from_pandas(topic_model.get_topic_info()).select('Topic', 'Representation', 'Representative_Docs')

# instruct LLM to generate topic titles
lm = mistral + generate_topic_titles('topic_title', 'topic_label', summary['topic'], summary['conversation-description'], df_topic_keywords_docs, temperature=0)
instruction
Respond with a detailed title and a short label to best represent the each given topic. Avoid repetitive words such as "Enhancing" or "Improving". Start title with a noun. Label should be terse and up to 4 words. Do not output any of the following words: "improving, bowling, green, warren, county"
user
The following is a dataset of comments from an online discussion. Discussion Question: What do you believe should change in Bowling Green/Warren County in order to make it a better place to live, work and spend time? KEYWORDS: [a set of keywords that describe the topic] SAMPLE STATEMENTS: [a set of statements that best represent the topic]
assistant
TITLE: [a descriptive sentence that represents the topic] LABEL: [terse phrase]
Code
print("Titles:", '\n'.join(topic_titles), "", "Labels:", '\n'.join(topic_labels), sep='\n')
Titles:
Enhancing Educational Opportunities
Rental Property Taxation and Development in Cities
Infrastructure Improvements for Safer Roadways
Proposed Ordinances and Fairness Debates in Bowling Green
Urban Infrastructure Improvements
Community Development and Business Growth
Government Reforms for Transparency and Fairness
Enhancing Downtown Attractions for Young Professionals and Tourists
Urban Development and Parking Solutions in Downtown Bowling Green
Police and Traffic Enforcement
Addressing the Opioid Crisis and Sex Education
Urban Green Spaces Expansion
Wage and Tax Policies for Temporary Workers

Labels:
Education Funding
City Rental Taxes
Roadway Upgrades
Ordinance Debates
Infrastructure Upgrades
Business, Growth
Government Transparency
Downtown Revitalization
Urban Development, Parking
Police Traffic
Opioid Crisis, Sex Education
Green Spaces Expansion
Temp Wage, Tax Policy
 

© 2024 Aaditya Bhatia

  • View source