Conference Talk|Leveraging Python for Community Policy Evaluation in Taiwan

While still in development, this project emphasizes the sustainability of the approach. Leveraging Python's open-source tools and libraries, I hope to provide a modular analysis process that allows users of varying technical backgrounds to quickly adopt and apply it flexibly to various community policy discussion contexts, and help enable community members to engage more effectively and influence policy decisions.

My name’s KC. I’m from Taiwan and work as a data scientist, product owner, enterprise agile coach, and workshop facilitator. We all play various roles in society, and today, I want to focus on one important role of mine—being a citizen.

I’ll share with you why and how I leverage Python for policy evaluation.

This all began with a very simple question—how can we bridge the growing divides in our communities using the tools we have in our hands today?

Before diving into that question, let me briefly share the background of this micro project.


Background

About the Forums

It all started nearly 20 years ago with young people’s dissatisfaction with public affairs. From conflicts to sitting down with government officials for discussions, it eventually led to the establishment of regular deliberative democracy forums starting in 2006, inviting youth to discuss various topics. After nearly 20 years, there’s no doubt that there have been lots of changes.

Since 2017, the large-scale forums have transitioned into multiple micro-forums, each with around 30 participants. The government sets the theme for the year, and youth teams organize the discussion forums, deciding for themselves the topics to be included under the annual theme and establishing their own discussion processes. Ultimately collaborate with government members through discussions as well.


However, regardless of the format, the unchanging element over the past 20 years is that these discussions must be conducted
in the spirit and model of deliberative democracy.

My Roles in This Series

In this long-standing democratic festival in Taiwan, I’ve participated as an attendee, facilitator, and part of the organizing team.

However, since 2020, I have taken on a new role as a mentor, guiding youth organizing teams to optimize the quality and process of their deliberative discussions.

At the same time, this series of forums has been included in the Taiwanese government’s Open Government Commitments, prompting me to start reflecting on my different roles.

The questions that have lingered in my mind for a long time are gradually becoming clearer.

Whether we could find a more organic and effective evaluation method, as annual adjustments relied on review meetings with only a few participants.

To be honest This was my first attempt to apply Python techniques in some other fields I’m familiar with—facilitation and policy analysis.


Applying Python

My First Try in ’22

At that time, I proposed an experimental plan in which I asked the interviewers to go to the forum site. They would observe and record the discussion status based on the guidelines and reference indicators I designed, combining both qualitative and quantitative data. Additionally, I also requested the interviewers to conduct actual interviews with the organizing team.

In brief, I used four sources to cross-evaluate several variables: interview contents, observer notes, participant satisfaction, and questionnaires

While this project had lots of limitations in interpreting results, it highlighted how such translation and visualization can intuitively convey the value of discussions and opinions across various fields.

If you’re interested, you can check out the process I followed.

Process & Learning of the Experimental Project

First, of course, was preprocessing, followed by crucial speech-to-text conversion. I used the pydub for silence detection, and then employed Google’s speech_recognition API to convert the audio recordings of the interviews into text.

Just clarify that although this project isn’t a rigorous research study, I still ensured clear communication about the use of audio and data and obtained informed consent from the interviewees.

With the transcripts in hand, I then moved on to the text mining process, clustering the team’s thoughts to understand their expectations and goals and identify the pains and gains experienced during the
execution process.

So, what does this attempt signify?

I’m pleased to say that since that year, relevant departments have continued to conduct observations and evaluations through think tanks, and adjustments have been made based on those evaluations.

Of course, I wouldn’t confidently say that this all stems entirely from this experimental project, but I’m glad to take part in collaborating with the department and think tank to share previous findings and challenges, helping to refine this mechanism further.

While this may seem like a small change, I resonate with Professor Cass Sunstein’s view that conflicts can be resolved through communication and deliberation.This urgent point isn’t about completely incompatible theories; it starts with concrete issues and obtaining consensus from all parties. This is what called “incompletely theorized agreement.”

Constructing change from small actions, which may seem insignificant but plant the seeds of trust, allowing them to gradually sprout and thrive.


What I’m Trying This Year

Motivation

Over the past few years, I’ve also tried using Python and AI in various forms of facilitation related to policy discussions, aiming to address issues like staff shortages or tight schedules while maintaining quality.

However, today I want to share some of my thoughts from this year.

This year, two significant events for me occurred:
First, I just joined PyLadies Taiwan as a volunteer for couples of weeks, and second, I discussed the integration of AI and other technologies during an activity for workshop facilitators.

This sparked a question in my mind—

Can connection and dialogue reduce polarization, and how can we ensure that the outcomes of these discussions continue to progress?


This is also related to
a sense of efficacy, which influences everyone’s motivation and willingness to engage in such activities.

Practically speaking, beyond evaluating and analyzing the project or conclusions, it’s crucial to consider how stakeholders and nonprofits can utilize this information effectively.

So, the things I’m thinking about and trying this year can be divided into two parts:

The first is analyzing the data released from this series of activities, and the second is finding ways to lower the barriers for issue advocates to use the data and understand how to apply the conclusions from the deliberative discussions happening across Taiwan to policy initiatives.

Process

Those micro deliberation forums require youth organizing teams to provide participants with discussion materials and conclusion reports, which must also be publicly available. These documents are made available for review and download in PDF format on the official website.

So, first, we need to parse the PDFs and convert them into usable text formats, for example, using the PyMuPDF.

Next, we’ll move on to the NLP part, starting with word segmentation.

For Mandarin, the commonly used jieba library is great, but considering the context and language habits here in Taiwan, I’ll use another locally developed one. Then, conduct topic modeling; there are several different options available. In my analysis, I plan to use chinese-RoBERTa-wwm-ext followed by Principal Component Analysis for dimensionality reduction. After that, I’ll use DBSCAN for clustering and visualize the reduced dimensions with matplotlib, extracting keywords using TF-IDF.

Some of the Discussions

The choice of DBSCAN is mainly because it’s effective in finding different sentence clusters and is beneficial for semantic analysis. However, it can consume more time and computational resources. Meanwhile, using gensim and LDA has advantages in topic interpretability. To compare which method is more suitable, I will try both.

Challenges & Limitations

The challenge lies in the fact that those conclusions reports were already reorganized by the facilitators and organizing teams. So they may be too broad, which may impact the interpretation and understanding of the results.

Since there is still a final co-creation discussion with the government that hasn’t taken place yet, my experimental plan is still in its early stages.

For me The real key of this plan is how to lower the barriers for issue advocates to utilize the public conclusion reports. Combining my learnings from PyLadies Taiwan, I’m considering using Google Colab for explanations and teaching, and integrating with Gemini so that advocates without programming backgrounds can also learn how to analyze the data. In addition, the majority of universities in Taiwan use Google for Education, which is one of the reasons I choose to use Colab.

Moving in this direction, I’m currently balancing the computational resources needed for evaluation and analysis while figuring out how to provide suitable prompts.

You can refer to this outline for Colab.

First, I’ll cover the basic environment and instructions to help users understand how to use Colab.

Next, we’ll discuss how to connect with Gemini. However, I also recognize that some users may not fully grasp how to operate it and may find it challenging. Additionally, the majority of general users in Taiwan use ChatGPT. This may become even more apparent after the integration of Apple Intelligence with ChatGPT on iOS, as Apple is the smartphone brand with the highest market share in Taiwan. Therefore, I will also note that users can copy the prompts into ChatGPT and then update the generated results in Colab.

Once users understand how to operate it, things will become much simpler.

I will divide it into several chapters: handling PDFs, word segmentation, and topic modeling—here, I will provide more guidance on integrating Gemini, helping users know how to choose the most suitable methods.

Next, I’ll cover dimensionality reduction, visualization, and keyword extraction.

The sections will be broken down according to the actual steps involved, including my methods, any necessary pre-installation content, and corresponding prompt guidelines.

The Road Ahead

The core of this micro-project is quite simple; it’s essentially about the data processing and text mining that many data scientists do daily. This project is still in the early stages, focusing on preprocessing data and segmenting the publicly available files.


The person on the slide is Albert Camus, who once said, “Each generation doubtless feels called upon to reform the world. Mine knows that it will not reform it, but its task is perhaps even greater. It consists in preventing the world from destroying itself.”


PyLadies is a community where we experience support and empowerment. I hope to extend Python’s reach and our community’s resilience through this platform, empowering more citizens and issue advocates. More importantly, I believe that through the actions of people in our community, we can sow seeds of trust, create connections, and prevent the world from falling apart in these uncertain and ambiguous times.

KC @ PyLadies Con 2024