In today’s fast-paced policy environment, timely and accessible public information isn’t just a nice-to-have—it’s foundational to democratic life. But in practice, policy updates can be scattered, hard to find, or locked behind legacy formats that limit public comprehension and engagement.
As someone who works at the intersection of civic tech, policy design, and public participation, I often ask myself:
What would it look like if citizens could access policy information as easily as checking the weather or tracking a package?
Familiar with policy advocacy and issue-related work, I naturally know where to find policy information. However, some questions are frequently raised by other participants in the various civic discussions I take part in, which has made me start thinking about what can be done: How well are policies in Taiwan communicated to the public?
To explore these questions, I created a “micro project” titled “TW Civic Agent – Policy Information in Taiwan”, now published on Kaggle. This notebook represents my attempt to better understand the landscape of policy information dissemination in Taiwan—through structured data, exploratory analysis, and a civic-minded lens. However, due to time and platform constraints, this is currently just a very basic demo. To officially launch the tool, a complete frontend interface and deployment are still required.
This post walks you through the thinking behind this project, how the tool is structured, and why I believe it could become part of a broader ecosystem of civic innovation.
Policy Communication is Still Fragmented
Taiwan’s digital governance landscape is, in many ways, good. From participatory platforms like Join.gov.tw to agile pandemic responses, it’s clear that public sector innovation is alive.
And yet, the infrastructure for communicating everyday policy remains fragmented.
- Government updates are spread across multiple websites, often without consistent URLs or formats.
- Public announcements are frequently issued as static PDFs or image-based news posts.
- There is no unified public interface for discovering new policies across ministries, sectors, or topics.
This disjointedness makes it hard for educators, civil society actors, or even policy professionals to keep track of evolving public information—let alone the general public.
I didn’t want to start by criticizing this system. I wanted to prototype an alternative.
Build a Civic-First Information Scaffold
So, this is the vision.
Rather than conducting deep analysis or drawing conclusions, I approached this project as an infrastructure-building exercise. The central question became:
Can we create a unified, structured, and extensible way of collecting and organizing policy-related updates in Taiwan—something that others can build upon, analyze, or turn into services?
The focus here is on building capacity, not creating insights. My intention was to offer a public-facing tool that could:
- Collect structured metadata about policy updates
- Standardize titles, categories, publication dates, and source agencies
- Serve as a modular base for further development—be it research, journalism, or public services
What the Notebook Actually Does
However — and this is the reality we always have to face. As I mentioned earlier, due to time constraints, I have only tested some of the features. So, next, I will explain the actual content included in the Kaggle notebook.
The notebook performs several key functions that are modular and extensible by design:
Policy Metadata Extraction
At the very start, I construct an initial, hand-coded dataset using a Python dictionary—not to analyze, but to demonstrate what structured civic metadata should look like.
Each field represents essential metadata about a government-issued post. I included a sample of ten entries to model what future automation might produce, and more importantly, to propose a schema that reflects actual user needs:
- title: what the post is called (e.g. “Public Welfare Policy Update”)
- source_name: which agency published it
- source_link: where to find the post
- date: when it was published
- category: how to loosely classify it (e.g., education, labor)
- type: the form of communication (news, regulation, etc.)
- source_type: format origin (website, open data, etc.)
I’m proposing a practical, human-readable schema that civic technologists, researchers, and citizens can rely on. It’s simple, transparent, and encourages replication.
Enables Data Export for Further Use
I export the table as a CSV to emphasize that the dataset is ready to leave the notebook—for others to reuse, visualize, or integrate.
This positions the notebook not as a closed demonstration, but as the first node in an open, remixable information system.
Simulating System Memory
Before any feature runs, I initialize shared global states:
conversation_history
tracks the dialogue between the user and the assistant.subscriptions
holds mock user preferences (e.g., which policy topics they want to receive updates on).
These mimic how a real conversational platform would store session memory or user preferences—critical for contextual replies and tailored recommendations.
Few-shot Examples
By using examples, the model can better understand the task.
Interactive / Non-Interactive Mode
In the published notebook, the simulation is conducted in a non-interactive mode. However, within the notebook, you can also generate simulated conversations by removing the # in front of the `run chat()`. The result is shown in the images below.


Who This Is For
While I created this as a prototype, I see potential for various actors to benefit from and build on this foundation:
- Civic technologists can build dashboards, alerts, or policy browsers with live updates.
- Journalists can use the database to track patterns in public communication and identify what gets prioritized.
- Educators and advocates can develop civic literacy curricula or campaigns with timely content.
- Government innovation teams can see this as a blueprint for improving internal data workflows and public transparency.
This tool doesn’t claim to solve all information problems—it simply clears a path forward.
What’s Next
While this notebook establishes a basic framework for policy metadata organization, its real potential lies in being a launchpad for building a dynamic civic information infrastructure. Below are the next-phase ideas—both from what I noted in the notebook and what I envision for future contributors.
Automating the Pipeline: From Manual to MLOps
Right now, the dataset is manually curated. To make it scalable and sustainable, we’ll need to shift toward an automated data collection and curation process. This opens the door to applying MLOps practices—the integration of machine learning with DevOps—for intelligent policy data workflows.
Next steps include:
- Building Scrapers or API Connectors for government websites and open data portals
- Training NLP models to classify and tag new policy texts by topic or tone
- Automated deduplication and cleaning to handle variations across sources
- Unit testing and monitoring to ensure pipeline stability
With MLOps orchestration (e.g., via tools like Airflow, Prefect, or GitHub Actions), this project could evolve into a semi-autonomous civic observatory.
Designing a Front-End for Citizens and Researchers
While this notebook is designed for data creators and developers, its real impact depends on public access. That means building a clean, intuitive, multilingual front-end interface for non-technical users.
Potential features:
- Searchable directory of recent policy posts
- Filters by category, agency, or date range
- Weekly policy update subscriptions (via email or Line)
- Natural-language summaries or keyword tagging
Tools like React (or Vue), TailwindCSS, and Next.js would make it possible to create a modern, fast-loading, responsive frontend that feels accessible and empowering to the general public.
Building a Back-End Database and API
To move beyond static CSVs and toward real-time functionality, this project will eventually require a back-end database and public RESTful API or GraphQL endpoint.
Architecture suggestions:
- Use PostgreSQL or MongoDB for structured and semi-structured content
- Create a FastAPI or Django REST Framework back-end to serve metadata
- Schedule ETL pipelines with cron or containerized workflows (e.g., Docker + GitHub Actions)
This would allow third-party developers, researchers, and civic apps to pull structured policy data on demand—expanding the impact far beyond what the original notebook can do.
Deploying as a Live Public Tool
With front-end and back-end components in place, the next step is deployment—making it available to the world. Depending on scale and budget, here are possible pathways:
- Streamlit or Gradio for MVP: Lightweight deployment of an interactive UI for quick demos or hackathons
- Heroku / Render / Vercel: Easy cloud platforms for mid-scale web hosting
- GCP/AWS: Scalable hosting for collaboration with civic institutions
- Authentication Layer: Optional sign-ins or admin portals for content moderation or curation
By containerizing the project (e.g., Docker), we can support reliable, reproducible deployments across environments.
Connecting to Other Civic Datasets
The current notebook focuses on policy announcements, but this same schema could be extended to include or link with:
- Legislative voting records
- Socioeconomic indicators by region or demographic
- Civic feedback tools (e.g., public surveys, Join.gov.tw proposals)
- Researchs results
This opens the door to building multi-layered civic dashboards, tailored for NGOs, journalists, educators, or concerned citizens.
I don’t see this notebook as a finished product. I see it as an open invitation to developer, researcher, civic hacker, public servant, or just general public who interested in this issue.
My Brief Reflections
One of the lessons I’ve learned from this process is that building civic infrastructure often starts with very small, low-glamour steps. But behind these basic components is a deeper philosophy:
That usable public information is a civic right, and that designing for reuse is a civic act.
This project began as a technical sketch—a way of showing that policy data in Taiwan could be organized, structured, and standardized. But what excites me most is its potential to become part of something much larger: an open civic data infrastructure, designed for participation, transparency, and usability.
There’s still much to do. But we now have a foundation to build on. And if others are willing to contribute—designers, developers, linguists, data scientists, or civil servants—I believe we can make civic information in Taiwan radically more open and meaningful.
If this vision resonates with you, feel free to fork the notebook, build a crawler, suggest an API structure, or just get in touch. The full starter code is down below:
Together, we can turn data into dialogue—and infrastructure into empowerment.