Public Services, Powered by AI. How WhatsApp Is Streamlining City Reporting in Cape Town

Using AI and WhatsApp to lower barriers to public service reporting across Cape Town. The prototype enables faster, more inclusive, and more accurate service request logging.

WA
The City of Cape Town processes hundreds of thousands of monthly service requests, covering everything from electricity outages to water leaks and sanitation problems. However, existing service channels all require significant manual effort and are time-consuming. A major constraint is the limited number of human agents available to receive and process these requests promptly. Compounding the issue of agent limitations, data indicates that service requests originating from informal areas are logged less frequently, despite these areas having access to the same channels as formal areas.

Introduction

There is an opportunity to complement existing channels with a faster, more convenient reporting tool. This case study outlines an AI-assisted WhatsApp reporting prototype designed to simplify the process, improve accessibility and reporting speed, enhance data quality, and reduce the burden on frontline teams.

The prototype uses natural language processing and structured categorisation logic to interpret resident reports submitted via text or voice notes. It extracts key details, identifies the service category, and presents a clear summary for the user to verify before submission. The approach aims to make reporting more accessible while supporting faster, more accurate resource allocation within the City.

Context and Problem

The City currently offers multiple reporting channels, including call centres, walk-in facilities, an online portal, and a WhatsApp line dedicated to water and electricity. These remain essential but rely on manual data capture. Residents often struggle to understand and navigate long lists of categories. Staff must spend significant time classifying each request, which creates delays and increases workload.

Advances in AI present an opportunity to simplify this process. With WhatsApp already widely used across Cape Town, introducing an AI-supported flow can make reporting more convenient while improving the quality of information captured at the point of entry.
City Services on WhatsApp – more than just a channel
The prototype provides a single, familiar entry point for service reporting through WhatsApp. Residents describe the issue written in their own words or through a voice note.

Solution Overview:

The AI assistant
Identifies whether the message is a valid service request
Extracts the issue description
Assigns the correct service category using predefined City categories
Additional information is collected using static blocks (without the use of AI):
The system supports English, Afrikaans, and Xhosa and was designed to reduce friction for residents while reducing manual processing for staff.
Location of the issue (shared via a Google Maps pin)
The user’s contact details
Presents a final summary for confirmation

Theory of Change

The table below outlines the expected activities, outputs, outcomes, and impact of the prototype.

Activities

Build the WhatsApp reporting system using AI, pilot with residents and customer relations staff, and refine alongside service teams.

Outputs

A WhatsApp system that allows residents to easily log reports via voice or text directly from their phone, confirm details, and submit categorised requests.

Outcomes

Easier submission process, less manual logging for staff, and higher quality report information.

Impact

More reports logged and resolved faster, with wider reach across the city.

Design Approach

1. AI Classification and Extraction

AI blocks assess whether the resident’s message relates to a municipal service issue and extract the core problem statement verbatim. This ensures clarity and also serves to screen queries ahead of the categorisation step.

2. AI Categorisation:  two approaches

LLM classifier  
A dedicated categorisation function matches the issue against the City’s predefined categories. Early tests revealed that category formats and prompt structures significantly influenced accuracy. The final approach uses:
  • A structured .txt file with consistent category formatting
  • A small number of targeted examples
  • A simplified prompt for reliable matching
Clarification agent
Instead of a single LLM classifier, we experimented with an AI agent. The agent had access to the same category list, but its job was to “disambiguate” a user’s service request when needed, for instance, if a user said that their electricity is out, the agent assessed whether it’s only their property or the entire street — two distinct service request types. In the first approach, a user would need to inspect the summary and then decide to start over if the assigned category was wrong. In this case, the agent would disambiguate before assigning a category. Both approaches are promising.

3. Multilingual Support

Testing highlighted the need for clearer multilingual handling. The final version supports voice and text input and output in English, Afrikaans, and Xhosa, with improved examples and prompt instructions.

4. Voice and Text Flexibility

The flow accommodates both text and voice notes, acknowledging that many residents prefer speaking rather than typing. It also handles switching between input types.

5. User Confirmation

Before submission, residents receive a summary of their report. This step ensures accuracy and builds trust in the system.

6. Integration with the City system

On submission, the key details are sent to the City system, using API’s and a reference number is shared with the resident. They can use the reference number to follow-up on their report.
Development Iterations
Early versions attempted to extract all information using a single prompt referencing a PDF category list. This caused hallucinations, incorrect classifications, and repeated requests for details. The team then tested several variations, including long “all-in-one” prompts and shorter prompts.
The final approach used smaller, focused AI components:
One block to validate the input
One block dedicated to classification
One block to extract the issue description
This modular structure increased accuracy, reduced errors, and improved transparency.

Prompt Experimentation

The team experimented with more than 15 different prompt structures to improve the accuracy of the AI assistant’s responses. At first, the assistant was instructed to extract the issue category, description, and location within a single message.
This approach resulted in several challenges, such as:
01
Hallucinations
02
Requesting information that had already been provided by the user
03
Misclassification of categories
The team also tested using one assistant for multiple languages. However, the assistant struggled to recognise local languages and frequently labelled valid user input as invalid.
Additional experiments included:
  • Instructing the assistant to summarise long user messages before classifying them - If the user described the issue in natural language (e.g., 'Someone dumped bricks on the pavement'), extract or summarize it clearly (e.g., 'Building materials dumped on pavement').
  • Testing different tones of voice—starting with a friendly, conversational tone and later shifting to a more concise one
  • Adding multiple examples of issue classifications directly into the prompt to improve accuracy
The team thoroughly tested each prompt using various issue report examples with different levels of detail to assess how the AI assistant responded. Ultimately, the team decided to separate the prompts by first instructing the AI assistant to categorize the reported issue and then, in a separate step, extract the user’s report without altering it.

Evaluation Approach


Testing With Synthetic Data
To evaluate and refine the categorisation logic, the team developed a synthetic data testing process. This avoided trial and error and provided a systematic view of how well the prompts performed across all service categories.
The process included:
  • Generating synthetic complaints for each category in different tones
  • Running these complaints through the categorisation prompts
  • Comparing intended and predicted categories
  • Manually reviewing errors to understand patterns
This revealed recurring misclassifications, including confusion between departments (e.g., noise complaints assigned to the wrong units) and broad categorisation of infrastructure issues. Insights from this process guided prompt redesign and conversations with the Customer Relations Management team.

A similar approach to testing will be used for further iterations.

Challenges

Hallucinations
The assistant occasionally generated incorrect categories or failed to use the reference file.
Missing or
repeated questions
The assistant sometimes requested information that had already been provided.
Image classification loops
Image recognition is a feature we’re hoping to integrate for future iterations.
Location
Difficulty in finding locations/addresses via WhatsApp location function.
Hallucinations
Model performance was weaker for Afrikaans and Xhosa, which have limited training data. Some valid inputs were incorrectly flagged as incomplete, and smaller models struggled to process these languages reliably. Using a larger model such as GPT-5 improved accuracy, though issues related to dialects, code-switching, and transcription quality still required careful testing and added cost.

Key Learnings

01.

Simplicity improves accuracy

Structured, concise prompts and a clean category file outperformed longer, complex approaches.

02.

Small, specialised prompts work best

Breaking tasks into smaller components reduced errors and improved control.

03.

Iterative testing is essential

Prompt behaviour changed with small adjustments, requiring ongoing refinement.

04.

Synthetic data strengthens reliability

Controlled testing identified weaknesses not visible through manual testing alone.

05.

Multilingual design must be intentional

Clear examples and tailored prompts improved recognition across languages.

06.

Voice is an important channel

Many residents rely on voice notes. Designing for both formats is necessary for inclusion.

07.

Shorter journeys improve completion

Simplified flows helped test groups reach the core value quickly.

08.

Quick prototypes improve alignment

Establish a core AI team for prompt design and testing, supported by teams for CRM, category, and workflow alignment.

09.

Human Capital Requirements

Controlled testing identified weaknesses not visible through manual testing alone.

01.

Simplicity improves accuracy

Structured, concise prompts and a clean category file outperformed longer, complex approaches.

02.

Small, specialised prompts work best

Breaking tasks into smaller components reduced errors and improved control.

03.

Iterative testing is essential

Prompt behaviour changed with small adjustments, requiring ongoing refinement.

04.

Synthetic data strengthens reliability

Controlled testing identified weaknesses not visible through manual testing alone.

05.

Multilingual design must be intentional

Clear examples and tailored prompts improved recognition across languages.

06.

Voice is an important channel

Many residents rely on voice notes. Designing for both formats is necessary for inclusion.

07.

Shorter journeys improve completion

Simplified flows helped test groups reach the core value quickly.

08.

Quick prototypes improve alignment

Establish a core AI team for prompt design and testing, supported by teams for CRM, category, and workflow alignment.

09.

Human Capital Requirements

Controlled testing identified weaknesses not visible through manual testing alone.

Conclusion:

Learnings, Impact, and the Open Playbook
The AI-powered service request logging prototype demonstrates how AI and WhatsApp can lower barriers to reporting and support faster, more accurate routing of service issues within a city. The approach aims to improve data quality, reduce manual processing, and highlight practical methods for designing AI-assisted public services.

Iterative prompt design, multilingual support, and synthetic evaluation played a critical role in developing a reliable model. The methodology is adaptable and can be applied by other cities aiming to strengthen public participation and modernise service reporting through accessible digital channels.

Playbook Journey

A public repository containing reusable flows & journeys, AI prompts, and UX documentation will be published at the conclusion of the project.
Read more

Simplifying Public Service
Reporting with AI and WhatsApp

Transform Your Community with WhatsApp

See how Turn.io’s WhatsApp API and Accelerator power impact at scale.
How AI and WhatsApp can lower barriers and expand
participation in city services.
Contact Turn.io