gis

Security First — Building Tools That Assume Breach

The question most GIS developers ask is: Is my data secure?

The better question is: If someone had full access to everything I've built, what would they find?

"Assume Breach" is a security framework that starts from the premise that a determined attacker will eventually get in. The question isn't whether your perimeter holds — it's what they find when it doesn't. Applied to GIS development, it changes how you design tools, where you put data, and what you ever allow yourself to touch during development.

Why GIS Data Is Different

Most software has sensitive data. GIS data is sensitive data with a location attached.

A donor record is sensitive. A donor record that shows where the donor lives, where they work, and the route they drive — that's a different threat surface entirely. Shelter intake records, damage assessment data, volunteer home addresses, patient geocodes from blood drives — GIS workflows routinely handle data that combines PII with precise location in ways that amplify exposure if breached.

This isn't hypothetical. Emergency management organizations, humanitarian groups, and government agencies operate GIS systems that, if compromised, could expose the locations of people in the most vulnerable moments of their lives.

The standard security advice — encrypt at rest, use OAuth, don't hardcode credentials — is all correct. It's also insufficient as a mindset. You need to design as if the encryption fails. As if the token gets stolen. As if someone is already inside.

Schema First, Data Never

The most practical application of Assume Breach in GIS development: never work with real data during development.

This sounds obvious. It almost never happens in practice.

The typical pattern is: a developer gets access to a live feature layer, queries it to see the fields, starts building the tool against it, and before long they're looking at real records in a localhost browser tab, pasting field values into test prompts, and logging attribute objects to the console. None of it leaves the machine. All of it creates risk.

The alternative: work with the schema only.

A schema is the structure of the data — field names, field types, domains, and value ranges — with no actual records attached. Everything you need to build a functional tool is in the schema. Everything you need to test a renderer, a popup, a filter, a query — is derivable from the schema. You don't need a real donor to test whether a dashboard renders a bar chart correctly.

What this looks like in practice:

Request the schema from the layer's REST endpoint (/FeatureServer/0?f=json) — this returns field definitions without records
Build synthetic data that matches the schema exactly — same field names, same types, plausible value ranges
Develop and test entirely against synthetic data
Only point the tool at the live layer for final integration testing, in an environment where that access is logged and appropriate

// Request schema only — no records, no PII
const schemaUrl = "https://services.arcgis.com/[org]/arcgis/rest/services/[layer]/FeatureServer/0?f=json";

// Response gives you field definitions like:
// { name: "DONOR_ID", type: "esriFieldTypeInteger" }
// { name: "GIFT_DATE", type: "esriFieldTypeDate" }
// { name: "GIFT_AMOUNT", type: "esriFieldTypeDouble" }

// Build synthetic records that match:
const syntheticFeatures = [
  { attributes: { DONOR_ID: 1001, GIFT_DATE: 1706745600000, GIFT_AMOUNT: 500 } },
  { attributes: { DONOR_ID: 1002, GIFT_DATE: 1709251200000, GIFT_AMOUNT: 1200 } }
];

// Build and test against synthetic data.
// The tool never sees a real record until it's deployed.

The field names are not sensitive. The data those fields hold is. Keep them separate in your development process, and you eliminate an entire category of accidental exposure.

What Travels, What Stays

Assume Breach forces a specific question for every piece of data in your system: If this were intercepted, what would an attacker have?

The answer determines where it lives.

Stays in the org: Any layer containing PII, location data tied to individuals, operational data about ongoing incidents, or data that is sensitive in combination even if not individually.

Can travel: Aggregated data, public datasets, schema definitions, synthetic test data, summary statistics with identifiers removed.

The practical rule: if you have to think about whether something is okay to put in a GitHub repo, it's not okay. Your threat model should be GitHub-visible code that connects to private services — not code that embeds or proxies the data itself.

This shapes the architecture of the tools you build. A donor analytics dashboard doesn't query donor records from the browser — it calls a server function that queries, filters, and returns only the aggregated result. The browser never sees individual records. If someone intercepts the browser traffic, they get totals. Not names.

AGOL-Specific Rules

ArcGIS Online has its own surface area. A few rules that follow directly from Assume Breach:

Group-share, never org-share, for sensitive layers. Org-sharing makes a layer visible to every user in the org. Group-sharing limits it to a defined list. The default should be group-share, and the group should be small.

Never put sensitive layers in public web maps or apps. It's easy to accidentally embed a private layer in a public Experience Builder app. The layer won't load without credentials — but the URL is visible in the page source. URL discovery is a real attack vector.

Service account tokens expire for a reason. Non-expiring tokens are convenient and dangerous. If you're using one, it's probably already in too many places. Audit where it is.

Check sharing before you build. Before you embed a layer in anything, verify its sharing settings. Don't assume. The REST endpoint (?f=json) will tell you.

The AI Complication

AI integrations introduce a new vector that most GIS security frameworks haven't caught up with.

When you use a language model to reason about map data — clicking a feature, sending its attributes to an API — the data leaves your environment. If those attributes include PII, you've just sent PII to a third party. The fact that it's anonymized on the other end doesn't change what traveled over the wire.

The Assume Breach answer: develop with synthetic data, demo with synthetic data, label it clearly, and treat production AI integration as an IT approval conversation.

For internal use cases — tools that never leave your network — a local model via Ollama eliminates the data-leaving-the-building problem entirely. Same capability, zero external transmission.

The chain is: schema first, synthetic data always, local model where possible, external API only after explicit approval.

Assume Breach isn't paranoia. It's the recognition that the question is not if — it's when, and what's exposed when it happens. Build with that assumption, and the answer to the second question becomes: not much.

Work with schemas and field names. Never with data.