gis

Beyond the Eye Candy: What Anvaka's Work Actually Teaches

If you've seen anvaka.github.io before, you probably stopped at the visuals. Every road in Tokyo rendered at once. The entire GitHub ecosystem laid out like a galaxy. Particles following vector fields at a million per frame. It's easy to spend twenty minutes clicking and come away thinking "cool, but not relevant to my work."

That's the wrong read. Andrei Kashcha — the developer behind these tools — has spent years solving exactly the problems that show up when you try to visualize large geographic and network datasets in the browser. The visuals are the output. The techniques underneath are what matters.

The Projects

city-roads — Every road in any city rendered from live OpenStreetMap data. Search any city and the full road network appears. Tokyo loads 1.4 million road segments.

map-of-github — 690,000 GitHub repositories laid out as an interactive map. Similar projects cluster together. Uses MapLibre for tile-based rendering.

fieldplay — Up to one million particles simulating flow through a vector field in real time. Entirely GPU-computed — the CPU never touches particle positions after initialization.

Code Galaxies (pm) — npm, Ruby Gems, Go packages visualized as force-directed star clusters. Zoom in and individual package names become readable.

What's Actually Happening Under the Hood

Force-directed layout at scale

The core engine behind map-of-github and the Code Galaxies is ngraph.forcelayout — Kashcha's physics simulation library. Force-directed layout means nodes push and pull each other until they settle into a stable position. Connected nodes attract. All nodes repel each other. The system runs until it reaches equilibrium.

The problem: doing this naively requires comparing every node to every other node — O(n²) complexity. With 690,000 nodes that's 476 billion comparisons per frame. It doesn't run.

The solution is the Barnes-Hut approximation. The algorithm builds a quadtree over all nodes, then approximates the repulsive force from distant clusters as a single interaction rather than computing each node individually. This drops the complexity to O(n log n) — fast enough to compute layouts for massive graphs offline, then ship the result as a binary file for browser rendering.

This is directly relevant to anyone trying to visualize network data geographically — transportation systems, utility grids, shelter routing. The layout problem is the same problem.

The map-of-github pipeline is a GIS workflow in disguise

The process behind map-of-github maps almost exactly onto a GIS data pipeline:

Step	What Kashcha did	GIS equivalent
Data collection	BigQuery analysis of 350M+ GitHub stars	Feature Layer query from AGOL
Similarity	Jaccard similarity between repos	Spatial join, proximity analysis
Clustering	Leiden algorithm → 1,500 communities	K-means or density clustering
Layout	ngraph.forcelayout per cluster, then global	Coordinate assignment, hierarchical rendering
Tiling	GeoJSON → Tippecanoe → vector tiles	Feature Layer → vector tile layer
Rendering	MapLibre GL JS	MapLibre, Deck.gl, ArcGIS JS SDK

The final rendering step uses MapLibre — the same library in the open-source mapping stack worth learning. The graph-as-map approach converts node positions into GeoJSON, runs them through Tippecanoe to generate vector tiles, then serves them through MapLibre exactly as you'd serve any geographic dataset. The rendering layer doesn't know or care whether the data came from a GPS device or a physics simulation.

GPU-first particle simulation in fieldplay

Fieldplay achieves one million animated particles at 60 frames per second by moving the computation entirely to the GPU. The particle positions are stored in a texture, not a JavaScript array. Each frame, a shader reads the current texture, computes new positions using fourth-order Runge-Kutta integration (more accurate than simple Euler stepping), and writes the result to a second texture. Then the textures swap. The CPU never touches particle positions after initialization.

This is the same architecture behind Deck.gl's performance characteristics. The insight — that GPU memory and GPU compute can handle what JavaScript arrays and CPU loops cannot — is the foundation of everything worth building in browser-based data visualization today.

Send your data to the GPU once. Let the GPU handle everything from there. Whether you're rendering 1.4 million road segments, 690,000 repository nodes, or a million particles — the bottleneck is never the GPU. It's always the cost of moving data back and forth between JavaScript and the GPU.

What This Means for GIS Work

Your data pipelines are graph pipelines in disguise. Shelter locations connected to origin counties. Feeding sites connected to disaster zones. Call data routed by county. All of these are graphs — nodes and edges — and the techniques Kashcha developed for visualizing software ecosystems apply directly. The ngraph library is open-source, composable, and dependency-free.

The vector tile workflow is universal. The path from GeoJSON to MapLibre tiles via Tippecanoe works the same whether the data came from a real geographic coordinate or a force-directed layout. If you're building tools that need to handle feature layers at scale without requiring a full ArcGIS enterprise backend, this pipeline is worth understanding.

fieldplay is a mental model, not just a demo. The particle field approach — defining behavior as a mathematical field, letting the GPU compute the rest — maps onto wind visualization, flood flow animation, evacuation route density. The visual language it uses is exactly right for disaster response data. The code is open source.

Kashcha's work lives at the intersection of graph theory, physics simulation, and browser rendering performance. For anyone building GIS tools with the ArcGIS SDK plus an ambition to push beyond what it ships with — this is worth more than a twenty-minute visit.

All projects: anvaka.github.io · Source: github.com/anvaka · ngraph: github.com/anvaka/ngraph