What This Is
Knowledge Graph RAG applied to structured federal geospatial data at national scale.
Four federal datasets — FEMA NRI, CDC SVI, United Way ALICE, FEMA Declarations —
joined on county FIPS and converted to 3,232 prose documents via a Python pipeline.
LightRAG (EMNLP 2025) ingested all documents, extracted entities and relationships,
and built the knowledge graph you see above.
The result: a hybrid retrieval system that answers relational spatial queries
no SQL join, no vector search, and no standard RAG pipeline can handle.
Why Knowledge Graph RAG, Not Standard RAG
Standard vector RAG retrieves chunks by embedding similarity. It finds counties
whose profiles sound like your query. It cannot follow entity relationships across documents.
LightRAG’s hybrid mode traverses the knowledge graph:
Orleans Parish → HAS_PATTERN → COMPOUND_VULNERABILITY → SHARED_BY → [211 counties]
The graph connects hurricane risk scores, SVI percentiles, poverty rates, and
declaration history as first-class entities — not buried in document chunks.
That is why “counties like pre-Katrina New Orleans” returns 211 structural
analogs with cited evidence — not a ranked list of similar-sounding documents.
Try asking
Which counties look like New Orleans did before Katrina?
Which counties have compound vulnerability with high hurricane risk and high poverty?
What counties have the highest expected annual loss from flooding?
Find counties with very high social vulnerability and very high hurricane risk
Which counties have never received a federal disaster declaration despite high risk?
Each query triggers hybrid retrieval — vector similarity over pgvector embeddings
AND graph traversal over extracted entity relationships. Matching counties highlight
on the ArcGIS JS SDK 4.32 map. FIPS codes are extracted from the LLM response via
a 3-layer parser: (1) explicit FIPS pattern, (2) county name → FIPS lookup table,
(3) bare 5-digit validation.
$3.30
Total cost to build the national knowledge graph.
GPT-4o-mini entity extraction + text-embedding-3-small embeddings.
3,232 counties. Four federal datasets. All 50 states.
The data was always there. The graph was not.
Architecture →
▼
Data Pipeline
FEMA NRI (ArcGIS FeatureServer)→ pandas join on STCOFIPS
CDC SVI (state CSVs)→ aggregated tract→county via groupby/median on STCNTY
FEMA Declarations (OpenFEMA API)→ county FIPS from fipsStateCode + fipsCountyCode
ALICE (United Way ACS)→ poverty rate, unemployment, struggling rate
↓
3,232 prose documents (~350 words each)
Composite flags: PRE_KATRINA_RISK_PATTERN, COMPOUND_VULNERABILITY, CHRONIC_DISASTER_COUNTY, NEVER_DECLARED
↓
LightRAG v1.4.13 ingest→ entity extraction (GPT-4o-mini)
→ knowledge graph (PostgreSQL 17 + pgvector)
→ embeddings (text-embedding-3-small, 1,536 dims)
Query Layer
User query → LightRAG hybrid mode
└ Vector: cosine similarity over pgvector embeddings
└ Graph: entity neighborhood traversal
↓
LLM synthesis (GPT-4o-mini)
↓
FIPS extraction (3-layer parser)
↓
ArcGIS JS SDK 4.32 county highlight
Query Modes
Naive
Active now
Vector similarity only. Fast. Works now.
Hybrid
Vector + graph traversal. Required for relational queries
like pre-Katrina pattern matching. Switching on once KG ingestion completes.
Stack
ArcGIS JS SDK 4.32
LightRAG v1.4.13
PostgreSQL 17 + pgvector
Railway Docker (1GB RAM)
GPT-4o-mini
text-embedding-3-small (1,536 dims)
Single HTML file · Zero build step
explorer.jbf.com
INDEPENDENT RESEARCH PROJECT
Not affiliated with or endorsed by the American Red Cross, FEMA, or any federal agency.
Data sourced from public federal datasets: FEMA NRI, CDC SVI, Census Bureau, United Way ALICE.
Views and analysis are solely those of the author.