1
Extract
S3 GeoParquet + Indonesia bbox filter
2
Load Raw
PyArrow → Iceberg poi_raw
3
Build Master
Raw → poi_master table
4
Dedup
Remove duplicates
5
Category Map
2000+ categories → 3-level hierarchy
6
Load Custom
Merge internal POI data
7
Aggregate
Grid-based stats (GID)
8
Adv. Aggregate
Buffer/Isochrone catchment
9
Export
Public facility GeoParquet
⚡ Key Features
📦 Apache Iceberg
ACID transactions, time travel, schema evolution on S3 with AWS Glue catalog
🗺️ Geosquare Grid
Advanced spatial indexing (level 12 = ~60m² grid) replacing geohash
📂 Dual Aggregation
Direct grid aggregation + catchment area (buffer/isochrone) methods
🏷️ Category Mapping
2000+ raw Overture categories → standardized 3-level hierarchy
🔄 Monthly Updates
Incremental ETL from Overture releases (2026-04-15.0)
📊 Dashboard
Streamlit + PyDeck for interactive visualization
🔄 Data Pipeline Architecture
📡
Overture Maps (S3)
GeoParquet format with Indonesia bounding box filter (94°E-141°E, 11°S-6°N)
🔄
ETL Pipeline (9 Steps)
Python + DuckDB + PyArrow
🗄️
AWS S3 + Glue Catalog
Iceberg tables (poi_raw, poi_master, poi_clean)
📊
Streamlit Dashboard
Interactive POI visualization
⚙️ Configuration (AWS)
release: "2026-04-15.0"
catalog:
type: "glue"
warehouse: "s3://geosquare-warehouse"
db_name: "geosquare_poi"
aws:
region: "ap-southeast-3"
projects:
- name: "geosquare"
gid_level: 12
aggregation:
precisions: [12]
advanced_aggregation:
mode: "buffer"
buffer:
radius_meters: 500
catchment:
provider: "valhalla"
spatial_filter:
country: "Indonesia"
bbox:
min_lon: 94.0
max_lon: 141.0
min_lat: -11.0
max_lat: 6.0
use_shapefile: true
shapefile_path: "s3://geosquare-data/indonesia_admin.shp"
📈 Processing Distribution
🔬 Two Aggregation Methods
Method 1: Direct Grid Aggregation
Simple count per Geosquare grid cell at specified precision level.
- Level 12: ~60m² cells
- Fast computation: O(n) complexity
- Use case: Overall POI density
- Table: poi_stats_geosquare12
Method 2: Catchment/Isochrone
Calculate service area per POI (buffer or isochrone), then intersect with grid.
- Buffer: Fixed radius (e.g., 500m)
- Isochrone: Travel time (e.g., 5 min)
- Use case: Accessibility analysis
- Table: poi_adv_stats_geosquare12
📋 Sample ETL Execution
$ python run_etl.py
$ python run_etl.py --steps 1,2,3,4
$ python run_etl.py --steps 7
$ python run_etl.py --steps 8
$ python run_etl.py --city jakarta --limit 1000
$ python run_etl.py --config config.aws.yaml