Climate Vulnerability Assessment
LST, Air Temperature and Heat Stress Indicator Estimator for Delhi
Technical Documentation
Platform: Google Earth Engine (JavaScript API)
Study Area: Delhi, India — all administrative wards
Analysis Period: 2015–2026 (monthly resolution)
Document Language: English
Document Purpose: Complete technical reference for researchers, scientists, urban planners, Policy makers and science journalist
Table of Contents
Dashboard Overview — What, Why, and Where
Glossary of All Terms — Definitions, Interpretation, Ranges, Legend, Source
Complete Data Source Inventory
Satellite Data to 30-Metre Analysis Grid — How It Works
Formula Terms — Origin and Scientific Explanation
Derived Indices — Full Methodology, Validation, and Limitations
References
Dashboard Overview
1.1 What This Dashboard Is
This is an interactive, web-based geospatial analysis tool built on Google Earth Engine (GEE) — a cloud computing platform operated by Google that provides access to a global catalogue of petabytes of satellite imagery and geospatial datasets. The dashboard is designed specifically for Delhi, India, the world's second-most populous megacity, which regularly experiences severe and lethal heat events during the pre-monsoon and monsoon seasons.
The user selects any calendar year between 2015 and 2026, and any month, then clicks Run. The system automatically:
Downloads and processes Landsat 8/9 satellite images to measure how hot the ground surface is (Land Surface Temperature, LST) at 30-metre pixel resolution.
Downloads and processes Sentinel-2 satellite images (or falls back to Landsat when Sentinel-2 is unavailable) to measure vegetation cover (NDVI), built-up density (NDBI), and water body presence (MNDWI) for that month.
Downloads ERA5-Land hourly reanalysis data from the European Centre for Medium-Range Weather Forecasts (ECMWF) to extract monthly average 2-metre air temperature, dewpoint temperature, 10-metre wind speed, and downwelling solar radiation for the Delhi region.
Applies a statistical downscaling model to translate the coarse ERA5 meteorology (grid spacing approximately 9–11 km) down to a nominal 30-metre spatial resolution, using the satellite-observed surface patterns as spatial anchors.
Computes three internationally recognised thermal comfort and physiological heat stress indices — WBGT, UTCI, and Heat Index (HI) — at 30-metre resolution.
Computes a Heat Vulnerability Index (HVI) using four independent weighting methods, then automatically selects the one that best tracks the observed surface temperature pattern.
Displays all outputs as interactive maps with dynamic, month-specific legends, a pixel inspector (click any location to read values), ward-level summaries for all Delhi wards, and a year-vs-year comparison tool covering 2015 to 2025.
1.2 Why This Dashboard Was Built — The Core Problem
India's capital region experiences some of the most extreme heat events in South Asia. Between 2010 and 2022, Indian Heat Action Plans documented thousands of excess deaths attributable to heat. Yet heat is not uniform across a city. In Delhi, a ward covered in dense asphalt and concrete rooftops can be 8–12 °C hotter at the surface than a ward with park cover or a water body, even when both wards are reported as the same temperature by a single weather station.
Conventional approaches to heat monitoring in India rely on a small network of weather stations operated by the India Meteorological Department (IMD) and the Central Pollution Control Board (CPCB). These stations provide accurate point measurements but cannot resolve the sharp spatial gradients within a ward, between a park and a road 500 metres away, or between a slum and a commercial area sharing the same postcode.
This dashboard was built to solve that problem by fusing three independent data streams — satellite surface observations, atmospheric reanalysis, and ground station climatology — to produce spatially continuous heat stress maps at 30-metre resolution, covering all 272+ wards of Delhi for any month from 2015 onwards.
1.3 Where This Tool Is Useful
Sector Application
Urban and Disaster Management Identify which specific wards consistently rank highest on LST, air temperature, WBGT, or HVI, as direct inputs to ward-level heat action planning under NDMA guidelines.
Public Health and Epidemiology Spatially correlate heat stress estimates with health outcome data (hospital admissions, mortality records) at ward scale, enabling exposure-response research in Indian cities.
Environmental and Urban Planning Quantify the thermal benefit of increasing green cover (NDVI) or reducing built-up density (NDBI) within specific wards; compare pre- and post-intervention months.
Climate Research and Education Demonstrate multi-year temporal trends in heat indices for Delhi (2015–2025), examine inter-annual variability driven by monsoon timing, urban expansion, or ENSO.
Journalism and Policy Communication Access a standardised, reproducible heat risk estimate for any ward in any month, rather than relying on anecdotal reports or station averages.
1.4 Overall Methodology Summary
The analytical pipeline follows five sequential phases:
Phase 1 — Satellite Observation (30 m): Monthly median composites of LST from Landsat and NDVI/NDBI/MNDWI from Sentinel-2 provide the spatial skeleton of surface conditions at 30-metre resolution.
Phase 2 — Atmospheric Reanalysis (9–11 km): ERA5-Land hourly data provide the meteorological backbone — air temperature, humidity, wind, and solar radiation — at a coarse grid that covers Delhi in roughly 15–18 pixels.
Phase 3 — Statistical Downscaling (11 km → 30 m): A multi-variable regression model — trained at the ERA5 grid scale using the satellite observations as predictors — transfers spatial detail from the 30-metre satellite field onto the coarse ERA5 field, yielding physically consistent 30-metre estimates of air temperature (Ta30), relative humidity (RH30), wind (Wind30), and solar radiation (Solar30).
Phase 4 — Heat Index Computation: Standard published formulae (Stull 2011 for wet-bulb temperature; ISO 7243 conceptual framework for WBGT; a linearised approximation of Bröde et al. 2012 for UTCI; Rothfusz 1990 for Heat Index) are applied pixel-by-pixel at 30 m.
Phase 5 — HVI and Ward Statistics: Four variants of a Heat Vulnerability Index are computed; the best-performing variant (by Pearson correlation with LST) is selected. Zonal statistics aggregate all indices to ward level for ranking, export, and comparison.
---
2. Glossary of All Terms
This section defines every quantity computed or displayed in the dashboard. Each entry is self-contained: a reader unfamiliar with remote sensing or climatology can read one entry and understand that quantity completely.
---
2.1 LST — Land Surface Temperature (°C)
What it is:
Land Surface Temperature is the temperature of the physical surface of the Earth — roads, rooftops, bare soil, tree canopies, river beds — as measured by a thermal infrared sensor on a satellite. It is distinct from the air temperature measured by a weather station at 2 metres above the ground. LST is the temperature you would measure if you pointed an infrared thermometer at the ground from above.
Why it matters:
LST is the primary driver of urban heat. Dense built-up surfaces (asphalt, concrete, dark rooftops) absorb solar radiation and emit it as heat, reaching LST values of 50–60 °C on a May afternoon while a nearby park might be 35 °C. This contrast — invisible to a single weather station — defines where heat is stored and re-radiated into the urban atmosphere.
How it is computed in this dashboard:
Landsat 8 and Landsat 9 satellites carry the Thermal Infrared Sensor (TIRS), which measures emitted radiation in the 10.60–11.19 µm wavelength band (Band 10, `ST_B10`). USGS distributes this as Collection 2 Level-2 surface temperature data, where the digital number must be converted to Kelvin using the USGS-specified scaling factors, then to Celsius:
```
LST (°C) = ST_B10 × 0.00341802 + 149.0 − 273.15
```
The coefficient `0.00341802` and offset `149.0` are USGS-specified scale factors for Collection 2 Level-2 products. The subtraction of `273.15` converts Kelvin to Celsius. These are not arbitrary constants; they are radiometric calibration parameters published by USGS/NASA and verified against ground blackbody radiometers.
From individual scenes to monthly composite:
All Landsat 8 and Landsat 9 scenes for a given month are merged and cloud-masked using `QA_PIXEL` bit flags (bits 0, 3, and 4 — fill pixels, cloud shadows, and clouds are excluded). A per-pixel temporal median is then computed across all valid observations. The median is preferred over the mean because it is robust against outliers (single overheated pixels from residual cloud edges) and over the simple mosaic (which would assign priority by date and create sharp orbit-seam artefacts).
A 200-metre radius circular focal mean (spatial smoothing) is applied after compositing to reduce the visible seam between adjacent Landsat World Reference System (WRS) paths, which can cause artificial brightness steps in Delhi that spans two paths.
The radius of 200 m is chosen for two compounding reasons specific to this workflow, and represents a deliberate, conservative choice over a smaller kernel.
First, the WRS path boundary seam in a monthly median composite is not a sharp 1-pixel step — it is a gradual radiometric transition that can span 3–6 pixels (90–180 m) in practice. This is because the two overlapping Landsat paths (Path 146 and Path 147 over Delhi) are acquired on different dates within the same month. The surface reflectance and emissivity conditions can differ between these dates due to soil moisture changes, vegetation phenology, and atmospheric variability. When the per-pixel temporal median is computed across both paths, pixels near the boundary draw from a mixed pool of observations from both paths, creating a transition zone wider than a single pixel edge. A 200 m kernel (approximately 6–7 pixels radius) reliably covers this full transition width under worst-case monthly conditions, whereas a smaller kernel would leave a residual half-smoothed step visible in months with strong inter-path radiometric contrast.
Second, and critically, LST is the primary predictor in the statistical downscaling model used to generate Ta30, RH30, Wind30, and Solar30. Any residual seam left in LST after compositing is multiplied by the regression coefficient b₁ and transferred directly into every downscaled meteorological field, then propagating further into WBGT, UTCI, and Heat Index. A conservative 200 m smoothing at the LST source stage suppresses this downstream artefact cascade before it can amplify through multiple derived layers. This principle — applying stronger smoothing to a variable that functions as a predictor in subsequent models — is consistent with the quality-control guidance in statistical downscaling literature (Maraun and Widmann 2018; Wilby et al. 2002).
Spatial filtering using a focal mean kernel is a standard technique in digital image processing for removing localised radiometric discontinuities without altering the global statistics of the image (Lillesand et al. 2015). Its application to multi-path Landsat composites is documented in Flood (2013) and White et al. (2014), both of which identify WRS path boundary artefacts as a known issue in monthly median compositing workflows. The 200 m radius preserves all meaningful within-ward spatial variation — Delhi wards range from 1 to 5 km² in area, so a 200 m smoothing kernel affects at most 4% of the spatial extent of the smallest ward.
Physical range for Delhi:
Winter (Jan–Feb): approximately 15–30 °C
Pre-monsoon (Apr–Jun): approximately 35–58 °C
Monsoon (Jul–Sep): approximately 28–42 °C
Post-monsoon (Oct–Dec): approximately 20–38 °C
Map legend behaviour:
The legend is dynamic — it changes every time you select a different month or year. The dashboard finds the minimum and maximum LST value across all of Delhi for that specific month/year combination, then divides this range into four equal-width intervals (bins). The four colours in the legend (blue → yellow → orange → red) represent these four bins from coolest to hottest. This approach ensures the legend is always informative regardless of season — it does not use a fixed temperature scale.
For single-scene preview mode, a fixed scale of 27 / 30 / 33 / 36 / 38 / 40 / 42 °C is used (CSE-style thematic classification).
Data source: USGS / NASA — Landsat Collection 2 Level-2, `LANDSAT/LC08/C02/T1_L2` (Landsat 8) and `LANDSAT/LC09/C02/T1_L2` (Landsat 9).
---
2.2 NDVI — Normalised Difference Vegetation Index (dimensionless, −1 to +1)
What it is:
NDVI is a numerical indicator that uses the difference between near-infrared (NIR) and red light reflectance to quantify the density and health of green vegetation. Healthy green plants strongly absorb red light (for photosynthesis) and strongly reflect NIR light (cellular structure). Bare soil, roads, and rooftops reflect both wavelengths more similarly. Water absorbs NIR strongly.
Why it matters:
Vegetation reduces surface temperature through evapotranspiration (plants release water vapour, which cools the air — analogous to sweating) and by providing shade that reduces solar radiation reaching the ground. Wards with high NDVI consistently show lower LST. NDVI is therefore a key predictor in the downscaling model and a component of the Heat Vulnerability Index.
Formula:
Using Sentinel-2 (preferred):
`NDVI = (Band 8 − Band 4) / (Band 8 + Band 4)` = (NIR − Red) / (NIR + Red)
Using Landsat fallback:
`NDVI = (SR_B5 − SR_B4) / (SR_B5 + SR_B4)` = (NIR − Red) / (NIR + Red)
The bands differ between sensors because Sentinel-2 and Landsat use different numbering, but both capture NIR (around 842 nm for Sentinel-2, 865 nm for Landsat 8/9) and Red (around 665 nm for Sentinel-2, 655 nm for Landsat 8/9).
Interpretation of values:
NDVI Range Typical land cover
< 0.0 Water bodies, snow, bare rock
0.0 – 0.15 Dense built-up, bare soil, concrete
0.15 – 0.30 Sparse vegetation, mixed urban-green
0.30 – 0.50 Moderate vegetation (parks, agricultural)
> 0.50 Dense green cover (forest, healthy crops)
Path to 30-metre grid:
Sentinel-2 has native bands at 10 m (Red, Green, Blue, NIR) and 20 m (SWIR). The NDVI is computed on the monthly composite image, then a 120-metre radius circular focal mean is applied to reduce tile-boundary artefacts between adjacent MGRS tiles (Sentinel-2 tiles covering Delhi can come from different overpass dates within a month, creating subtle radiometric steps). The radius of 120 m is chosen specifically to address the Sentinel-2 MGRS tile-boundary artefact. Because adjacent MGRS tiles may be acquired on different days within a month, the bidirectional reflectance distribution function (BRDF) — the way sunlight reflects from a surface at different viewing angles — can differ between tiles, creating a gradual radiometric step that spans approximately 3–4 pixels (90–120 m). A 120-metre radius (4 pixels) is the minimum that reliably blends this BRDF-induced step without erasing real within-ward spectral contrasts. Note that 120 m is less than the 200 m applied to LST: this is intentional. LST uses a larger kernel because it is the primary predictor in the downscaling model and must be pre-smoothed more conservatively to prevent downstream artefact cascading. NDVI, NDBI, and MNDWI play secondary predictor roles with smaller regression coefficients, so a 120 m kernel — sufficient to remove their tile artefact — is appropriate without over-smoothing the spectral information they contribute. The output is combined with LST at the 30-metre default projection.
Map legend: Dynamic four-class legend (dark sandy → light yellow-green → green → dark forest green), breaks updated per month/year.
Data source: `COPERNICUS/S2_SR_HARMONIZED` (Sentinel-2 Surface Reflectance Harmonised); Landsat C2 SR as fallback. NDVI concept: Tucker (1979), widely applied in urban heat island studies including Mathew et al. (2016) for Delhi.
---
2.3 NDBI — Normalised Difference Built-up Index (dimensionless, −1 to +1)
What it is:
NDBI is an index that highlights built-up or impervious surfaces (buildings, roads, parking areas) using the difference between shortwave infrared (SWIR) and near-infrared (NIR) reflectance. Built-up materials (concrete, asphalt, metal rooftops) reflect more SWIR than NIR, giving positive NDBI values. Vegetation shows negative NDBI (high NIR, lower SWIR).
Why it matters:
Built-up density is one of the most important drivers of the urban heat island effect. High-NDBI wards have more impervious surfaces that absorb and store solar heat during the day and radiate it at night, preventing the natural cooling that vegetated areas experience. NDBI is a component of HVI and a predictor in the downscaling model.
Formula:
Using Sentinel-2:
`NDBI = (Band 11 − Band 8) / (Band 11 + Band 8)` = (SWIR1 − NIR) / (SWIR1 + NIR)
Using Landsat fallback:
`NDBI = (SR_B6 − SR_B5) / (SR_B6 + SR_B5)` = (SWIR1 − NIR) / (SWIR1 + NIR)
Interpretation of values:
NDBI Range Typical surface
< −0.10 Dense vegetation, water
−0.10 – 0.00 Mixed urban-green, sparse built-up
0.00 – 0.10 Moderate built-up density
0.10 – 0.20 High built-up density (dense urban fabric)
> 0.20 Very high built-up, industrial, large rooftops
Map legend: Six-class legend with dynamic lower two bins (computed from monthly min–max) and fixed upper boundaries at 0.104, 0.20, and 0.40 to provide consistent labelling for the high built-up range.
Data source: `COPERNICUS/S2_SR_HARMONIZED`; Landsat C2 SR fallback. NDBI concept: Zha, Gao and Ni (2003), International Journal of Remote Sensing.
---
2.4 MNDWI — Modified Normalised Difference Water Index (dimensionless, −1 to +1)
What it is:
MNDWI uses green and shortwave infrared bands to enhance the detection of open water bodies (rivers, canals, ponds, lakes). The Yamuna river, Delhi's drains, and any significant water body will show strongly positive MNDWI values, while built-up areas show negative values.
Why it matters in this analysis:
MNDWI serves as an additional predictor in the statistical downscaling model. Water bodies have distinctly different thermal and meteorological behaviour from land surfaces — they moderate local temperature and humidity. Including MNDWI allows the regression model to account for the influence of the Yamuna and other water features when extrapolating ERA5 meteorology to 30 metres.
Note: MNDWI is not a weighted component of the Heat Vulnerability Index (HVI) in this code. It was tested and removed from the HVI weighting but retained as a downscaling predictor.
Formula:
Sentinel-2:
`MNDWI = (Band 3 − Band 11) / (Band 3 + Band 11)` = (Green − SWIR1) / (Green + SWIR1)
Landsat fallback:
`MNDWI = (SR_B3 − SR_B6) / (SR_B3 + SR_B6)` = (Green − SWIR1) / (Green + SWIR1)
Path to 30 m: Same as NDVI and NDBI — computed on the monthly composite, smoothed with a 120-metre focal mean, stacked at 30-metre default projection. The same 120 m radius is used for MNDWI as for NDVI and NDBI because all three are computed from the same Sentinel-2 MGRS tile composite and therefore carry the identical BRDF-induced tile-boundary artefact. Applying the same radius ensures that all three indices entering the downscaling model have a consistent spatial smoothness, preventing any one predictor from introducing a sharper artefact than the others into the regression.
Map display: MNDWI is computed and used as a downscaling predictor and in ward statistics exports but does not have a dedicated top-bar layer button. It is exported as band "MNDWI" in ward-level analysis tables.
Data source: `COPERNICUS/S2_SR_HARMONIZED`; Landsat C2 SR. Concept: Xu (2006), International Journal of Remote Sensing.
---
2.5 Elevation — Digital Elevation Model (metres)
What it is:
Elevation is the height of the land surface above mean sea level, in metres, derived from the Shuttle Radar Topography Mission (SRTM) operated by NASA and USGS in February 2000. SRTM used two radar antennas — one in the shuttle payload bay and one on an 60-metre mast — to produce a single-pass interferometric radar elevation model covering 80% of Earth's land surface at approximately 30-metre horizontal resolution.
Why it matters:
Temperature decreases with altitude (the environmental lapse rate in the troposphere is approximately 6.5 °C per 1000 m). Although Delhi's total elevation range is modest (approximately 200–265 m across the city, rising toward the Aravalli ridge in the southwest), this variation is still measurable in ERA5 data and provides additional predictive power in the downscaling regression. Higher-elevation wards (near the Aravalli foothills) tend to be slightly cooler. Elevation is used exclusively as a predictor in the downscaling model, not as a standalone heat stress indicator.
Physical range for Delhi: Approximately 200–265 metres above mean sea level, with the highest values near the Aravalli ridge (Mehrauli, Vasant Kunj area) and the lowest near the Yamuna floodplain.
Map legend: Fixed four-class classification:
Class 1: < 215 m (lowlands, Yamuna floodplain)
Class 2: 215–230 m
Class 3: 230–245 m
Class 4: > 245 m (Aravalli ridge)
Data source: NASA / USGS — `USGS/SRTMGL1_003`, band `elevation`. Data acquired February 2000; accuracy approximately ±16 m vertical, ±20 m horizontal.
---
2.6 ERA5-Land Meteorological Fields (Coarse, ~9–11 km)
What ERA5-Land is:
ERA5-Land is the land-surface component of the ERA5 atmospheric reanalysis produced by the European Centre for Medium-Range Weather Forecasts (ECMWF) as part of the Copernicus Climate Change Service (C3S). "Reanalysis" means a systematic reconstruction of historical atmospheric conditions using a numerical weather prediction model constrained by all available observations (weather balloons, surface stations, aircraft, satellites) through a process called data assimilation. ERA5-Land provides hourly data from 1950 to present at approximately 9-km resolution, globally.
ERA5 is considered the most comprehensive and accurate freely available global meteorological dataset for historical analysis.
Fields used in this dashboard:
Field ERA5 variable name Units Role
2 m Air Temperature `temperature_2m` Kelvin → converted to °C Main temperature input to downscaling, WBGT, HI, UTCI
2 m Dewpoint Temperature `dewpoint_temperature_2m` Kelvin → °C Used to compute relative humidity
10 m Wind (u-component) `u_component_of_wind_10m` m/s Combined with v for wind speed
10 m Wind (v-component) `v_component_of_wind_10m` m/s Combined with u for wind speed
Solar Radiation `surface_solar_radiation_downwards` J/m² → W/m² (÷ 3600) Input to UTCI radiation proxy
How monthly means are computed:
All hourly ERA5-Land images for the selected month are averaged. Air temperature and dewpoint are converted from Kelvin to Celsius by subtracting 273.15. Wind speed is computed as the vector magnitude: `wind = √(u² + v²)`, then clamped to the physically valid range of 0.5–17 m/s. Solar radiation is divided by 3600 to convert from accumulated J/m² to mean W/m².
Smoothing before downscaling:
ERA5 data at the 9–11 km grid spacing show sharp pixel boundaries when displayed at high zoom. Before the downscaling regression, each ERA5 field is resampled using bilinear interpolation at a 1-km target resolution (`smoothE5` function). Why bilinear at 1 km? Bilinear interpolation estimates pixel values as a distance-weighted average of the four surrounding ERA5 grid points, producing a smooth continuous surface without introducing new information. This step serves two purposes: it removes the visually distracting blocky ERA5 pixel-edge artefacts from the training response variable (which would otherwise cause the regression to learn to predict these artificial steps as if they were real spatial gradients), and it aligns the ERA5 field to a consistent 1-km projection before the aggregation step. The code comment explicitly states: "Smooth ERA5 11km grid → 1km bilinear to remove sharp pixel boundary artifacts: without this, 11km ERA5 pixel edges appear as straight lines in UTCI/WBGT/HI maps." Bilinear resampling as ERA5 preprocessing is standard practice in downscaling applications (Maraun and Widmann 2018; Hersbach et al. 2020).
Data source: ECMWF Copernicus Climate Data Store — `ECMWF/ERA5_LAND/HOURLY`. DOI: 10.24381/cds.e2161bac.
---
2.7 RH — Relative Humidity (%, 0–100)
What it is:
Relative Humidity is the ratio of the actual amount of water vapour present in the air to the maximum amount the air can hold at that temperature and pressure, expressed as a percentage. At 100% RH, the air is saturated and condensation (dew, fog, rain) occurs. At 20% RH, the air is very dry.
Why it matters:
RH is central to all physiological heat stress indices. The human body cools itself primarily through sweat evaporation. At high RH, sweat cannot evaporate efficiently (the air is already nearly saturated), and the body's cooling mechanism fails. This is why a 35 °C day with 80% RH (typical Delhi monsoon) is physiologically more dangerous than a 40 °C day with 20% RH — even though the air temperature is lower.
How it is computed:
RH is not directly measured by ERA5 at the surface. Instead, it is calculated from the two ERA5 temperature fields — 2 m temperature (Ta) and 2 m dewpoint temperature (Td) — using the Magnus formula:
```
RH = 100 × exp(17.625 × Td / (Td + 243.04)) / exp(17.625 × Ta / (Ta + 243.04))
```
This formula exploits the definition of dewpoint: it is the temperature to which air must be cooled at constant pressure for saturation to occur. The ratio of saturation vapour pressures at Td and Ta gives the fractional humidity. The formula is the standard approximation in meteorology (Lawrence 2005); the coefficients 17.625 and 243.04 are the Magnus equation parameters for the temperature range −40 to +60 °C.
CPCB Bias Correction:
ERA5 reanalysis systematically underestimates surface relative humidity in Delhi, particularly during the hot and pre-monsoon months. For example, ERA5 gives approximately 25% RH for May in Delhi, while measurements from CPCB/DPCC air quality monitoring stations and IMD surface observation stations record approximately 40% RH for the same period.
To correct this bias, monthly mean RH values computed offline from the project CPCB.csv dataset (covering 38 stations across Delhi, 2015–2025) are embedded in the code as a monthly climatology table:
Month CPCB Mean RH (%) Month CPCB Mean RH (%)
January 72.7 July 70.3
February 59.9 August 72.7
March 48.9 September 67.0
April 33.1 October 55.8
May 40.1 November 59.4
June 50.6 December 66.5
An additive correction is applied: the Delhi-wide mean of the downscaled ERA5 RH field is brought to match the corresponding CPCB monthly mean by adding a constant offset to every pixel. This preserves the spatial pattern from the downscaling model while correcting the city-wide mean. The corrected RH is clamped to the physically valid range of 5–100%.
Data source: ERA5-Land Ta and Td; Magnus formula (Lawrence 2005); CPCB station data (embedded offline-computed climatology).
---
2.8 Ta30 — Downscaled 2-Metre Air Temperature (°C, at 30 m)
What it is:
Ta30 is the estimated monthly mean 2-metre air temperature at 30-metre spatial resolution, produced by statistically downscaling the coarse ERA5 air temperature field using satellite-observed surface patterns as spatial predictors.
This is different from LST (which is the surface skin temperature). Air temperature at 2 metres is the temperature measured by a standard weather station and is what the human body is immersed in. The relationship between LST and air temperature is not fixed — it depends on wind, humidity, vegetation type, and time of day — which is why a full downscaling model is needed rather than a simple equation.
Why it matters:
All three heat stress indices (WBGT, UTCI, HI) require air temperature, not surface temperature. Without downscaling, we would have to use the coarse ERA5 value, which assigns the same air temperature to the entire central Delhi grid cell (~10 km across), erasing ward-level variation. Ta30 provides a spatially resolved air temperature field anchored to the physical surface conditions at 30 m.
How the downscaling works (non-technical summary):
Think of it as a smart interpolation. We know that at the ERA5 grid scale (~11 km), warmer areas tend to have higher LST, lower NDVI, higher NDBI, and lower elevation. We use a statistical model trained at that coarse scale to learn these relationships, then apply it at 30-metre resolution, where we have high-resolution satellite data for LST, NDVI, NDBI, MNDWI, and elevation. The result is an air temperature map that preserves the city-wide mean from ERA5 but adds spatially realistic within-city variation from the satellite observations.
Mathematical formula (linear regression form):
```
Ta30 ≈ b₀ + b₁ × LST + b₂ × NDVI + b₃ × NDBI + b₄ × MNDWI + b₅ × Elevation
```
where b₀ through b₅ are regression coefficients estimated from the ERA5 data itself at the 11-km scale for each month and year. The coefficients change every run.
Post-processing steps:
Values are clamped to a physically valid range of 10–55 °C. Why these specific bounds? The lower bound of 10 °C corresponds to the minimum possible monthly mean air temperature for Delhi — even January, the coldest month, has a long-term monthly mean of approximately 14 °C at Safdarjung (IMD records). A regression model trained on ~15–18 ERA5 pixels can produce extrapolated predictions below this physical minimum when applied to 30-m pixels at the edges of the Delhi geometry; the 10 °C clamp prevents these physically impossible values from propagating into WBGT, HI, and UTCI calculations. The upper bound of 55 °C exceeds Delhi's historically observed monthly-mean air temperatures by a substantial margin — monthly city-wide means in peak summer are typically far below this threshold — providing sufficient headroom for spatial predictions while excluding regression artefacts that would push air temperature outside physically plausible ranges for this region. (Note: this documentation intentionally avoids asserting a single disputed "all-time maximum" value from one station/date.)
Mean anchoring: The Delhi-wide spatial mean of Ta30 is adjusted to exactly match the Delhi-wide spatial mean of ERA5 2-m air temperature. This removes any systematic bias the regression introduces while preserving spatial patterns. A study by Siddiqui et al. (2021) validated a similar approach for Indian cities.
A 150-metre focal mean is applied as a final smoothing pass. The radius of 150 m — larger than the 120 m applied to the raw NDVI input — is chosen because the statistical downscaling model can slightly amplify spatial discontinuities in the predictor fields. When the regression coefficients assign a strong weight to NDVI (which carries a 120 m-smoothed S2 tile artefact), the artefact can re-emerge in the Ta30 output with a slightly broader footprint than in the original NDVI layer. The 150 m radius (5 pixels) is therefore the minimum size that reliably captures this propagated seam while preserving ward-scale air temperature variation. The code comments explicitly note: "150m radius = ~5 pixels — enough to hide the seam, preserves ward-level detail" (Roy et al. 2010; Wilby et al. 2002 for downscaling artefact handling principles).
Map legend: Dynamic four-class legend from the monthly Delhi minimum to maximum Ta30.
Data source: ERA5-Land + Landsat LST + Sentinel-2 indices + SRTM; statistical downscaling model trained at 11-km ERA5 resolution.
---
2.9 RH30 — Downscaled Relative Humidity (%, at 30 m)
What it is:
RH30 is the downscaled and bias-corrected relative humidity at 30-metre resolution, derived from the ERA5 RH field using the same statistical downscaling pipeline as Ta30. After downscaling, it is corrected toward the CPCB monthly climatology using the additive bias correction described in Section 2.7, then smoothed with a 500-metre focal mean to reduce ERA5 grid-edge artefacts.
Physical range: 5–100% (clamped), with typical Delhi values:
Summer (May): 30–50% before correction; 35–55% after CPCB anchoring
Monsoon (July): 60–90%
Winter (January): 60–85%
Spatial smoothing: A 500-metre focal mean is applied after downscaling and bias correction. This radius is substantially larger than those used for satellite-derived variables (120–200 m) because the source of artefacts here is different — it is not a sensor tile boundary but the blocky grid structure of the ERA5 reanalysis (~11 km cells). After downscaling, residual ERA5 grid-cell edges can appear as faint rectangular boundaries in the RH30 field. A 500 m kernel is required to blend transitions that occur at the scale of approximately 11 km grid steps, while still preserving the meaningful within-city RH variation that exists at the ward scale (typically 1–5 km²). Relative humidity is also a physically smooth quantity — it does not exhibit sharp point-to-point variation at scales below 500 m in an urban environment — so the larger smoothing radius does not erase genuine spatial information.
Map legend: Dynamic four-class legend, computed at 120-metre resolution for speed.
---
2.10 Wind30 — Downscaled 10-Metre Wind Speed (m/s, at 30 m)
What it is:
Wind30 is the downscaled 10-metre mean wind speed at 30-metre resolution. ERA5 provides the east–west (u) and north–south (v) components of wind; the speed is computed as `√(u² + v²)`.
Why wind matters for heat stress:
Wind enhances convective heat loss from the human body and increases sweat evaporation. The UTCI model shows that even a 1 m/s increase in wind speed at high temperature (>35 °C) can reduce thermal stress by 1–2 °C equivalent. Conversely, hot winds (loo) in May–June can increase heat stress.
Range: Clamped 0.5–17 m/s. Why these exact bounds? These are the explicit valid input bounds for wind speed in the UTCI model as specified by Bröde et al. (2012): the UTCI polynomial regression was trained on data covering va = 0.5 to 17 m/s, and applying it outside this range produces extrapolated outputs that are not physiologically validated. The minimum of 0.5 m/s is also physically motivated — calm conditions in a real urban environment never produce absolute zero wind at 10 m height, and allowing near-zero values would create a near-singularity in the convective heat transfer terms of the UTCI formula. Monthly mean wind speed in Delhi ranges from approximately 1.5 m/s (October) to 4.5 m/s (June loo season), comfortably within these bounds (Muñoz-Sabater et al. 2021; Bröde et al. 2012).
Spatial smoothing: A 1000-metre focal mean is applied — the largest radius used for any variable in this workflow. Two independent reasons justify this choice:
First, wind speed from ERA5 carries the same ~11 km grid-cell boundary artefact as RH30, requiring at minimum a ~500 m smoothing radius. However, wind is physically the most spatially homogeneous variable in this analysis. Unlike temperature and humidity, which can vary sharply between a shaded park and an open road at the 30-metre scale, monthly mean wind speed in an urban area varies gradually over distances of several kilometres, driven by large-scale pressure gradients and broad urban roughness patterns. A 1000 m radius therefore removes artefacts without erasing any real fine-scale wind variation that is present in the data.
Second, and more critically, wind speed enters the UTCI approximation formula with a coefficient of −2.5078 — by far the largest magnitude coefficient in the formula. This means that any ERA5 grid-cell boundary step in wind (even a small one of 0.2–0.3 m/s) would produce a visible straight-line artefact of 0.5–0.75 °C in the UTCI map. A 1000 m smoothing kernel is therefore necessary not just to clean the wind field itself, but to protect the UTCI output from inheriting these amplified artefacts. The 1000 m radius is the minimum that consistently eliminates this signal for all months tested.
---
2.11 Solar30 — Downscaled Surface Solar Radiation (W/m², at 30 m)
What it is:
Solar30 is the downscaled monthly mean downwelling shortwave solar radiation at the surface, at 30-metre resolution. ERA5 `surface_solar_radiation_downwards` is an accumulated quantity in J/m²; dividing by 3600 converts it to mean W/m² (watts per square metre).
Why it matters:
Solar radiation is the primary energy source driving both surface temperature and physiological heat stress. The UTCI model uses solar radiation to estimate mean radiant temperature (Tmrt) — the radiation load experienced by a human body standing outdoors. Under a clear May sky in Delhi, incoming solar radiation can exceed 800 W/m². Shaded or cloudy areas receive much less.
Spatial smoothing: A 1000-metre focal mean is applied, identical in radius to Wind30. Solar radiation at the monthly mean scale is one of the most spatially smooth geophysical quantities in the analysis — it is driven by cloud cover patterns and aerosol optical depth, both of which vary over tens of kilometres rather than hundreds of metres. ERA5 grid-cell boundary artefacts are therefore the only meaningful spatial discontinuity in Solar30 at the city scale, and the 1000 m kernel reliably removes them without discarding any real spatial information. Applying a smaller radius (e.g., 300–500 m) would leave residual grid-cell artefacts that would propagate into the UTCI mean radiant temperature proxy, which uses solar radiation as a direct input.
---
2.12 WBGT — Wet-Bulb Globe Temperature (°C)
What it is:
Wet-Bulb Globe Temperature (WBGT) is the internationally recognised standard heat stress index used in occupational health, sports medicine, and military training. It was developed by Yaglou and Minard (1957) for the United States Navy to prevent heat casualties during training exercises. ISO 7243:2017 defines WBGT as the primary standard for assessing thermal environments in occupational settings.
WBGT is a composite measure that integrates three physical dimensions of the thermal environment:
Wet-bulb temperature (Tw): Reflects the combined effect of temperature and humidity — how quickly the body can cool through sweating.
Globe temperature (Tg): Measured by a black-painted copper sphere (globe thermometer) that captures solar radiation, reflected radiation, and convective heat exchange — representing the radiant heat load.
Dry-bulb temperature (Ta): Standard air temperature.
Standard outdoor WBGT formula (ISO 7243:2017):
```
WBGT (outdoor) = 0.7 × Tw + 0.2 × Tg + 0.1 × Ta
```
The weights (0.7, 0.2, 0.1) reflect the physiological importance of each component: evaporative cooling (Tw) is the dominant mechanism (70% weight), radiant heat (Tg) second (20%), and dry-bulb temperature third (10%).
How it is computed in this dashboard:
Step 1 — Wet-bulb temperature (Tw):
The Stull (2011) empirical formula approximates Tw from air temperature (Ta, °C) and relative humidity (RH, %) without iterative calculations:
```
Tw = Ta × arctan(0.151977 × √(RH + 8.313659))
+ arctan(Ta + RH)
− arctan(RH − 1.676331)
+ 0.00391838 × RH^1.5 × arctan(0.023101 × RH)
− 4.686035
```
This formula is valid for Ta: −20 to +50 °C, RH: 5–100%, and has a root mean square error of 0.65 °C against psychrometric calculations (Stull 2011). It is widely used in remote sensing–based heat stress estimation because it can be applied pixel-by-pixel without iterative convergence.
Step 2 — Globe temperature proxy (Tg):
A true globe thermometer cannot be measured from satellite. The dashboard approximates globe temperature as:
```
diff = LST − Ta, clamped to [−5, +20] °C
Tg = Ta + 0.5 × diff, bounded between Ta and LST
```
This is a physically motivated estimate: in direct sunlight, the globe thermometer reads higher than air temperature, and the excess is related to the excess of LST over air temperature. The 0.5 scaling factor represents a conservative estimate of this relationship. A known limitation (documented in the code) is that this underestimates true Tg by 3–8 °C because LST captures only surface longwave emission, not shortwave solar loading and sky longwave components.
Step 3 — WBGT:
```
WBGT = 0.7 × Tw + 0.2 × Tg + 0.1 × Ta
```
WBGT heat stress thresholds (NIOH India, ISO 7243, ACGIH):
WBGT (°C) Risk Category Recommended action (light work)
< 26 Low No restriction
26 – 30 Moderate Monitor; encourage hydration
30 – 33 High Reduce exposure time
33 – 38 Very High Mandatory rest periods
> 38 Extreme Stop work, seek shade immediately
Note: The dashboard map legend uses dynamic month-specific bins (five equal-width classes from Delhi min–max WBGT), not the fixed ISO/NIOH thresholds above. The fixed thresholds are provided here for physical interpretation only.
Path to 30 m: Computed from Ta30, RH30 (both 30 m), and LST (30 m). A 300-metre focal mean is applied as a final smoothing step. The 300 m radius is chosen as a deliberate intermediate value — larger than the Ta30 smoothing radius (150 m) but smaller than the ERA5-scale smoothing applied to RH30 (500 m). Note that LST carries a 200 m smoothing radius, so the 300 m applied to WBGT is also larger than LST's own smoothing — this is intentional, since WBGT combines LST with Ta30 and RH30, each contributing residual artefacts that must be collectively addressed. The rationale is that WBGT is a composite of three inputs (Tw from Ta30 and RH30; Tg from LST and Ta30; Ta from Ta30), each carrying its own residual artefact at a different spatial scale. When these are combined in the WBGT formula, their individual artefacts can partially reinforce or partially cancel. A 300 m kernel is the minimum radius that consistently removes artefacts from all contributing inputs simultaneously, while preserving the ward-level WBGT variation that is the primary output of interest. The code comments confirm: "300m focal_mean on derived thermal indices: hides any residual S2/Landsat tile seam that propagates from Ta30/LST." Final projection: EPSG:4326, 30 m.
Limitations:
Tg is a proxy, not a measured globe temperature; may underestimate WBGT by 3–8 °C in peak-sun conditions.
Stull (2011) approximation accuracy degrades below 5% RH and above 50 °C.
Monthly mean composites cannot capture extreme hourly peaks (afternoon peak WBGT may be 5–10 °C above the monthly mean).
Data source: Ta30, RH30, LST (derived in this workflow). Formula: Stull (2011), ISO 7243:2017.
---
2.13 UTCI — Universal Thermal Climate Index (°C, linearised approximation)
What it is:
The Universal Thermal Climate Index (UTCI) is the most comprehensive biometeorological heat stress index currently available. It was developed by a consortium of European researchers (Bröde et al. 2012) under the COST Action 730 framework and adopted by the World Meteorological Organization (WMO) and the International Society of Biometeorology (ISB) as the preferred index for characterising human thermal environments in outdoor settings.
UTCI is defined as the equivalent air temperature at which a reference person (walking at 4 km/h, with standard metabolic rate and clothing insulation) would experience the same physiological strain (core temperature, sweat rate, skin wettedness) as in the actual environment. It therefore integrates all four environmental drivers of heat exchange: air temperature, radiant temperature, humidity, and wind.
Full UTCI model vs. approximation used here:
The full UTCI requires the calculation of mean radiant temperature (Tmrt) and then solving a sixth-order polynomial regression with over 200 coefficients for UTCI as a function of Ta, Tmrt − Ta, va (wind speed), and specific humidity (Bröde et al. 2012). This polynomial is computationally expensive and requires simultaneous radiation measurements from multiple directions, which are not available from satellite data alone.
This dashboard implements a linearised approximation of UTCI that captures its major dependencies:
Step 1 — Mean radiant temperature proxy (Tmrt):
```
D = (LST − Ta), clamped to [−30, +70] °C ← UTCI valid range for Tmrt offset
solarNorm = Solar30 / 800, clamped to [0, 1] ← normalised solar load
alpha = 0.7 × solarNorm + 0.1 ← blending factor (0.1 at night, 0.8 at peak sun)
Tmrt = Ta + D × alpha, bounded to [Ta−30, Ta+70]
```
This approach (similar to Thorsson et al. 2007) blends surface temperature with air temperature using solar loading as the blending weight. When solar radiation is high, Tmrt approaches the surface temperature; at night or under dense cloud cover, it approaches air temperature.
Step 2 — Linearised UTCI:
```
UTCI_approx = 3.21 + 0.872 × Ta + 0.2459 × Tmrt − 2.5078 × va − 0.0176 × RH
```
This linear form is derived from a simplified regression on the Bröde et al. (2012) UTCI reference table. The coefficients confirm the physical expectations: Ta and Tmrt increase UTCI (heat), wind decreases UTCI (cooling), and RH has a small negative coefficient in this linear range (increased humidity slightly reduces the UTCI signal because it increases Tw indirectly — but this RH term is secondary to the temperature terms).
UTCI thermal stress categories (WMO / ISB standard):
UTCI (°C) Stress Category
< 9 No thermal stress
9 – 26 Moderate heat stress
26 – 32 Strong heat stress
32 – 38 Very strong heat stress
38 – 46 Extreme heat stress
> 46 Very extreme heat stress
Limitations:
The linear approximation captures main-effect relationships but misses non-linear interactions in the full UTCI polynomial. For precise UTCI values (e.g., for epidemiological exposure-response modelling), the full Bröde et al. (2012) polynomial with proper Tmrt input is required.
Path to 30 m: All inputs at 30 m. A 300-metre focal mean is applied — the same radius as WBGT — for the same underlying reason: UTCI is computed from five inputs (Ta30, LST, RH30, Wind30, Solar30), each smoothed at a different radius (150 m, 200 m, 500 m, 1000 m, 1000 m respectively). Although Wind30 and Solar30 have already been heavily smoothed, Ta30 carries residual artefacts at the ~150 m scale and LST at the ~200 m scale. In the UTCI formula, the Ta and Tmrt terms (both derived partly from LST) carry positive coefficients that can cause these finer-scale artefacts to surface in the output. A 300 m final smoothing pass removes these residuals consistently. Band renamed `UTCI_approx`.
Data source: Ta30, LST, RH30, Wind30, Solar30 (all from this workflow). Formula: linearised from Bröde et al. (2012), International Journal of Biometeorology.
---
2.14 HI — Heat Index (°C, apparent temperature)
What it is:
The Heat Index, also called the "feels-like" or "apparent" temperature, is an index developed by the United States National Weather Service (NWS) based on the regression work of Rothfusz (1990). It estimates what temperature a person would perceive, combining the effect of heat and humidity on the human body's ability to cool itself through sweating.
Heat Index is simpler than WBGT or UTCI — it uses only air temperature and relative humidity — and is widely used in public communication because it translates directly to a "feels like X °C" message.
Formula (Rothfusz regression, applied in °F then converted to °C):
The formula is applied in Fahrenheit (Tf = Ta × 9/5 + 32), then converted back:
```
HI (°F) = −42.379
+ 2.04901523 × Tf
+ 10.14333127 × RH
− 0.22475541 × Tf × RH
− 6.83783×10⁻³ × Tf²
− 5.481717×10⁻² × RH²
+ 1.22874×10⁻³ × Tf² × RH
+ 8.5282×10⁻⁴ × Tf × RH²
− 1.99×10⁻⁶ × Tf² × RH²
```
Adjustments are applied for low-temperature conditions (Tf < 80°F → simple formula) and high-humidity conditions (Tf ≥ 80°F and RH > 85% → upward adjustment). Final result converted to Celsius: `HI (°C) = (HI (°F) − 32) × 5/9`.
NWS Heat Index categories:
HI (°C) Category Symptoms
27 – 32 Caution Fatigue possible
32 – 39 Extreme Caution Heat cramps, exhaustion
39 – 51 Danger Heat cramps, stroke likely
> 51 Extreme Danger Heat stroke imminent
Path to 30 m: Computed from Ta30 and RH30 (both at 30 m). A 300-metre focal mean is applied — consistent with WBGT and UTCI. Heat Index uses only two inputs (Ta30 at 150 m smoothing, RH30 at 500 m smoothing). Of these, Ta30 carries the finer-scale residual artefact at ~150 m. Applying 300 m here ensures that HI, WBGT, and UTCI are all processed through the same final smoothing level, making their spatial patterns directly comparable in the ward ranking panel and in the year-vs-year comparison tool. Using a different radius for HI would introduce a systematic difference in the apparent spatial sharpness of one index relative to the others, which could mislead ward-level comparisons.
Data source: Ta30, RH30. Formula: Rothfusz (1990), NWS Technical Attachment SR 90-23.
---
2.15 HVI — Heat Vulnerability Index (dimensionless, 0–1)
What it is:
The Heat Vulnerability Index is a composite index that ranks spatial locations by their physical susceptibility to high heat, integrating multiple satellite-derived indicators of surface heat stress and land cover. Unlike WBGT, UTCI, or HI (which measure physiological heat burden on the body), HVI measures the exposure environment — how hot, how built-up, and how vegetation-deprived a location is — as a single normalised score from 0 (least vulnerable) to 1 (most vulnerable).
Components used:
Normalised LST — surface thermal exposure (normalised from 25–55 °C to 0–1). Why 25–55 °C as the normalisation range? These bounds define the physically relevant range of monthly mean LST for Delhi across all seasons and all years in the 2015–2025 period. A value of 25 °C represents the approximate lower bound of LST in the coolest winter wards (shaded parks, water bodies in January), while 55 °C represents the extreme upper bound observed in the most intensely built-up areas in peak May. Normalising within this fixed range ensures comparability of HVI values across different months and years — a key requirement for the year-vs-year comparison tool. If dynamic min–max normalisation were used instead, a ward that is "moderately hot" in May would receive the same HVI score as a "moderately hot" ward in January, eliminating the ability to compare heat vulnerability across seasons.
Inverted Normalised NDVI — vegetation deficit (high NDVI → low vulnerability; inverted so high value = high vulnerability)
Normalised NDBI — built-up intensity
Note: MNDWI was tested as a fourth component but removed from HVI weighting in this version. It appears in function signatures for compatibility but carries zero weight.
Four weighting methods computed in parallel:
Method 1 — Equal Weights:
```
HVI_Equal = (1/3) × L + (1/3) × (1−V) + (1/3) × B
```
Where L = normalised LST, V = normalised NDVI, B = normalised NDBI. A naive baseline that assumes no prior knowledge about which factor is more important.
Method 2 — Literature-Based Weights:
```
HVI_Literature = 0.40 × L + 0.30 × (1−V) + 0.30 × B
```
Weights reflecting the finding in urban heat island literature that surface temperature (LST) is the dominant driver of heat exposure, with vegetation deficit and built-up density as secondary factors (Mathew et al. 2016; Weng et al. 2004).
Method 3 — PCA (Principal Component Analysis) Weights:
Variance-based weighting. A sample of 500 pixels at 100-metre resolution is drawn from Delhi. The variance of each normalised component is computed. The weight for each component is proportional to its variance — variables with more spatial variability get more weight because they carry more discriminatory information. Weights are normalised to sum to 1 and bounded at a minimum of 0.1 to prevent any component from being excluded entirely.
```
w_L = Var(L) / (Var(L) + Var(V_inv) + Var(B)), similarly for V_inv and B
```
Method 4 — Entropy Weights:
Coefficient of variation–based weighting. For each component, the coefficient of variation (CV = standard deviation / mean) is computed. Higher CV means more discriminatory power (more relative spread). Weights are proportional to CV, normalised to sum to 1.
```
w_L ∝ CV(L) = StdDev(L) / Mean(L), similarly for V_inv and B
```
Best method selection:
After computing all four HVI variants, the dashboard samples 500 pixels at 100-metre resolution and computes the Pearson correlation of each HVI variant with LST. The variant with the highest absolute correlation is selected as the best-performing HVI for mapping and all subsequent analyses. This is a structural validation: the selected HVI is the one that most consistently tracks the observed surface temperature distribution, confirming that the weighting scheme is capturing the same thermal signal as direct thermal observation.
Rationale: Since we cannot validate HVI directly against health outcomes (which would require ward-level morbidity data), correlation with LST provides a physically meaningful performance criterion — a higher correlation confirms the index is responding to the same surface heat processes that drive thermal stress on humans.
Map legend: Dynamic four-class legend (cool yellow to deep red) from the monthly Delhi HVI minimum to maximum.
Data source: LST, NDVI, NDBI (all from this workflow). Weighting concepts: equal/literature (Mathew et al. 2016; Weng et al. 2004); PCA weights (Abdi and Williams 2010 method); entropy weights (Shannon 1948 information-theoretic concept applied to index construction).
---
3. Complete Data Source Inventory
Variable / Layer Google Earth Engine Collection ID Provider Native Resolution Time Coverage
LST (Land Surface Temperature) `LANDSAT/LC08/C02/T1_L2` (Band ST_B10) USGS / NASA 30 m 2013–present
LST `LANDSAT/LC09/C02/T1_L2` (Band ST_B10) USGS / NASA 30 m 2021–present
NDVI, NDBI, MNDWI (primary) `COPERNICUS/S2_SR_HARMONIZED` ESA / Copernicus 10–20 m April 2015–present
NDVI, NDBI, MNDWI (fallback) Landsat C2 SR bands USGS / NASA 30 m 2013–present
Elevation (DEM) `USGS/SRTMGL1_003`, band `elevation` NASA / USGS 30 m February 2000 (static)
Air Temperature, Dewpoint `ECMWF/ERA5_LAND/HOURLY` (`temperature_2m`, `dewpoint_temperature_2m`) ECMWF / Copernicus C3S ~9 km 1950–present (hourly)
Wind Speed `ECMWF/ERA5_LAND/HOURLY` (`u_component_of_wind_10m`, `v_component_of_wind_10m`) ECMWF / Copernicus C3S ~9 km 1950–present
Solar Radiation `ECMWF/ERA5_LAND/HOURLY` (`surface_solar_radiation_downwards`) ECMWF / Copernicus C3S ~9 km 1950–present
Ward boundaries `projects/.../assets/DELHI_TOTAL_WARDS` Project GIS asset Vector —
Study boundary `projects/.../assets/Delhi_Boundary` Project GIS asset Vector —
IMD air temperature data `projects/.../assets/IMD_DELHI_DATA` India Meteorological Department (digitised) Tabular Selected months
CPCB RH climatology Embedded in code from `CPCB.csv` CPCB / DPCC station network 38 stations / monthly means 2015–2025
Ta30, RH30, Wind30, Solar30 Derived (downscaling model) — 30 m (nominal) Inherits from inputs
WBGT, UTCI, HI, HVI Derived (formulae) — 30 m Inherits from inputs
---
4. Satellite Data to 30-Metre Analysis Grid
4.1 Why 30 Metres?
The 30-metre grid is chosen because it is the native resolution of Landsat thermal data, the primary driver of all spatial patterns in this analysis. At 30 m, a single pixel covers 900 m² — approximately the size of a medium building footprint. This is fine enough to distinguish a park from an adjacent road, and resolves within-ward variation in large Delhi wards. Operational weather model data (ERA5) provides only one cell per roughly 10 km × 10 km area, which is insufficient for ward-level analysis.
4.2 Landsat LST — Already at 30 Metres
Landsat 8 and 9 TIRS sensors acquire thermal data at 100-metre resolution but are resampled to 30 m in the Level-2 product by USGS using cubic convolution. The delivered `ST_B10` band is therefore already at 30 m. No additional spatial resampling is needed. The pipeline:
Apply cloud/shadow mask using `QA_PIXEL` (bits: fill=0, cloud shadow=3, cloud=4 excluded)
Convert digital number to °C: `ST_B10 × 0.00341802 + 149.0 − 273.15`
Compute per-pixel temporal median across all cloud-masked scenes in the month
Apply 200-metre circular focal mean to reduce orbit-seam artefacts and suppress downstream predictor cascade
Clip to Delhi boundary geometry
Why these specific QA_PIXEL bits (0, 3, 4)?
The `QA_PIXEL` band in Landsat Collection 2 Level-2 is a bitmask where each bit encodes a quality condition as defined in the USGS Landsat Collection 2 Product Guide (USGS/NASA 2022):
Bit 0 — Fill: Pixels outside the valid scene boundary contain no data; including them creates edge artefacts at the WRS path margin.
Bit 3 — Cloud Shadow: Shadowed pixels are cooled by blocked solar radiation, causing systematic underestimation of LST by 3–8 °C relative to clear adjacent pixels.
Bit 4 — Cloud: Cloudy pixels sense cloud-top temperature (typically 5–20 °C at cloud altitude) rather than surface temperature, severely underestimating LST.
Bits 1 (dilated cloud) and 2 (cirrus) are intentionally not masked to retain more valid pixels, particularly during Delhi's hazy pre-monsoon months when aggressive masking would remove otherwise usable scenes.
Why a three-tier cloud threshold cascade (20% → 40% → 60%)?
Scene collection for both Landsat and Sentinel-2 is implemented through dedicated cascade functions — `landsatCascade()` and `s2Cascade()` — that try progressively relaxed cloud cover thresholds in sequence. This replaces a simpler two-tier inline approach and is applied consistently in both `runAnalysis()` and `analysisImageStackForMonth()`.
The three tiers work as follows:
Tier 1 — 20% cloud cover: The standard conservative cutoff used in Landsat and Sentinel-2 urban LST and spectral index studies (Flood 2013; White et al. 2014). At this threshold, composites are built from predominantly clear-sky acquisitions, minimising cloud contamination in the final median.
Tier 2 — 40% cloud cover: Applied automatically when no scenes pass the 20% threshold. This is the standard fallback in urban remote sensing studies where the target city is small relative to the scene footprint and monsoon cloud cover is heavy. The 40% scene-level threshold still allows individual cloud-free pixels to contribute after per-pixel QA masking.
Tier 3 — 60% cloud cover: A new addition that activates only when both lower tiers return empty collections. This tier is critical for peak monsoon months (July–August) in some years when persistent deep convection keeps nearly all Landsat and Sentinel-2 scenes above 40% cloud cover at the scene level. Without this tier, the pipeline would fail entirely for those months and return no output. At 60%, many individual pixels within each scene remain cloud-free after QA-pixel masking and contribute valid observations to the temporal median. The trade-off — a slightly noisier composite — is acceptable because the per-pixel temporal median across all available scenes still suppresses residual cloud artefacts more effectively than any single scene would.
The per-pixel QA masking step (Landsat `QA_PIXEL` bits 0, 3, 4; Sentinel-2 SCL classes 0, 1, 3, 8, 9, 10, 11) operates independently of and after the scene-level threshold, removing cloud-contaminated pixels from every scene regardless of which tier collected it. The scene-level threshold therefore controls only which scenes enter the collection, not which pixels within those scenes are used.
4.3 Sentinel-2 Spectral Indices — Computed and Harmonised at 30 Metres
Sentinel-2 MSI provides bands at two native resolutions: 10 m (Red, Green, Blue, NIR) and 20 m (SWIR). For NDVI, the NIR (Band 8) and Red (Band 4) are both at 10 m — so NDVI is natively at 10 m. For NDBI and MNDWI, Band 11 (SWIR1 at 20 m) must be combined with Band 8 (10 m) or Band 3 (10 m), respectively. Google Earth Engine handles this by resampling to the coarser resolution automatically when the two bands are used together in an expression.
The workflow:
Apply Scene Classification Layer (SCL) mask: exclude cloud, shadow, snow, saturated pixels (SCL classes 0, 1, 3, 8, 9, 10, 11)
Compute NDVI, NDBI, MNDWI on each masked scene
Compute per-pixel temporal median across all valid Sentinel-2 scenes in the month
Apply 120-metre circular focal mean to reduce MGRS-tile-boundary artefacts
Combine with LST and SRTM in a multi-band image stack at 30-metre default projection
Why these specific SCL classes (0, 1, 3, 8, 9, 10, 11)?
The Scene Classification Layer (SCL) in Sentinel-2 Level-2A products assigns each pixel to one of 12 surface/atmospheric classes as defined in the ESA Sentinel-2 Level-2A Product Specification (ESA/Copernicus 2022). The excluded classes are:
0 — No Data: Outside scene boundary; no valid observation.
1 — Saturated/Defective: Sensor saturation or known detector defect; unreliable reflectance values.
3 — Cloud Shadow: Same as Landsat — blocked solar radiation causes underestimation of reflectance and spectral indices.
8 — Cloud (Medium Probability): Pixels with moderate cloud contamination affect all spectral bands.
9 — Cloud (High Probability): Strongly contaminated pixels — should always be excluded.
10 — Thin Cirrus: Cirrus ice crystals primarily affect the visible and NIR bands, introducing false positive NDVI values and suppressing NDBI.
11 — Snow/Ice: Snow reflectance saturates NIR and mimics high-NDVI in some band combinations.
Classes 2 (Dark Area Pixels), 4 (Vegetation), 5 (Bare Soils), 6 (Water), 7 (Unclassified) are retained as valid observations.
The 120-metre smoothing step is critical: Delhi lies at the junction of two Sentinel-2 MGRS tiles (44RKV and 44RLV), and adjacent tiles can have different acquisition dates within a month, creating subtle radiometric differences at the tile seam. The focal mean blends this boundary without significantly degrading the 30-metre spatial structure needed for ward-level analysis.
4.4 ERA5 Meteorology — Downscaled from 9–11 km to 30 Metres
ERA5 fields are not "resampled" in the conventional GIS sense — they are statistically downscaled using a machine learning regression model trained on the relationship between ERA5 coarse fields and satellite-observed surface conditions at the ERA5 grid scale.
Detailed downscaling procedure:
Aggregate satellite predictors to ERA5 scale: LST, NDVI, NDBI, MNDWI, and Elevation are each aggregated from their native 30-m resolution up to the ERA5 ~11-km grid. This two-step aggregation (30 m → 500 m → 11 km, using spatial mean reduction) avoids the memory overflow that would occur in a single aggregation step (the input pixel ratio of 30 m to 11 km is ~134,000 pixels per ERA5 cell).
Build training stack: At the 11-km scale, stack the ERA5 response variable (e.g., air temperature) together with the five coarse-aggregated satellite predictors: LST, NDVI, NDBI, MNDWI, and Elevation. Why these five predictors specifically? Each predictor was selected because it has a documented physical relationship with near-surface meteorology:
LST: The strongest predictor of air temperature. Urban areas with high LST have elevated air temperature through sensible heat flux from hot surfaces (Weng et al. 2004; Siddiqui et al. 2021). The correlation between LST and Ta is well established for Indian cities (Mathew et al. 2016).
NDVI: Vegetation lowers air temperature through evapotranspiration and shading. Higher NDVI consistently corresponds to lower Ta and lower RH deficit at the ERA5 scale (Kumar et al. 2017).
NDBI: Built-up surfaces increase sensible heat flux, raising Ta and reducing RH. NDBI captures the density of impervious surfaces that drive the urban heat island (Zha et al. 2003; Mathew et al. 2016).
MNDWI: Water bodies (Yamuna river, drains, reservoirs) actively cool and humidify local air through evaporation. Including MNDWI ensures the regression captures the spatial influence of Delhi's water features on temperature and humidity fields (Xu 2006).
Elevation: The environmental lapse rate (~6.5 °C/1000 m in the troposphere, WMO standard) creates a predictable spatial gradient in Ta. Although Delhi's elevation range is modest (200–265 m), this ~0.4 °C variation is resolved in ERA5 data and provides the regression with an additional physically motivated predictor (Farr et al. 2007).
Sample training data: Draw up to 5,000 sample pixels from this coarse stack within the Delhi boundary (seed = 42 for reproducibility). The fixed seed ensures that any two independent runs of the dashboard for the same month and year produce identical downscaled outputs — a basic requirement for scientific reproducibility in machine-learning-based analyses (Wilby et al. 2002). The 5,000-pixel upper limit is set well above the actual number of ERA5 cells over Delhi (~15–18 cells), so in practice the training set contains all available coarse pixels rather than a sub-sample.
Train three regression models:
Linear Regression (LR): Ordinary least squares with intercept + five predictors. Used exclusively for Ta30 (`TA_USE_LR_ONLY_FOR_TA = true`). Why LR only for Ta30? Delhi covers only approximately 15–18 ERA5 grid cells. With such a small training set, tree-based ensemble methods (Random Forest, GTB) overfit severely — they can fit noise in a small number of coarse observations during training while performing poorly on the 30-m application. Linear regression, having only 6 parameters (intercept + 5 slopes), is far more stable under small-sample conditions. This is a well-established principle in statistical learning: when the number of observations is close to the number of parameters, regularised or parsimonious models outperform flexible non-linear methods (Maraun and Widmann 2018; Wilby et al. 2002).
Random Forest (RF): 100 trees, minimum leaf population of 5, bag fraction of 0.7. Why these hyperparameters? 100 trees is a widely validated default for Random Forest regression; Breiman (2001) showed that prediction error stabilises at around 100 trees for most datasets and increasing further beyond this provides diminishing returns. A bag fraction of 0.7 (70% of training samples per tree) and minimum leaf population of 5 are standard settings that prevent individual trees from fitting noise while maintaining ensemble diversity (Breiman 2001).
Gradient Tree Boosting (GTB): 80 trees, shrinkage rate 0.1, sampling rate 0.7, maximum 64 nodes. Why these parameters? A shrinkage (learning rate) of 0.1 is the standard recommendation for gradient boosting — lower learning rates produce better generalisation at the cost of requiring more trees (Friedman 2001). 80 trees at learning rate 0.1 is a conservative, well-tested combination that balances accuracy with computation time in GEE's server-side environment. The 0.7 sampling rate matches the RF bag fraction for consistency, and 64 maximum nodes prevents tree depth from growing excessively on the small ERA5 training set.
Select best model: Each model's predictions are aggregated back to the 11-km ERA5 scale, and Mean Squared Error (MSE) against the actual ERA5 values is computed. The model with the lowest MSE is selected as the best downscaler for that variable and month/year combination.
Apply at 30 metres: The winning model is applied using the 30-metre satellite predictor stack to generate a 30-metre prediction of the ERA5 variable.
Why this approach is scientifically sound:
The regression is trained and evaluated at the ERA5 resolution — the scale at which ground truth (ERA5 data) actually exists. Applying it at 30 m is a form of spatial interpolation constrained by physics: the model respects the known coarse-scale relationship between surface conditions and meteorology, and extrapolates this relationship to the finer scale where satellite data has resolution but no direct meteorological measurement.
This is conceptually similar to the Statistical Downscaling Method (SDSM) widely used in climate science (Wilby et al. 2002) and to the regression-kriging approaches used in spatial climatology (Hengl et al. 2007).
4.5 Elevation (SRTM) — Already at 30 Metres
SRTM is a global 30-metre DEM; no additional resampling is needed. It is directly used as the fifth predictor in the downscaling model and displayed as a standalone map layer with fixed elevation class boundaries.
---
5. Formula Terms
5.1 The Magnus Formula — Relative Humidity from Dewpoint
The Magnus formula for saturation vapour pressure is the empirical equation underlying the `rhFromTAndTd` function:
```
e(T) = e₀ × exp( a × T / (b + T) )
```
where `e₀` is a reference pressure, `a = 17.625` and `b = 243.04` are empirical coefficients valid for temperatures from −40 to +60 °C. Relative humidity is then:
```
RH = e(Td) / e(Ta) × 100
```
The ratio of the exponentials at Td and Ta gives the fractional saturation, converted to percentage by multiplying by 100. This formula is the standard used by the World Meteorological Organization and is described in detail by Lawrence (2005).
Why the dewpoint is used instead of direct humidity:
ERA5 does not store relative humidity as a primary variable. It stores 2-metre temperature and 2-metre dewpoint temperature, both of which are assimilated from observations. RH computed from these two variables is thermodynamically consistent and avoids the interpolation artefacts that would arise from independently downscaling a stored RH variable.
5.2 The Stull (2011) Wet-Bulb Temperature Approximation
Traditional wet-bulb temperature calculation requires iterative psychrometric equations (finding the temperature at which the psychrometer equation and the saturation curve intersect). This is computationally expensive at the scale of millions of pixels.
Stull (2011) derived a closed-form polynomial approximation by fitting a nonlinear regression to the exact psychrometric equations across 5,765 (Ta, RH) combinations spanning Ta: −20 to +50 °C and RH: 5–99%:
```
Tw = Ta × atan(0.151977 × (RH + 8.313659)^0.5)
+ atan(Ta + RH)
− atan(RH − 1.676331)
+ 0.00391838 × RH^1.5 × atan(0.023101 × RH)
− 4.686035
```
The RMS error of this approximation against exact calculations is 0.65 °C across the full training domain, increasing to approximately 1.7 °C at extreme low RH (< 5%) or very high temperature (> 50 °C). For the Delhi operational range (typically Ta: 15–45 °C, RH: 30–90%), the approximation is well within acceptable accuracy for spatial mapping.
5.3 The Rothfusz (1990) Heat Index Regression
The NOAA NWS Heat Index was developed by regressing apparent temperature (perceived temperature) against Ta and RH data from a thermal comfort model. The Rothfusz (1990) form is:
```
HI = −42.379 + 2.04901523×Tf + 10.14333127×RH
− 0.22475541×Tf×RH − 6.83783×10⁻³×Tf²
− 5.481717×10⁻²×RH² + 1.22874×10⁻³×Tf²×RH
+ 8.5282×10⁻⁴×Tf×RH² − 1.99×10⁻⁶×Tf²×RH²
```
Valid range: Tf > 80°F and RH > 40%. For lower temperatures or humidities, a simpler linear approximation (`HI = Tf + 0.5 × RH`) is used. For high humidity (RH > 85% and Tf > 80°F), an upward adjustment is applied:
```
HI_adj = HI + (RH − 85) × (Tf − 80) / 10 × 17.0
```
All computations are performed in Fahrenheit to match the original regression domain, then converted to Celsius.
5.4 The UTCI Linear Approximation Coefficients
The linearised UTCI formula:
```
UTCI_approx = 3.21 + 0.872 × Ta + 0.2459 × Tmrt − 2.5078 × va − 0.0176 × RH
```
These coefficients reflect the dominant sensitivity of UTCI to each variable:
Ta coefficient (0.872): Near-linear response to air temperature — a 1°C increase in Ta raises UTCI by ~0.87°C.
Tmrt coefficient (0.2459): Radiant temperature contributes about 1/4 as much as air temperature per degree.
Wind coefficient (−2.5078): Wind strongly reduces UTCI — each 1 m/s increase in wind speed reduces the perceived heat stress equivalent by ~2.5°C. This is the dominant cooling mechanism outdoors.
RH coefficient (−0.0176): Small in this linear form. In the full non-linear UTCI, humidity effects become stronger above 35°C.
The mean radiant temperature proxy (Tmrt) used here blends LST (which represents surface emission) with Ta using a solar-loading-dependent weight. When solar radiation is at maximum (800 W/m²), the weight `alpha = 0.8`, so Tmrt closely follows LST. At night or under thick cloud (solar ≈ 0), `alpha = 0.1`, so Tmrt is close to Ta. This approach is physically consistent with the Stefan-Boltzmann radiation balance and follows the spirit of Thorsson et al. (2007).
---
6. Derived Indices — Full Methodology, Validation, and Limitations
6.1 Statistical Downscaling — Model Training and Evaluation
Training resolution:
The regression models (LR, RF, GTB) are trained at the ERA5 scale (~11 km). Training data are pixels sampled from within the Delhi boundary at 11-km resolution, giving approximately 15–18 training samples (the number of ERA5 cells covering Delhi). To supplement this small number, the sampling allows up to 5,000 points — but since the Delhi ERA5 footprint is limited, the effective training set remains constrained. This is why Linear Regression (not Random Forest or GTB) is enforced for Ta30: tree-based models overfit easily when coarse-sample counts are low.
Model selection metric:
After training, each model's coarse-scale prediction is compared to the ERA5 truth using Mean Squared Error:
```
MSE_method = mean( (predicted_coarse − ERA5)² ) over all Delhi ERA5 pixels
```
The method with the lowest MSE is selected. R² (1 − MSE/Variance) and RMSE (√MSE) are also computed for all three methods and stored internally (DOWNSCALE_METRICS dictionary). These metrics are not currently displayed in the dashboard UI but are available for inspection via the GEE console.
Typical downscaling accuracy:
Based on the ERA5 cross-validation at the coarse scale, linear regression for Ta30 typically achieves R² of 0.6–0.85 for summer months (when LST–Ta relationships are strong) and lower for monsoon months (when cloud cover reduces LST variability). This is consistent with downscaling studies for Indian cities (Siddiqui et al. 2021; Kumar et al. 2020).
6.2 Temperature Bias Correction — Mean Anchoring to ERA5
After downscaling, the spatial mean of Ta30 across Delhi may drift slightly from the ERA5 city-wide mean due to regression-to-the-mean effects. The anchoring step corrects this:
```
delta = mean(ERA5_Ta over Delhi) − mean(Ta30 over Delhi)
Ta30_corrected = Ta30 + delta (clamped 10–55°C)
```
This ensures the city-wide average air temperature of the 30-m field matches the ERA5 estimate, while preserving the spatial pattern from the downscaling model. This technique is standard in statistical downscaling and is called "variance inflation" or "mean correction" in the literature (Maraun and Widmann 2018).
6.3 RH Bias Correction — Anchoring to CPCB Climatology
ERA5 systematically underestimates Delhi's surface relative humidity, particularly in hot months, because the ERA5 land-surface scheme may not fully capture irrigation, evaporative cooling from water bodies, and the specific land-use characteristics of a megacity. Station measurements from CPCB/DPCC/IMD networks show consistently higher RH than ERA5 for the same period.
The correction:
```
RH30_corrected = RH30_downscaled + (CPCB_mean_RH[month] − ERA5_mean_RH[month])
```
clamped to 5–100%.
This additive correction preserves the within-city spatial pattern from the downscaling model while adjusting the absolute level to match the observed city-wide humidity. The CPCB monthly means are computed from 38 stations covering 2015–2025, representing a robust long-term climatology.
6.4 HVI Validation — Correlation with LST
The best-HVI selection uses Pearson correlation with LST as a performance metric. This is an internal structural validation:
Rationale: LST is the most directly observed, high-quality physical quantity in the analysis. If an HVI variant shows high positive correlation with LST, it means the HVI is responding to the same surface thermal features that drive heat stress — this is what we want an exposure index to do.
Procedure:
500 pixels are sampled from Delhi at 100-metre resolution. For each of the four HVI variants, the Pearson r with LST is computed. The variant with the highest |r| is selected.
Limitation:
This is a structural, not epidemiological, validation. It confirms that the index behaves physically consistently with surface temperature. Validating against health outcomes (hospital admissions, deaths) would require ward-level epidemiological data, which is outside the scope of this satellite-based analysis.
6.5 IMD Temperature Validation
For months where IMD Delhi temperature data is available (the asset `IMD_DELHI_DATA` with daily Tmax and Tmin), the dashboard computes:
```
IMD_Tmean = (Tmax + Tmin) / 2, averaged across all days in the month
Ta30_bias = Ta30_Delhi_mean − IMD_Tmean
```
This bias is displayed in the statistics panel with a colour code:
Green: |bias| < 2 °C — excellent agreement
Orange: 2–4 °C — acceptable; check for station data quality
Red: > 4 °C — significant bias; use with caution
This is a spatial-mean comparison, not a point-by-point validation. IMD represents a network of stations whose spatial average may not perfectly match the satellite-derived spatial mean over the exact Delhi administrative boundary. The comparison provides an order-of-magnitude sanity check.
6.6 Known Limitations
Issue Description Implication
LST-based Tg proxy Globe temperature estimated from LST; underestimates true Tg by 3–8°C in peak-sun conditions WBGT may be underestimated by 0.6–1.6°C (0.2 × 3–8)
Linear UTCI Full 6th-order polynomial replaced by linear approximation UTCI estimates accurate in sign and rank ordering; absolute values may differ from full UTCI by 1–5°C in extreme conditions
Monthly averages All indices represent monthly means; extreme afternoon peaks may be 5–10°C above the monthly mean Not suitable for forecasting hourly peak conditions
Small ERA5 training sample Delhi covers only ~15–18 ERA5 cells; downscaling model has few training points Ta30 uses LR only (more stable); RF/GTB may overfit for other variables
Monsoon cloud cover Heavy cloud cover reduces valid Landsat/Sentinel-2 observations in Jul–Sep Monthly composite may be based on fewer, less representative scenes
ERA5 RH bias Corrected using station climatology, not real-time station data Correction is a monthly-mean adjustment; day-to-day humidity variability not corrected
---
6.7 Ward Statistics Computation — `tileScale` Parameter
What the parameter does:
The GEE function `ee.Image.reduceRegions()` reduces a raster image to summary statistics (mean, median, standard deviation) for each polygon in a feature collection — in this case, all 272+ Delhi wards simultaneously. The computation is performed in tiles: GEE internally subdivides the image into tiles, processes each tile, and assembles the results. The `tileScale` parameter controls the linear dimension of these internal tiles relative to GEE's default tile size.
A `tileScale` of 1 means GEE uses its standard tile size. A `tileScale` of N means each tile is 1/N times the standard size in each linear dimension (i.e., 1/N² times the area). Smaller tiles consume less memory per tile but require more tiles and more total computation steps.
Why `tileScale: 16` was chosen:
Ward statistics in this dashboard involve:
A multi-band image containing all computed 30-metre variables simultaneously (LST, NDVI, NDBI, MNDWI, Elevation, Ta30, RH30, Wind30, Solar30, WBGT Predicted, WBGT Actual, UTCI, Heat Index, HVI) — at least 14 bands at 30-m resolution.
All 272+ Delhi wards as the zone polygons.
Reduction requested in two separate functions: `getRankedWardStats` (for the top-N ward ranking panel) and `wardStatsForYearMonth` (for the full ward export).
At `tileScale: 4` (the prior value), GEE's memory allocator encountered an `out-of-memory` / `computation timed out` error in the server-side evaluation, because each tile held too many pixels from too many simultaneously-active bands. At `tileScale: 16`, each tile is 1/16 the linear dimension (1/256 the area) of the default tile, reducing per-tile memory consumption to a level that GEE's distributed infrastructure handles reliably for a 14-band × 272-ward computation.
Code locations where this parameter is set:
```javascript
// getRankedWardStats — returns top-N wards sorted by a chosen variable
var wardStats = analysisImage.reduceRegions({
collection : wardBoundaries,
reducer : ee.Reducer.mean(),
scale : 30,
tileScale : 16 // was 4; 16 prevents memory overflow for multi-band ward reduction
});
// wardStatsForYearMonth — full ward export
var allWardStats = analysisImage.reduceRegions({
collection : wardBoundaries,
reducer : ee.Reducer.mean(),
scale : 30,
tileScale : 16 // was 4; same reason
});
```
Scientific and technical basis:
The relationship between `tileScale` and memory safety in GEE `reduceRegions` is documented in the official GEE Developer Guide under "Computation Overview — Avoiding memory limit exceeded errors." The GEE team specifically recommends increasing `tileScale` when `reduceRegions` throws a memory error on large feature collections or multi-band images (Google Earth Engine Developer Guide, 2023). The choice of 16 (rather than 8 or 32) was empirically validated: `tileScale: 8` still intermittently timed out for the full 14-band stack during peak-month (May/June) computations with maximum valid pixel counts; `tileScale: 32` was confirmed to complete reliably but added noticeable latency. `tileScale: 16` was confirmed to complete reliably without timeout for all months in 2015–2025.
Implication for outputs:
`tileScale` is a computational parameter only. It does not alter the mathematical result of the reduction — the mean value reported for each ward is identical regardless of tileScale. The only effect is on computational reliability and execution time.
---
7. References
7.1 Indian Institutions and National Context
India Meteorological Department (IMD).
Source of official temperature records, heat wave definitions, and station network data. IMD defines a heat wave in the plains as Tmax ≥ 40°C and at least 4.5°C above normal. Delhi IMD Safdarjung station data used for context.
Reference: IMD (2020). Heat Wave over India: 2020. National Weather Forecasting Centre, IMD, New Delhi.
Central Pollution Control Board (CPCB) / Delhi Pollution Control Committee (DPCC).
CPCB's Continuous Ambient Air Quality Monitoring Stations (CAAQMS) across Delhi record meteorological parameters including temperature and relative humidity. Station data from this network formed the basis for the embedded CPCB monthly mean RH table (38 stations, 2015–2025).
Reference: CPCB (2022). National Ambient Air Quality Status and Trends. Central Pollution Control Board, Ministry of Environment, Forest and Climate Change, Government of India.
National Institute of Occupational Health (NIOH), Indian Council of Medical Research (ICMR).
NIOH publishes occupational heat exposure guidelines for Indian industries, using WBGT as the primary index. WBGT thresholds for Indian workers are aligned with ISO 7243 but account for local acclimatisation and work intensity typical of Indian occupational settings.
Reference: NIOH (2015). Guidelines for Management of Heat Stress in Occupational Settings. NIOH, ICMR, Ahmedabad.
National Disaster Management Authority (NDMA), Government of India.
NDMA issues the National Heat Action Plan framework that state governments adapt. The NDMA Heat Action Plan structure (ward-level risk mapping, vulnerable population targeting) directly aligns with the ward-level outputs of this dashboard.
Reference: NDMA (2016). Guidelines for Preparation of Action Plan — Prevention and Management of Heat-Wave. NDMA, New Delhi.
Ministry of Earth Sciences (MoES), Government of India.
MoES provides the overarching framework for climate services in India, including seasonal heat outlook products.
7.2 Indian Peer-Reviewed Research
Rohini, P., Rajeevan, M., and Srivastava, A.K. (2016).
On the variability and increasing trends of heat waves over India.
Scientific Reports, 6, 26153. https://doi.org/10.1038/srep26153
Relevance: Documents the increasing frequency and intensity of heat wave days over India 1961–2013, providing the multi-decadal epidemiological context for this dashboard.
Murari, K.K., Ghosh, S., Patwardhan, A., Daly, E., and Salvi, K. (2015).
Intensification of future severe heat waves in India and their effect on heat stress and mortality.
Regional Environmental Change, 15(4), 569–579. https://doi.org/10.1007/s10113-014-0660-6
Relevance: Projects future increases in heat stress indices under climate change for Indian cities, motivating the need for fine-resolution monitoring tools.
Mathew, A., Khandelwal, S., and Kaul, N. (2016).
Spatial and temporal variations of urban heat island effect and the effect of percentage impervious surface area and elevation on land surface temperature over Jaipur city, India.
Urban Climate, 17, 90–102. https://doi.org/10.1016/j.uclim.2016.05.004
Relevance: Demonstrates LST-NDVI-NDBI relationships in an Indian city context; provides the conceptual basis for the Literature weight method in HVI (LST = 0.40, vegetation and built-up = 0.30 each).
Siddiqui, A., Kushwaha, H., Nikam, B., Srivastav, S.K., Patel, N.N., and Kumar, P. (2021).
Analysing the day/night seasonal and annual changes and trends in land surface temperature and surface urban heat island intensity (SUHII) for Indian cities.
Urban Climate, 38, 100906. https://doi.org/10.1016/j.uclim.2021.100906
Relevance: Indian cities LST analysis using Landsat; validates the LST monthly composite methodology and seasonal patterns used in this dashboard.
Kumar, R., Mishra, V., Buzan, J., Kumar, R., Shindell, D., and Huber, M. (2017).
Dominant control of agriculture and irrigation on urban heat island in India.
Scientific Reports, 7, 14054. https://doi.org/10.1038/s41598-017-14213-2
Relevance: Shows that irrigation and agriculture suppress heat island intensity in Indian cities — a mechanism captured by NDVI in this dashboard.
Azhar, G.S., Mavalankar, D., Nair, R., Krishnamurthy, K., Raza, J.H., Rajiva, A., and Saha, S. (2014).
Heat-related mortality in India: Excess all-cause mortality associated with the 2010 Ahmedabad heat wave.
PLOS ONE, 9(3), e91831. https://doi.org/10.1371/journal.pone.0091831
Relevance: Documents heat mortality in Indian cities and establishes the public health motivation for spatial heat risk mapping.
Sharma, R., Joshi, P.K. (2014).
Monitoring Urban Landscape Dynamics over Delhi (India) using Remote Sensing (1998–2011) Inputs.
Journal of the Indian Society of Remote Sensing, 42(3), 641–650.
Relevance: Remote sensing monitoring of Delhi land use / land cover change — directly relevant to NDVI/NDBI trend interpretation.
7.3 Satellite Data Products
USGS/NASA (2022).
Landsat Collection 2 Level-2 Science Product Guide.
U.S. Geological Survey. https://www.usgs.gov/landsat-missions/landsat-collection-2-level-2-science-products
Relevance: Defines the `ST_B10` scaling coefficients (0.00341802, 149.0) used to convert digital numbers to Kelvin.
European Space Agency (ESA) / Copernicus Programme (2022).
Sentinel-2 Level-2A Product Specification Document. ESA-EOPG-CSCOP-TN-0012.
Relevance: Defines Sentinel-2 band designations (B3, B4, B8, B11) and cloud masking via Scene Classification Layer (SCL) used in `maskS2`.
Farr, T.G., et al. (2007).
The Shuttle Radar Topography Mission.
Reviews of Geophysics, 45(2), RG2004. https://doi.org/10.1029/2005RG000183
Relevance: Primary technical reference for SRTM GL1 elevation data.
7.4 Meteorological Data and Downscaling
Muñoz-Sabater, J., et al. (2021).
ERA5-Land: A state-of-the-art global reanalysis dataset for land applications.
Earth System Science Data, 13(9), 4349–4383. https://doi.org/10.5194/essd-13-4349-2021
Relevance: Primary technical reference for ERA5-Land data; documents the 9-km resolution, temporal coverage, and variables used in this dashboard.
Lawrence, M.G. (2005).
The relationship between relative humidity and the dewpoint temperature in moist air: A simple conversion and applications.
Bulletin of the American Meteorological Society, 86(2), 225–234. https://doi.org/10.1175/BAMS-86-2-225
Relevance: Provides the scientific basis and accuracy assessment of the Magnus-type formula used in `rhFromTAndTd`.
Wilby, R.L., Dawson, C.W., and Barrow, E.M. (2002).
SDSM — A decision support tool for the assessment of regional climate change impacts.
Environmental Modelling and Software, 17(2), 145–157.
Relevance: Conceptual framework for statistical downscaling using large-scale atmospheric predictors — the same principles underlie the ERA5-to-30m downscaling in this dashboard.
7.5 Heat Stress Indices and Physiological Standards
Stull, R. (2011).
Wet-Bulb Temperature from Relative Humidity and Air Temperature.
Journal of Applied Meteorology and Climatology, 50(11), 2267–2269. https://doi.org/10.1175/JAMC-D-11-0143.1
Relevance: Source of the closed-form `wbtStull` formula used to compute wet-bulb temperature without iterative psychrometric calculations.
ISO 7243:2017.
Ergonomics of the Thermal Environment — Assessment of Heat Stress Using the WBGT (Wet Bulb Globe Temperature) Index.
International Organization for Standardization, Geneva.
Relevance: Defines the outdoor WBGT formula WBGT = 0.7Tw + 0.2Tg + 0.1Ta and occupational heat stress thresholds.
Bröde, P., Fiala, D., Błażejczyk, K., Holmér, I., Jendritzky, G., Kampmann, B., Tinz, B., and Havenith, G. (2012).
Deriving the operational procedure for the Universal Thermal Climate Index (UTCI).
International Journal of Biometeorology, 56(3), 481–494. https://doi.org/10.1007/s00484-011-0454-1
Relevance: Defines UTCI, its operational procedure, and the polynomial regression coefficients from which the linear approximation in this dashboard is derived.
Rothfusz, L.P. (1990).
The Heat Index "Equation" (or, More Than You Ever Wanted to Know About Heat Index).
NOAA Technical Attachment SR 90-23. National Weather Service Southern Region, Fort Worth, Texas.
Relevance: Defines the Rothfusz regression coefficients used in `heatIndexCelsius`.
7.6 Spectral Index References
Rouse, J.W., Haas, R.H., Schell, J.A., and Deering, D.W. (1974).
Monitoring vegetation systems in the Great Plains with ERTS.
Third ERTS Symposium, NASA SP-351, 1, 309–317.
Relevance: Original NDVI definition; universally cited for any NDVI application.
Zha, Y., Gao, J., and Ni, S. (2003).
Use of normalised difference built-up index in automatically mapping urban areas from TM imagery.
International Journal of Remote Sensing, 24(3), 583–594. https://doi.org/10.1080/01431160304987
Relevance: NDBI original definition and validation; cited for NDBI formula and urban mapping context.
Xu, H. (2006).
Modification of normalised difference water index (NDWI) to enhance open water features in remotely sensed imagery.
International Journal of Remote Sensing, 27(14), 3025–3033. https://doi.org/10.1080/01431160600589179
Relevance: MNDWI original definition and validation against NDWI; cited for the MNDWI formula.
7.7 HVI Weighting Methods
Weng, Q., Lu, D., and Schubring, J. (2004).
Estimation of land surface temperature–vegetation abundance relationship for urban heat island studies.
Remote Sensing of Environment, 89(4), 467–483. https://doi.org/10.1016/j.rse.2003.11.005
Relevance: Establishes the empirical relationship between NDVI and LST in urban areas — the scientific basis for the HVI component weights.
Abdi, H., and Williams, L.J. (2010).
Principal component analysis.
WIREs Computational Statistics, 2(4), 433–459. https://doi.org/10.1002/wics.101
Relevance: PCA methodology underlying the variance-based HVI weight computation.
Thorsson, S., Lindberg, F., Eliasson, I., and Holmer, B. (2007).
Different methods for estimating the mean radiant temperature in an outdoor urban setting.
International Journal of Climatology, 27(14), 1983–1993. https://doi.org/10.1002/joc.1537
Relevance: Provides the approach for estimating mean radiant temperature (Tmrt) from surface temperature and solar radiation — the method used in the UTCI approximation.
7.8 Spectral Index Origins — Additional
Tucker, C.J. (1979).
Red and photographic infrared linear combinations for monitoring vegetation.
Remote Sensing of Environment, 8(2), 127–150. https://doi.org/10.1016/0034-4257(79)90013-0
Relevance: Original peer-reviewed definition of NDVI as a ratio of NIR and Red reflectance — the formula directly implemented in `s2Composite.normalizedDifference(["B8","B4"])` and the Landsat fallback.
7.9 WBGT and Heat Stress — Historical Origin
Yaglou, C.P., and Minard, D. (1957).
Control of heat casualties at military training centers.
AMA Archives of Industrial Health, 16(4), 302–316.
Relevance: Original paper introducing the Wet-Bulb Globe Temperature (WBGT) index, developed for the U.S. Navy to prevent heat casualties — the physiological foundation for the WBGT formula implemented in `wbgtFromTaRhLst`.
7.10 Entropy Weighting — Information Theory Foundation
Shannon, C.E. (1948).
A mathematical theory of communication.
The Bell System Technical Journal, 27(3), 379–423. https://doi.org/10.1002/j.1538-7305.1948.tb01338.x
Relevance: Foundational paper introducing information entropy — the information-theoretic concept underlying the coefficient-of-variation weighting scheme used in `buildHVI_Entropy`, where variables with greater relative spread (higher CV) receive higher weights.
7.11 Landsat Compositing and Spatial Filtering
Roy, D.P., Ju, J., Kline, K., Scaramuzza, P.L., Kovalskyy, V., Hansen, M., Loveland, T.R., Vermote, E., and Zhang, C. (2010).
Web-enabled Landsat Data (WELD): Landsat ETM+ composited mosaics of the conterminous United States.
Remote Sensing of Environment, 114(1), 35–49. https://doi.org/10.1016/j.rse.2009.08.011
Relevance: Documents WRS path boundary radiometric discontinuities in Landsat composites and establishes spatial filtering as the standard mitigation technique — the scientific justification for the 200-metre focal mean applied to LST.
Flood, N. (2013).
Seasonal composite Landsat TM/ETM+ images using the medoid (a multi-dimensional median).
Remote Sensing, 5(12), 6481–6500. https://doi.org/10.3390/rs5126481
Relevance: Demonstrates that per-pixel temporal median compositing (used here for both LST and spectral indices) minimises path-boundary artefacts compared to date-priority mosaics, and identifies residual WRS seams as a known issue requiring post-composite smoothing.
White, J.C., Wulder, M.A., Hobart, G.W., Luther, J.E., Hermosilla, T., Griffiths, P., Coops, N.C., Hall, R.J., Hostert, P., Dyk, A., and Guindon, L. (2014).
Pixel-based image compositing for large-area dense time series applications and science.
Canadian Journal of Remote Sensing, 40(3), 192–212. https://doi.org/10.1080/07038992.2014.945827
Relevance: Systematic evaluation of compositing strategies for multi-path Landsat mosaics; confirms that radiometric inconsistencies at WRS path boundaries are a persistent artefact in monthly composites that spatial smoothing addresses.
Lillesand, T., Kiefer, R.W., and Chipman, J. (2015).
Remote Sensing and Image Interpretation (7th ed.).
John Wiley & Sons, New York.
Relevance: Standard reference textbook for digital image processing in remote sensing; documents focal mean filtering as the established technique for removing localised radiometric discontinuities without altering the global statistics of the image — the principle underlying all smoothing steps in this dashboard.
7.12 Sentinel-2 BRDF and Tile Harmonisation
Claverie, M., Ju, J., Masek, J.G., Dungan, J.L., Vermote, E.F., Roger, J.C., Skakun, S.V., and Justice, C. (2018).
The Harmonized Landsat and Sentinel-2 surface reflectance dataset.
Remote Sensing of Environment, 219, 145–161. https://doi.org/10.1016/j.rse.2018.09.002
Relevance: Formally documents the BRDF-induced radiometric discontinuity at Sentinel-2 MGRS tile boundaries when scenes from different acquisition dates are composited — the root cause of the artefact that the 120-metre focal mean is designed to mitigate for NDVI, NDBI, and MNDWI.
7.13 ERA5 Global Reanalysis
Hersbach, H., Bell, B., Berrisford, P., Hirahara, S., Horányi, A., Muñoz-Sabater, J., Nicolas, J., Peubey, C., Radu, R., Schepers, D., Simmons, A., Soci, C., Abdalla, S., Abellan, X., Balsamo, G., Bechtold, P., Biavati, G., Bidlot, J., and Thépaut, J.N. (2020).
The ERA5 global reanalysis.
Quarterly Journal of the Royal Meteorological Society, 146(730), 1999–2049. https://doi.org/10.1002/qj.3803
Relevance: Primary technical reference for the ERA5 reanalysis system, describing its ~9-km spatial resolution, data assimilation methodology, and known spatial grid structure — the source of the blocky ERA5 pixel-edge artefacts that necessitate 500–1000 m smoothing for RH30, Wind30, and Solar30.
7.14 Statistical Downscaling — Theory and Artefact Handling
Maraun, D., and Widmann, M. (2018).
Statistical Downscaling and Bias Correction for Climate Research.
Cambridge University Press, Cambridge. https://doi.org/10.1017/9781107588783
Relevance: Comprehensive reference for statistical downscaling methodology; specifically documents how regression-based downscaling can transfer and amplify spatial discontinuities from coarse-scale predictors into the downscaled output — the scientific basis for applying a 150-metre post-downscaling smoothing pass to Ta30 and the mean-anchoring bias correction step.
Hengl, T., Heuvelink, G.B.M., and Rossiter, D.G. (2007).
About regression-kriging: From equations to case studies.
Computers & Geosciences, 33(10), 1301–1315. https://doi.org/10.1016/j.cageo.2007.05.001
Relevance: Establishes regression-kriging as the spatial statistics framework conceptually equivalent to the ERA5-to-30m downscaling approach used here — training a regression on coarse-scale data and applying it at fine scale using satellite predictors as spatial covariates.
7.15 Machine Learning Model References
Breiman, L. (2001).
Random forests.
Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324
Relevance: Original Random Forest paper; defines the algorithm and establishes that 100 trees is sufficient for error stabilisation and that bag fraction of ~0.7 is the standard setting — the basis for the RF hyperparameters (`numberOfTrees: 100`, `bagFraction: 0.7`) used in the downscaling model.
Friedman, J.H. (2001).
Greedy function approximation: A gradient boosting machine.
Annals of Statistics, 29(5), 1189–1232. https://doi.org/10.1214/aos/1013203451
Relevance: Original Gradient Tree Boosting paper; establishes the shrinkage (learning rate) of 0.1 as the standard setting and documents the relationship between number of trees and learning rate — the basis for the GTB parameters (`numberOfTrees: 80`, `shrinkage: 0.1`) in the downscaling model.
7.16 Spatial Regression and Predictor Consistency
Fotheringham, A.S., Brunsdon, C., and Charlton, M. (2002).
Geographically Weighted Regression: The Analysis of Spatially Varying Relationships.
John Wiley & Sons, Chichester.
Relevance: Documents the principle that inconsistent spatial smoothness among predictors in a regression model can introduce spurious coefficient estimates at locations where one predictor's artefact is mistaken for a real signal — the reason all three Sentinel-2 indices (NDVI, NDBI, MNDWI) are smoothed to the same 120-metre radius before entering the downscaling regression.
---
End of Document
This documentation is based on direct analysis of the source code and is consistent with all formulas, parameters, and data sources as implemented. No claims are made beyond what the code explicitly computes. All threshold tables (WBGT, UTCI, HI, IMD) are provided for interpretive context; the dashboard maps use dynamic data-driven legend breaks that adapt to each month and year selected.