Back to Blog
Machine Learning
May 1, 202610 min read

Why I Used Physics to Make XGBoost Better at Solar Forecasting

Most ML forecasting tutorials start with the data and end with a model. This project started with a physics equation — and it made all the difference.

Machine LearningEnergy ForecastingPythonXGBoostRenewablesPhysicsEnergiewende

Dr. Deepak K. Pandey

Experimental Physicist & Data Science Specialist. Building physics-informed ML systems for industrial AI applications in the DACH region.

Most ML forecasting tutorials start with the data and end with a model. This project started with a physics equation.

Germany's solar fleet generates up to 50 GW at peak. Forecasting it matters because every percentage point of error at midday costs real money on the balancing market. The standard approach is to throw weather features into a gradient boosting model and optimise RMSE. That works. But it leaves a lot of explainability on the table, and it forces the model to learn things it should already know.

Two-layer architecture: Physics (pvlib) + XGBoost residual learner → Calibrated UQ → FastAPI + Streamlit

The full pipeline: SMARD + Open-Meteo data → TimescaleDB → Physics layer (pvlib) + ML residual layer (XGBoost) → Calibrated P10/P50/P90 → API & Dashboard

The Residual Idea

pvlib is a Python library built by solar engineers to compute exactly how much energy a solar panel should produce given the sun's position, air mass, and atmospheric turbidity. It knows nothing about clouds. But it knows geometry, and geometry is reliable.

So instead of asking XGBoost to predict solar generation directly, I asked it to predict the residual:

residual = actual_solar - physics_prediction

The physics layer handles the easy part: geometry, seasonal patterns, diurnal curve. XGBoost handles only what physics cannot see — cloud cover, curtailments, aerosols, measurement noise. The result is a smaller, faster model that learns from a harder signal.

Results comparison: Physics-only R²=0.78 vs Physics+XGBoost R²=0.92, MAE reduced by 60%

R² went from 0.78 (physics alone) to 0.92 (physics + residual learner). MAE dropped by 60%, from 3,856 MW to 1,552 MW.

The most interesting result: when I inspected XGBoost's feature importances, physics_pred was the top feature by a large margin. The model is primarily amplifying and correcting the physics signal, not ignoring it. That is exactly what you want from a physics-informed design.

Calibrated Uncertainty

A point forecast is not enough for grid operators. They need to know the range. I trained three separate quantile models (q10, q50, q90) and then applied split conformal prediction to add a distribution-free coverage guarantee.

Why point forecasts aren't enough: probabilistic P10/P50/P90 intervals with conformal prediction guarantee

Grid operators need calibrated intervals, not point estimates. Split conformal prediction provides a distribution-free coverage guarantee.

The P90 interval hit 0.869 empirical coverage on the test set, close to the 0.90 target. P50 was miscalibrated (0.28 observed vs 0.50 claimed) due to a seasonal distribution shift between the calibration and test sets. This is a known limitation of split conformal when the calibration window does not represent the test distribution — documented honestly in ADR-003.

Forecasts are evaluated with CRPS (Continuous Ranked Probability Score), which rewards both calibration and sharpness simultaneously. A wide but well-calibrated interval scores worse than a tight and accurate one. Final CRPS: 514.6 MW.

The Stack

The whole system runs with docker compose up --build. Three services start in order: TimescaleDB (raw data storage with 26,000+ hourly records in hypertables), FastAPI (serves forecasts), Streamlit (interactive dashboard with P10/P50/P90 chart, physics decomposition, and reliability diagram).

The project is managed as a 22-day research sprint, including 5 Architecture Decision Records, 3 weekly reports, 2 retrospectives, a risk register, and a public Kanban board — because good science needs a paper trail.

Key Takeaways

  • Physics + ML beats pure ML. A principled baseline reduces the learning problem to what matters.
  • The model validates itself. If physics_pred is XGBoost's top feature, you know the architecture is working as intended.
  • Calibration ≠ accuracy. CRPS and reliability diagrams reveal what RMSE hides — and honest reporting of miscalibration is more valuable than hiding it.
  • One command deployment. Real systems should start reliably. Docker Compose enforces this discipline.

The repo is currently private while final documentation is completed. If you'd like early access or want to discuss the methodology, reach out on LinkedIn or via the contact page.

Ready to Explore More?

Discover more insights on bridging science and technology for career success.