Inside Sunset Verify: Grading Our Sunset Forecasts

You opened your weather app last night and it promised a beautiful sunset how crepuscular rays form . This morning, it said nothing at all about whether that sunset actually arrived.

That silence is the industry standard. A forecast is a confident claim the forecast models behind that claim , and almost none of those claims are ever checked measuring forecast accuracy against the sky that showed up.

Sunset Verify is our refusal to keep that bargain. We grade our own sunset-color forecasts in public, the morning after, with a letter score any reader can pull up and argue with.

What Is Sunset Verify?

Sunset Verify is the part of the journal where Vesper holds itself accountable for a single falsifiable prediction: how vivid tonight's sunset will be. The afternoon before, we publish a color score; the next morning, we publish how close we got.

Sunset Verify grades every sunset-color forecast Vesper publishes against the sky that actually arrived. The afternoon prediction earns a public score the next morning — hit or miss, posted where any reader can see it.

This is not a confidence interval buried in a settings menu. The grade sits next to the forecast it judges, so the prediction and its report card travel together.

If you want the physics behind why one evening turns molten and the next stays gray, we covered that in why sunsets differ from one night to the next. Sunset Verify is the accountability layer that sits on top of that science.

Why No Other Weather App Grades Itself

The dominant pattern in weather software is prediction without reconciliation. The app tells you tomorrow's high, tomorrow's chance of rain, tomorrow's golden hour — and then tomorrow becomes today and the slate is wiped clean.

We think this is wrong, and we think the reason it persists is structural rather than technical. Nothing forces a weather app to check its own work, so almost none of them do.

Most weather apps never check their own forecasts because nobody makes them. A prediction with no scorecard can never be wrong out loud, which is comfortable for the app and useless for the reader.

Naming names: Dark Sky, Carrot, and Apple Weather all show a forecast and none of them show a track record. That is not a swipe at their engineering, which is often excellent — it is an observation about what they choose to make visible.

An unchecked forecast is a comfortable thing to ship. It can never be wrong out loud, it never produces an awkward number, and it never invites the reader to keep score.

Comfortable for the app is precisely the problem. A prediction you are never allowed to audit is a prediction you are being asked to take on faith.

Dimension	Typical weather app	Sunset Verify
Forecast shown	Yes	Yes
Forecast checked after the fact	No	Every night
Errors published	Never	Prominently
Scorecard visible to the reader	None	A through F

How We Score A Sunset Forecast

The forecast itself is a single number we call the color score, from 0 to 100, published in the afternoon. It estimates how saturated and dramatic the sky will be in the window around sunset.

That number is not a guess about the weather in general. It is a guess about a narrow set of conditions that have to line up near the horizon for color to happen at all.

We forecast a 0–100 color score the day before, then compare it to observed cloud height, humidity, and aerosols after dark. The gap becomes a letter grade, A through F, published the next morning.

After the sun is down, verification begins. We compare the prediction against what actually occurred, drawing on a few independent sources:

Cloud structure and altitude. Mid- and high-level clouds catch under-lighting and turn vivid, while a low marine layer or a perfectly clear dome both tend to kill the show — and each fails differently.
Humidity and aerosols. Moisture and particulate load shift the color temperature of the light and decide whether the reds deepen or wash out into haze.
Observed sky and reader photos. Satellite imagery tells us what the cloud deck did; reader-submitted frames tell us what it looked like from the ground, which is the only thing that ultimately matters.

The gap between the predicted color score and the observed result becomes a letter grade. An A means we called a vivid sky and got one, or called a flat sky and were right to; an F means we sent you to the window for nothing, or worse, kept you inside during the best light of the week.

That last failure mode matters more to us than the first. Missing a great sunset is the error we work hardest to drive toward zero, because it is the one that actually costs the reader something — a point we make in our guide to reading golden hour light.

What We Got Wrong — And Published Anyway

The feature only means anything if the bad grades stay up. So they stay up.

There are evenings when we forecast a 20 and the western sky detonates into crimson, and the next morning Sunset Verify says, plainly, that we blew it. We do not quietly recompute the score or soften the language once the photos come in.

This is the editorial discipline that makes the rest of the journal credible. A publication that will tell you when it was wrong about the sky is a publication you can believe when it says the morning is worth getting up for.

Our rule for Sunset Verify is simple: the grade goes up before we know whether it flatters us, and it stays up after we find out it doesn't.

The Design Tradeoffs We Actually Made

Building Sunset Verify meant choosing, repeatedly, between what flattered us and what served the reader. None of these calls were obvious, and a couple of them still get argued internally.

Here are the tradeoffs that shaped the feature:

Honesty versus polish. We could have graded ourselves on a generous curve to keep the average high, but a scorecard you tune in your own favor is marketing with a number stapled to it.
Defining "right" for a subjective thing. Sunset beauty is partly in the eye of the viewer, so we committed to a measurable proxy — color saturation and the illumination of mid-level cloud — and accept that it will occasionally disagree with how a night felt.
Where the grade lives. Putting the report card next to the forecast risks undermining the forecast, and we did it anyway, because burying the grade two taps away would have been a quieter way of not really shipping it.
The temptation to hedge. The single fastest way to raise our accuracy would be to predict a dull sky every night, and we ruled that out on day one.

The hardest tradeoff was honesty versus polish. We could have graded ourselves gently to protect the average, but a scorecard you tune in your own favor is just marketing with a number on it.

That last temptation deserves its own line, because it is the one that quietly destroys features like this.

We refuse to hedge forecasts to protect the score. Predicting a dull sky every night would lift our accuracy and make the feature worthless, so we forecast the sky we actually expect.

A forecast optimized to protect a score stops being a forecast and becomes a hedge. We would rather post an honest F on a night we misread than a safe C on a night we refused to commit.

What Public Accountability Does To A Forecast

The surprising part is what grading ourselves did to the forecasting itself. When every prediction is going to be scored in the morning, the afternoon work gets sharper.

You stop reaching for the comfortable hedge. You commit to a number, because a vague forecast and a confident one earn the same F when they are wrong, and only the confident one earns an A when it is right.

Publishing misses builds trust rather than spending it. A forecast you can audit is one you can believe; a forecast that hides its record is asking for faith it has not earned.

Readers feel this even if they never open the methodology. A forecast that keeps its own scorecard reads as a forecast made by someone who expects to be checked — which is exactly the posture we want every brief written in, as we explained in how Vesper writes a daily brief.

Trust is not built by being right every time, which no honest forecaster can promise. It is built by being checkable, and then surviving the check often enough to matter.

Why This Fits The Rest Of Vesper

Sunset Verify is not a gimmick bolted onto a weather app. It is the same conviction that runs through the whole journal, expressed as a number.

We built Vesper on the belief that a forecast should be worth reading, not just worth glancing at — the case we lay out in full in what makes weather worth reading. A take you can audit is the most honest form of a take worth reading.

If you want the longer version of why we started a weather publication willing to argue with itself in public, it lives in why we built Vesper. Sunset Verify is that philosophy with the stakes turned up, because here the reader gets to keep score.

Frequently Asked Questions

A few of the questions readers ask most often about how the feature works.

How does Sunset Verify score a sunset forecast?

We predict a 0–100 color score the afternoon before, then compare it to observed cloud height, humidity, and aerosols after sunset. The difference becomes a public A-to-F grade the next morning.

Why don't other weather apps grade their forecasts?

Because nothing forces them to. An unchecked forecast can never be publicly wrong, so most apps show a confident prediction and quietly move on to tomorrow.

Does showing your forecast errors hurt credibility?

The opposite. A forecast you can audit earns more trust than one that hides its record, and readers consistently tell us the misses are why they believe the hits.

Can Vesper game its own Sunset Verify score?

It could, by forecasting dull skies every night to keep accuracy high. We refuse to hedge that way, because a scorecard you tune in your own favor is worthless to the reader.

Why is sunset color so hard to forecast?

Color depends on mid-level cloud height, humidity, and aerosols lining up in a narrow window near the horizon. Small errors in any of them flip a forecast from vivid to gray.

Read Tonight's Forecast — And Tomorrow's Grade

Tonight's color score is already posted, and tomorrow morning it will carry a grade whether we earned it or not. That is the deal we made with you.

Open the Vesper journal this evening, read the sunset call, and come back in the morning to see how we did. If we were wrong, you will be the first to know, because we will be the ones to say so.

Inside Sunset Verify: Why We Grade Our Own Sunset Forecasts in Public