With the Western Cape drought now making international headlines on the BBC, the New York Times, and elsewhere, CSAG staff are receiving daily requests for interviews and information about the drought both locally and internationally. The world is now watching to see if Cape Town will be the first major city to actually run out of water. At the same time, we are also receiving more and more queries asking the question: “When will the winter rains start and how much rain will fall this winter?”. As we ease our way through the middle of summer, which interestingly included a slightly wetter than normal November by some measurements, people are starting to look to the autumn and wondering whether the annual winter rains will come early and save the city or, like last year, arrive late and push us into a day zero situation.
This is also the time of year when various seasonal forecast centers or groups are producing seasonal outlooks (forecasts) that include the upcoming winter months. Before now winter was too far into the future to even consider trying to predict what it might look like. And so we have forecasts from local seasonal prediction groups such as the South African Weather Service (SAWS), University of Pretoria (UP), as well as the usual range of international prediction groups such as NOAA/NCEP and the IRI (USA), ECMWF (EU), JAMSTEC (Japan). It is extremely tempting therefore to dive into all the available predictions and start interpreting, analyzing, comparing, and hoping. In fact, I could link to all the forecasts here to ease the process, but I’m not going to do that… yet.
Skill, skill, skill…
The key ingredient of any prediction is some measure of how skillful it is. Within climate science, and in particular within forecasting, the topic of skill and how you measure it could (and has probably) formed the topic of complete PhD theses. I have been in more discussions/arguments about the word “skill” than I care to remember. There are many many ways of evaluating and quantifying the skill of a forecast. For a nice overview of some of the methods have a look at the IRI document on the topic
But regardless of how skill is quantified, broadly speaking the various measures indicate whether we would even expect the prediction system (computer models based on physics or statistics, tosses of a coin, or combinations of all of these) to “get it right”. What “get it right” means is of course pretty key to the discussion (how wrong can you be while still being right?), and the range of skill measures reflect the range of meanings of “right”. So for example, if the prediction is for a dry series of months ahead, the skill measure might indicate how often in the past, when the prediction has been for dry conditions, has the prediction been correct (ie. it really turned out to be dry). Of course you could just toss a coin: heads means it’ll be wet and tails means it’ll be dry. On average, over enough seasons and with a normal coin, you’d end up being right 25% of the time (50% chance of it being wet (or dry) multiplied by 50% chance coin landing on tails (or heads). You’d actually do “better” by always predicting dry, that way you’d be right 50% of the time. Of course statistics tells us that you could easily be wrong several years in a row and then people might stop listening to you (though interestingly, they often don’t in practice, but that’s another discussion). To do even better requires more information, and that is what seasonal forecast systems are trying to do.
How to not be wrong, probabilistic forecasts…
Many modern forecast systems don’t forecast an exact amount of rainfall (eg. 450mm or 60% of normal) or even a single possibility (wet or dry), but rather some statistical probability of the amount of rainfall falling into one of a number of categories. Three categories have become the most commonly used; below normal, normal, and above normal, the so called tercile (3 categories) forecast. This is generally more useful than just wet or dry forecasts because many years aren’t particularly wet or dry, they are closer to normal and its useful to know if that is going to happen. The categories are often constructed so as to be equally likely on average over 30+ years (depending on the forecast system) so if you had no other information about an upcoming period you would could predict a 33% chance of below normal rainfall, a 33% chance of normal rainfall, and a 33% of above normal rainfall (though some systems uses 25% and 75% as thresholds of below normal and above normal. The important/interesting thing about tercile probability forecasts like these is that they cannot be wrong. Well, if one of the tercile categories was given a 0% probability then I guess it could be wrong if that category actually happens. But generally probabilities are never given as zero and in that case the forecast can’t be wrong because it always indicates some probability of every possible outcome. Even you only assigned a 10% probability to above normal (and 90% to normal or below normal) rainfall and it actually ends up raining more than normal then you can easily show that you did give that some chance (10% probability) of happening!
So if a probabilistic forecast can’t be wrong, then how do we decide if its skillful or not? Again, there are lots of ways but many skill measures (see above) look at which tercile is given the highest probability, or at least weight the highest probability tercile higher in the evaluation of how well a forecast does in a particular season. This is analogous to how people actually use such forecasts. We look first at which category is given the highest probability and in some senses, consider that the forecast. The magnitude of the probability is typically considered analogous to how “confident” the forecast is in that category happening. In reality, its an indication of the strength of an anomaly (deviation away form normal) produced by the forecast system and is not an indication of confidence. A forecast can produce a very strong signal of abnormal conditions but have zero skill and therefore our confidence in the forecast should be very low. We only know how confident to be in the forecast by looking at the skill measure(s).
So can we say anything about the coming winter?
Well, unfortunately, if you do actually dig and manage to find evaluations of skill of the various seasonal forecasts that cover the Western Cape region, you find that invariably there is essentially no skill. You need to read the details though because the scores are not always obvious.
Warning: technically messy section ahead
For example, a ROC score (Relative Operating Characteristics) of 0.5 is zero skill, less that 0.5 is actually negative skill (its wrong more often than its right) and only when you edge into the > 0.7 territory do you have a forecast that you might want to bet your own money on. Other scores, like the Heidke skill score might give you values that need to be compared with “climatological probabilities” (pure chance). So a score of 0.4 is greater than climatological odds of 0.333 (33%) and so indicates some skill. But a score of 0.2 is less than 0.333 and so there is very low skill. Sorry, this is all very technical… its a bit of minefield really, but its really really important if critical decisions are going to be made.
And so, in (almost) conclusion, I have yet to find a seasonal forecast system that can demonstrate a sufficient (a subjective threshold) level of skill that would give me enough reason to bet anything on the probabilistic (or other) predictions for the coming winter rainfall season in the Western Cape. In fact, that is the reason we stopped providing our own seasonal forecast here at CSAG. We couldn’t demonstrate sufficient skill in our own forecast system. Given the complexities of actually finding and evaluating the skill of the available forecasts, we decided that not providing a forecast that we don’t feel is usable for decision making, is more defensible than providing forecasts that have no skill and risking people making significant decisions based on them.
And so we have to assume that anything could happen this winter. We could have a repeat of the last 3 years, or we could be “lucky” and have a very wet winter. But at the moment we have no scientific (climate science that is) basis for saying which possibility is more likely. So, depending on the consequences of being wrong, decisions makers need to decide which possibility they are going to plan for. My view is that, with regards water supply, we need to be planning for another dry winter to the extent that that is possible… the consequences of not doing so are much larger than the consequences of preparing for a dry winter and it turning out to be normal or wet.
So why can we say anything about climate change then?
Many readers may be wondering why we (and others) seem okay talking about a possibly drier Western Cape on climate change time scales (by the 2040s) but are so hesitant to say anything with confidence about a few months from now. Its a good question. The full answer would be better covered in another article but briefly the answer is because many dynamical climate models (which are based on physical principles like momentum, conservation of energy, and Boyles law, not statistics… again, a topic for another article) seem to show a consistent response (reduced rainfall over the Cape region) to increased greenhouse gas (GHG) concentrations when simulating multiple decades of future daily weather conditions across the globe. There are also good climatological reasons, such as intensification of high pressure systems across the southern Atlantic ocean, that point towards the same conclusion. However, year to year variations in rainfall are not driven by GHG concentrations. For the north-east of South Africa, as well as many other places in the world, year to year variations are driven by ocean temperature patterns in the Pacific and other ocean basins. For the Cape region they are driven by? That’s the challenge, climate scientist have been working for years to figure out what drives our year to year variations in rainfall. There are weak links to a number of regional and global ocean temperature drivers, and to other regional atmospheric conditions, but all very weak and unclear. It may be that (like other places in the world) the weather systems that bring our rain are just very random and unpredictable. We continue to pursue a driver that would hold a clue to being able to predict the next season, but at the moment, for predicting winter rainfall for the next few months in this region we just don’t have anything substantive enough.
The actual forecasts – for the brave:
The South African Weather Service Seasonal Climate Watch presents probabilities for below normal or above normal, which ever has the highest probability. I like this presentation because it leaves areas blank/white where there is no confidence (low skill). I did have the methodology explained to me once but I can’t remember it off hand right now. Regardless, the Western Cape is is almost completely blank in the rainfall forecast maps for all the upcoming 3 month periods indicating no confidence/skill.
The South African Weather Service Long Range Forecast maps are pretty small and cover the whole world so are quite hard to read for a small region, but critically, the skill measures (verification) do not seem to be available so we can move on.
The IRI forecast (USA) can be found here and indicates a slightly increased probability (40% against a climatological average of 33%) of above normal rainfall for March through May but a slightly increased probability (also 40%) of below normal for April-June. Like the SAWS Climate Watch, white areas indicate no skill. The Western Cape is not white suggesting there is some skill. The verification (skill) maps are quite comprehensive and overwhelming but also are for global maps so its quite hard to see detail around the Western Cape. But the various skill measures do seem to indicate very low skill in the region. Its a subjective decision of course but personally I wouldn’t consider the forecast message as very robust.
The ECMWF seasonal forecast indicates normal conditions but I can’t seem to find skill measures except for near surface temperature correlation measures so again, not very useful.
Back in South Africa, The University of Pretoria based Seasonal Forecast Worx seems to be indicating very high probabilities (up to 70% probability against a climatological average of 25% because in this case below normal is the driest 25% of seasons) of below normal rainfall. However, even though it is quite hard to read the maps, the ROC scores for below normal are low. Above 0.5 certainly, but still quite low. To be honest I’m not quite sure what to do with this forecast. It is based on the North American Multi-Model Ensemble NMME forecasts which doesn’t show similar predictions so presumably the statistical post-processing done at the University of Pretoria is introducing this strong signal. Even so, with low skill levels I wouldn’t consider this forecast very informative for winter rainfall for the Western Cape region.
Please note, that this discussion is all in the context of predicting winter rainfall in the Western Cape (city of Cape Town water supply catchment in particular). The story of skill and usefulness of forecasts is completely different for summer rainfall in other parts of the country, especially in the North East where seasonal forecasts have demonstrated skill and usefulness in informing decision making.
Hi Stefaan. Thanks for the question. While the forecast is independent of the outcome, the forecast “hit rate” is dependent on the combination of the forecast and the outcome. Hence the probabilities are multiplicative. And yes, the number of ensemble members in each category is exactly how the probabilities are generated. My point is that there is a difference between confidence as determined by the number of models that head in certain direction, and “confidence” in the accuracy of the forecast (skill). All the models in an ensemble can say the same thing and yet all be consistently wrong. Communicating the meanings of these different words is probably the key challenge of seasonal forecast communication!
Willem Stefaan Conradie
Thanks for the very interesting and thorough description.
I’m confused about the coin tossing forecast probabilities, though. Surely, you’d be right 50% of the time in the long term either way? The forecast is independent of the outcome, so there should be no reason to multiply probabilities? Also, just intuitively, you should be right equally often if you toss a coin or guess dry all the time? No?
Also, you state that “[t]he magnitude of the probability is typically considered analogous to how “confident” the forecast is in that category happening. In reality, its an indication of the strength of an anomaly (deviation away form normal) produced by the forecast system and is not an indication of confidence.” Do some large ensemble forecasts not use the number of ensemble members in each category to derive their forecast probabilities?