: New:

I continue to check my little dashboard of San Francisco COVID-19 numbers each day to decide whether swapping air with strangers for inessential activities is probably-OK or invisibly taking a silly risk. When I refreshed that tab this morning, there was a pleasant jolt: test positivity %s were way down. It wasn't just that the most-recent number was lower than kinda-recent; three weeks' worth of numbers had fallen, as if someone took a scythe to the graph:
wow that purple line is high up there oh wait actually that purple line is at safe, low levels

When I compared an old backup data file to the latest data file, I saw that there were 100s more tests counted per day, and only 0-1 of those newly-counted tests were positive.

Wondering what happened, I poked around the SF.gov data pages and bumped into a notice:

On 6/21/2023, we will be including additional COVID-19 testing records as part of our ongoing data cleaning efforts. Along with this release, we will also provide an update on the race or ethnicity information for the testing data. This update primarily involves records where race or ethnicity has been updated from Unknown or Other to a known category. Please note that implementing these changes may result in some fluctuations in the historical data.

So… eyeballing that graph suggests that about ⅓ of the June records needed data-cleaning. My hat's off to the data scientist who got that job. (I wonder why negative results were so much more likely to need data-cleaning than positive results. Maybe if someone gets a positive test result, they're more likely to stay in the testing office to interpret their handwritten scrawls on some form?)

(This wasn't even the only data-fluctuation incident this month. The California open data portal's COVID-detected-in-wastewater data file turned into COVID-and-mpox-and-flu-and-others-detected-in-wastewater data file. Pulling the COVID data out is pretty simple—if you know what's going on. But there wasn't any change announcement or anything. So for a weekend I was scratching my head, wondering why my wastewater "COVID" data looked so weird, not realizing that my code was blithely averaging together COVID numbers and flu numbes and etc numbers to get… uhm, something that looked pretty random.)

Anyhow, I'm glad I wasn't just looking at test positivity %s when deciding whether to go places (or just sewer data). If you're just looking at one stream of data, then when it changes you're thinking "Did we cure all the COVIDs but nobody told me?" But when you notice that the other data streams haven't moved, then you know to double-check your data sources.

Tags: programming

lahosken@gmail.com

Tags