One of the things we looked at last time was IO.vala, a strange little pile of text processing. One thing I was curious about was what
get_month was actually being used for… As it turns out, we’re using it for timestamp parsing in our RSS and Atom parsers:
string input = dat.value; string date_strs = input.split (" "); if (date_strs.length < 5) break; string time_strs = date_strs.split (":"); if (time_strs.length < 3) break; new_item.time_published = new DateTime.utc ( int.parse (date_strs), get_month (date_strs), int.parse (date_strs), int.parse (time_strs), int.parse (time_strs), int.parse (time_strs));
This code sample led me on a bit of a mental journey:
- Ok, this really ought to be a shared parsing function instead of two identical code snippets.
GLib.DateTimecomes with a parsing function, why is this manual?
- Also, doesn’t
get_monthoperate on the english representation of the month? Does this work? Are people injecting english words into their timestamps?
I had to know the truth, so I did a little search through my personal Singularity installation’s database, and yes they do! But only in RSS. As it turns out, RSS and Atom use different formats for their timestamps, RFC 822 and RFC 3339 respectively. The former represents the month in English, but the latter uses a much more sane (for computers) format that
GLib.DateTime should be able to parse on its own. In other words, Singularity has never correctly parsed a timestamp in an Atom feed.
The next obvious step was to write a simple test. I took the Atom feed for this website and put a stripped-down copy in a string for my parser to read. The results:
assertion failed: (result.generator == "Jekyll") assertion failed: (result.link == "https://www.huguesross.net/feed.xml") assertion failed: (result.site_link == "https://www.huguesross.net/") assertion failed: (item.time_published == new DateTime.utc (2021, 12, 30, 0, 0, 0)) assertion failed: (item.time_updated == new DateTime.utc (2021, 12, 30, 0, 0, 0))
I don’t think it’s particularly controversial to say that a publicly-facing parser should have automated tests. And this is why! Timestamp parsing alone has been broken for over half a decade, and despite using Singularity daily I never noticed. I then set up a similar test for RSS. It’s slightly better off, but there are valid date formats that it can’t handle and some fields are missing altogether.
Among these missing fields, one interesting tidbit is the ‘last update’ field that both RSS and Atom feeds provide. I have a
last_update property on my
Feed object already but it stores the last time we checked for updates, not the last time the feed updated. Both of these are useful, so ideally I’d add another column to my database for it… however, this means updating the database schema. Given the current state of Singularity’s database queries, I don’t want to touch that yet. For now I’ve fixed some of the other immediate issues like parsing existing fields that weren’t being read. The rest will have to wait, but at least we now have dates displaying on items from Atom feeds. I also killed
get_month while I was at it, writing a more compliant RFC 822 date parser.
When I do return though, I think I’ll be putting in some additional tests for our parsers to jump through. I’m sure there are other issues lurking in that code, and I’d like to suss them out before they become serious.
Some Final Thoughts About Testing
I’ve had a long love-hate relationship with automated testing, historically leaning more on the ‘hate’ side of the fence.
Automated tests are excellent for specific domains, the example above was a great one because we had a description of How Things Should Work that we could verify, and the cost of not testing was also high. With arbitrary feeds pulled from the internet, we can mis-parse useful data or even fall prey to malicious inputs. Tests can also be a useful way to ensure that refactored systems mirror the old systems’ functionality. So what is there to dislike?
But so often, when testing is taught or pushed as a solution the examples are useless–if you look at this set of tests from Singularity’s codebase, you can see what I mean. Most of these tests are really just verifying that yes, if I pass a value in the value gets used. There are much easier ways to verify that sort of thing, and you can imagine how long it took to write all of these tests for 3 lines of code. Then you consider how many bugs come from developers making the wrong assumptions, and realize that those same developers are the ones writing the tests… well, the problem is apparent.
Examples like this keep coming up without addressing the obvious problems and it sucks. I see so people argue for 100% test coverage of codebases, or pushing practices like Test-Driven Development without considering the cost or benefits that any of these tests will bring. There’s a balance to be struck, and I think my aversion to tests comes from seeing so people way on the other side of that balance. It’s something I need to improve on myself though, not doing something at all because of bad arguments rather than actually evaluating my needs is childish, and it’s not the sort of behavior I want to foster.