This is the story about one of those bugs that show their ugly faces at the most inappropriate of times, seem impossible to pin down, and turn out to be very easy to fix once you get to the root of the problem. Usually after several hours of banging your head on the table, cursing at the computer, and doubting your own sanity.
You all know these bugs, right? After spending the first one or two hours digging helplessly through code I usually say „I know it’s something stupid.“ Because I do. Whenever you run into one of these obscure bugs it’s usually one or two lines of code with a stupid, stupid mistake. These bugs are usually easy to fix. But they’re hard to find. And they will make you feel like an idiot. Always.
Anyway, as for the most inappropriate of times, how does „two hours before the sprint demo“ sound? Like an amazingly inappropriate time for a bug to be found? You betcha. So last Friday it was two hours before the sprint demo and we were just trying to check all our features for one last time, when saving a new password didn’t work anymore.
As it happened I had stumbled upon the same bug the night before. However, after a couple of rounds of re-fetching from source control, re-compiling and checking the application logs, the bug just went away. It was gone. Unreproducible. There was nothing else for me to do than blaming it on the arbitrary ways that Perforce works at times. I obviously had been working with old versions and updating the files and rebuilding the solution fixed it. Case closed.
Until the next morning when the bug was there again. Nobody could explain it. Our tester had been running UI and functional tests for days in a row. We all had been using the feature from time to time and nobody had ever come across this bug. So another developer and me went about to try to fix this ugly bug as fast as possible in time for the sprint demo. In retrospective we should have thrown in the towels and asked to postpone the demo right away. But we thought it might work.
From the logs it seemed like a trigger on the database prevented saving a new password on the base that the expiry date for a password was smaller than its create date. So we pinned it down to some DateTime issue. We tweaked the code to make sure that the create date of the new password would always be greater than the old one’s expiry date. This didn’t exactly fit the trigger’s complaint, but it was the closest we came to finding an angle to start with.
Then my colleague stumbled upon another strange behavior that made it look like the new password was already associated with the login although it hadn’t yet been persisted to the database. So we tweaked the code a little bit more to make sure that we wouldn’t accidentally expire the new password.
Since the bug occurred on two forms we made sure that the changes were implemented for both of them, compiled the code and… voilà… the bug, it was gone. Still not overly confident, but at least glad that the problem seemed to have been fixed in time for the demo, we started a new build, packed up and headed to the conference room.
You want to guess what happened during the sprint demo? Naturally, the bug, it reappeared! It was back! Naturally the feature wasn’t quite accepted and our Scrum team, well… let’s say, we weren’t exactly thrilled.
Since the rest of the demo went fairly well, it was decided that the sprint would be dragged out for a few more days to give us time to fix the bug and a couple of small things. While we were talking about how to best proceed I tried to reproduce the bug on my local installation. You want to guess again? I couldn’t reproduce it. Now the local installation had exactly the same files than the build we used during the demo. We had witnessed the bug just about half and hour ago. Now it was gone again.
We spent the afternoon hunting down what might cause a bug that re- and disappears, going down various paths until we reached a point when we consulted the database guy for some NHibernate anomalies my colleague found. Then the moment of revelation, facepalms, hysterical laughter and relief came.
Whenever we set a new password we set its create date to the current time. However depending whether we were dealing with a completely new login or just a password change for an existing password the code executed was slightly different.
Now lacking an implementation to delete logins we were mostly working with existing logins. Deleting a login required logging on to the database and deleting it manually using SQL. Not a big deal, but one that interrupted the flow and was therefore avoided as long as it wasn’t absolutely necessary.
While we were working on fixing the bug we had concentrated on the part where a password is saved for an existing login. The bigger picture revealed what the real issue was. While we were using
DateTime.Now when setting the password for a new login, we used
DateTime.Now.ToUniversalTime() when changing the password for an existing login.
With our server settings,
DateTime.Now was one hour later than
DateTime.Now.ToUniversalTime(). Since we started our demo by proudly presenting that you could create a new login that meant that for exactly one hour we were unable to change the password, due to the simple fact that the database wouldn’t allow saving a password when its expiry date was earlier than its create date.
This also explained why the bug had just gone away so many times before. After one hour of time difference between the server’s time and universal time, the expiry date would be later than the create date and the database was happy. This was a bug that occurred for exactly one hour before retreating and not showing up until the next time you created a new login and tried to change its password within the same hour.
Now, as I said, the answer, once found, was amazingly easy and clear. However the path that led to the answer was pretty much paved with confusion. It also taught us the lesson to be very, very, very careful when dealing with timestamps. Because these things, they are small and look harmless, but they can make you go temporarily insane just the same.