1) The nagging feeling of something not being right
The first major mistake occurred during one of the first projects that I did from scratch after university. I was working for a small software house that specialised (at the time) in small MS Access apps. When I say small, I’m talking 5-10 users or single user desktop apps. It’s the kind of thing that Access is quite good at.
I was writing a budgetting app for a large insurance company. They wanted to be able to send something out to all of the branch managers, all of the regional managers and all of the provincial managers, so that they could capture their budget info for the next year. The main restriction that I had was that the app must not need to connect to the head office. The managers should be able to copy the app onto laptops and work from home with no internet/WAN access of any form (this was 1999). In addition, the various branch/regional/provincial offices must only have the budget data for themselves and any subsidiary offices.
So I created a second mdb file to be a template data file. It had only the budget line item data (stationary, rent, furniture, etc), copied from the main data file, but none of the offices’ actual budget values. Then, I wrote some rather complex DAO/VBA code that would create one mdb file per office by making a copy of the tempate database (because DAO doesn’t allow a file that’s in use to be copied), insert that office’s data into it, and then place that data file in a specific location so that it could be copied down to the branches.
I tested and I tested and the procedure worked perfectly. Finally the insurance company’s IT people called and said they’d finished making the required changes to the budget line items and percentages for the year and could I go in and babysit the process, since it was the first time. No problem! I went in, ran the process, checked that all the files were there (they were) and then went home. (Anyone see the mistake yet?)
All the way home I had a strange feeling that something was wrong, but I couldn’t place it.
Two days later the calls started. Branch managers complaining that their databases were missing certain line items or had incorrect percentages. I checked over the code but I couldn’t find a thing wrong. There was no way that the code could have caused those problems. I wrote up patches that would correct the data and sent those out. Still couldn’t find the cause. Then, on the third day of calls, the manager from the Botswana office called. He dabbled in IT in his spare time and he knew a fair bit about Access. While we were chatting he mentioned that a none of the changes that head office had told him had been done were in the database he had.
At that point, the other shoe dropped. Head office had made changes to the line item details in their master data file. I hadn’t transfered those changes to the template database, so when I created all of the branch databases, they had the old version of the line items.
Later that day I got to explain to 178 branch managers across three countries why they were going to have to completely redo any budgets that they’d done. To say I wasn’t popular is a rather large understatement
2) Test all assumptions
The second mistake was a fair bit more recent (3 or so years ago) and does involve SQL Server.
I was rewriting the core stored procs for a rather large banking system. The number of users and data were growing and the stored proc that was called for all data access was slow, (Actually, slow’s a compliment) and was very convoluted. Tables were being queried multiple times for the same data in different places and if statements were nested so deep it was impossible to figure out what was actually going to run. The procedure was like a tower built on sand. Very shaky and likely to fall over if touched.
So, for performance reasons, and so that we could make planned enhancements, I rewrote the proc, from scratch. Because of the critical nature of this proc (core proc for a system that ran the investment and corporate divisions of the bank) I was exceedingly careful and I tested and checked all assumptions, except one.
During the course of the rewrite/optimisation I ran across a temp table with a column called FilterGuid, type varchar(50). From where the table was used, I could tell that it contained values of primary keys for various tables, for use in IN statements. The system standards required that every table have a uniqueidentifier for the primary key. The varchar type was causing performance problems due to the implicit conversions and resultant index scans, and so I changed the data type from varchar(50) to uniqueidentifier.
Two weeks of extensive testing found no problems, so the rewritten proc, along with a bunch of subprocedures it called, went into production on a thursday evening.
Next morning within 5 min of me getting to my desk, there were several people there. A small portion of the system that controlled the interbank payments was not working, some payments were already almost two hours late, and the error was “Error converting varchar to uniqueidentifier” (For anyone not familiar with banking, there are strict time limits on the interbank payments and harsh fines if those time limits are exceeded)
It took very little investigation to reveal that the payments portion of the system was breaking the application standards. While the devs for that section claimed that they used guids, they actually used strings that looked like guids, if you didn’t look too closely that it. An example would be “18e21k37-2966-4d75-8jf6-1g08b828a143″.
It took about 2 hours to find, fix, test and deploy a fix. I’m just glad the bank didn’t take the penalties out of my salary, or I wouldn’t have been paid a cent for the rest of the time I worked there.
Moral of that story – test assumptions, don’t trust standards, and don’t believe developers when they say that they follow the standards.
Passing the baton on