Over the years, I’ve had a number of occasions to need to store information about when somebody worked on a certain job. Sometimes people will want to store the date the work was done and the number of hours the employee worked, where as other people will need to store the start time and the stop time (in other words, when the employee "punched in," and when the employee "punched out"). Often, the same customer will need to use both methods of recording time, depending upon the nature of the work recorded.
A first try
So here is a naïve attempt at creating metadata for this sort of data. I’m going to demonstrate this as a database table, but the issues involved are nearly identical when you create similar structures in code.
CREATE TABLE TimeRecords ( Id INTEGER NOT NULL PRIMARY KEY, PersonId INTEGER [...], WorkDate DATE, StartTime TIME, StopTime TIME, TotalHours NUMERIC(18, 4));
This might look OK at first glance, but it turns out to be fairly badly broken. Before I discuss why, I should clarify the data types involved.
What isn’t broken
One thing you might notice when looking at this metadata is that I’ve made most of the columns nullable. In my experience, it is very important to allow users to save work in progress, and to only do validations (such as requiring that the Person is filled in) when you actually try to do something important that the time record, such as submit it for payroll. For example, an employee could begin a time record when they "punch in" at the beginning of a shift, and finish the time record when they "punch out" at the end of a shift.
DATE and TIME in this DDL refer to the SQL Standard data types which contain, respectively, a date and a time only. The SQL Standard data type for a date and time in a single column is TIMESTAMP. Not all databases (or programming languages/environments) support such types. In many cases, you will have to do with only TIMESTAMP, or its equivalent, such as TDateTime in Delphi.
Note that TotalHours is NUMERIC, not TIME. It is important not to confuse a time span with a time of day; they are different units. In .NET, there is a TimeSpan type which is appropriate for this.
Problems with the naïve approach
Presuming that the user has filled in all of the required information for a time record, what would we expect to see in a typical row?
If the user is recording only the date and the number of hours worked, it seems pretty obvious: both the WorkDate and the TotalHours fields will be non-NULL, and the StartTime and StopTime columns will be NULL.
What if the user needs to record the start and stop times? Well, the WorkDate, StartTime and StopTime fields almost be non-NULL. But what about TotalHours? We could fill that in, but wouldn’t it be redundant with the information in StartTime and StopTime? Or, worse, could we even be sure that all three fields are kept "in sync?" So we might be tempted to leave that NULL, but then, what if we want to do a SUM in SQL? By filling in TotalHours, we can compute an employee’s total work time, even if some of the records are filled in with start times and stop times, but others were filled in with only the total hours worked.
|0||100||2009-01-29||1:00 PM||1:30 PM||0.5|
To compute the total hours on a record that has start and stop times, we might be tempted to subtract the start time from the stop time. But that doesn’t work. An employee could work a shift from 8 PM to 2 AM, yielding a negative result for this calculation.
We could make a rule that we add a day to the stop time before subtracting if it is less than the start time, but this presumes that the employee’s shift was less than 24 hours. Greater than 24 hour shifts may be uncommon, but they do happen; I once worked a 31 hour shift, fresh out of college.
What this means is that recording only the start and stop times without the start and stop dates does not give us sufficient information to compute the total hours worked in all cases, and makes it more difficult in many cases.
Some useful redundancy
Here’s another way to store this data. This is what I generally do. But I’ll forgive you if you’re skeptical when you see it. I’ll discuss some alternatives later.
CREATE TABLE TimeRecords ( Id INTEGER NOT NULL PRIMARY KEY, PersonId INTEGER [...], WorkDate DATE, StartTime TIMESTAMP, StopTime
, TotalHours NUMERIC(18, 4));
First, it is no longer a problem to compute TotalHours from the StopTime and StartTime, if they’re filled in. StopTime minus StartTime will always equal the correct total hours worked, when they are non-NULL. Also, it is no longer a problem to record a greater than 24 hour shift.
But isn’t there some redundancy here? WorkDate is always going to equal the date portion of StartTime, when StartTime is non-NULL. Somebody once suggested to me that it would be possible to just use the date portion of the StartTime TIMESTAMP for a "TotalHours-only"time record (and drop the WorkDate column altogether). The problem with this idea is that a TIMESTAMP always contains time information, whether or not you bother to fill in. If you assign only a date, then the time portion is presumed to be midnight. So there is no way to distinguish between a record where the user intended to fill in only total hours and one where the user intentionally filled in midnight as the StartTime but forgot to fill in the StopTime. That will be a problem when you validate the record prior to submitting it for payroll.
I found that the approach above works well for me, but those who find the duplication of one date completely untenable might be tempted to do something along the lines of this metadata:
CREATE TABLE TimeRecords ( Id INTEGER NOT NULL PRIMARY KEY, PersonId INTEGER [...], WorkDate DATE, StartTime TIME, StopTime
, TotalHours NUMERIC(18, 4));
There’s no redundancy in this version. You can store both time records with start times and stop times or time records with just elapsed times, and it’s easy to distinguish the two. Unfortunately, certain queries become more difficult. With the version which uses TIMESTAMP for the StartTime column, I can write a query like this to determine who was in the building at a particular point in time:
SELECT PersonId FROM TimeRecords WHERE :SomeParameter BETWEEN StartTime and StopTime
Records without start and stop times will be ignored by this query, since they do not contain enough information to answer the question. With a TIME instead of a TIMESTAMP data type for the StartTime column, I would have to use an expression in the where clause to write this query:
SELECT PersonId FROM TimeRecords WHERE :SomeParameter BETWEEN (
StartTime) and StopTime
That might well mean that my query can no longer be indexed.
Although my preferred schema has a small bit of redundancy, I find it significantly more usable.It is considerably less work to handle copying the date between the WorkDate and StartTime fields, and computing the TotalHours from the StartTime and StopTime, than it is to not have the date in either column.And not having dates in the StartTime and StopTime columns at all is just unusable.