Merging Overlapping Date Ranges

R

Robert Morley

Hi all,

Can I get somebody to verify this code for me...I *think* it works as is,
but I'm I'm tired and having a bad-programming day, so I'm not 100% sure.
What the code is *supposed* to do (and some of you may recognize this from
about a year ago when I was doing much the same thing) is to go through a
list of contacts for an account, each of which is active from StartDate to
EndDate, and merge any overlapping date ranges. If the date ranges don't
overlap, then they cannot be merged.

Basic concepts:
* EndDate is NULL if the contact is currently active (and should
therefore be treated as an infinitely high date).
* Any change in AccountID, TypeID, or ContactID constitutes a new
subset, and you cannot merge dates across subsets.
* Within each subset, the recordset is forcibly ordered by StartDate to
make the logic simpler.

Anyway, here's the code...if any of the above is unclear, please feel free
to ask me. And of course, if you find any bugs, please let me know! I
realize some of the code is less-than-optimal with all the EndDate handling,
but that was the least of my concerns at the time...once it works, I'll
worry about clarity and speed. :)

Oh, and the code as-posted is in "test mode" and won't actually change the
recordset, but if you're even thinking of having a look at this for me, I'm
sure you're smart enough to figure that out! :)

Thanks all!


Public Function MergeDates()
Const cMaxDate As Date = #6/6/2079# 'Max Date for SQL Server
smalldatetime
Dim rs As ADODB.Recordset
Dim strAccountID As String
Dim lngTypeID As Long
Dim lngContactID As Long
Dim dtMin As Date
Dim dtMax As Date
Dim lngDeletedTot As Long
Dim blnChanged As Boolean
Dim bm As Variant
Dim bm2 As Variant

lngDeletedTot = 0
Set rs = New ADODB.Recordset
With rs
.CursorLocation = adUseClient
.Open "SELECT * FROM acctTeam ORDER BY AccountID, TypeID, ContactID,
StartDate, COALESCE(EndDate, '2079-Jun-06') DESC"), DataProject.Connection,
adOpenStatic, adLockOptimistic, adCmdText
If Not .EOF Then
strAccountID = !AccountID.Value
lngTypeID = !TypeID.Value
lngContactID = !ContactID.Value
dtMin = !StartDate.Value
dtMax = Nz(!EndDate.Value, cMaxDate)
bm = .Bookmark
blnChanged = False
.MoveNext
End If

While Not .EOF
If (strAccountID = !AccountID.Value) And (lngTypeID =
!TypeID.Value) And (lngContactID = !ContactID.Value) And (!StartDate.Value -
1 <= dtMax) Then
If Nz(!EndDate.Value, cMaxDate) >= dtMax Then dtMax =
Nz(!EndDate.Value, cMaxDate)
blnChanged = True
Debug.Print "Would have deleted " & !UniqueID.Value,
!AccountID, !TypeID, !ContactID, !StartDate.Value, !EndDate.Value
' .Delete
lngDeletedTot = lngDeletedTot + 1
Else
If blnChanged Then
bm2 = .Bookmark 'Performs correctly in test scenario,
when delete has not actually occurred.
.Bookmark = bm
If (!StartDate.Value <> dtMin) Or (Nz(!EndDate.Value,
cMaxDate) <> dtMax) Then
Debug.Print "Would have updated " & !UniqueID.Value,
!AccountID.Value, !TypeID.Value, !ContactID.Value, !StartDate.Value,
!EndDate.Value; " to "; strAccountID, lngTypeID, lngContactID, dtMin,
IIf(dtMax = cMaxDate, Null, dtMax)
!StartDate.Value = dtMin
!EndDate.Value = IIf(dtMax = cMaxDate, Null, dtMax)
End If

.CancelUpdate
' .Update

If bm2 = Empty Then
.MoveLast
Else
.Bookmark = bm2
End If
blnChanged = False
End If
strAccountID = !AccountID.Value
lngTypeID = !TypeID.Value
lngContactID = !ContactID.Value
dtMin = !StartDate.Value
dtMax = Nz(!EndDate.Value, cMaxDate)
bm = .Bookmark
End If
.MoveNext
Wend
.Close
End With
Set rs = Nothing

MsgBox "Removed " & lngDeletedTot & " redundant records."
End Function
 
S

Sylvain Lafontaine

You should enclose this in a transaction because if records are deleted but
the update for the StartDate and EndDate cannot be made for one reason or
another, then you will lose information. Also, maybe using UpdateBatch
instead of Update will accelerate a little the performance.

There is a bug if the recordset finish by one or more records to delete, as
the required changes for the dates won't be stored: you must repeat the test
for blnChanged after the While loop.

Also, the tests for (!StartDate.Value <> dtMin) and (bm2 = Empty) seem
useless here and I don't understand why you take the precaution of sorting
EndDate in descending order but that's not a big deal.
 
R

Robert Morley

You should enclose this in a transaction because if records are deleted
but the update for the StartDate and EndDate cannot be made for one reason
or another, then you will lose information. Also, maybe using UpdateBatch
instead of Update will accelerate a little the performance.

Yeah, I'd considered that, but wanted to get the logic working before
worrying about transactions, which as I understand it, are a little
unpredictable in ADO (or is it only Access that's the problem?). Anyway,
I'll look at that on Monday, though it's probably not a HUGE concern.
There is a bug if the recordset finish by one or more records to delete,
as the required changes for the dates won't be stored: you must repeat the
test for blnChanged after the While loop.

Thanks, I'll look into that.
Also, the tests for (!StartDate.Value <> dtMin) and (bm2 = Empty) seem
useless here and I don't understand why you take the precaution of sorting
EndDate in descending order but that's not a big deal.

You're right; those are left over from previous versions of the code that
didn't work out. Told you I was having a bad day! Thank the gods (or in
this case, I guess "God" would be more appropriate) that it's a four-day
weekend and I can get some sleep! :)

Je te remerci, Sylvain!


Rob
 
R

Robert Morley

Apparently I was still tired when I responded, cuz I missed a few of your
points. :)
Also, maybe using UpdateBatch instead of Update will accelerate a little
the performance.

While performance is acceptable with the current single Updates, batch
updates were definitely on my list of things to switch to once the logic was
finalized.
I don't understand why you take the precaution of sorting EndDate in
descending order but that's not a big deal.

Not a biggie, but I figured it would improve performance and avoid
unnecessary updating in the following case (which happens surprisingly
frequently!):

Record 1: Start Date = 2005-Jan-01 End Date = <NULL>
Record 2: Start Date = 2005-Jan-01 End Date = 2005-Dec-31

This way, Record 1 will remain unchanged, since it has the broadest range of
the two, and Record 2 will be deleted. Had they been sorted the other way
around, there would have been both a delete and an update.



Rob
 
R

Robert Morley

Okay, here's the final "logic" version of the code. Does anybody (probably
meaning Sylvain <g>) see any further problems with it? I think you'll
agree, it's a little better looking that it was Thursday afternoon. :)

Also, I've got a question: if I change this to use UpdateBatch, is there
any need to deal with transactions, or is the entire UpdateBatch wrapped in
an internal transaction anyway? I looked in the documentation, and it
doesn't really specify whether it is or not.

Public Function MergeDates()
Const cMaxDate As Date = #6/6/2079# 'Max date for SQL Server
smalldatetime
Dim rs As ADODB.Recordset
Dim strAccountID As String
Dim lngTypeID As Long
Dim lngContactID As Long
Dim dtEnd As Date
Dim dtMax As Date
Dim lngDeletedTot As Long
Dim bm As Variant
Dim bm2 As Variant

lngDeletedTot = 0
Set rs = New ADODB.Recordset
With rs
.CursorLocation = adUseClient
.Open "SELECT * FROM acctTeam ORDER BY AccountID, TypeID, ContactID,
StartDate, COALESCE(EndDate, '2079-Jun-06') DESC", DataProject.Connection,
adOpenStatic, adLockOptimistic, adCmdText
If Not .EOF Then
strAccountID = !AccountID.Value
lngTypeID = !TypeID.Value
lngContactID = !ContactID.Value
dtEnd = Nz(!EndDate.Value, cMaxDate)
dtMax = dtEnd
bm = .Bookmark
.MoveNext
End If

While Not .EOF
If (strAccountID = !AccountID.Value) And (lngTypeID =
!TypeID.Value) And (lngContactID = !ContactID.Value) And (!StartDate.Value -
1 <= dtMax) Then
If Nz(!EndDate.Value, cMaxDate) >= dtMax Then dtMax =
Nz(!EndDate.Value, cMaxDate)
Debug.Print "Would have deleted " & !UniqueID.Value; " ";
!AccountID; " "; !TypeID; " "; !ContactID; " "; !StartDate.Value; " ";
!EndDate.Value
' .Delete
lngDeletedTot = lngDeletedTot + 1
Else
If dtEnd <> dtMax Then
bm2 = .Bookmark 'Performs correctly in test scenario,
when delete has not actually occurred.
.Bookmark = bm
Debug.Print "Would have updated " & !UniqueID.Value; "
"; !AccountID.Value; " "; !TypeID.Value; " "; !ContactID.Value; " ";
!StartDate.Value; " "; !EndDate.Value; "to "; IIf(dtMax = cMaxDate,
"<Null>", dtMax)
' !EndDate.Value = IIf(dtMax = cMaxDate, Null, dtMax)
' .Update
.Bookmark = bm2
End If
strAccountID = !AccountID.Value
lngTypeID = !TypeID.Value
lngContactID = !ContactID.Value
dtEnd = Nz(!EndDate.Value, cMaxDate)
dtMax = dtEnd
bm = .Bookmark
End If
.MoveNext
Wend
If dtEnd <> dtMax Then
.Bookmark = bm
Debug.Print "Would have updated " & !UniqueID.Value; " ";
!AccountID.Value; " "; !TypeID.Value; " "; !ContactID.Value; " ";
!StartDate.Value; " "; !EndDate.Value; "to "; IIf(dtMax = cMaxDate,
"<Null>", dtMax)
' !EndDate.Value = IIf(dtMax = cMaxDate, Null, dtMax)
' .Update
End If
.Close
End With
Set rs = Nothing

MsgBox "Removed " & lngDeletedTot & " redundant records."
End Function



Rob
 
S

Sylvain Lafontaine

Your code looks good and yes, even if you are using the update UpdateBatch
method, you have to deal with transactions. Transactions are associated
with the Connection object and have nothing to do with recordsets.
 
Top