Macro Optimization - 25,000+ Rows

S

Spy128Bit

I'm currently trying to optimize this macro for use with more than
30,000 rows. I thought by jumping from one ID to another it would
help but it still takes a considerable amount of time to run. Any
thoughts or tips are greatly appreicated. I have listed a sample data
piece to show how the data is currently shown.

Sub Logic_Beta()
Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual
Application.EnableEvents = False
Lastrow = Cells(Rows.Count, "A").End(xlUp).Row
Dim i As Long
Start = ""
Finish = ""
Columns("K").ClearContents
For i = 2 To Lastrow
Start = Range("F" & i)
Finish = Range("G" & i)
ID = Range("B" & i)
Z = 0
Range("B1").Activate
For j = 2 To Lastrow
If Range("B" & j) = ID Then
If j > 2 Then
Cells.Find(What:=ID, After:=ActiveCell, LookIn:=xlValues, LookAt _
:=xlPart, SearchOrder:=xlByColumns, SearchDirection:=xlNext,
MatchCase:= _
False).Activate
End If
If Range("F" & j) < Start And Range("F" & j) < Finish And Range("G" &
j) < Finish And Range("G" & j) > Start Then
Z = Z + ((Range("G" & j) - Start) * 1440)
GoTo NextLine
End If
If Range("F" & j) > Start And Range("F" & j) < Finish And Range("G" &
j) > Finish And Range("G" & j) > Start Then
Z = Z + ((Finish - Range("F" & j)) * 1440)
GoTo NextLine
End If
If Range("F" & j) < Start And Range("F" & j) < Finish And Range("G" &
j) > Finish And Range("G" & j) > Start Then
Z = Z + ((Finish - Start) * 1440)
GoTo NextLine
End If
If Range("F" & j) > Start And Range("F" & j) < Finish And Range("G" &
j) < Finish And Range("G" & j) > Start Then
Z = Z + ((Range("G" & j) - Range("F" & j)) * 1440)
GoTo NextLine
End If
If Range("F" & j) = Start And Range("F" & j) < Finish And Range("G" &
j) = Finish And Range("G" & j) > Start Then
Z = Z + ((Finish - Start) * 1440)
GoTo NextLine
End If
NextLine:
End If
Next j
Range("K" & i) = Z
Next i
Application.EnableEvents = True
Application.Calculation = xlCalculationAutomatic
Application.ScreenUpdating = True
End Sub

(In case the data isn't aligned properly.)
Columns:
A - Number
B - Who
C - RCID - Blank
D - TID - Blank
E - WID - Blank
F - Start Time
G - End Time

Number Who RCID TID WID Start Time
EndTime
A1 John 04/03/07 10:19:52 AM 04/03/07 10:22:12 AM
B2 John 04/03/07 10:26:15 AM 04/03/07 10:29:47 AM
C3 John 04/03/07 10:38:25 AM 04/03/07 10:51:37 AM
A2 John 04/03/07 10:52:20 AM 04/03/07 11:26:57 AM
B3 John 04/03/07 11:29:26 AM 04/03/07 11:38:11 AM
C4 John 04/03/07 11:55:36 AM 04/03/07 12:00:21 PM
A3 John 04/03/07 12:03:00 PM 04/03/07 12:05:28 PM
B4 John 04/03/07 12:06:22 PM 04/03/07 12:16:13 PM
C5 John 04/03/07 12:23:16 PM 04/03/07 12:35:03 PM
A4 John 04/03/07 12:50:34 PM 04/03/07 01:17:20 PM
B5 John 04/03/07 01:00:40 PM 04/03/07 01:05:59 PM
C6 John 04/03/07 01:13:57 PM 04/03/07 01:30:13 PM
A5 John 04/03/07 01:37:47 PM 04/03/07 01:43:08 PM
B6 John 04/03/07 01:47:22 PM 04/03/07 01:51:39 PM
C7 John 04/03/07 01:54:28 PM 04/03/07 02:08:48 PM
 
S

Spy128Bit

Changed to the recommended here.
use
if something1 then
elseif something2 then
end if

Took this out.
yo have
Range("G" & j) > Start
in each test, it is redundant, make just on

Changed this as well.
you have
Range("F" & j) and Range("G" & j)
calculated in each test, make them fixed at the start of the loop, lik
rj=Range("F" & j)
gj=Range("G" & j)
what is the purpose of
If j > 2 Then
Cells.Find(What:=ID, After:=ActiveCell, LookIn:=xlValues, LookAt _
:=xlPart, SearchOrder:=xlByColumns, SearchDirection:=xlNext,
The intended purpose of this was to have it jump to the next row
instead of cycling down one by one. The sample data is all for one
person but the actual data may have 1,000 rows between the same ID. I
had it reset to B1 each time the row changed to ensure all the IDs
were checked starting from the beginning. Sorry if it's a bit of a
mess. Learning as I go and have learned a great from other samples
posted by everyone and trying to piece this one together myself.
 
S

sali

few things:

instead of
if something1 then
goto nextline
end if
if something2 then
goto nextline
end if

use
if something1 then
elseif something2 then
end if

yo have
Range("G" & j) > Start
in each test, it is redundant, make just on

you have
Range("F" & j) and Range("G" & j)
calculated in each test, make them fixed at the start of the loop, lik
rj=Range("F" & j)
gj=Range("G" & j)

what is the purpose of
If j > 2 Then
Cells.Find(What:=ID, After:=ActiveCell, LookIn:=xlValues, LookAt _
:=xlPart, SearchOrder:=xlByColumns, SearchDirection:=xlNext,
MatchCase:= _
False).Activate
 
S

sali

Changed to the recommended here. ,
The intended purpose of this was to have it jump to the next row
instead of cycling down one by one. The sample data is all for one
person but the actual data may have 1,000 rows between the same ID. I
had it reset to B1 each time the row changed to ensure all the IDs
were checked starting from the beginning. Sorry if it's a bit of a
mess. Learning as I go and have learned a great from other samples
posted by everyone and trying to piece this one together myself.

learning is ok, nobody born learnt.

but in this case, i see no puprose, just cpu wasting, because on next prog
lines you *explicitly* address ranges with range("B" & j) and similar, so no
"activate" make any sense, just slow down.

if you want to use built in find/activate features, you need to examine
*active* cell [retrieve its range] after doing find, and not to use flat
for/next

any speed improvement so far?

sometimes it is all due to the data structure/design matter. if badly
organized data, it is hard to have lightspeed program.

keep us informed.
 
S

Spy128Bit

if you want to use built in find/activate features, you need to examine
*active* cell [retrieve its range] after doing find, and not to use flat
for/next
Not sure how this would be done or what you mean here. I did try to
just take out this find step altogether and noticed it actually ran a
little faster without so unless this new way would be a step up over
that I'll just leave it out for now.
any speed improvement so far?
A small improvement has been seen. It still takes upwards of 20
minutes for jus a couple thousand records. I considered sorting the
data to shorten the loops but doing so would eliminate the option of
allowing multiple sheets ( one per day ) as the list would always be
starting back at "A".
sometimes it is all due to the data structure/design matter. if badly
organized data, it is hard to have lightspeed program.
Would sorting it make that big of a difference? One possible solution
I see for the multiple sheets would be to carry over the ID and search
once per sheet to find the one being worked and move forward from
there.
keep us informed.
On a different note I was considering the possibility of building
arrays and working with the data there. The idea would be to grab all
the rows of a single ID and loop the arrays against each other with
the statements you saw originally. Then once the output was put on
column "K" I would a check to see if there was a value there and skip
that row if so. Thoughts on this, worth a try or way overcomplicating
it?
 
S

sali

consider you are doing nested for/next, it means, n^2 passes.
having 10.000 rows, it is 100 milions passes. for 20 minutes, it gives
almost 100.000 pases/second, which realy isn't bad.

nested for/next loop is very unpleasant for processing, is there any chance
to serialize data [instead of n^2 to have a*n processing]?

what is your software version, cpu and mem amount?

memory augmentation may help dramaticaly

have you monitored task manager and cpu utilization. is it 100%, having
excel taking most of it?
 
S

Spy128Bit

consider you are doing nested for/next, it means, n^2 passes.
having 10.000rows, it is 100 milions passes. for 20 minutes, it gives
almost 100.000 pases/second, which realy isn't bad.
nested for/next loop is very unpleasant for processing, is there any chance
to serialize data [instead of n^2 to have a*n processing]?
Serialize the data? Can you explain this a little?
what is your software version, cpu and mem amount?
Excel 2003, Pention D, 3.0Ghz, 3 Gigs RAM
memory augmentation may help dramaticaly

have you monitored task manager and cpu utilization. is it 100%, having
excel taking most of it?
Not yet, will try that back at work since it's a slower machine that I
have here.
 
S

sali

consider you are doing nested for/next, it means, n^2 passes.
having 10.000rows, it is 100 milions passes. for 20 minutes, it gives
almost 100.000 pases/second, which realy isn't bad.
nested for/next loop is very unpleasant for processing, is there any
chance
to serialize data [instead of n^2 to have a*n processing]?
Serialize the data? Can you explain this a little?

to reorganize data, not to have to need n^2 passes, but just linear, a*n
[but maybe not possible at this point]
 
J

Jay

Hi Spy128Bit -

Could you briefly describe your objective with this code? What do you want
it to do? Typcially, code speaks for itself, but your code does some things
that appear redundant (but may be perfectly appropriate). I just need a bit
more information before I can decide if I can provide any help. While you
and sali work on optimizing your existing code, maybe I could work on
alternatives.

One additional question to answer specifically: It appears that your code
will produce different values in column K for each instance of the same
person. In other words, if 'Joseph' appears multiple times throughout your
database, a different Z will be calculated each time because your code gets
the variables "Start" and "Finish" from the 'current record' and then
calculates Z relative to "Start" and "Finish". Is that correct ?

Jay
 
S

Spy128Bit

Hi Spy128Bit -

Could you briefly describe your objective with this code? What do you want
it to do? Typcially, code speaks for itself, but your code does some things
that appear redundant (but may be perfectly appropriate). I just need a bit
more information before I can decide if I can provide any help. While you
and sali work on optimizing your existing code, maybe I could work on
alternatives.

One additional question to answer specifically: It appears that your code
will produce different values in column K for each instance of the same
person. In other words, if 'Joseph' appears multiple times throughout your
database, a different Z will be calculated each time because your code gets
the variables "Start" and "Finish" from the 'current record' and then
calculates Z relative to "Start" and "Finish". Is that correct ?

Jay

Jay,

The objective is to find the overlap in time for the same person.
Ex:
I worked 12:00 PM - 1:00PM on 4/1
I worked 12:30 PM - 1:00PM on 4/1

The value in "K" should would be 90 ( 60 for the time worked
12:00-1:00 and 30 for the overlap from the second line ).
The value in "K" on the second line would be 60 ( 30 for the time
worked 12:30-1:00PM and 30 for the overlap from the previous line ).

This is why the values in "K" are different for each line. What I
will be doing with the results is finding how productive someone is
based on their overlap.

If there's one thing I have seen on these boards it is alternative
solutions. Anything you can offer would be great. I'm glad to
receive the help I got so far and it's helped a lot. I hope that
helps explain the code a little better. If there's still questions on
something specific please let me know.

Thanks!
 
J

Jay

Hi Spy128Bit -

Thanks for the info. FYI: I'll be busy for the next 2-3 hours on some other
chores, but will take a look at your application then.
 
S

Spy128Bit

Hi Spy128Bit -

Thanks for the info. FYI: I'll be busy for the next 2-3 hours on some other
chores, but will take a look at your application then.
--
Jay









- Show quoted text -

I'm actually looking at the option to do a unique value filter on
columb "B" and building an array from it. The array could then be
used as a filter for the processing piece of the visible rows only.
I'm hoping that should get it down to fairly quick in processing.
 
J

Jay

Yup. Filtering came to mind. Look into the SpecialCells method (with the
xlCellTypeVisible argument) to grab the visible cells after applying the
filter.

There are programmatic techniques to accumulate unique values , too. I
think they may be a bit faster than filtering, but I don't know if they'll be
substantially faster. Won't know until we try. Gotta go for now. Very
sorry for the delay.
 
S

Spy128Bit

Yup. Filtering came to mind. Look into the SpecialCells method (with the
xlCellTypeVisible argument) to grab the visible cells after applying the
filter.

There are programmatic techniques to accumulate unique values , too. I
think they may be a bit faster than filtering, but I don't know if they'll be
substantially faster. Won't know until we try. Gotta go for now. Very
sorry for the delay.

---
Jay






- Show quoted text -

Ok, when the mind starts thinking I usually get myself in trouble in
the coding. How about a combination of http://www.rondebruin.nl/copy5.htm#all
to either create all the sheets ( wow, fast ) or using it to somehow
pull a selection to another sheet, process it, then return the results
on a final sheet removing them off the data sheets as I go? Or stick
to what we got?
 
S

sali

The objective is to find the overlap in time for the same person.
Ex:
I worked 12:00 PM - 1:00PM on 4/1
I worked 12:30 PM - 1:00PM on 4/1

maybe to further clarify code using function "overlap" which returns number
of minutes overlaped between two records. it is general overlap function,
calculating whatever toyou check, just keep i mind that segments must be
oriented, first start, then end point
calculate number of minutes using datediff() function, referencing some
"zero" date
--------------
Sub aaa()
m1 = DateDiff("n", reference_date, my_date1)
...
End Sub

Function overlap(p1s As Long, p1e As Long, p2s As Long, p2e As Long) As Long
If p1s < p2s Then
overlap = p1e - p2s
Else
overlap = p2e - p1s
End If
If overlap < 0 Then overlap = 0
End Function
-------------
 
S

Spy128Bit

Ok, when the mind starts thinking I usually get myself in trouble in
the coding. How about a combination ofhttp://www.rondebruin.nl/copy5.htm#all
to either create all the sheets ( wow, fast ) or using it to somehow
pull a selection to another sheet, process it, then return the results
on a final sheet removing them off the data sheets as I go? Or stick
to what we got?- Hide quoted text -

- Show quoted text -

Ok, will be posting the updated macro shortly. Doing a final couple
of tests with it but it now does the above sequence. It takes the
data sheet and separates it into individual sheets, processes the
sheets ( 4 minutes! ), combined it back into a master sheet, and
deleted the no longer necessary sheets.
 
S

Spy128Bit

Test Data: 34,431 rows
Running time: Start to finish.... 4 minutes flat!
If I can work out the unique autofilter into an array I'll see if it's
any faster on one sheet using the visible only than creating the extra
sheets. But, for now the speed of it works perfectly. I greatly
appreciate all the help and suggestions in getting this to where it is
today. Thanks!

Sub Logic_All()
Copy_With_AdvancedFilter_To_Worksheets
Logic_Beta
Master
Application.DisplayAlerts = False
For Each ws In ThisWorkbook.Worksheets
If ws.Name = "Data" Or ws.Name = "LILO" Or ws.Name = "Control" Or
ws.Name = "MergeSheet" Then GoTo SkipSh
ws.Delete
SkipSh:
Next
Application.DisplayAlerts = True
End Sub

Sub Logic_Beta()
Application.ScreenUpdating = True
Application.Calculation = xlCalculationManual
Application.EnableEvents = False
Dim i As Long
Dim sh As Worksheet
For Each sh In Sheets
sh.Activate
If sh.Name = "Data" Or sh.Name = "LILO" Or sh.Name = "Control" Then
GoTo SkipSh
LastRowSh = Cells.Find(What:="*", After:=[A1], SearchOrder:=xlByRows,
SearchDirection:=xlPrevious).Row
Columns("K").ClearContents
Range("K1").Value = "Difference"
For i = 2 To LastRowSh
Start = Range("F" & i)
Finish = Range("G" & i)
Z = 0
For j = 2 To LastRowSh
StartCk = Range("F" & j)
FinishCk = Range("G" & j)
If StartCk > Finish Or FinishCk < Start Then GoTo NextLine
If StartCk < Start And StartCk < Finish And FinishCk < Finish Then
Z = Z + ((FinishCk - Start) * 1440)
ElseIf StartCk > Start And StartCk < Finish And FinishCk > Finish Then
Z = Z + ((Finish - StartCk) * 1440)
ElseIf StartCk < Start And StartCk < Finish And FinishCk > Finish Then
Z = Z + ((Finish - Start) * 1440)
ElseIf StartCk > Start And StartCk < Finish And FinishCk < Finish Then
Z = Z + ((FinishCk - StartCk) * 1440)
ElseIf StartCk = Start And StartCk < Finish And FinishCk = Finish Then
Z = Z + ((Finish - Start) * 1440)
End If
NextLine:
Next j
Range("K" & i).Value = Z
Next i
SkipSh:
Next sh
Application.EnableEvents = True
Application.Calculation = xlCalculationAutomatic
Application.ScreenUpdating = True
End Sub

Sub Master()
Dim sh As Worksheet
Dim DestSh As Worksheet
Dim Last As Long
Dim shLast As Long
With Application
.ScreenUpdating = False
.EnableEvents = False
End With
'Delete the sheet "MergeSheet" if it exist
Application.DisplayAlerts = False
On Error Resume Next
ThisWorkbook.Worksheets("MergeSheet").Delete
On Error GoTo 0
Application.DisplayAlerts = True
'Add a worksheet with the name "MergeSheet"
Sheets(1).Activate
Set DestSh = ThisWorkbook.Worksheets.Add
DestSh.Name = "MergeSheet"
'loop through all worksheets and copy the data to the DestSh
For Each sh In ThisWorkbook.Worksheets
If sh.Name = "Data" Or sh.Name = "LILO" Or sh.Name = "Control"
Then GoTo SkipSh
If sh.Name <> DestSh.Name Then
Last = LastRow(DestSh)
shLast = LastRow(sh)
'This example copies everything, if you only want to copy
'values/formats look at the example below the first
example
sh.Range(sh.Rows(2), sh.Rows(shLast)).Copy
DestSh.Cells(Last + 1, "A")
End If
SkipSh:
Next
Application.Goto DestSh.Cells(1)
With Application
.ScreenUpdating = True
.EnableEvents = True
End With
End Sub

Function LastRow(sh As Worksheet)
On Error Resume Next
LastRow = sh.Cells.Find(What:="*", _
After:=sh.Range("A1"), _
Lookat:=xlPart, _
LookIn:=xlFormulas, _
SearchOrder:=xlByRows, _
SearchDirection:=xlPrevious, _
MatchCase:=False).Row
On Error GoTo 0
End Function

Sub Copy_With_AdvancedFilter_To_Worksheets()
Dim CalcMode As Long
Dim ws1 As Worksheet
Dim WSNew As Worksheet
Dim rng As Range
Dim cell As Range
Dim Lrow As Long
Set ws1 = Sheets("Data") '<<< Change
'Tip : You can also use a Dynamic range name,
http://www.contextures.com/xlNames01.html#Dynamic
'or a fixed range like Range("A1:H1200")
Set rng = ws1.Range("B1").CurrentRegion '<<< Change
With Application
CalcMode = .Calculation
.Calculation = xlCalculationManual
.ScreenUpdating = False
End With
With ws1
rng.Columns(2).AdvancedFilter _
Action:=xlFilterCopy, _
CopyToRange:=.Range("IV1"), Unique:=True
'This example filter on the first column in the range (change
this if needed)
'You see that the last two columns of the worksheet are used
to make a Unique list
'and add the CriteriaRange.(you can't use this macro if you
use the columns)
Lrow = .Cells(Rows.Count, "IV").End(xlUp).Row
.Range("IU1").Value = .Range("IV1").Value
For Each cell In .Range("IV2:IV" & Lrow)
.Range("IU2").Value = cell.Value
Set WSNew = Sheets.Add
On Error Resume Next
WSNew.Name = cell.Value
If Err.Number > 0 Then
MsgBox "Change the name of : " & WSNew.Name & "
manually"
Err.Clear
End If
On Error GoTo 0
rng.AdvancedFilter Action:=xlFilterCopy, _
CriteriaRange:=.Range("IU1:IU2"), _
CopyToRange:=WSNew.Range("A1"), _
Unique:=False
WSNew.Columns.AutoFit
Next
.Columns("IU:IV").Clear
End With
With Application
.ScreenUpdating = True
.Calculation = CalcMode
End With
End Sub
 
J

Jay

Hi Spy128Bit -

Good show. Your persistence paid off with the filtering approach.

Interestingly though, I ran your procedure(s) and couldn't get your reported
performance on my test data; no matter, but I'm curious why. It might be the
way I ginned up 32000+ records or some other reason. Below is my version.
It processes 32,400 records in 4 minutes and 15 seconds on my PC (2.66 Core 2
Duo, 2GB Ram, XL2003/WinXPPro-SP2). I expect our platforms would be fairly
similar in performance.

Could you run my procedure on your data and report back with execution time
? It would be of general interest, but also it's an opportunity to shake
down performance concepts.
--
Jay


Option Base 1

Sub Logic_Beta_V2()
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
'V2 uses arrays and calculates overlap within the procedure (does
'not call the overlap function).
'This version processes 32,400 records in 4 minutes and 15 seconds.
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''

startTime = Timer
Dim rngDB As Range
Dim lrow As Long
Dim uniqueWho As New Collection
Dim calc() As Variant

'Size up the data range
Set rngDB = Range("A1").CurrentRegion
lrow = rngDB.Rows.Count + rngDB.Row - 1
Set rngDB = rngDB.Offset(1, 0).Resize(rngDB.Rows.Count - 1, _
rngDB.Columns.Count)
Set rngWho = rngDB.Columns(2)

'Set the excel environment conditions
oldStatusBar = Application.DisplayStatusBar
Application.DisplayStatusBar = True
Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual
Application.EnableEvents = False

'1. Build a collection of unique names in 'Who' field
'using NewCollection technique; store in 'uniqueWho' collection
On Error Resume Next ' ignore any errors
For Each cl In rngWho.Cells 'rngDB.Columns(2).Cells
uniqueWho.Add cl.Value, cl.Value ' add the unique item
Next 'cl
On Error GoTo 0

'2. For each unique name in database...
For Each strWho In uniqueWho
iw = iw + 1 'counter for statusbar
idx = 0
ReDim calc(1 To rngDB.Rows.Count, 1 To 3) 'reinitialize used array
'Use With Block and DoLoop to search through entire 'Who' column
'to find all matching names. Load data from matching rows into into
'calc array.
With rngWho
Set h = .Find(strWho, LookIn:=xlValues, Lookat:=xlWhole)
If Not h Is Nothing Then
h_address1 = h.Address
Do
idx = idx + 1
calc(idx, 1) = h.Row 'worksheet row number
calc(idx, 2) = h.Offset(0, 4) 'start time
calc(idx, 3) = h.Offset(0, 5) 'finish time
Set h = .FindNext(h)
Loop While Not h Is Nothing And h.Address <> h_address1
End If
End With

'Cycle through calc array to calculate overlaps for current strWho
'and write results to activesheet one-by-one.
For i = 1 To idx
'i is the index for the 'base' start (calc(i,2)) and
'finish (calc(i,3)) times
Z = 0
'Then, calculate overlap for each matching record
'and accumulate in variable Z
For j = 1 To idx
'j is the index for the common start (calc(j,2)) and
'finish (calc(j,3)) times
If calc(i, 2) < calc(j, 2) Then
ovrlap = calc(i, 3) - calc(j, 2)
Else
ovrlap = calc(j, 3) - calc(i, 2)
End If
If ovrlap < 0 Then ovrlap = 0
ovrlap = ovrlap * 1440
Z = Z + ovrlap

Next j
Cells(calc(i, 1), 11) = Z 'store result in column K
Next i

Next 'strWho (next unique person)

endTime = Timer

Application.EnableEvents = True
Application.Calculation = xlCalculationAutomatic
Application.ScreenUpdating = True

'Formulate and display completion dialog message
wrapProcedure:
s = startTime: e = endTime
If Round((e - s) / 60, 1) > 0.9 Then
If Round((e - s) / 60, 1) < 2 Then mplural = "" Else mplural = "s"
If ((e - s) / 60 - Int((e - s) / 60)) * 60 <= 1 Then _
splural = "." Else splural = "s."
cTime = Int((e - s) / 60) & " minute" & mplural & " and " & _
Format(Round(((e - s) / 60 - Int((e - s) / 60)) * 60, 1), "##.#") & _
" second" & splural
Else
If e - s <= 1 Then splural = "." Else splural = "s."
cTime = Format(Round((e - s), 1), "##.#") & " second" & splural
If e - s < 0.1 Then cTime = "less than 0.1 second."
End If
MsgBox "Procedure completed successfully in " & cTime, vbInformation

End Sub
 
S

Spy128Bit

Hi Spy128Bit -

Good show. Your persistence paid off with the filtering approach.

Interestingly though, I ran your procedure(s) and couldn't get your reported
performance on my test data; no matter, but I'm curious why. It might be the
way I ginned up 32000+ records or some other reason. Below is my version.
It processes 32,400 records in 4 minutes and 15 seconds on my PC (2.66 Core 2
Duo, 2GB Ram, XL2003/WinXPPro-SP2). I expect our platforms would be fairly
similar in performance.

Could you run my procedure on your data and report back with execution time
? It would be of general interest, but also it's an opportunity to shake
down performance concepts.

Our systems are pretty close but the results... jaw dropping. Using
your macro it took 27.1 <b>seconds</b>. I ran it twice to make sure
and verifeid our results are the same. I don't think WOW can cover it
enough. I will be spending a good bit of time looking over what
you've provided to learn from it. Yeah, WOW is all I have to say
right now. Ok, I need to get some air now. I can't thank you enough
for posting your version to test. I will be instantly switching to
yours now. Thanks a ton! You've been an incredible help.
 
J

Jay

Hi Spy128Bit -

Great to hear about your results. Attacking a problem with different
approaches may be a bit redundant, but the payoff is that we get to learn
something about how specific techniques contribute to performance.

On the subject of performance, I'm now curious about the difference in
performance between our PC's. I'd be very interested in testing the same
data on both. I'm guessing that our data values are extremely different, but
at the same time, the procedure as constructed should be somewhat immune to
that. It would be valuable to understand this by running a side-by-side
comparison.

If you'd feel comfortable emailing a data worksheet along with the
processing time, I'd be grateful. It could be dummied-up if you have
sensitive data. Alternatively, I could email mine. It's up to you if you
want to do this and I understand if you don't; no explanation necessary. My
adrs is jc .sresearch a t hotmail_com (modify to look like a standard email
address).

One additional note. Credit to sali for the overlap function. It distilled
the overlap calculations down to a few short lines which were easily
incorporated into the procedure.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top