Multiple Column Criteria which deletes all but Highest Value

S

ShagNasty

Sorry if in wrong Discussion group...

Spreadsheet with columns A thru K -- columns A, E, G, & J have employee data
(name, pay code, pay period, and hours respectively). Frequent pay
adjustments are made to many employee’s time during a given pay period. I
need to delete all records (rows), but the highest hour total, for the
employee, pay code, and pay period. MS Office SP3, Win XP, Approx 20k rows.

EmpName ENum JCode JDesc PayCode PDesc PayPeriod Hours
Emp A 0000A ABC ABC 055 OT 01/25/09 5
Emp A 0000A ABC ABC 055 OT 01/25/09 7.5
Emp A 0000A ABC ABC 065 ST 01/25/09 8
Emp A 0000A ABC ABC 065 ST 01/25/09 4
Emp B 0000B NBC NBC 055 OT 02/25/09 3
Emp B 0000B NBCN NBC 055 OT 02/25/09 5
Emp B 0000B NBC NBC 055 OT 02/25/09 16

I need to retain rows 3 (7.5), 4 (8), and 8 (16)

Thanks.. ShagNasty
 
J

JLatham

Ok, your description of columns used didn't match the example data, so I
wrote the code so that you can adapt it to your actual workbook/worksheet
layout. Just change the Const values (right side of = sign) to match your
reality.

I also suggest testing this on a copy of your workbook first, rather than
taking the chance that I've made a boo-boo that will destroy what you have.

To test the code:
Make a copy of your workbook. Close the original.
Press [Alt]+[F11] to open the VB Editor (VBE).
In the VBE choose Insert --> Module from its menu toolbar.
Copy the code below and paste it into the empty module presented to you.
Make changes to the Const values as required to properly define them.
Close the VBE.
Use Tools --> Macro --> Macros and select the RemoveExtraEntries macro in
the list and click the [Run] button. How long it takes depends on how many
entries you have. It actually does the work in 2 stages. First it finds the
hour values to be removed and sets them to zero. After testing all entries,
it goes back and deletes all rows with a 0 entry for the hours on a row.
Works correctly with the test data you provided.

Sub RemoveExtraEntries()
'by using these Const values
'you can change the code to
'work with a different layout
'on your sheet if you ever need to
Const mySheetName = "Sheet1" ' payroll data sheet name
Const ENumColumn = "B"
Const JCodeColumn = "C"
Const PCodeColumn = "E"
Const HrsColumn = "H"
'end of user redefinable Const values
Dim pSheet As Worksheet
Dim lastRow As Long
Dim offset2JCode As Integer
Dim offset2PCode As Integer
Dim offset2Hrs As Integer
Dim rOffset As Long
Dim maxHours As Single
Dim OLC As Long ' outer loop counter
Dim ILC As Long ' inner loop counter
Dim baseCell As Range

'initialize some values
Set pSheet = ThisWorkbook.Worksheets(mySheetName)
offset2JCode = Range(JCodeColumn & 1).Column - _
Range(ENumColumn & 1).Column
offset2PCode = Range(PCodeColumn & 1).Column - _
Range(ENumColumn & 1).Column
offset2Hrs = Range(HrsColumn & 1).Column - _
Range(ENumColumn & 1).Column
lastRow = pSheet.Range(ENumColumn & Rows.Count). _
End(xlUp).Row
Application.ScreenUpdating = False ' improve performance
For OLC = lastRow To 2 Step -1
Set baseCell = pSheet.Range(ENumColumn & OLC)
rOffset = -1
maxHours = baseCell.Offset(0, offset2Hrs)
For ILC = OLC - 1 To 2 Step -1
If baseCell.Offset(rOffset, 0) = baseCell And _
baseCell.Offset(rOffset, offset2JCode) = _
baseCell.Offset(0, offset2JCode) And _
baseCell.Offset(rOffset, offset2PCode) = _
baseCell.Offset(0, offset2PCode) Then
If baseCell.Offset(rOffset, offset2Hrs) > _
maxHours Then
maxHours = baseCell.Offset(rOffset, offset2Hrs)
baseCell.Offset(0, offset2Hrs) = 0 ' mark for delete
Else
baseCell.Offset(rOffset, offset2Hrs) = 0 ' mark for delete
End If
End If
rOffset = rOffset - 1
Next
Next
'now delete rows with 0 in the Hours column
For OLC = lastRow To 2 Step -1
If pSheet.Range(HrsColumn & OLC) = 0 Then
pSheet.Range(HrsColumn & OLC).EntireRow.Delete
End If
Next
Set baseCell = Nothing
Set pSheet = Nothing
End Sub
 
J

Joel

I had to sort on 4 columns (A, E, G, H). so I did 2 sorts. I sorted H in
descending order then put this formula in row 3. Row 3 will alway contain
the highest number of hours for the 1st employee since I sorted H in
descending order.

place in column IV the last column
"=IF(AND(A2=A3,E2=E3,G2=G3,H2>H3),""X"","""")"

The formula puts an X in the rows where the hours for the same employee
(paycode, pay period) is the same but the hours are less. Then I sort on
Column IV to bring the X's to the top of the worksheet. I then delete the
rows with the X's.

It is a slow process in excel to delete rows 1 at a time. It is much
quicker especially if you have a large number of rows to sort and do 1
delete. I have had files with 5,000 rows and the delete took a couple of
minues. This code will always work in a couple of seconds.



Sub DeleteRows()

'get Last Row of file
Lastrow = Range("A" & Rows.Count).End(xlUp).Row

'sort on 4 criteria so you ned to do 2 sorts
'first sort on Hours in decreasing time
'column H = Hours
Rows("1:" & Lastrow).Sort _
header:=xlYes, _
key1:=Range("H1"), _
order1:=xlDescending

'now sort on other columns
'column A employee
'column E Pay code
'column G pay period
Rows("1:" & Lastrow).Sort _
header:=xlYes, _
key1:=Range("A1"), _
order1:=xlAscending, _
key2:=Range("E1"), _
order2:=xlAscending, _
key3:=Range("G1"), _
order3:=xlAscending

'place x in column IV using a formula to indicate
'which lines to delete
'row 2 will never get deleted since it will alway be the highest # hours
Range("IV3").Formula = _
"=IF(AND(A2=A3,E2=E3,G2=G3,H2>H3),""X"","""")"
'copy formula down the column
Range("IV3").Copy _
Destination:=Range("IV3:IV" & Lastrow)

'replace formula with values
Range("IV3:IV" & Lastrow).Copy
Range("IV3:IV" & Lastrow).PasteSpecial _
Paste:=xlPasteValues

Rows("1:" & Lastrow).Sort _
header:=xlYes, _
key1:=Range("IV1"), _
order1:=xlDescending

'find last X
Set c = Columns("IV").Find(what:="X", _
LookIn:=xlValues, lookat:=xlWhole, searchdirection:=xlPrevious)

'delete rows
If Not c Is Nothing Then
Lastrow = c.Row
Rows("2:" & Lastrow).Delete
End If

'delete column IV
Columns("IV").Delete

End Sub
 
J

Joel

JLathamL deleting rows one at a time is extremely slow on large files.
iIrecommend filtering and deleting files with 1 delete command.

JLatham said:
Ok, your description of columns used didn't match the example data, so I
wrote the code so that you can adapt it to your actual workbook/worksheet
layout. Just change the Const values (right side of = sign) to match your
reality.

I also suggest testing this on a copy of your workbook first, rather than
taking the chance that I've made a boo-boo that will destroy what you have.

To test the code:
Make a copy of your workbook. Close the original.
Press [Alt]+[F11] to open the VB Editor (VBE).
In the VBE choose Insert --> Module from its menu toolbar.
Copy the code below and paste it into the empty module presented to you.
Make changes to the Const values as required to properly define them.
Close the VBE.
Use Tools --> Macro --> Macros and select the RemoveExtraEntries macro in
the list and click the [Run] button. How long it takes depends on how many
entries you have. It actually does the work in 2 stages. First it finds the
hour values to be removed and sets them to zero. After testing all entries,
it goes back and deletes all rows with a 0 entry for the hours on a row.
Works correctly with the test data you provided.

Sub RemoveExtraEntries()
'by using these Const values
'you can change the code to
'work with a different layout
'on your sheet if you ever need to
Const mySheetName = "Sheet1" ' payroll data sheet name
Const ENumColumn = "B"
Const JCodeColumn = "C"
Const PCodeColumn = "E"
Const HrsColumn = "H"
'end of user redefinable Const values
Dim pSheet As Worksheet
Dim lastRow As Long
Dim offset2JCode As Integer
Dim offset2PCode As Integer
Dim offset2Hrs As Integer
Dim rOffset As Long
Dim maxHours As Single
Dim OLC As Long ' outer loop counter
Dim ILC As Long ' inner loop counter
Dim baseCell As Range

'initialize some values
Set pSheet = ThisWorkbook.Worksheets(mySheetName)
offset2JCode = Range(JCodeColumn & 1).Column - _
Range(ENumColumn & 1).Column
offset2PCode = Range(PCodeColumn & 1).Column - _
Range(ENumColumn & 1).Column
offset2Hrs = Range(HrsColumn & 1).Column - _
Range(ENumColumn & 1).Column
lastRow = pSheet.Range(ENumColumn & Rows.Count). _
End(xlUp).Row
Application.ScreenUpdating = False ' improve performance
For OLC = lastRow To 2 Step -1
Set baseCell = pSheet.Range(ENumColumn & OLC)
rOffset = -1
maxHours = baseCell.Offset(0, offset2Hrs)
For ILC = OLC - 1 To 2 Step -1
If baseCell.Offset(rOffset, 0) = baseCell And _
baseCell.Offset(rOffset, offset2JCode) = _
baseCell.Offset(0, offset2JCode) And _
baseCell.Offset(rOffset, offset2PCode) = _
baseCell.Offset(0, offset2PCode) Then
If baseCell.Offset(rOffset, offset2Hrs) > _
maxHours Then
maxHours = baseCell.Offset(rOffset, offset2Hrs)
baseCell.Offset(0, offset2Hrs) = 0 ' mark for delete
Else
baseCell.Offset(rOffset, offset2Hrs) = 0 ' mark for delete
End If
End If
rOffset = rOffset - 1
Next
Next
'now delete rows with 0 in the Hours column
For OLC = lastRow To 2 Step -1
If pSheet.Range(HrsColumn & OLC) = 0 Then
pSheet.Range(HrsColumn & OLC).EntireRow.Delete
End If
Next
Set baseCell = Nothing
Set pSheet = Nothing
End Sub


ShagNasty said:
Sorry if in wrong Discussion group...

Spreadsheet with columns A thru K -- columns A, E, G, & J have employee data
(name, pay code, pay period, and hours respectively). Frequent pay
adjustments are made to many employee’s time during a given pay period. I
need to delete all records (rows), but the highest hour total, for the
employee, pay code, and pay period. MS Office SP3, Win XP, Approx 20k rows.

EmpName ENum JCode JDesc PayCode PDesc PayPeriod Hours
Emp A 0000A ABC ABC 055 OT 01/25/09 5
Emp A 0000A ABC ABC 055 OT 01/25/09 7.5
Emp A 0000A ABC ABC 065 ST 01/25/09 8
Emp A 0000A ABC ABC 065 ST 01/25/09 4
Emp B 0000B NBC NBC 055 OT 02/25/09 3
Emp B 0000B NBCN NBC 055 OT 02/25/09 5
Emp B 0000B NBC NBC 055 OT 02/25/09 16

I need to retain rows 3 (7.5), 4 (8), and 8 (16)

Thanks.. ShagNasty
 
J

JLatham

I agree, but went with that solution for a couple of reasons:
no added complexity/requirement to have the user do any filtering setup,
done in 2 phases so that if problems did arise, the potential to
troubleshoot after the 1st phase exists.

Joel said:
JLathamL deleting rows one at a time is extremely slow on large files.
iIrecommend filtering and deleting files with 1 delete command.

JLatham said:
Ok, your description of columns used didn't match the example data, so I
wrote the code so that you can adapt it to your actual workbook/worksheet
layout. Just change the Const values (right side of = sign) to match your
reality.

I also suggest testing this on a copy of your workbook first, rather than
taking the chance that I've made a boo-boo that will destroy what you have.

To test the code:
Make a copy of your workbook. Close the original.
Press [Alt]+[F11] to open the VB Editor (VBE).
In the VBE choose Insert --> Module from its menu toolbar.
Copy the code below and paste it into the empty module presented to you.
Make changes to the Const values as required to properly define them.
Close the VBE.
Use Tools --> Macro --> Macros and select the RemoveExtraEntries macro in
the list and click the [Run] button. How long it takes depends on how many
entries you have. It actually does the work in 2 stages. First it finds the
hour values to be removed and sets them to zero. After testing all entries,
it goes back and deletes all rows with a 0 entry for the hours on a row.
Works correctly with the test data you provided.

Sub RemoveExtraEntries()
'by using these Const values
'you can change the code to
'work with a different layout
'on your sheet if you ever need to
Const mySheetName = "Sheet1" ' payroll data sheet name
Const ENumColumn = "B"
Const JCodeColumn = "C"
Const PCodeColumn = "E"
Const HrsColumn = "H"
'end of user redefinable Const values
Dim pSheet As Worksheet
Dim lastRow As Long
Dim offset2JCode As Integer
Dim offset2PCode As Integer
Dim offset2Hrs As Integer
Dim rOffset As Long
Dim maxHours As Single
Dim OLC As Long ' outer loop counter
Dim ILC As Long ' inner loop counter
Dim baseCell As Range

'initialize some values
Set pSheet = ThisWorkbook.Worksheets(mySheetName)
offset2JCode = Range(JCodeColumn & 1).Column - _
Range(ENumColumn & 1).Column
offset2PCode = Range(PCodeColumn & 1).Column - _
Range(ENumColumn & 1).Column
offset2Hrs = Range(HrsColumn & 1).Column - _
Range(ENumColumn & 1).Column
lastRow = pSheet.Range(ENumColumn & Rows.Count). _
End(xlUp).Row
Application.ScreenUpdating = False ' improve performance
For OLC = lastRow To 2 Step -1
Set baseCell = pSheet.Range(ENumColumn & OLC)
rOffset = -1
maxHours = baseCell.Offset(0, offset2Hrs)
For ILC = OLC - 1 To 2 Step -1
If baseCell.Offset(rOffset, 0) = baseCell And _
baseCell.Offset(rOffset, offset2JCode) = _
baseCell.Offset(0, offset2JCode) And _
baseCell.Offset(rOffset, offset2PCode) = _
baseCell.Offset(0, offset2PCode) Then
If baseCell.Offset(rOffset, offset2Hrs) > _
maxHours Then
maxHours = baseCell.Offset(rOffset, offset2Hrs)
baseCell.Offset(0, offset2Hrs) = 0 ' mark for delete
Else
baseCell.Offset(rOffset, offset2Hrs) = 0 ' mark for delete
End If
End If
rOffset = rOffset - 1
Next
Next
'now delete rows with 0 in the Hours column
For OLC = lastRow To 2 Step -1
If pSheet.Range(HrsColumn & OLC) = 0 Then
pSheet.Range(HrsColumn & OLC).EntireRow.Delete
End If
Next
Set baseCell = Nothing
Set pSheet = Nothing
End Sub


ShagNasty said:
Sorry if in wrong Discussion group...

Spreadsheet with columns A thru K -- columns A, E, G, & J have employee data
(name, pay code, pay period, and hours respectively). Frequent pay
adjustments are made to many employee’s time during a given pay period. I
need to delete all records (rows), but the highest hour total, for the
employee, pay code, and pay period. MS Office SP3, Win XP, Approx 20k rows.

EmpName ENum JCode JDesc PayCode PDesc PayPeriod Hours
Emp A 0000A ABC ABC 055 OT 01/25/09 5
Emp A 0000A ABC ABC 055 OT 01/25/09 7.5
Emp A 0000A ABC ABC 065 ST 01/25/09 8
Emp A 0000A ABC ABC 065 ST 01/25/09 4
Emp B 0000B NBC NBC 055 OT 02/25/09 3
Emp B 0000B NBCN NBC 055 OT 02/25/09 5
Emp B 0000B NBC NBC 055 OT 02/25/09 16

I need to retain rows 3 (7.5), 4 (8), and 8 (16)

Thanks.. ShagNasty
 
J

Joel

There is a small problem with both my code and JLathem code that when an
employee has the same number of hours on two rows the code will leave both
rows. i think you only want one row left. Make this change in my code

from
"=IF(AND(A2=A3,E2=E3,G2=G3,H2>H3),""X"","""")"
to
"=IF(AND(A2=A3,E2=E3,G2=G3,H2>=H3),""X"","""")"
 
J

JLatham

HA! You're right and I even thought about that before writing the code. In
mine, the line that reads:
If baseCell.Offset(rOffset, offset2Hrs) > _
maxHours Then

should have been

If baseCell.Offset(rOffset, offset2Hrs) >= _
maxHours Then
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top