Sorting data into a new column

P

pbc444

I have three large sets of data (each column has up to 50,000 rows). The
data are names (text) not numbers. I would like to create a fourth data
set comprised of the names that are present in ALL three original
columns. Basically, if a name is on list one, two, AND three, then it
is placed on the fourth list. If the name is on just one or two of the
three, then it will not be placed on the fourth list. Can I complete
this sorting in Excel automatically? The datasets are too large to
manipulate manually. Thanks.
 
P

pbc444

Hi, thanks for the help. Here is an example that I hope will clarify.
Imagine three columns. In Column 1 there are three names (Peter, Paul,
Mary), with one name in each cell (A1, A2, A3). In Column 2, there are
four names (Paul, Mary, Luke, John) in cells B1, B2, B3, and B4,
respectively. Finally, in Column 3 there are three names (Paul, Mary,
James), in cells C1, C2, C3.

Given the above, I would like to sort the three lists into a new column
4, which would only contain that names that are present in all the other
columns. For this example, Column 4 would contain Paul and Mary in
cells D1 and D2, as those two names appear in Columsn 1-3.

Thanks again for the help. I really appreciate it.
 
D

Dave Peterson

Pick out the column with the shortest list of names (I'm gonna use A)

Then insert 4 columns next to that column (New columns B:E)

And say the other 3 columns shifted to the right to f:h.

Then in B2 (headers in row 1)
=isnumber(match(a2,f:f,0))

In C2:
=isnumber(match(a2,g:g,0))

In D2:
=isnumber(match(a2,h:h,0))

In E2:
=countif(b2:d2,true)

Drag all those formulas down the length of column A.

Let excel recalculate
(go get a cup or two of coffee with 50k rows!)

Select columns B:E
edit|copy
edit|paste special|values
(It'll make the next step quicker)

Select column E
data|filter|autofilter

Filter to show just the 3's (it's in column A and 3 appears 3 times in B, C and
D.

Select column A
edit|goto|special|visible cells only
edit|copy
and paste to a new worksheet.

That calculation could take a while. You may want to drag down in smaller
increments.
 
A

Alex

Here is another way. Assuming your data is in columnes A, B, and C and
that the first row is a column title of some kind, I got this to work.

In column D, put the following formula into D1. Drag it down to as far
as you think you will ever have data.

=IF(AND((COUNTIF(B:B,A2)+COUNTIF(C:C,A2))=2,A2<>""),A2,"")

Turn the first row titles into autofilter.

Go to the filter for column D and sleect "NonBlanks".

Voila!

Alex.
 
D

Dave Peterson

What happens if the name appears in column B twice, but 0 times in column C.

And where's the 4th column <vbg>?

=IF(AND((COUNTIF(B:B,A2)>0,COUNTIF(C:C,A2)>0,COUNTIF(d:d,A2)>0,A2<>""),A2,"")

But with giant lists (50000 rows), I've always seen =countif() much slower than
=match()

=if(and(isnumber(match(a2,b:b,0)),
isnumber(match(a2,c:c,0)),
isnumber(match(a2,d:d,0))),a2,"")

might be another approach.
 
A

Alex

I thought that excel could go to 32000+- rows anyway. I would be
looking to put this in access or something anyway.

Alex.
 
D

Dave Peterson

Excel can go 32000+. All the way to 65536 rows.
I thought that excel could go to 32000+- rows anyway. I would be
looking to put this in access or something anyway.

Alex.
 
P

pbc444

Thanks to everyone for the help. I actually ended up using an array to
extract common values from two lists, and then did an array of the
array output.

=IF(COUNTIF($A$1:$A$10,B1)>0,B1,"")

It nearly melted my laptop, but did work. I agree that data sets this
large shouldn't be in Excel.

--Peter
 
Top