searching a large database with a long list of search terms

J

joe_d_builder

I have a worksheet with a column with about 3000 rows of info in it. I
also have six other worksheets completely full (65536 each) that I need
to search through a column and then when I find a row that matches an
entry in one of those rows paste that row next to the correct number in
the 3000 entries. I don't know how to write macros, only simple
formulas. Is there an easy way to do this?

This is confusing so as an example here's the one 3000 row worksheet

aaa bob 123
bbb june 345
ccc fred 876
ddd mary 765

and I want to find all the values in the first colum (aaa, bbb, ccc,
ddd) that show up in here (each of the 65536 row worksheets)...

ddd toronto
zzz chicago
aaa new york
mmm boise
bbb portland
ddd miami

and end up with something that looks like this:

aaa bob 123 new york
bbb june 345 portland
ccc fred 876 NO ENTRY
ddd mary 765 miami

Except that I need to do this with !hundreds of thousands! of rows so
it can't take a super long time. Notice that when it couldn't find a
matching entry it put "NO ENTRY" in there. That's important because
there might be instances where the search term doesn't show up.

Thanks so much folks, I really appreciate it.

Joe
 
D

David

Hi,
Just a little clarification for myself and others that may look at this. The
sheet with 3000 entries, these are unique entries? And this same sheet is
where you want to write to? Just to add a term for clarification, this would
be the "main" sheet. You would want to look up all entries from this "main"
sheet and find them on the other 6 sheets, where you would be fetching back
to the "main" sheet the city?

The other 6 sheets, where the lookup is taking place, there are not
duplicate lookup values, ie aaa, bbb, ccc would only have a single entry
somewhere on the six other sheets and only one city associated with each
lookup value?
 
J

joe_d_builder

Yes, these are unique. They appear (most of them anyway) in the other
sheet just as you described.

I messed around with using the advanced filter function and was able to
filter the results but I wasn't able to copy over the data back to the
"main" sheet. Plus I had to do each individual worksheet by itself and
then copy all the results back to the main sheet (and then they weren't
tied in with the original unique data).

Joe
 
P

Pete_UK

Joe,

In your example you have the code "ddd" twice - I'm assuming this is a
typo, and have changed it to "eee" below.

Is the data on the 6 sheets sorted in some way, and if not can it be
sorted by the first column? You example data would then look like this:

aaa new york
bbb portland
ddd toronto
eee miami
mmm boise
zzz chicago

This will make searching through the data to find a match much quicker.

Pete
 
B

Biff

Max, I'm wondering if you did a full application test on this. (I didn't
look at your sample file)

6 sheets with 65536 rows of data, 3000 rows of lookup values and 6 columns
of formulas.

What kind of calc time did that take?

Biff

Max said:
Perhaps one play to try ..

Sample construct available at:
http://cjoint.com/?bfl6QazB5P
VLookUp_6Sheets_joe_d_builder.xls

Assume data in the 6 sheets are in cols A and B, from row1 down
(key col = col A, "city" in col B)

Rename the 6 sheets to be simply the numbers: 1,2,3,4,5,6
(The renaming of the sheetnames to the numbers 1 - 6
is to allow us to easily fill the extract formulas in Master)

Then in sheet: Master
where the data is in cols A to C, with the key col = col A
aaa bob 123
bbb june 345
etc

Put in D1, copy across 6 cols to I1, fill down as far as required:
=IF($A1="","",IF(ISNA(MATCH($A1,INDIRECT("'"&COLUMN(A1)&"'!A:A"),0)),"NO
ENTRY",VLOOKUP($A1,INDIRECT("'"&COLUMN(A1)&"'!A:B"),2,0)))

Cols D to I will extract the "city" returns from each of the 6 sheets (1 -
6)
[ Col D = returns from sheet: 1, .. col I = returns from sheet: 6 ]
Unmatched cases will return "NO ENTRY"
 
B

Biff

Easy there, Max!

I'm not "knocking" anything.

It was just a question out of my own curiosity.
Btw, do you do full testing on detailed posts
like this one before you respond ?

Yes, I do. That's one of the reasons I didn't want to "tackle" this one! I
didn't want to fill 6 sheets with data! I have a custom toolbar button with
macro that generates random numeric values that I use for testing in "huge"
blocks of cells. I need to get something that does the same thing but
generates random text strings.

Biff
 
B

Biff

I'm not sure, but if I'm the OP, I'd definitely appreciate/prefer to
receive
and try out any ideas/suggestions than not to receive any response to try.
Whether the suggestion(s) ultimately work or not when plugged into the
"real-world" application, of course, is another matter which could be
followed-up in further feedback within the thread or as a fresh post. Just
my views ..

I agree completely.

Thanks for the text string pointers. I'll definitely check those out!

Biff
 
P

Pete_UK

Max,

The OP didn't get back to me when I asked about sorting the reference
data beforehand. Can you sort your random data in the 6 sheets then
re-apply your formula to take advantage of this to see if there is a
big increase in speed? In theory, the binary search technique applied
if the data is sorted should make a massive difference to 6 * 65536
entries.

Pete
 
M

Max

It wasn't exactly a controlled experiment earlier, Pete <g>
I didn't save the testfile, and think I might have probably underestimated
the timing a little. I set the calc mode to manual, filled the data to 65k,
then filled the formulas to 3k, and left for an oxygen break. I ended up
with a double. When I came back, the fills and calcs were done. Probably
better to await the OP's feedback on whether it worked out ok for him over
there.
 
B

Biff

If anyone is still following this thread I'll do some tests and post the
results. Stay tuned!

Biff
 
M

Max

Biff,

What would have been your suggestion to the OP,
had not the "volume" got in the way ?
 
M

Max

.. Let me give this a try.

Joe,

Could you drop a line or two here
on how the try went for you over there ?

Did it work when you applied it on a copy of your actual file?
Roughly how long did the calcs take to complete?

Thanks
 
B

Biff

Here are the results of 3 tests:

Computer specs:

Pentium P4, 2.0 ghz, 256 Mb ram, WinXP (all service packs, all patches),
Excel 2002 (XP) (all service packs)
Other than the operating system, Excel is the only app running.

File configuration: (based on the OPs description)

7 sheets total, 1 summary, 6 data

Summary sheet(1): 3 columns x 3000 rows. Lookup values in column A, A1:A3000
Data sheets(6): 2 columns x 65536 rows

Test 1 (based on the reply from Max)

File size (wo/formulas) - 26.6 Mb
File size (w/formulas) - 27.5 Mb

This formula was copied to 6 columns x 3000 rows:

=IF($A1="","",IF(ISNA(MATCH($A1,INDIRECT("'"&COLUMN(A1)&"'!A:A"),0)),"NO
ENTRY",VLOOKUP($A1,INDIRECT("'"&COLUMN(A1)&"'!A:B"),2,0)))

I was unable to copy/drag in a single operation. When I tried, Excel
"froze-up". I had to use Task Manager to regain control. Tried twice and
Excel "froze" both times. I had to drag copy in increments of ~200 rows at a
time. I didn't time this but to copy to all 3000 rows took at least an hour.
(calculation was on automatic) After all formulas were copied:

Calc time (data sheets unsorted) ~6:45 (m:ss)
Calc time (data sheets sorted ascending) ~3:30 (m:ss)

Test 2

Deleted all the above formulas, reset the used range.

Used this array formula copied to 1 column x 3000 rows:

=IF(SUMPRODUCT(COUNTIF(INDIRECT("'"&WSlist&"'!A:A"),A1)),VLOOKUP(A1,INDIRECT("'"&INDEX(WSlist,MATCH(TRUE,COUNTIF(INDIRECT("'"&WSlist&"'!A:A"),A1)>0,0))&"'!A:B"),2,0),"")

After all formulas were copied:

File size (w/formulas) - 27.2 Mb
Calc time (data sheets sorted ascending) ~3:35 (m:ss)

Test 3

Deleted all the above formulas. Decided to try a monster nested IF formula
but I hit the nested function limit so I split the formula into 2 cells. I
cell formula did the lookup on sheets 2,3,4. The other cell formula did the
lookup on sheets 5,6,7.

=IF(NOT(ISERROR(VLOOKUP(A1,Sheet2!A:B,2,0))),VLOOKUP(A1,Sheet2!A:B,2,0),IF(NOT(ISERROR(VLOOKUP(A1,Sheet3!A:B,2,0))),VLOOKUP(A1,Sheet3!A:B,2,0),IF(NOT(ISERROR(VLOOKUP(A1,Sheet4!A:B,2,0))),VLOOKUP(A1,Sheet4!A:B,2,0),"")))

=IF(D1="",IF(NOT(ISERROR(VLOOKUP(A1,Sheet5!A:B,2,0))),VLOOKUP(A1,Sheet5!A:B,2,0),IF(NOT(ISERROR(VLOOKUP(A1,Sheet6!A:B,2,0))),VLOOKUP(A1,Sheet6!A:B,2,0),IF(NOT(ISERROR(VLOOKUP(A1,Sheet7!A:B,2,0))),VLOOKUP(A1,Sheet7!A:B,2,0),"No
Entry"))),"")

After all formulas were copied:

File size (w/formulas) - 28.2 Mb
Calc time (data sheets sorted ascending) ~1 second

I did not test using unsorted data sheets in tests 2 and 3.

Conclusion:

Sorting the data can speed up calc time significantly in "large" files. The
use of 1000's of volatile functions should be avoided at all costs! Monster
formulas aren't all bad!

Comments/suggestions welcome!

Biff
 
B

Biff

See my test results.

I probably would've gone with what I did in my test 2.

As you can see, the calc times were pretty much the same (what I would've
done versus what you did). I don't like to use a lot of helper cells if I
don't have to. The monster formula was the hands-down winner, but who
"likes" monster formulas? They tend to scare people away!

Biff
 
M

Max

Comments/suggestions welcome!

My compliments on your effort, Biff. Amazing.
Sorting the data can speed up calc time significantly in "large" files.
The use of 1000's of volatile functions should be avoided at all costs!
Monster formulas aren't all bad!

I was unable to copy/drag in a single operation.

FWIW, my m/c was able to fill 6 C x 3000 R, over here.
(no freezing)

Btw, think there was a slight interp difference in that I presumed what the
OP would like to have was to match & extract separate returns from each of
the 6 sheets for all the 3000 lines. Then he could assess further what he
wants done.
I didn't assume any "precedence" order in the 6 data sheets.
 
M

Max

I probably would've gone with what I did in my test 2.

... the super complex array <g> ?
filled down only in 1 col x 3000 rows
(and which also contains the same volatile INDIRECT ?)

2 observations:

1. The returns are different, ref explanation in my response to you earlier
(pasted below), because of different interps of what the OP wanted. My
suggestion simply lines up all the returns from the 6 sheets for each of the
3000 lines in 6 cols. Yours return results in 1 col, with an implicit
precedence order assumed in the 6 data sheets. Nothing wrong there, just
different interps.
Btw, think there was a slight interp difference in that I presumed what the
OP would like to have was to match & extract separate returns from each of
the 6 sheets for all the 3000 lines. Then he could assess further what he
wants done.
I didn't assume any "precedence" order in the 6 data sheets.
As you can see, the calc times were pretty much the same ..

2. Wondering whether the calc times would be any different if say, your
array formula were to be modified to return similarly as mine the results in
6 cols ?

Just some thoughts, Biff <g>
 
M

Max

Better to have 6 similar formulas pointing to each of the 6 sheets

What's meant is simply ..

In D1:
=IF($A1="","",IF(ISNA(MATCH($A1,'1'!$A:$A,0)),"NO
ENTRY",VLOOKUP($A1,'1'!$A:$B,2,0)))

In E1:
=IF($A1="","",IF(ISNA(MATCH($A1,'2'!$A:$A,0)),"NO
ENTRY",VLOOKUP($A1,'2'!$A:$B,2,0)))

and so on in F1 till I1 (the same formula essentially but with the
sheetnames changed accordingly to '3', '4', '5', and '6')

D1:I1 is copied down 3000 rows
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top