How to sort/update large excel db

C

ConfusedNovice

Hi - this is way out of my league - don't know what
questions to ask, not even sure I'll understand the
answers, but I've gotta try, & there's a short time frame.
Here's the situation:

I need to work on a large (>15,000) client list. Updates
to existing records, adding new records in various
categories, & generating grouped lists. Most of the
records have one or more blank fields, & need to stay that
way.

I can't figure out how to sort/filter/update/add, and
still preserve the blank fields. The data seems to shift
into the wrong records, screwing up the addresses when I
sort & update. And my computer hangs forever, then gives
truncated lists when I try to merge worksheets.

Please help me out with suggestions, preferably in short,
simple words. I've been struggling with this for almost 2
days now, & I'm totally frustrated. Thanks a lot!

Jane
 
F

Frank Kabel

Hi
select the entire data range (with all rows/columns) and use 'Data -
sort' to sort your data
 
I

icestationzbra

provide a layout of your sheet / data. that would help to analyse th
situation. some examples as to what is expected would also help, alon
with it
 
C

ConfusedNovice

Thanks!

Well, I have columns A-V, that's about 22 fields, with
normal name/address/phone/email, etc., plus codes
indicating type of client & status, various preferences,
dates first entered, & last modified (which I would like
to automate), & a free-form comments field. Each customer
record is a row. Some people have only last name & email,
others have info in many fields. Pretty much all text.

I've tried data sort, with some success in making lists by
state, zip code, status. The scarey part is adding data. I
tried adding or updating some records, & some of the last
names shuffled themselves on to the wrong records. The
errors I found were near records with empty cells in the
first name column. (Also, I'm wondering about the accuracy
of the existing records before this little project came to
me...)

Jane
 
D

Dave Peterson

If you sorted only A-F (say) and didn't sort G-V, then try to find a backup
version of your data.

Once you've saved the data in this mixed up manner, the only way to go back is
manually checking each one.

And make sure you select the whole range (all of the columns) when you do your
sort next time.
 
C

ConfusedNovice

Thanks - I'm working with copies, not the sacred master
file. I think I'm ok with sorting for now, probably
filters, too. (For basics, nothing fancy...) Selecting
everything seems to be the most important factor.

Here's the current issues:
I have some original records & a previous version of the
db. I suspect that some address scrambling already exists.
Can't figure out how to compare the 2 versions, nor how to
look for duplicate records within/between the files.

Also, I'd like to add/correct some records, but I'd prefer
to do them as a batch. I don't want to spend the time
manually searching on each one to verify it's not a
duplicate. How does the Merge locate records & decide
whether to change an existing or make a new one?

Almost ready to try macros, totally don't "get" the pivot
tables, successfully broke out several client lists. Not
the best way to learn this stuff.

Jane
 
D

Dave Peterson

You sorted by a key column (to get messed up).

If the entries were unique in that key column, you could use =vlookup() to
return the value for other fields in that row.

=Vlookup() requires that the key column be in the left most column of the table
(not always column A, but it wouldn't hurt if you put it there in the worksheet
that contains the original info.)

Then
=vlookup(j3,originalsheet!$a:$k,5,false)

If column J were the key column, then this will bring back the 5th column of the
original sheet (column E) when/if a match is found.

But if you find that you have tons of differences, you may find it easier to
start with the (copy of) the original and redo that work.
 
C

ConfusedNovice

Sorry about the time gap between posts, I had to go out of
town.

Are you saying that I can't ever use a key column as part
of a sort/filter, then update records in the resulting
subset, & finally return to the full database with updates
done & all records intact? If this is true, then how
*does* a person sort/filter the database to locate
particular records in need of edits & then fix only those
records? If I'm misunderstanding something, then please
set me straight, thanks!

I now have a couple more previous versions of the database
& a lot (but not nearly all) of various original paper
records. I've confirmed that there was pre-existing
corruption, & getting back to a clean db version is
unlikely. I want to undo damage done by others if
possible, do no further harm, & get the people a good
list.

This is a lot bigger than I expected, & I really apprecate
the help. I thought it would work to sort, filter, & edit.
If I'm back to square 1 or worse, I don't want to make any
more false assumptions.

Jane
 
D

Dave Peterson

I'm saying that if you have two "live" copies of the workbook, it'll be a pain
to merge one of them back into the other.

There's nothing built into excel that will force you to make sure your key
column entries are unique.

When I do this, I spend some time verifying that the data is at least
consistent--no additional entries with the same key were added. If you can
believe/verify this, then, yeah, you can do =vlookup()'s to merge the updated
values into the one real workbook.

It's just in my experience, avoiding this is usually better. (Just give the
file to someone else and tell them to return it when they're done--and put a
freeze on any other changes.)
 
C

ConfusedNovice

Still trying to understand what I've got. There is no
column with all unique entries. Every column has blank
cells someplace. On 2 of the earlier database versions,
the leftmost column (Column A) is set up as "Index
Column," but not all the records have an "Index" number in
this column - some of them are blank, & some have zeros in
Column A. The most recent master does not have Index
column, it starts with a Last Name field.

This brings me back to the question of how to prevent
names/values from slipping into the blank cells when
sorting & updating records. I have been reading a lot of
Excel info, but this is *not* my area at all, & I want to
fix these problems, not compound them. Should I be able to
identify key column(s)? If yes, how? And if no, is there
any safe way to sort when blanks exist in every column?
(Or can it be a Key & contain multiple blank cells?)

There are ~17,000 records, & when the wrong values move
into a blank cell, that record doesn't indicate that it
has been changed. (Probably because it was not opened?)

I'm seriously running out of time. I've got a pretty big
list of changes ready to enter directly on db records. I
also have isolated a few hundred recent entries by date,
verified them, & placed them in a separate worksheet.
(Hoping to append to a clean main list & sort into place.)
I don't think it's possible to verify every row.

Yes, no matter how you look at this, it's a pain! I'm not
really dumb, just a total rookie, & your help is
invaluable.
To summarize:
Whatsup with "Index Column"?
How to identify/designate Key columns?
Implications of having blank cells throughout sheet?
Fastest way to add/edit records if I go back to an older,
cleaner version of the database?

Thanks again!

Jane
 
D

Dave Peterson

Just to clarify, is your data laid out one record per row? You don't use
multiple rows to represent one "logical" record, do you?

If you do, then this will make it even more difficult.

But to match/merge two different worksheets, I would think that you'd need to
have a unique key/index into the data.

If you don't have this key, how do you match up your data to look for
differences?

I wouldn't put too much faith in having that unique key in the last name
column--but maybe you can use a couple of columns that could serve as that key.

For instance, if you had the lastname, firstname, address in 3 separate columns,
you could use a helper cell and do something like:
=A2&char(10)&b2&char(10)&c2
(The char(10) probably won't appear in your data and will serve as a field
delimiter--just in case there's records that might look identical when
combined.)

But even if you did this, you'd have to make sure that these fields weren't
changed. If any were changed, then you couldn't use them for a match between
worksheets.

To find out if the records in one worksheet are in the other (based on any
single column you can use--in your data or derived from your data):

=isnumber(match(a2,sheet1!a:a,0))
and drag down.
You'll see True if it appears on that other worksheet.

And use the equivalent to check the other worksheet.

Once you've found your Falses--you'll want to inspect them to see if they should
be cleansed--cleaned up to make them match.

You could end up fixing the data on either worksheet--just to find the matches.

Once you get rid of all the Falses, you can do the =vlookup() stuff to match
merge.

I'm not sure if all this work can be done on a quick schedule. This kind of
stuff usually means fix, check, fix, check....until you can't find any more
differences. Then you let someone else review it and it starts again.

Good luck,
 
Top