Trying to import several million rows- help!

D

-=Drew

All-

I have a gigantic file that I need to do some data mining into. It's about
70,000 pages of flat data records (my best guess- haven't made it to the
bottom of the file yet...it's 332 meg- :) ) Word can barely handle it (takes
over 30 mins), wordpad & notepad can't hang at all. I can't cut & paste when
I open the file in word, so severing the file into easily manageable sections
isnt' an option.

I'm trying to import it in a linebyline record format.
I tried the posted macro for importing more than 65K records- it works well
but taps out at 1.04 million records. Unfortunately, i need to import all of
them, and probably will end up needing about 3x that many to be imported (yes
friends, over 3 million records.)

I can't tell whether the error is a system resource error or if it's some
limitation in excel.


Any suggestions?

Your country owes the one who helps me a debt of gratitude.
-=Drew
 
F

Frank Kabel

Hi
suggestion: Don't use Excel for this :)
Try using a databae which is able to handle this amount of data.
Especially if you're going into serious data mining Excel is NOT the
right tool.
 
J

JE McGimpsey

-=Drew said:
I can't tell whether the error is a system resource error or if it's some
limitation in excel.

XL worksheets are limited to 65536 rows. You could use a macro to import
3M rows into 46 worksheets.
Any suggestions?

Use a database application. You've got a large database. Using a
spreadsheet to manage it doesn't make much sense.
 
A

Alan

-=Drew said:
All-

I have a gigantic file that I need to do some data mining into. It's
about 70,000 pages of flat data records (my best guess- haven't made
it to the bottom of the file yet...it's 332 meg- :) ) Word can
barely handle it (takes over 30 mins), wordpad & notepad can't hang
at all. I can't cut & paste when I open the file in word, so severing
the file into easily manageable sections isnt' an option.

I'm trying to import it in a linebyline record format.
I tried the posted macro for importing more than 65K records- it
works well but taps out at 1.04 million records. Unfortunately, i
need to import all of them, and probably will end up needing about 3x
that many to be imported (yes friends, over 3 million records.)

I can't tell whether the error is a system resource error or if it's
some limitation in excel.


Any suggestions?

Hi,

I won't re-iterate the sound advice that Excel is totally
inappropriate for what you are trying to achieve ... or perhaps I just
did.

However, in relation to splitting up a big Word document - that should
be possible.

Steps you could try (assuming you have a back up copy so no risk here
except wasting time):


1) Copy the file - Call the two parts 1.doc and 2.doc

2) Open 1.doc and delete approximately half the pages from the middle
to the end. Close and save.

3) Open 2.doc and delete approximately half the pages from the
beginning to the middle. Close and save.

4) Repeat (1) to (3) for each new file (1.doc and 2.doc) substituting
sensible names each time until your individual files are down to
manageable sizes.


HTH,

Alan.
 
L

Lady Layla

Why not do the data mining in a database environment? Word is definately not
made to do this type of function and the file is too large for Excel. Have you
tried Access?



: All-
:
: I have a gigantic file that I need to do some data mining into. It's about
: 70,000 pages of flat data records (my best guess- haven't made it to the
: bottom of the file yet...it's 332 meg- :) ) Word can barely handle it (takes
: over 30 mins), wordpad & notepad can't hang at all. I can't cut & paste when
: I open the file in word, so severing the file into easily manageable sections
: isnt' an option.
:
: I'm trying to import it in a linebyline record format.
: I tried the posted macro for importing more than 65K records- it works well
: but taps out at 1.04 million records. Unfortunately, i need to import all of
: them, and probably will end up needing about 3x that many to be imported (yes
: friends, over 3 million records.)
:
: I can't tell whether the error is a system resource error or if it's some
: limitation in excel.
:
:
: Any suggestions?
:
: Your country owes the one who helps me a debt of gratitude.
: -=Drew
 
H

hgrove

-=Drew wrote...
...
I know that Excel is not the optimal tool- my problem is that I
need to go step by step to get the data into a DB- ultimately a
SQL w/ Access Front End interface.

As I said before, I would love to chop it into sequential pieces
as it would make the data much easier to handle, but MS Word
takes about 3.5 hours to open this sucker, and the cut/paste
options are grey'ed out when i try to hack it into 10000 page
sections. (FYI- it's so big, I haven't had the patience to see how
long it is, but it's probably between 70,000-100,000 pages of
flat text records)

1. Can anyone recommend a more powerful text editor than
Word to try out? If i'm able to parse it into smaller files- i.e.
those of less than 1.048 million lines, then i'm in business.

Don't use any text editor. Don't use any nonbatch software at all. Thi
is a old fashioned DP task, and an old fashioned DP is best.

If you can use other software, consider downloading an installing th
GnuWin32 text utilities,

http://sourceforge.net/project/showfiles.php?group_id=23617&package_id=26492

which includes one named split which provides may options for splittin
any sort of file. Piece of cake splitting a large text file at eac
formfeed or ever N lines.
2. Can anyone recommend/write a better script to enter the
data into Excel?

If the file takes forever to load into Word, and Word doesn't let yo
do anything useful with it, Excel would be MUCH, MUCH worse. That said
you could use something like this.


Sub foo()
Const YOUR_FILENAME As String = "d:\tmp\deleteme.prn"
Const YOUR_LINES_PER_FILE As Long = 100

Dim ifd As Integer, ofd As Integer
Dim k As Long, n As Long, s As String

ifd = FreeFile
Open YOUR_FILENAME_HERE For Input As #ifd

Do While Not EOF(ifd)

Line Input #ifd, s
n = n + 1

If n Mod YOUR_LINES_PER_FILE = 1 Then
If k > 0 Then Close #ofd
k = k + 1
ofd = FreeFile
Open YOUR_FILENAME & Format(k, "\.0000") For Output As #ofd
End If

Print #ofd, s

Loop

Close #ofd
Close #ifd

End Su
 
J

Jamie Collins

-=Drew wrote ...
I know that Excel is not the optimal tool- my problem is that I need to go
step by step to get the data into a DB- ultimately a SQL w/ Access Front End
interface.

You may be able to use ADO to query the text file to get meaningful
in-memory *sets* of data:

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnclinic/html/scripting03092004.asp

Row-by-row processing must surely be impractical. How long have you
forecasted it will take you to step through 3 million rows? Are you
planning to transform you data with 3 million cell formulas? Don't hit
F9 <g>.

Jamie.

--
 
A

Alan

-=Drew said:
All-

Thank you for the quick responses.
I know that Excel is not the optimal tool- my problem is that I need
to go step by step to get the data into a DB- ultimately a SQL w/
Access Front End interface.

As I said before, I would love to chop it into sequential pieces as
it would make the data much easier to handle, but MS Word takes about
3.5 hours to open this sucker, and the cut/paste options are grey'ed
out when i try to hack it into 10000 page sections. (FYI- it's so
big, I haven't had the patience to see how long it is, but it's
probably between 70,000-100,000 pages of flat text records)

1. Can anyone recommend a more powerful text editor than Word to try
out? If i'm able to parse it into smaller files- i.e. those of less
than 1.048 million lines, then i'm in business.
2. Can anyone recommend/write a better script to enter the data into
Excel?

What was the outcome o trying to use Access?

Did it work?

Alan.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top