Remove lines from text file prior to importing in Excel

A

Aaron.*

I've scoured the usual sources, but haven't found any existing code
that will accomplish what I need, so here's my situation.

I've got a large text file from my company's mainframe. It's around
120,000 lines, with each record from a particular database taking up
several lines (generally 8 to 16). Of those lines, only 4 of them are
needed for the output file I need to produce. Since there is an
import limit on Excel 2003, I'd like to remove the extraneous lines
prior to the actual opening of the file in Excel. It should be pretty
easy to isolate which lines to delete:

Example data:

RECORD1 11 DATA-01 DATA-25 DATA-49
RECORD1 14 DATA-02 DATA-26 DATA-50
RECORD1 17 DATA-03 DATA-27 DATA-51
RECORD1 20 DATA-04 DATA-28 DATA-52
RECORD1 31 DATA-05 DATA-29 DATA-53
RECORD1 34 DATA-06 DATA-30 DATA-54
RECORD1 37 DATA-07 DATA-31 DATA-55
RECORD1 40 DATA-08 DATA-32 DATA-56
RECORD2 11 DATA-09 DATA-33 DATA-57
RECORD2 14 DATA-10 DATA-34 DATA-58
RECORD2 17 DATA-11 DATA-35 DATA-59
RECORD2 20 DATA-12 DATA-36 DATA-60
RECORD2 31 DATA-13 DATA-37 DATA-61
RECORD2 34 DATA-14 DATA-38 DATA-62
RECORD2 37 DATA-15 DATA-39 DATA-63
RECORD2 40 DATA-16 DATA-40 DATA-64
RECORD3 11 DATA-17 DATA-41 DATA-65
RECORD3 14 DATA-18 DATA-42 DATA-66
RECORD3 17 DATA-19 DATA-43 DATA-67
RECORD3 20 DATA-20 DATA-44 DATA-68
RECORD3 31 DATA-21 DATA-45 DATA-69
RECORD3 34 DATA-22 DATA-46 DATA-70
RECORD3 37 DATA-23 DATA-47 DATA-71
RECORD3 40 DATA-24 DATA-48 DATA-72

Basically, the 9th and 10th characters of a row would determine
whether or not I keep the line or delete them.

I know how to do this in VBA (Basically a match using If
Mid(Cells(currow, 1), 9, 2) <> "[Value to Match]" And etc...), but I
can't figure out how to manipulate the text prior to importing to
avoid that 65,000 row limit.

Any help would be appreciated.

Thanks!
 
A

Aaron.*

I guess I asked too soon. I found some code on-line and was able to
put it together in a form that would work for my uses. Actually
pretty straight forward.

Sub DeleteExtraLines(SourceFile As String)
Dim TargetFile As String, tLine As String
Dim i As Long, F1 As Integer, F2 As Integer

TargetFile = "RESULT.TMP"
If Dir(SourceFile) = "" Then Exit Sub
If Dir(TargetFile) <> "" Then
On Error Resume Next
Kill TargetFile
On Error GoTo 0
If Dir(TargetFile) <> "" Then
MsgBox TargetFile & _
" already open, close and delete / rename the file and
try again.", _
vbCritical
Exit Sub
End If
End If
F1 = FreeFile
Open SourceFile For Input As F1
F2 = FreeFile
Open TargetFile For Output As F2
i = 1 ' line counter
Application.StatusBar = "Reading data from " & _
TargetFile & " ..."
While Not EOF(F1)
If i Mod 100 = 0 Then Application.StatusBar = _
"Reading line #" & i & " in " & _
TargetFile & " ..."
Line Input #F1, tLine
If Mid(tLine, 9, 3) = "009" Then 'This is where you enter the
clause for which lines you want to copy, in my case, any line where
the 9-11 characters are "009"
Print #F2, tLine
End If
i = i + 1
Wend
Application.StatusBar = "Closing files ..."
Close F1
Close F2
Kill SourceFile ' delete original file
Name TargetFile As SourceFile ' rename temporary file
Application.StatusBar = False
End Sub

Hope this helps someone else in the future.

I've scoured the usual sources, but haven't found any existing code
that will accomplish what I need, so here's my situation.
I've got a large text file from my company's mainframe.  It's around
120,000 lines, with each record from a particular database taking up
several lines (generally 8 to 16).  Of those lines, only 4 of them are
needed for the output file I need to produce.  Since there is an
import limit on Excel 2003, I'd like to remove the extraneous lines
prior to the actual opening of the file in Excel.  It should be pretty
easy to isolate which lines to delete:
Example data:
RECORD1 11 DATA-01 DATA-25 DATA-49
RECORD1 14 DATA-02 DATA-26 DATA-50
RECORD1 17 DATA-03 DATA-27 DATA-51
RECORD1 20 DATA-04 DATA-28 DATA-52
RECORD1 31 DATA-05 DATA-29 DATA-53
RECORD1 34 DATA-06 DATA-30 DATA-54
RECORD1 37 DATA-07 DATA-31 DATA-55
RECORD1 40 DATA-08 DATA-32 DATA-56
RECORD2 11 DATA-09 DATA-33 DATA-57
RECORD2 14 DATA-10 DATA-34 DATA-58
RECORD2 17 DATA-11 DATA-35 DATA-59
RECORD2 20 DATA-12 DATA-36 DATA-60
RECORD2 31 DATA-13 DATA-37 DATA-61
RECORD2 34 DATA-14 DATA-38 DATA-62
RECORD2 37 DATA-15 DATA-39 DATA-63
RECORD2 40 DATA-16 DATA-40 DATA-64
RECORD3 11 DATA-17 DATA-41 DATA-65
RECORD3 14 DATA-18 DATA-42 DATA-66
RECORD3 17 DATA-19 DATA-43 DATA-67
RECORD3 20 DATA-20 DATA-44 DATA-68
RECORD3 31 DATA-21 DATA-45 DATA-69
RECORD3 34 DATA-22 DATA-46 DATA-70
RECORD3 37 DATA-23 DATA-47 DATA-71
RECORD3 40 DATA-24 DATA-48 DATA-72
Basically, the 9th and 10th characters of a row would determine
whether or not I keep the line or delete them.
I know how to do this in VBA (Basically a match using If
Mid(Cells(currow, 1), 9, 2) <> "[Value to Match]" And etc...), but I
can't figure out how to manipulate the text prior to importing to
avoid that 65,000 row limit.
Any help would be appreciated.

Aaron,

As far as I am aware, there is no easy way of doing this, since the
import routines supplied with Excel do not allow for the exclusion of
specified records - only skipping fields.

I suspect that you will have to write your own VBA subroutine to
import the records selectively. That is what I did when I was faced
with a similar problem a couple of years ago.

If you need further information, I will gladly supply a code fragment
that you can modify to suit your own needs.

JAC- Hide quoted text -

- Show quoted text -
 
R

ryguy7272

Glad you got it working! I often post, and then find a solution right
afterward. Anyway, here's a generic solution for importing a large *.txt
file, for you for next time, or for anyone else reading this post:

Sub LargeFileImport()
'Dimension Variables
Dim ResultStr As String
Dim FileName As String
Dim FileNum As Integer
Dim Counter As Double
Dim mySht As Worksheet
'Ask User for File's Name
FileName = Application.GetOpenFilename
'Check for no entry
If FileName = "" Then End
'Get Next Available File Handle Number
FileNum = FreeFile()
'Open Text File For Input
Open FileName For Input As #FileNum
'Turn Screen Updating Off
Application.ScreenUpdating = False
'Create A New WorkBook With One Worksheet In It
Workbooks.Add Template:=xlWorksheet
'Set The Counter to 1
Counter = 1
'Loop Until the End Of File Is Reached
Do While Seek(FileNum) <= LOF(FileNum)
'Display Importing Row Number On Status Bar
Application.StatusBar = "Importing Row " & _
Counter & " of text file " & FileName
'Store One Line Of Text From File To Variable
Line Input #FileNum, ResultStr
'Store Variable Data Into Active Cell
If Left(ResultStr, 1) = "=" Then
ActiveCell.Value = "'" & ResultStr
Else
ActiveCell.Value = ResultStr
End If
'For xl95 change 65536 to 16384
If ActiveCell.Row = 65536 Then
'If On The Last Row Then Add A New Sheet
' and copy the header row from the last sheet
Set mySht = ActiveSheet
ActiveWorkbook.Sheets.Add
mySht.Cells(1, 1).EntireRow.Copy ActiveSheet.Cells(1, 1).EntireRow
Application.CutCopyMode = False
ActiveCell.Offset(1, 0).Select
Else
'If Not The Last Row Then Go One Cell Down
ActiveCell.Offset(1, 0).Select
End If
'Increment the Counter By 1
Counter = Counter + 1
'Start Again At Top Of 'Do While' Statement
Loop
'Close The Open Text File
Close
'Remove Message From Status Bar
Application.StatusBar = False
End Sub

--
Ryan---
If this information was helpful, please indicate this by clicking ''Yes''.


Aaron.* said:
I guess I asked too soon. I found some code on-line and was able to
put it together in a form that would work for my uses. Actually
pretty straight forward.

Sub DeleteExtraLines(SourceFile As String)
Dim TargetFile As String, tLine As String
Dim i As Long, F1 As Integer, F2 As Integer

TargetFile = "RESULT.TMP"
If Dir(SourceFile) = "" Then Exit Sub
If Dir(TargetFile) <> "" Then
On Error Resume Next
Kill TargetFile
On Error GoTo 0
If Dir(TargetFile) <> "" Then
MsgBox TargetFile & _
" already open, close and delete / rename the file and
try again.", _
vbCritical
Exit Sub
End If
End If
F1 = FreeFile
Open SourceFile For Input As F1
F2 = FreeFile
Open TargetFile For Output As F2
i = 1 ' line counter
Application.StatusBar = "Reading data from " & _
TargetFile & " ..."
While Not EOF(F1)
If i Mod 100 = 0 Then Application.StatusBar = _
"Reading line #" & i & " in " & _
TargetFile & " ..."
Line Input #F1, tLine
If Mid(tLine, 9, 3) = "009" Then 'This is where you enter the
clause for which lines you want to copy, in my case, any line where
the 9-11 characters are "009"
Print #F2, tLine
End If
i = i + 1
Wend
Application.StatusBar = "Closing files ..."
Close F1
Close F2
Kill SourceFile ' delete original file
Name TargetFile As SourceFile ' rename temporary file
Application.StatusBar = False
End Sub

Hope this helps someone else in the future.

I've scoured the usual sources, but haven't found any existing code
that will accomplish what I need, so here's my situation.
I've got a large text file from my company's mainframe. It's around
120,000 lines, with each record from a particular database taking up
several lines (generally 8 to 16). Of those lines, only 4 of them are
needed for the output file I need to produce. Since there is an
import limit on Excel 2003, I'd like to remove the extraneous lines
prior to the actual opening of the file in Excel. It should be pretty
easy to isolate which lines to delete:
Example data:
RECORD1 11 DATA-01 DATA-25 DATA-49
RECORD1 14 DATA-02 DATA-26 DATA-50
RECORD1 17 DATA-03 DATA-27 DATA-51
RECORD1 20 DATA-04 DATA-28 DATA-52
RECORD1 31 DATA-05 DATA-29 DATA-53
RECORD1 34 DATA-06 DATA-30 DATA-54
RECORD1 37 DATA-07 DATA-31 DATA-55
RECORD1 40 DATA-08 DATA-32 DATA-56
RECORD2 11 DATA-09 DATA-33 DATA-57
RECORD2 14 DATA-10 DATA-34 DATA-58
RECORD2 17 DATA-11 DATA-35 DATA-59
RECORD2 20 DATA-12 DATA-36 DATA-60
RECORD2 31 DATA-13 DATA-37 DATA-61
RECORD2 34 DATA-14 DATA-38 DATA-62
RECORD2 37 DATA-15 DATA-39 DATA-63
RECORD2 40 DATA-16 DATA-40 DATA-64
RECORD3 11 DATA-17 DATA-41 DATA-65
RECORD3 14 DATA-18 DATA-42 DATA-66
RECORD3 17 DATA-19 DATA-43 DATA-67
RECORD3 20 DATA-20 DATA-44 DATA-68
RECORD3 31 DATA-21 DATA-45 DATA-69
RECORD3 34 DATA-22 DATA-46 DATA-70
RECORD3 37 DATA-23 DATA-47 DATA-71
RECORD3 40 DATA-24 DATA-48 DATA-72
Basically, the 9th and 10th characters of a row would determine
whether or not I keep the line or delete them.
I know how to do this in VBA (Basically a match using If
Mid(Cells(currow, 1), 9, 2) <> "[Value to Match]" And etc...), but I
can't figure out how to manipulate the text prior to importing to
avoid that 65,000 row limit.
Any help would be appreciated.

Aaron,

As far as I am aware, there is no easy way of doing this, since the
import routines supplied with Excel do not allow for the exclusion of
specified records - only skipping fields.

I suspect that you will have to write your own VBA subroutine to
import the records selectively. That is what I did when I was faced
with a similar problem a couple of years ago.

If you need further information, I will gladly supply a code fragment
that you can modify to suit your own needs.

JAC- Hide quoted text -

- Show quoted text -

.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top