Ever wondered how to work with large text imports >256 columns?

L

Lars Uffmann

Since I found a rather nice solution to a problem I had, I wanted to
share it - as is:

I needed to import a rather large (ca. 50,000 rows) text file, which had
a good bit over 256 columns, but only needed a few of those columns in
an Excel sheet, then export most of the rest to a text file again, but
without *some* columns.

To "trick" Excel into doing the file reading for me, I used
Workbooks.OpenText but instead of giving it the true column separator
(tab), I told the function Tab:=False and Semicolon:=True (using a
separator that did not occur at all in my data).

That way, Excel read the whole file at once, into column 1 of a new
workbook. 1-liner instead of bothering to do all the file reading
yourself :)

Then, for the columns I needed in an Excel Worksheet, I wrote a little
VBA function to get those columns from a string that contains a whole
tab-separated (or other separator) row, as stored in column 1 now.

I then looped over the number of rows and assigned the return value of
my function to the cell content in column 1 of my (new) target
worksheet. Now you can use the Excel Range.TextToColumns function on
that sheet and you got the first part of the job done.

For the data I wanted to re-export, I wrote another function that saves
a text file *directly* from the imported worksheets column 1, by looping
over each row, re-assembling the line without the columns I wanted to
remove, and then saving to a text file.

Done. I processed 266 columns, 49195 rows, extracting the first 11
columns to an Excel Sheet, re-exporting all but 1 column to a text file,
on a 2.21 Dual Core System with 2GB RAM and WinXP SP2 within about 16
seconds, that includes opening the original text file (28.9MB) and
saving both the new Workbook and the re-export of the text without an
unwanted column.

For the code used in my case, see below if interested.

HTH someone!

Lars


VBA Code used (might need some adaptation for your needs):

' Function: getFirstXColumns
' Purpose: return the first colNumber tab-separated values from _
' strLine, including the trailing tabs
' Example Usage (returns "abc" & vbTab & "def" & vbTab):
' columnValues = getFirstXColumns ("abc" & vbTab & "def" & vbTab _
' & "ghi" & vbTab, 2)
Public Function getFirstXColumns(strLine As String, _
colNumber As Integer) As String
Dim col As Integer
Dim pos As Long

pos = 0
For col = 1 To colNumber
pos = InStr(pos + 1, strLine, vbTab)
If (pos = 0) Then
getFirstXColumns = strLine
Exit Function
End If
Next col

getFirstXColumns = Left(strLine, pos)
End Function

' Function: saveTextWithoutColumn
' Purpose: consider column 1 of ws a tab-separated table, determine _
' the position of the column indicated by colName and save the _
' table to the indicated file without that column
' Example Usage:
' saveTextWithoutColumn ws, "myfilename.txt", "unwanted column name"
Private Sub saveTextWithoutColumn(ws As Worksheet, file As String, _
colName As String)
Dim buffer As String
Dim col As Long, colNumber As Long
Dim pos As Long, endpos As Long
Dim lastRow As Long, row As Long
Dim fileHandle As Integer
Dim skipColStart As Long, skipColEnd As Long

buffer = ws.Cells(1, 1).value

pos = 1
endpos = 1
col = 0
Do While (endpos > 0) ' while a new col separator was found (vbTab)
endpos = InStr(pos, buffer, vbTab)
If (endpos > 0) Then
col = col + 1
If (Mid(buffer, pos, endpos - pos) = colName) Then
Exit Do
End If
Else
col = 0
End If
pos = endpos + 1
Loop
colNumber = col

' Determine the highest used row number in the imported data
lastRow = ws.Cells.Find("*", SearchOrder:=xlByRows, _
SearchDirection:=xlPrevious).row

fileHandle = FreeFile()

On Error Resume Next
Kill file
On Error GoTo 0

Open file For Binary Access Write As #fileHandle

' if-statement wrapped around whole loop in case of non-existant
' column for speed optimisation
If colNumber = 0 Then
For row = 1 To lastRow
' attach line break to line and write to file
buffer = ws.Cells(row, 1).value & lineBreak
Put #fileHandle, , buffer
Next row
Else
For row = 1 To lastRow
buffer = ws.Cells(row, 1).value ' get current line

' determine start and end positions of column to remove
pos = 0
For col = 1 To colNumber - 1
pos = InStr(pos + 1, buffer, vbTab)
If (pos = 0) Then Exit For
Next col
skipColStart = pos + 1
skipColEnd = InStr(skipColStart, buffer, vbTab)

' assemble & write line to file without column to be skipped
buffer = Left(buffer, skipColStart - 1) _
& Mid(buffer, skipColEnd + 1) & lineBreak
Put #fileHandle, , buffer
Next row
End If

Close #fileHandle
End Sub


Sub handleData(path As String, filename As String)
Dim wb As Workbook
Dim ws As Worksheet
Dim wsNew As Worksheet
Dim lastRow As Long
Dim row As Long

Workbooks.OpenText path & filename & ".txt", Origin:=xlMSDOS, _
DataType:=xlDelimited, Tab:=False, Semicolon:=True, _
DecimalSeparator:=".", ThousandsSeparator:=","

Set wb = Workbooks(filename & ".txt")
Set ws = wb.Worksheets(filename)

' Determine the highest used row number in the imported data
lastRow = ws.Cells.Find("*", SearchOrder:=xlByRows, _
SearchDirection:=xlPrevious).row

Set wsNew = wb.Worksheets.Add()
wsNew.Name = "my data"

For row = 1 To lastRow
' first 11 columns wanted in my case - might differ for you
wsNew.Cells(row, 1).value = getFirstXColumns( _
ws.Cells(row, 1).value, 11)
Next row

wsNew.Range(wsNew.Cells(1, 1), _
wsNew.Cells(lastRow, 1)).TextToColumns DataType:=xlDelimited, _
Tab:=True, DecimalSeparator:=".", ThousandsSeparator:=","

saveTextWithoutColumn ws, "myfilename.txt", "unwanted column name"

Application.DisplayAlerts = False
ws.Delete
Set ws = Nothing

wb.SaveAs path & wsNew.Name & ".xls", xlNormal
Application.DisplayAlerts = True
wb.Close

Set wsNew = Nothing
Set wb = Nothing
End Sub
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top