Determine delimiter in csv or txt file

K

Kieran H

To all,

I would like to be able to programatically determine the delimiter
within a txt or csv file so that the user is not faced with the Excel
import text wizard..

The files come from several sources and the delimiter is likely to be
tab, comma or semicolon and less likely a pipe or tilda.

I'm thinking I need to count the number of each potential delimiter in
a given number of rows probably excluding those that occur within
quoted text.

If you have any thoughts or know of any examples they'd be a great
help

Cheers

Kieran
 
C

Chip Pearson

The following code will open the file and test for probable delimiters. If
Split returns an array of more than one element when splitting using a
specific character as the delimiter, that character is a probable delimiter
of the text on a single line of the input file. Change FName to the name of
your file, and change PossibleDelimiters to include all the characters that
might be a delimiter of the data.

Sub AAA()
Dim FNum As Integer
Dim FName As String
Dim Ndx As Long
Dim InputLine As String
Dim Arr As Variant
Dim C As String
Dim PossibleDelimiters As String

PossibleDelimiters = ",;|~" & vbTab

FName = "C:\Test.txt" '<<< CHANGE
FNum = FreeFile
Open FName For Input Access Read As #FNum
Line Input #FNum, InputLine
Close #FNum
For Ndx = 1 To Len(PossibleDelimiters)
C = Mid(PossibleDelimiters, Ndx, 1)
Arr = Split(InputLine, C)
If IsArray(Arr) = True Then
If UBound(Arr) - LBound(Arr) + 1 > 1 Then
Debug.Print "Likely Delimiter: ", C, Asc(C)
End If
End If
Next Ndx
End Sub


--
Cordially,
Chip Pearson
Microsoft MVP - Excel
Pearson Software Consulting, LLC
www.cpearson.com
(email address is on the web site)
 
T

Tom Ogilvy

this looks for the most frequently occuring non letter, non number character

Sub ReadStraightTextFile()
Dim sStr As String
Dim LineofText As String
Dim schr As String
Dim s(0 To 127) As Long
Dim sepChar As String, sepMax As Long
Dim sName As Variant, i As Long
Dim bStop As Boolean, s1 As String
sName = Application.GetOpenFilename(FileFilter:="CSV Files (*.csv),*.csv")
If sName = False Then Exit Sub
Open sName For Input As #1
sStr = ""
i = 1
Do While Not EOF(1)
If i > 2 Then Exit Do
Line Input #1, LineofText
sStr = LineofText
i = i + 1
Loop
Close #1
bStop = False
For i = 1 To Len(sStr)
schr = Mid(sStr, i, 1)
If schr = """" Then bStop = Not bStop
If Not bStop Then
s(Asc(schr)) = s(Asc(schr)) + 1
s1 = s1 & schr
End If
Next
sepMax = 0
sepChar = ","
For i = 9 To 127
' exclude double quote and space
If i <> 34 and i <> 32 Then
schr = Chr(i)
If Not IsNumeric(schr) Then
If UCase(schr) = LCase(schr) Then
If s(i) > sepMax Then
sepChar = schr
sepMax = s(i)
End If
End If
End If
End If
Next

MsgBox asc(sepChar) & ": " & sepChar
End Sub

It would be easily defeated by odd patterns

!!!!!!!!!!!!!!!!,!!!!!!!!!!!,///////////

and some not so odd patterns
1/1/2007,1/2/2007,1/3/2007

but if you have some knowledge of the possibilities you might be able to
craft it to a reasonable approach.
 
R

RB Smissaert

Haven't timed it, but this might be a faster way of doing this:


Sub test()
MsgBox Asc(GuessDelimiter("C:\DelimTest.txt"))
End Sub


Function GuessDelimiter(strFile As String) As String

Dim i As Byte
Dim n As Long
Dim x As Long
Dim z As Byte
Dim strText As String
Dim arrDelimiters(1 To 5) As String

strText = OpenTextFileToString(strFile)

arrDelimiters(1) = ","
arrDelimiters(2) = ";"
arrDelimiters(3) = "|"
arrDelimiters(4) = "~"
arrDelimiters(5) = vbTab

For i = 1 To 5
If i = 1 Then
n = CountChar(arrDelimiters(i), strText)
Else
n = CountChar(arrDelimiters(i), strText, x)
End If
If n > x Then
x = n
z = i
End If
Next i

If z > 0 Then
GuessDelimiter = arrDelimiters(z)
End If

End Function


Function OpenTextFileToString(ByVal strFile As String) As String

Dim hFile As Long

'obtain file handle, open file and load into a string buffer
hFile = FreeFile

Open strFile For Input As #hFile

OpenTextFileToString = Input$(LOF(hFile), hFile)

Close #hFile

End Function


Function CountChar(strChar As String, _
strString As String, _
Optional lStopAtCount As Long = -1) As Long

Dim lPos As Long
Dim n As Long

lPos = InStr(1, strString, strChar, vbBinaryCompare)

If lPos = 0 Then
CountChar = 0
Exit Function
End If

If lStopAtCount = -1 Then
Do Until lPos = 0
lPos = InStr(lPos + 1, strString, strChar, vbBinaryCompare)
n = n + 1
Loop
Else
Do Until lPos = 0 Or n > lStopAtCount
lPos = InStr(lPos + 1, strString, strChar, vbBinaryCompare)
n = n + 1
Loop
End If

CountChar = n

End Function


You might make it a bit faster by altering the order of the delimiters in
arrDelimiters.


RBS
 
R

RB Smissaert

Yes, I can see it makes sense to only look at the first line of the file for
several reasons.

So, this would be better compared to my old function:

Function GuessDelimiter(strFile As String) As String

Dim i As Byte
Dim n As Long
Dim x As Long
Dim z As Byte
Dim hFile As Long
Dim strFirstLine As String
Dim arrDelimiters(1 To 5) As String

hFile = FreeFile

Open strFile For Input Access Read As #hFile
Line Input #hFile, strFirstLine
Close #hFile

arrDelimiters(1) = "|"
arrDelimiters(2) = "~"
arrDelimiters(3) = ";"
arrDelimiters(4) = vbTab
arrDelimiters(5) = ","

For i = 1 To 5
If i = 1 Then
n = CountChar(arrDelimiters(i), strFirstLine)
Else
n = CountChar(arrDelimiters(i), strFirstLine, x)
End If
If n > x Then
x = n
z = i
End If
Next i

If z > 0 Then
GuessDelimiter = arrDelimiters(z)
End If

End Function


RBS
 
K

Kieran H

Haven't timed it, but this might be a faster way of doing this:

Sub test()
MsgBox Asc(GuessDelimiter("C:\DelimTest.txt"))
End Sub

Function GuessDelimiter(strFile As String) As String

Dim i As Byte
Dim n As Long
Dim x As Long
Dim z As Byte
Dim strText As String
Dim arrDelimiters(1 To 5) As String

strText = OpenTextFileToString(strFile)

arrDelimiters(1) = ","
arrDelimiters(2) = ";"
arrDelimiters(3) = "|"
arrDelimiters(4) = "~"
arrDelimiters(5) = vbTab

For i = 1 To 5
If i = 1 Then
n = CountChar(arrDelimiters(i), strText)
Else
n = CountChar(arrDelimiters(i), strText, x)
End If
If n > x Then
x = n
z = i
End If
Next i

If z > 0 Then
GuessDelimiter = arrDelimiters(z)
End If

End Function

Function OpenTextFileToString(ByVal strFile As String) As String

Dim hFile As Long

'obtain file handle, open file and load into a string buffer
hFile = FreeFile

Open strFile For Input As #hFile

OpenTextFileToString = Input$(LOF(hFile), hFile)

Close #hFile

End Function

Function CountChar(strChar As String, _
strString As String, _
Optional lStopAtCount As Long = -1) As Long

Dim lPos As Long
Dim n As Long

lPos = InStr(1, strString, strChar, vbBinaryCompare)

If lPos = 0 Then
CountChar = 0
Exit Function
End If

If lStopAtCount = -1 Then
Do Until lPos = 0
lPos = InStr(lPos + 1, strString, strChar, vbBinaryCompare)
n = n + 1
Loop
Else
Do Until lPos = 0 Or n > lStopAtCount
lPos = InStr(lPos + 1, strString, strChar, vbBinaryCompare)
n = n + 1
Loop
End If

CountChar = n

End Function

You might make it a bit faster by altering the order of the delimiters in
arrDelimiters.

RBS










- Show quoted text -

This is excelent - Real world problems - Real world solutions

My thanks and respect

Cheers

Kieran
 
R

RB Smissaert

No trouble.
As you don't really have to check the whole file look at my other posting
and that of Chip Pearson.

RBS
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top