Parsing imported csv data

  • Thread starter PC_User via AccessMonster.com
  • Start date
P

PC_User via AccessMonster.com

I need to find a way to parse data that I'm importing when the data has a
comma "," delimiters. Sometimes the string has a comma in the name (i.e.
GenVec, Inc. or RealNetworks, Inc) and sometimes the number amount has commas
indicating thousands or millions. I have no control over the format of the
imported data since it comes from an external website. I'm using XML
functions to import such as:

Dim objXML As Object
Set objXML = CreateObject("MSXML2.ServerXMLHTTP")
Debug.Print "Company Name = " & Split(objXML.responseText, ",")(1)
Debug.Print "Last Trade Volume = " & Split(objXML.responseText, ",")(12)

I need a way to deal with these extra commas that are not data delimiters.
Can someone help me?
 
J

John W. Vinson

I need a way to deal with these extra commas that are not data delimiters.

That becomes an impossible request if there's nothing in the data that
distinguishes a "comma used as a data delimiter" from an otherwise identical
"comma used as part of a data item". Here are three numbers:

123,456,789,012,345

What three numbers do I mean? Can any program - or person - possibly tell
which commas serve which role???

Could you post an actual example of your data?
 
P

PC_User via AccessMonster.com

The data looks like this:

RNWK","RealNetworks, Inc","3/26/2010","4:00pm",4.83,-0.02,"-0.02 - -0.41%",4.
85,4.91,4.83,5.00,"4.83 - 5.00",398826,2.16,5.41,"2.16 - 5.41","NasdaqNM",4.
80,200,5.00,3,700,N/A,N/A,-1.638,0.00,N/A,"N/A","14-Feb-00",2.783,1.74,1.17,-
26.2M

and like this

"GNVC","GenVec, Inc.","3/26/2010","4:00pm",2.90,-0.05,"-0.05 - -1.69%",2.95,2.
94,2.87,3.0001,"2.87 - 3.0001",2353016,0.39,3.35,"0.39 - 3.35","NGM",2.88,15,
2.90,30,N/A,N/A,-0.189,0.00,N/A,"N/A","N/A",0.109,27.06,22.64,-16.9M

I had removed the code for the other numerical results like volume that had
commas, but I didn't remove the code for the symbol name. Once I resolve the
issue of parsing I can then put their code back into the module.
 
J

John W. Vinson

The data looks like this:

RNWK","RealNetworks, Inc","3/26/2010","4:00pm",4.83,-0.02,"-0.02 - -0.41%",4.
85,4.91,4.83,5.00,"4.83 - 5.00",398826,2.16,5.41,"2.16 - 5.41","NasdaqNM",4.
80,200,5.00,3,700,N/A,N/A,-1.638,0.00,N/A,"N/A","14-Feb-00",2.783,1.74,1.17,-
26.2M

and like this

"GNVC","GenVec, Inc.","3/26/2010","4:00pm",2.90,-0.05,"-0.05 - -1.69%",2.95,2.
94,2.87,3.0001,"2.87 - 3.0001",2353016,0.39,3.35,"0.39 - 3.35","NGM",2.88,15,
2.90,30,N/A,N/A,-0.189,0.00,N/A,"N/A","N/A",0.109,27.06,22.64,-16.9M

I had removed the code for the other numerical results like volume that had
commas, but I didn't remove the code for the symbol name. Once I resolve the
issue of parsing I can then put their code back into the module.

If that's actually what the data looks like, there are unpaired quotes in the
very first record, and complete inconsistancy in the use of quotemarks in
general. It looks a mess! The last entry -26.2M should be a text field in
quotes for example; you've got some N/A in quotes and some not; the string
4.80,200,5.00,3,700 could be 4.80; 200; 5.00; 3700 or it could be 4.80; 200;
5.00; 3; 700 (which is how Import will probably parse it).

Do you have any control over the contents? Could it be changed to put quotes
around all the field values? Doing so (if they're paired) should remove the
ambiguity, since a comma inside quotes should be seen as a character in the
string not a delimiter.
 
P

PC_User via AccessMonster.com

<< I have no control over the format of the imported data since it comes from
an external website. I'm using XML functions to import such as . . .>>

Thank you for your help.
 
K

kc-mass

Why don't you post a small block of data with everything in it. I believe
you can parse the data you've shown by a read file operation in which you
can either fix delimiters or write to a db table.
In your first example, RealNetworks acronym RNWK the preceding quotation
mark is missing.
Is that missing in the data or did it just get copied wrong?

Regards

Kevin
 
P

PC_User via AccessMonster.com

Yes, the first quote is missing. Please show me your parsing code.

Thanks,
PC
 
K

kc-mass

Is the quote missing in the original data or missing in your sample of the
data?

I also asked that you load a larger chuck of the data straight from the
source table.

Can you do that ? Maybe 25 stock sets?

I don't now have the code for you but if I can fully understand your data I
will create it.

Regards

Kevin
 
K

kc-mass

I think you need to rethink your participation in the news groups.

You won't spend your time answering simple, necessary questions
regarding your request but those in the groups are supposed to spend their
time
fathoming what your data might look like and what would solve YOUR problem.

Attitudes are attitudes and you own yours. I'm sure it serves you well!

Regards
 
P

PC_User via AccessMonster.com

NOTHING IS IMPOSSIBLE!!!!
==================================================
http://www.tek-tips.com/faqs.cfm?fid=3566
==================================================
Option Compare Database
Option Explicit

' read a csv file into a recordset
' can handle a first line with field names (e.g. a header)
' deals with quoted strings in csv data (e.g. "this is a test,,,,", this,is,a,
test
'
Function ImportCsvFile(FileName As String, DestRst As Recordset, ErrorMsg As
String, Optional HasHeaders As Boolean = False) As Long
On Error GoTo ImportCsvFileError
' open the source file
Dim InputFileHandle As Integer
InputFileHandle = FreeFile
Open FileName For Input As #InputFileHandle

' set the current character read from the file
Dim CurChar As String
CurChar = ""

' set the previous character read from the file
Dim PrevChar As String
PrevChar = ""

' indicate if the next character has already been 'read'
Dim ReadAhead As Boolean
ReadAhead = False

' store field names in a header
Dim ReadFieldNames(0 To 511) As String

' indicate if we are currently reading a header line
Dim ReadingHeaderLine As Boolean
ReadingHeaderLine = HasHeaders

' the current field (text between commas)
Dim CurField As String
CurField = ""

' indicate if we are inside a quoted field
Dim InQuote As Boolean
InQuote = False

' the current field number (index into the field names array *or* the
recordset)
Dim FieldNumber As Integer
FieldNumber = 0

' indicate if a field has been read (e.g. a comma or EOL has been
reached)
Dim SetField As Boolean
SetField = False

' indicate if a record should be added (e.g. EOL has been reached)
Dim AddRecord As Boolean
AddRecord = False

' indicate if a DestRst.Update method needs to be invoked
Dim NeedsUpdate As Boolean
NeedsUpdate = False

' indicate if a DestRst.AddNew method needs to be invoked
Dim NeedToAdd As Boolean
NeedToAdd = True

Do While Not EOF(InputFileHandle) ' Loop until end of file.
' sometimes we need to read ahead one character (e.g. for a "),
then find we want to put
' that character back into the input stream.
If Not ReadAhead Then
CurChar = Input(1, #InputFileHandle) ' Get one character.
End If
ReadAhead = False

Select Case CurChar
' handle quoted strings in the CSV data, allowing embedded commas
or quotes.
Case """"
If InQuote Then
If Not EOF(InputFileHandle) Then
CurChar = Input(1, #InputFileHandle)
If CurChar = """" Then
CurField = CurField & """"
Else
ReadAhead = True
InQuote = False
End If
Else
InQuote = False
End If
Else
InQuote = True
End If
' handle the comma character (End of Field, unless in a quoted
string)
Case ","
If InQuote Then
CurField = CurField & ","
Else
SetField = True
End If
' handle all other characters
' toss out any CR's, and treat LF's as end of line.
Case Else
If Asc(CurChar) <> 13 Then
If Asc(CurChar) = 10 Then
SetField = True
AddRecord = True
Else
CurField = CurField & CurChar
End If
End If
End Select
' either set a field name (if header), or set a field value
(based on field name in header, or field number)
If SetField Then
If NeedToAdd Then
DestRst.AddNew ' add a new record
NeedToAdd = False ' clear need to add
NeedsUpdate = True ' we do need to do an update before
doing another Add
End If
CurField = Trim(CurField)

If ReadingHeaderLine Then ' store field name
ReadFieldNames(FieldNumber) = CurField
Else
' only add fields that are non-zero-length
If Len(CurField) > 0 Then
If HasHeaders Then ' set field value (either from
name, or field number)
DestRst(ReadFieldNames(FieldNumber)) = CurField
Else
DestRst(FieldNumber) = CurField
End If
End If
End If
FieldNumber = FieldNumber + 1 ' bump field number
CurField = "" ' clear field for more data
SetField = False ' wait for a comma or EOL
End If

' if we hit EOL, Update any existing changes, and indicate we
need to add
' another record if we encounter more data
If AddRecord Then
If NeedsUpdate Then
DestRst.Update
NeedsUpdate = False
End If
NeedToAdd = True ' if we hit more data, do an .
AddNew
FieldNumber = 0 ' start at field 0
ReadingHeaderLine = False ' there can only be one header
line
AddRecord = False
DoEvents
End If

PrevChar = CurChar
Loop
If NeedsUpdate Then
DestRst.Update
End If
Close #InputFileHandle

ImportCsvFileExit:
Exit Function

ImportCsvFileError:
Resume
End Function

Sub TestCsvImport()
Dim ErrorMsg As String

Dim MyRst As Recordset
Set MyRst = CurrentDb.OpenRecordset("SomeTable")
ImportCsvFile "C:\SomeData.csv", MyRst, ErrorMsg, False
MyRst.Close
Set MyRst = Nothing
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top