Help! Importing .csv file (103 columns)

I

ifiaz

I have a .csv file that looks like this with 309 rows that has a single
day's data.

|<----------------------- 103 Columns (FIXED)----------------->|
HSTATION_CD,HAGENT_NUM,AIRLINE_CD,...,FDEPT_ACTUAL_DT,LOADED_OOG,XLOADED_OOG
HKG,HAS,CX,...,10-JAN-2005 01:40:42,11,0
HKG,HAS,CX,...,10-JAN-2005 01:40:42,11,0
HKG,HAS,CX,...,10-JAN-2005 01:40:42,11,0
HKG,HAS,CX,...,10-JAN-2005 01:40:42,11,0

I need to do the following:

1) Generate report based on sums of the whole column, count of some of
the columns, etc. All the data I require are an aggregate of some of
the columns.

2) Everyday, I need to print the report together with a custom summary
written by me on top of the imported data for that day.

3) I need to be able to retrieve the aggregate data for any day of the
year together with the summary.

The stored data table that I require is...

01/01/05, Sum Of Col1, Count Of Col2 That Equals "CX", Custom Summary
Event Typed Out, etc.
02/01/05, Sum Of Col1, Count Of Col2 That Equals "CX", Custom Summary
Event Typed Out, etc.
03/01/05, Sum Of Col1, Count Of Col2 That Equals "CX", Custom Summary
Event Typed Out, etc.
.....
and so on.

I need not require to store the LARGE raw data that I obtain from the
..csv but only the summary.

----

My failed attempts are:

Attempt 1:
I imported the whole .csv into a single table and it was too awkward to
manipulate the data. Also, I didn't know how I might go about storing
the data I require together with my custom comments in another table.

Attempt 2:
I used two tables. One that has two fields storing the ID and the 103
column names as values. Another table with 3 fields, that is linked to
the first table with the values as text (actually most of the data is
numbers, and times). I require to transpose the original data in order
to store it in the second table.

So, I tried to read the file line by line an insert into the table.
But, this is quite slow. Even, if I am successful in this, I didn't
know how to store the data I require with my custom comments in another
table.

I want to know whether my direction is right. How am I go about in
doing this?
Please give brief step by step instructions. Thanks for your time.
 
K

Ken Snell [MVP]

Bring the entire data set into an ACCESS table: (two fields: autonumber for
primary key, memo for the entire record). Then use queries to parse/append
the data to permanent tables that will store the data in a normalized
manner. Then run reports from those tables.
 
I

ifiaz

Ken said:
Bring the entire data set into an ACCESS table: (two fields: autonumber for
primary key, memo for the entire record).

I understand this part. Since you said it, this will be the best way to
go.
Then use queries to parse/append
the data to permanent tables that will store the data in a normalized
manner. Then run reports from those tables.


Could you elaborate more on this one.

Like what the other tables should look like.
How I could form the query, etc.

Thank you.
 
K

Ken Snell [MVP]

Hard for me to give you specific suggestions on the table structures for the
"permanent" residency of the data, as I have no idea what the data are nor
what you need to get from the data.

You'll need to identify which specific data elements you need to extract
from the "long record", and where those data elements are located in that
"long record". For example, you might need the data located at characters 35
through 49, characters 100 through 103, and characters 111 through 124. Your
query would parse the data from each record based on the location of the
data and put those data in the permanent table for the fields that are
related to those data.
--

Ken Snell
<MS ACCESS MVP>
 
I

ifiaz

Ken said:
Hard for me to give you specific suggestions on the table structures for the
"permanent" residency of the data, as I have no idea what the data are nor
what you need to get from the data.

You'll need to identify which specific data elements you need to extract
from the "long record", and where those data elements are located in that
"long record". For example, you might need the data located at characters 35
through 49, characters 100 through 103, and characters 111 through 124. Your
query would parse the data from each record based on the location of the
data and put those data in the permanent table for the fields that are
related to those data.
--

The long record is comma delimited (variable length), so I presume I
should use 'split' for each column I require in the query. Is it?

There are about 50 columns I require, so do I have to add one by one?

How do I get a resulting query to be added to a new table?

How can I keep adding records to the new table from the same query?

I do not want to have the same two dates on the new table, and how I
could avoid that? This might happens when someone runs the query more
than once.
e.g.
01/01/05, Summary1, Summary2, Comment1
02/01/05, Summary1, Summary2, Comment1
02/01/05, Summary1, Summary2, Comment1 <<< Shouldn't be added
 
K

Ken Snell [MVP]

Answers inline...

--

Ken Snell
<MS ACCESS MVP>



ifiaz said:
The long record is comma delimited (variable length), so I presume I
should use 'split' for each column I require in the query. Is it?
Yes.



There are about 50 columns I require, so do I have to add one by one?

Not sure I understand what you mean by add one by one? You could use a
public function that would return the desired "column" from the long record,
something like this:

Public Function ReturnTheFieldValue(strOriginalRecord As String,
lngColumnNumber As Long) As String
Dim varFields As Variant
varFields = Split(strOriginalRecord, ",")
ReturnTheFieldValue = varFields(lngColumnNumber - 1)
End Function

Then you would use the function in an append query to get the correct value
you need for each column. You could run one query to do all 50 columns at
once (this example would put column 1 into field 1, column 5 into field 2,
column 10 into field 3, etc.):

INSERT INTO NewTableName
(Field1, Field2, Field3, (etc.), Field50)
SELECT ReturnTheFieldValue([OriginalField], 1),
ReturnTheFieldValue([OriginalField], 5),
ReturnTheFieldValue([OriginalField], 10),
(etc. for all of the desired 50 fields)
FROM TableOfOriginalData;


How do I get a resulting query to be added to a new table?

You can use a MakeTable query to create a new table from a query's results.

How can I keep adding records to the new table from the same query?

You would create the table separate from a query. Then use an append query
(see above) to add new records to that table.

I do not want to have the same two dates on the new table, and how I
could avoid that? This might happens when someone runs the query more
than once.
e.g.
01/01/05, Summary1, Summary2, Comment1
02/01/05, Summary1, Summary2, Comment1
02/01/05, Summary1, Summary2, Comment1 <<< Shouldn't be added

You would need to set an index in the "new" table that would require all
fields as a combination be unique. Then, if the append query tried to add
duplicate records, they would be rejected.
 
F

fiazidris


Fiaz writes:

After going through a lot of trials, I find it is more efficient to
import the whole 103 columns into a single table first and use a query
to get summary count, sum, etc on all columns.

If I do it any other way, it takes a very long time to finish and gets
too complicated.

The query looks like this:

Field: Field1, Field2, Field3....Field103, Field104
Total: Count, Count, Sum, .... Sum, Expression

As one of the column is text and I require all the 'I' to be counted on
that column, I added 1 more column and used the following in the query
'Field104'. Is this how one normally does?

ExpressionOfFLT_TYPE_INTL:
DCount("[FLT_TYPE]","tbl00_BMSStat","[FLT_TYPE] = 'I'")

I will extract the data I require from the summary query and add it to
a new table with the date information using the following subroutine. I
haven't added the SQL to INSERT yet.

CODE:

Sub RunThroughQuery()

Dim Db As DAO.Database
Dim rst As DAO.Recordset

Set Db = CurrentDb
Set rst = Db.OpenRecordset("qry02_BMS_OneDay")

'Column numbers to be extracted
varColumns = Array(9, 34, 35, 36, 27, 25, 34, 37, 102, 84, 38, 39,
40, 31, 41, 51)

For Each myElement In varColumns
Debug.Print rst.Fields(myElement)
Next

End Sub


Please let me know if I am on the right track and any valuable advice
that you might have.
 
K

Ken Snell [MVP]

Your approach appears to be a good one.
--

Ken Snell
<MS ACCESS MVP>




Fiaz writes:

After going through a lot of trials, I find it is more efficient to
import the whole 103 columns into a single table first and use a query
to get summary count, sum, etc on all columns.

If I do it any other way, it takes a very long time to finish and gets
too complicated.

The query looks like this:

Field: Field1, Field2, Field3....Field103, Field104
Total: Count, Count, Sum, .... Sum, Expression

As one of the column is text and I require all the 'I' to be counted on
that column, I added 1 more column and used the following in the query
'Field104'. Is this how one normally does?

ExpressionOfFLT_TYPE_INTL:
DCount("[FLT_TYPE]","tbl00_BMSStat","[FLT_TYPE] = 'I'")

I will extract the data I require from the summary query and add it to
a new table with the date information using the following subroutine. I
haven't added the SQL to INSERT yet.

CODE:

Sub RunThroughQuery()

Dim Db As DAO.Database
Dim rst As DAO.Recordset

Set Db = CurrentDb
Set rst = Db.OpenRecordset("qry02_BMS_OneDay")

'Column numbers to be extracted
varColumns = Array(9, 34, 35, 36, 27, 25, 34, 37, 102, 84, 38, 39,
40, 31, 41, 51)

For Each myElement In varColumns
Debug.Print rst.Fields(myElement)
Next

End Sub


Please let me know if I am on the right track and any valuable advice
that you might have.
 
F

fiazidris

Now, I have setup Five tables.

Table One: to store each day of the year 01/01/05, 02/01/05, and so on
STATDATE (datetime)

Table Two: to store the Object names that I want to keep values for
(about 15 objects)
OBJNAME (text)

Table Three: to store the Object values for each of the objects in
table two and for each of the dates of the year.
STATDATE, OBJNAME, OBJVALUE (number)

Some of the OBJVALUE are texts for some of the OBJNAMEs so I have
another set of two tables to store the OBJNAME in one and OBJVALUE
stored as text with the same type of setting as Table Two and Table
Three above.

Relationships:
Table One [STATDATE] : 1 - Many Table : Three [STATDATE]
Table Two [OBJNAME] : 1 - Many Table : Three [OBJNAME]

Is my setup correct? Table One just stores unique dates of the year.

Table Three: STATDATE, OBJNAME pair must be unique

Please give me any valuable advice that you might have. Thanks.
If my explanation is not clear, Please ask for clarification.
 
K

Ken Snell [MVP]

I don't believe I have sufficient knowledge about how you're using the data
so that I could give you completely meaningful/helpful comments about your
table setup.

I assume that TableOne is needed so that you can ensure that you will get a
record for every day in the year, even if there are no data for that year?
Using such a table is a good approach to having a query, where this table is
left-joined to other data tables, always produce a record for a day in the
year.

I don't know the use/purpose of TableTwo. Is it storing "names" of field
names that you'd use?

TableThree appears to store a lookup value for each record in TableTwo? Any
reason you can't combine the two tables? You can store number "strings" in a
text field, and you can convert the string back to a number within queries
or functions as needed.

I don't know why you need the last two tables just because you have
different data types. See my comment above.
 
I

ifiaz

Ken said:
I don't believe I have sufficient knowledge about how you're using the data
so that I could give you completely meaningful/helpful comments about your
table setup.

I assume that TableOne is needed so that you can ensure that you will get a
record for every day in the year, even if there are no data for that year?
Using such a table is a good approach to having a query, where this table is
left-joined to other data tables, always produce a record for a day in the
year.

I don't know the use/purpose of TableTwo. Is it storing "names" of field
names that you'd use?

The purpose of Table Two is with only one field "OBJNAME" to store the
fixed column names I import.
TableThree appears to store a lookup value for each record in TableTwo? Any
reason you can't combine the two tables? You can store number "strings" in a
text field, and you can convert the string back to a number within queries
or functions as needed.

Table Three is to store the values for each of the objects in Table
Two.

Sample:
Rows of TableOne:
01/01/05
02/01/05
....
....

Rows of TableTwo: ObjectNames that indicate column numbers from the
..csv file
C025
C027
C031
C034
....
....

Rows of TableThree: First Two columns are foreign keys from table 1 and
table 2
01/01/05, C025, 50000
01/01/05, C027, 46105
....
....
10/01/05, C087, UA0862
10/01/05, C088, I
....
....

I thought, in future if I need to keep track of new objects, I could
add those objects in Table Two without changing the structure of the
tables. I believe your suggestion can work too, so what do you think?

Some of the items I need to keep track, apart from importing from the
..csv file, are long description of a day's event, etc. in a Memo field.

Like,
DATE, FaultsOfTheDay, SystemStatusOfTheDay, etc.
I don't know why you need the last two tables just because you have
different data types. See my comment above.

Yes, for small text fields I could store numbers and texts as TEXT.
Does it reduce efficiency much in converting one to the other? Please
advise.
 
K

Ken Snell [MVP]

Comments inline...

--

Ken Snell
Table Three is to store the values for each of the objects in Table
Two.

Sample:
Rows of TableOne:
01/01/05
02/01/05
...
...

Rows of TableTwo: ObjectNames that indicate column numbers from the
.csv file
C025
C027
C031
C034
...
...

Rows of TableThree: First Two columns are foreign keys from table 1 and
table 2
01/01/05, C025, 50000
01/01/05, C027, 46105
...
...
10/01/05, C087, UA0862
10/01/05, C088, I
...
...

I thought, in future if I need to keep track of new objects, I could
add those objects in Table Two without changing the structure of the
tables. I believe your suggestion can work too, so what do you think?

I'm not 100% clear yet, but if it's ever possible for an object from
TableTwo to have more than one value, then the use of a TableThree is
probably the better approach.

Some of the items I need to keep track, apart from importing from the
.csv file, are long description of a day's event, etc. in a Memo field.

Like,
DATE, FaultsOfTheDay, SystemStatusOfTheDay, etc.


Yes, for small text fields I could store numbers and texts as TEXT.
Does it reduce efficiency much in converting one to the other? Please
advise.

Converting is a fairly minor operation, and I would not expect to see any
effect on performance that is noticeable.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top