Import CSV VBA need performance improvement guidance

A

arookie

Hi,

I am using the following code to bulk insert records. I need some
guidance to further improve the performance. As I need to import
200,000+ records in one pass.

For i = 3 To UBound(myArray)

Dim rst As New ADODB.Recordset
sSQL = "SELECT QUESTION_ID,ENROLLMENT_ID,CHOICE FROM
QUESTION_MARK WHERE QUESTION_ID = " & arrExamQuestions(0, i - 3) & "
AND ENROLLMENT_ID = " & lngEnrollmentID
rst.Open sSQL, CurrentProject.Connection, adOpenStatic,
adLockOptimistic, adCmdText

If rst.RecordCount = 0 Then
rst.Close
Dim SQL As String

SQL = "INSERT INTO QUESTION_MARK " & _
"(QUESTION_ID,ENROLLMENT_ID,CHOICE) " & _
"VALUES (" & arrExamQuestions(0, i - 3) & "," &
lngEnrollmentID & ",'" & myArray(i) & "')"
DoCmd.RunSQL SQL
Else
If (IsNull(rst!CHOICE.Value) Or Len(Trim(rst!
CHOICE.Value)) = 0) Then
rst!CHOICE = Nz(myArray(i), "")
End If
rst.Update
rst.Close
End If
Next i
 
J

John W. Vinson

Hi,

I am using the following code to bulk insert records. I need some
guidance to further improve the performance. As I need to import
200,000+ records in one pass.

For i = 3 To UBound(myArray)

Dim rst As New ADODB.Recordset
sSQL = "SELECT QUESTION_ID,ENROLLMENT_ID,CHOICE FROM
QUESTION_MARK WHERE QUESTION_ID = " & arrExamQuestions(0, i - 3) & "
AND ENROLLMENT_ID = " & lngEnrollmentID
rst.Open sSQL, CurrentProject.Connection, adOpenStatic,
adLockOptimistic, adCmdText

If rst.RecordCount = 0 Then
rst.Close
Dim SQL As String

SQL = "INSERT INTO QUESTION_MARK " & _
"(QUESTION_ID,ENROLLMENT_ID,CHOICE) " & _
"VALUES (" & arrExamQuestions(0, i - 3) & "," &
lngEnrollmentID & ",'" & myArray(i) & "')"
DoCmd.RunSQL SQL
Else
If (IsNull(rst!CHOICE.Value) Or Len(Trim(rst!
CHOICE.Value)) = 0) Then
rst!CHOICE = Nz(myArray(i), "")
End If
rst.Update
rst.Close
End If
Next i

OW. You're reading the external file, one record at a time, and running a
single record insert!? No wonder it's slow!

Any chance that you could just run three Append queries, one for each element
of the array? Link to the text file as an external "table" and run them all in
at once!
 
A

arookie

Hi John,

Can you please explain your idea in further details. Is it possible
for you to provide some VBA examples.

The following is the current import process.

*Import file format
StudentID, RegionID, Serial No., MCQ choices
1234567890, 3, 12345, A, B, C, D, A, B, C, A, B, C .... 70 choices
per student

1)Get file list from file open dialog.
2)Load the course exam structure in array. The array contains question
id's of each question
3)Insert the file information in FILE table
4)Open the CSV file and read each line. Sends the line to other
procedure.
5)Data validation of StudentID, RegionID, Serial No.
6)Check if studentID exist in enrollment table and then make sure
regionID matches.
8)Update enrollment record
7)Now start inserting Student's choice(A,B,C or D) in Question_Mark
table
- Check if the record already exist in Question_Mark table.
- If doesn't exist then insert new record
- If exist then check if Choice value exist, if doesn't exist
then insert choice value.

The process takes about 25 minutes for 1000 records but less than 10
minutes is my goal. Any ideas if I can improve the performance.

I was using linked tables from network file which caused the import
process to hang. I moved the tables to local Access front end to avoid
performance issues. I am not sure the issue was with Access 2007 and
linked tables from network file.

The following is my code for your reference.
======================================================================================
Option Compare Database

'Dim obExamQuestions As clsExamQuestions
Private Const CONST_QUESTION_TABLE As String = "QUESTION"

Private arrExamQuestions As Variant
Private QuestionCount As Integer


Public Function ImportMark(ByVal intYear As Integer, ByVal byteSession
As Byte, ByVal strCourseCode As String, ByVal strStatusCode As String,
ByVal strFiles As String)
On Error GoTo ImportMarks_Errors

ImportMark = False

Const ForReading = 1

Dim oFs, oFile As Object

Dim lngFileID As Long, sLine As String, intLineNumber As Integer

Dim arrayFiles As Variant

DoCmd.Hourglass False

'Set obExamQuestions = New clsExamQuestions

If LoadExam(byteSession, intYear, strCourseCode) = False Then
GoTo ImportMarks_Errors
End If

arrayFiles = Split(strFiles, ",")

For i = 0 To UBound(arrayFiles)

lngFileID = GetFileID(arrayFiles(i), intYear, byteSession,
strCourseCode)

Set oFs = CreateObject("Scripting.FileSystemObject")

If (oFs.FileExists(arrayFiles(i))) Then
Set oFile = oFs.OpenTextFile(arrayFiles(i), ForReading,
-1)
intLineNumber = 1
Do While oFile.AtEndOfStream <> True 'enumerate each line
in text file
sLine = oFile.ReadLine
Call ImportRecord(intYear, byteSession, strCourseCode,
sLine, intLineNumber, strStatusCode, lngFileID)
intLineNumber = intLineNumber + 1
Loop
oFile.Close
End If
Next i

Set oFile = Nothing
Set oFs = Nothing
ImportMark = True
Exit_ImportMarks_Errors:
DoCmd.Hourglass False
Exit Function
ImportMarks_Errors:
MsgBox Err.Numer & ":" & Err.Description
ImportMark = False
Resume Exit_ImportMarks_Errors

End Function
'Get all studentid, affiliate id for this session.

Private Sub ImportRecord(ByVal iYear As Integer, ByVal bSession As
Byte, ByVal sCourseCode As String, ByVal sLine As String, ByVal iRow
As Integer, ByVal sStatusCode As String, ByVal lngFileID As Long)
On Error GoTo Error_ImportRecord

Dim myArray As Variant
Dim i As Integer

myArray = Split(sLine, ",")

'remove white space
For i = 0 To UBound(myArray)
myArray(i) = Trim(myArray(i))
Next i

'Validate Data: Student ID
If Len(myArray(0)) = 0 Or Len(myArray(0)) > 10 Or Not
IsNumeric(myArray(0)) Then
Call LogError(sLine, iRow, lngFileID, "Invalid Student ID in
Column 1.")
Exit Sub
End If

'Validate Data: Affiliate ID
If Len(myArray(1)) = 0 Or Not IsNumeric(myArray(1)) Then
Call LogError(sLine, iRow, lngFileID, "Invalid Affiliate ID in
Column 2.")
Exit Sub
End If

'Validate Data: Serial Number
If Len(myArray(2)) = 0 Or Len(myArray(2)) <> 5 Or Not
IsNumeric(myArray(2)) Then
Call LogError(sLine, iRow, lngFileID, "Invalid Serial Number
in Column 3.")
Exit Sub
End If

'Validate Data: MCQ question count
If UBound(myArray) <> (QuestionCount + 2) Then
Call LogError(sLine, iRow, lngFileID, "Number of questions in
CSV file doesn't match with Question structure.")
Exit Sub
End If

Dim sSQL As String
sSQL = "SELECT
ENROLLMENT_ID,ENR_AFFILIATE_ID,IS_SAMPLE,SERIAL_NUMBER FROM ENROLLMENT
WHERE ENR_CLIENT_ID = '" & myArray(0) & "' AND COURSE_SESSION = " &
bSession & " AND COURSE_CODE ='" & sCourseCode & "' AND COURSE_YEAR ="
& iYear

Dim rs As New ADODB.Recordset
rs.Open sSQL, CurrentProject.Connection, adOpenKeyset,
adLockOptimistic, adCmdText
'Check if the student id exist in course offering's enrollment
records
If rs.RecordCount <> 1 Then
Call LogError(sLine, iRow, lngFileID, "Student ID doesn't
exist in course offering's enrollment records.")
Exit Sub
End If
'Check if Affiliate ID match
If Trim(rs!ENR_AFFILIATE_ID.Value) <> myArray(1) Then
Call LogError(sLine, iRow, lngFileID, "Enrolling Affiliate ID
doesn't match to course offering's enrollment records. The Affiliate
ID in Enrollment table is " & rs!ENR_AFFILIATE_ID.Value & "")
Exit Sub
End If

Dim lngEnrollmentID As Long
'Get EnrollmentID
lngEnrollmentID = rs!ENROLLMENT_ID.Value
'Update the Serial Number
rs!SERIAL_NUMBER = myArray(2)
'Update Status Code Is Sample to Yes
If sStatusCode = "S" Then
rs!IS_SAMPLE = "Y"
End If
rs.Update
rs.Close

'Process MCQ now
For i = 3 To UBound(myArray)

Dim rst As New ADODB.Recordset
sSQL = "SELECT QUESTION_ID,ENROLLMENT_ID,CHOICE FROM
QUESTION_MARK WHERE QUESTION_ID = " & arrExamQuestions(0, i - 3) & "
AND ENROLLMENT_ID = " & lngEnrollmentID
rst.Open sSQL, CurrentProject.Connection, adOpenStatic,
adLockOptimistic, adCmdText

If rst.RecordCount = 0 Then
rst.Close
Dim SQL As String

SQL = "INSERT INTO QUESTION_MARK " & _
"(QUESTION_ID,ENROLLMENT_ID,CHOICE) " & _
"VALUES (" & arrExamQuestions(0, i - 3) & "," &
lngEnrollmentID & ",'" & myArray(i) & "')"
DoCmd.RunSQL SQL
Else
If (IsNull(rst!CHOICE.Value) Or Len(Trim(rst!
CHOICE.Value)) = 0) Then
rst!CHOICE = Nz(myArray(i), "")
End If
rst.Update
rst.Close
End If
Next i
Set rst = Nothing
Set rs = Nothing
Exit_ImportRecord:
Exit Sub
Error_ImportRecord:
If Not rs Is Nothing Then
If rs.State = adStateOpen Then rs.Close
Set rs = Nothing
End If
If Not rst Is Nothing Then
If rst.State = adStateOpen Then rst.Close
Set rst = Nothing
End If

Call LogError(sLine, iRow, lngFileID, Error.Description)
MsgBox Error$
Resume Exit_ImportRecord
End Sub

Private Sub LogError(ByVal sLine As String, ByVal intRow As Integer,
ByVal lngFileID As Long, ByVal sError As String)
On Error GoTo Error_LogError
'change it to get file id and then udpate records.

' log error in table with all text field

Dim myArray As Variant
Dim rstError As New ADODB.Recordset
Dim i As Integer
Dim sChoices As String

myArray = Split(sLine, ",")
With rstError
.Open "IMPORT_ERROR_LOG", CurrentProject.Connection,
adOpenKeyset, adLockOptimistic, adCmdTable
.AddNew
!FILE_ID.Value = lngFileID
!CLIENT_ID.Value = myArray(0)
!AFFILIATE_ID.Value = myArray(1)
!SERIAL_NUMBER.Value = myArray(2)
!LINE_NUMBER.Value = intRow
!ERROR_DESC.Value = sError
For i = 3 To UBound(myArray)
sChoices = myArray(i) & "," & sChoices
Next i
!CHOICES = Left(sChoices, Len(sChoices) - 1)
.Update
.Close
End With
Exit_LogError:
Exit Sub
Error_LogError:
MsgBox Err.Number & ":" & Err.Description & vbCrLf _
& "Error happened while logging error. Rescan or reimport
the data." & vbCrLf _
& "Line: " & sLine & vbCrLf _
& "File: " & lngFileID, vbCritical + vbOKOnly
Resume Exit_LogError
End Sub
'================================FUNCTIONS==========================
Private Function GetFileID(ByVal strFile As String, iYear As Integer,
bSession As Byte, sCourseCode As String) As Long

Dim rst As New ADODB.Recordset

rst.Open "IMPORT_FILE", CurrentProject.Connection, adOpenDynamic,
adLockOptimistic, adCmdTable
rst.AddNew
rst!FILE_NAME.Value = Right(strFile, Len(strFile) -
InStrRev(strFile, "\"))
rst!FILE_PATH.Value = strFile
rst!COURSE_YEAR.Value = iYear
rst!COURSE_SESSION.Value = bSession
rst!COURSE_CODE.Value = sCourseCode
rst.Update
GetFileID = rst!FILE_ID
rst.Close
Set rst = Nothing
End Function

'-------------------------------------------
Private Function LoadExam(byteSession As Byte, intYear As Integer,
strCourseCode As String) As Boolean
'-------------------------------------------
On Error GoTo HandleError

Dim rs As New ADODB.Recordset
Dim sQry As String

LoadExam = False

sQry = "SELECT QUESTION_ID FROM " & CONST_QUESTION_TABLE & " WHERE
COURSE_SESSION = " & byteSession & " AND COURSE_CODE ='" &
strCourseCode & "' AND COURSE_YEAR =" & intYear & " ORDER BY
ORDER_NUMBER ASC"

rs.Open sQry, CurrentProject.Connection, adOpenStatic,
adLockReadOnly, adCmdText
With rs
QuestionCount = .RecordCount

If QuestionCount = 0 Then
MsgBox "No Questions found for this course offering.
Please define exam.", vbCritical
.Close
Set rs = Nothing
Exit Function
End If
arrExamQuestions = rs.GetRows(QuestionCount)
.Close

End With

Set rs = Nothing

LoadExam = True

Done:
Exit Function
HandleError:
If Not rs Is Nothing Then
If rs.State = adStateOpen Then rs.Close
Set rs = Nothing
End If
MsgBox "Error loading Exam Question Structure: " & vbCrLf &
Err.Description, vbCritical
Resume Done

End Function
======================================================================================
 
J

John W. Vinson

Hi John,

Can you please explain your idea in further details. Is it possible
for you to provide some VBA examples.

sorry... it's going to take a couple of weeks at least before I have time to
dig through 300+ lines of what looks like fairly complex code. As an unpaid
volunteer here, that's just a bit beyond what I'm comfortable committing to.

If you do indeed need to do all this code-based line-by-line validation then I
can't think of any good way to speed it up.
 
G

goroth

You could import the whole text file into a temp table in access.
1)Get file list from file open dialog.
2)Load the course exam structure in array. The array contains
question
id's of each question
-- I would import the whole text file into a temp table in access
DoCmd.TransferText acImportDelim, , "TEMPTABLE", "c:\temp.csv", -1

3)Insert the file information in FILE table
4)Open the CSV file and read each line. Sends the line to other
procedure.
5)Data validation of StudentID, RegionID, Serial No.

-- I would then compare my temp table to my main table.
Select * from TEMPTABLE where exists (select studentID from main)...

6)Check if studentID exist in enrollment table and then make sure
regionID matches.
8)Update enrollment record

(really no need to check if studentid is in temptable because the
where clause will take care of that for you)
Update main set main.enrollment = (select top 1 temptable.enrollment
from temptable where main.studentid = temptable.studentid)
 
A

arookie

This might be a good idea. I will give it a try. Thanks for the
suggestions.

Do you know whether in Access 2007, DAO performs better than ADO.
Somehow I have a feeling that ADO performance is slow in Access 2007
than Access 2003 version.

Thanks again.

Arookie.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top