Importing Text

G

glnbnz

I have a question I hope can be answered.
How can I change information from paragraph form to table form.

Example:
100001 Type: GAS Status: SP Op: CONTINENTAL RESOURCES, INC (170979)
PENICK, DOUGLAS #1 Bond: BS777800
Todd County Topo: ALLENSVILLE Carter Coord: 25 C 30 1240FNL 1520FEL
PI, NO RECORDS ON FILE

I would like to convert the above paragraph into a table.
Everything in CAPITAL letters would be its own field straight across.
This is a part of a large text document with hundreds of records.
Thank you
 
J

Jeff Boyce

One possibility might be to use Word to parse the separate "data elements"
into their own column in a (Word) table. After the data has been converted
to a Word table, and cleaned-up (probably will need this, unless all data is
EXACTLY the same), you could import that table into Access.

Regards

Jeff Boyce
Microsoft Office/Access MVP
 
G

glnbnz

Thank you for the advice, could you please send me in the right direction on
how to parse the data elements in Word?
 
J

John Nurick

What do you mean by "everything in CAPITAL letters would be its own
field"? I guess you want to parse the string of text into fields
somewhat along these lines:

RecordNumber: 100001
Type: GAS
Status: SP
Op: CONTINENTAL RESOURCES, INC (1780979) PENIC, DOUGLAS #1
Bond: BS777800
Todd County Topo: ALLENSVILLE
Carter Coord: 25 C 1240FNL 1520FEL, PI, NO RECORDS ON FILE

This probably won't be very difficult, but it's *essential* to
understand the task and the data before you start. Can you post half a
dozen or so sample records which between them illustrate the range of
variability in the structure and values - and for each of them also show
exactly what the result needs to be.
 
G

glnbnz

Yes I can. Here is 5 records. To be more specific all I really need is:
RecordNumber:
Type:
Status:
Op:
The rest of the record could be discarded.

100001 Type: GAS Status: SP Op: CONTINENTAL RESOURCES, INC (170979)
PENICK, DOUGLAS #1 Bond: BS777800
Todd County Topo: ALLENSVILLE Carter Coord: 25 C 30 1240FNL 1520FEL
PI, NO RECORDS ON FILE

10156 Type: SRI Status: AI Op: CONTINENTAL RESOURCES, INC (170979)
HOLLIMAN, S H #2 Depth: 2087 Bond: BS777800
Hopkins County Topo: NORTONVILLE Carter Coord: 12 I 25 2520FNL 2360FWL
SRI WELL, ALL RECORDS ON FILE, NO VIOLATIONS, TRANS FROM HAR-KEN OIL CO
10/16/97, TRANS FROM FARRAR OIL CO 9/26/01

10161 Type: OIL Status: PR Op: CONTINENTAL RESOURCES, INC (170979)
WILLIAM & DRAKE HEIRS #4 Depth: 1707 Bond: BS777800
Hopkins County Topo: NORTONVILLE Carter Coord: 8 I 25 170FSL 580FEL
OIL WELL, ALL RECORDS ON FILE, NO VIOLATIONS, TRANS FROM HAR-KEN OIL CO
10/16/97, TRANS FROM FARRAR OIL CO 9/26/01

10162 Type: OIL Status: PR Op: CONTINENTAL RESOURCES, INC (170979)
WILLIAM & DRAKE HEIRS #5 Depth: 2185 Bond: BS777800
Hopkins County Topo: NORTONVILLE Carter Coord: 13 I 25 1020FNL 590FEL
OIL WELL, ALL RECORDS ON FILE, NO VIOLATIONS, TRANS FROM HAR-KEN OIL CO
10/16/97, TWIN WELL TO PERMIT#89337, TRANS FROM FARRAR OIL CO 9/26/01

10362 Type: OIL Status: PR Op: CONTINENTAL RESOURCES, INC (170979)
HOLLIMAN, S H #3 Depth: 2114 Bond: BS777800
Hopkins County Topo: NORTONVILLE Carter Coord: 12 I 25 2950FNL 1500FWL
OIL WELL, ALL RECORDS ON FILE, NO VIOLATIONS, TRANS FROM HAR-KEN OIL CO

Thank you
 
J

John Nurick

Only needing those first few fields makes it a lot simpler. I'll assume that
the records are in a text file, with line breaks where they appear in your
sample data, and that the [Op] field extends to the end of the first line of
the record. If that's right, here's one way to go at it.

1) Import the text file into a new table. In the import wizard, select
Delimited, set the delimiter to the pipe character "|" and the text qualifier
to {none}.

2) The table will have just one field, probably called F1, with one record
for each line of the text file. Use a delete query with this criterion
Not Like "*Type: *Status:*"
to get rid of all the records that don't correspond to the first line of the
text records.

3) Go to table design view and add fields for RecordNumber, Type, Status and
Op.

4) Use an update query with expressions that extract the values from the
original field and put them in the new fields.

There are various ways of building the calculated fields. I'd probably use
my rgxExtract function (downloadable from
http://www.j.nurick.dial.pipex.com/Code/ )
in which case the expressions in the "update to" cell for the various fields
would be like this:

RecordNumber: rgxExtract([F1], "^\d+")

Type: rgxExtract([F1], "Type:\s+(.+)\s+Status:")

Status rgxExtract([F1], "Status:\s+(.+)\s+Op:")

Op: rgxExtract([F1], "Op:\s+(.+)$")

Alternatively you can use the standard string functions such as Instr() and
Mid() to locate "fixed points" such as "Type: " and " Status:" and return the
characters between them.
 
J

Jeff Boyce

Open Word. Highlight the data. Click Tables | Convert | Text to Tables.

Regards

Jeff Boyce
Microsoft Office/Access MVP
 
G

glnbnz

I am sorry but I need one more piece of information. I need the name that
appears right under the first line. The [Op] field actually extends all the
way to the beginning of the word [Bond:]

Thanks again

John Nurick said:
Only needing those first few fields makes it a lot simpler. I'll assume that
the records are in a text file, with line breaks where they appear in your
sample data, and that the [Op] field extends to the end of the first line of
the record. If that's right, here's one way to go at it.

1) Import the text file into a new table. In the import wizard, select
Delimited, set the delimiter to the pipe character "|" and the text qualifier
to {none}.

2) The table will have just one field, probably called F1, with one record
for each line of the text file. Use a delete query with this criterion
Not Like "*Type: *Status:*"
to get rid of all the records that don't correspond to the first line of the
text records.

3) Go to table design view and add fields for RecordNumber, Type, Status and
Op.

4) Use an update query with expressions that extract the values from the
original field and put them in the new fields.

There are various ways of building the calculated fields. I'd probably use
my rgxExtract function (downloadable from
http://www.j.nurick.dial.pipex.com/Code/ )
in which case the expressions in the "update to" cell for the various fields
would be like this:

RecordNumber: rgxExtract([F1], "^\d+")

Type: rgxExtract([F1], "Type:\s+(.+)\s+Status:")

Status rgxExtract([F1], "Status:\s+(.+)\s+Op:")

Op: rgxExtract([F1], "Op:\s+(.+)$")

Alternatively you can use the standard string functions such as Instr() and
Mid() to locate "fixed points" such as "Type: " and " Status:" and return the
characters between them.



glnbnz said:
Yes I can. Here is 5 records. To be more specific all I really need is:
RecordNumber:
Type:
Status:
Op:
The rest of the record could be discarded.

100001 Type: GAS Status: SP Op: CONTINENTAL RESOURCES, INC (170979)
PENICK, DOUGLAS #1 Bond: BS777800
Todd County Topo: ALLENSVILLE Carter Coord: 25 C 30 1240FNL 1520FEL
PI, NO RECORDS ON FILE

10156 Type: SRI Status: AI Op: CONTINENTAL RESOURCES, INC (170979)
HOLLIMAN, S H #2 Depth: 2087 Bond: BS777800
Hopkins County Topo: NORTONVILLE Carter Coord: 12 I 25 2520FNL 2360FWL
SRI WELL, ALL RECORDS ON FILE, NO VIOLATIONS, TRANS FROM HAR-KEN OIL CO
10/16/97, TRANS FROM FARRAR OIL CO 9/26/01

10161 Type: OIL Status: PR Op: CONTINENTAL RESOURCES, INC (170979)
WILLIAM & DRAKE HEIRS #4 Depth: 1707 Bond: BS777800
Hopkins County Topo: NORTONVILLE Carter Coord: 8 I 25 170FSL 580FEL
OIL WELL, ALL RECORDS ON FILE, NO VIOLATIONS, TRANS FROM HAR-KEN OIL CO
10/16/97, TRANS FROM FARRAR OIL CO 9/26/01

10162 Type: OIL Status: PR Op: CONTINENTAL RESOURCES, INC (170979)
WILLIAM & DRAKE HEIRS #5 Depth: 2185 Bond: BS777800
Hopkins County Topo: NORTONVILLE Carter Coord: 13 I 25 1020FNL 590FEL
OIL WELL, ALL RECORDS ON FILE, NO VIOLATIONS, TRANS FROM HAR-KEN OIL CO
10/16/97, TWIN WELL TO PERMIT#89337, TRANS FROM FARRAR OIL CO 9/26/01

10362 Type: OIL Status: PR Op: CONTINENTAL RESOURCES, INC (170979)
HOLLIMAN, S H #3 Depth: 2114 Bond: BS777800
Hopkins County Topo: NORTONVILLE Carter Coord: 12 I 25 2950FNL 1500FWL
OIL WELL, ALL RECORDS ON FILE, NO VIOLATIONS, TRANS FROM HAR-KEN OIL CO

Thank you
 
G

glnbnz

I am also having problems writing the update query. When I enter
RecordNumber: rgxExtract([F1], "^\d+")
I get a 'The expression you entered has an invalid . (dot) or ! operator or
invalid parentheses.
error

Thanks again

glnbnz said:
I am sorry but I need one more piece of information. I need the name that
appears right under the first line. The [Op] field actually extends all the
way to the beginning of the word [Bond:]

Thanks again

John Nurick said:
Only needing those first few fields makes it a lot simpler. I'll assume that
the records are in a text file, with line breaks where they appear in your
sample data, and that the [Op] field extends to the end of the first line of
the record. If that's right, here's one way to go at it.

1) Import the text file into a new table. In the import wizard, select
Delimited, set the delimiter to the pipe character "|" and the text qualifier
to {none}.

2) The table will have just one field, probably called F1, with one record
for each line of the text file. Use a delete query with this criterion
Not Like "*Type: *Status:*"
to get rid of all the records that don't correspond to the first line of the
text records.

3) Go to table design view and add fields for RecordNumber, Type, Status and
Op.

4) Use an update query with expressions that extract the values from the
original field and put them in the new fields.

There are various ways of building the calculated fields. I'd probably use
my rgxExtract function (downloadable from
http://www.j.nurick.dial.pipex.com/Code/ )
in which case the expressions in the "update to" cell for the various fields
would be like this:

RecordNumber: rgxExtract([F1], "^\d+")

Type: rgxExtract([F1], "Type:\s+(.+)\s+Status:")

Status rgxExtract([F1], "Status:\s+(.+)\s+Op:")

Op: rgxExtract([F1], "Op:\s+(.+)$")

Alternatively you can use the standard string functions such as Instr() and
Mid() to locate "fixed points" such as "Type: " and " Status:" and return the
characters between them.



glnbnz said:
Yes I can. Here is 5 records. To be more specific all I really need is:
RecordNumber:
Type:
Status:
Op:
The rest of the record could be discarded.

100001 Type: GAS Status: SP Op: CONTINENTAL RESOURCES, INC (170979)
PENICK, DOUGLAS #1 Bond: BS777800
Todd County Topo: ALLENSVILLE Carter Coord: 25 C 30 1240FNL 1520FEL
PI, NO RECORDS ON FILE

10156 Type: SRI Status: AI Op: CONTINENTAL RESOURCES, INC (170979)
HOLLIMAN, S H #2 Depth: 2087 Bond: BS777800
Hopkins County Topo: NORTONVILLE Carter Coord: 12 I 25 2520FNL 2360FWL
SRI WELL, ALL RECORDS ON FILE, NO VIOLATIONS, TRANS FROM HAR-KEN OIL CO
10/16/97, TRANS FROM FARRAR OIL CO 9/26/01

10161 Type: OIL Status: PR Op: CONTINENTAL RESOURCES, INC (170979)
WILLIAM & DRAKE HEIRS #4 Depth: 1707 Bond: BS777800
Hopkins County Topo: NORTONVILLE Carter Coord: 8 I 25 170FSL 580FEL
OIL WELL, ALL RECORDS ON FILE, NO VIOLATIONS, TRANS FROM HAR-KEN OIL CO
10/16/97, TRANS FROM FARRAR OIL CO 9/26/01

10162 Type: OIL Status: PR Op: CONTINENTAL RESOURCES, INC (170979)
WILLIAM & DRAKE HEIRS #5 Depth: 2185 Bond: BS777800
Hopkins County Topo: NORTONVILLE Carter Coord: 13 I 25 1020FNL 590FEL
OIL WELL, ALL RECORDS ON FILE, NO VIOLATIONS, TRANS FROM HAR-KEN OIL CO
10/16/97, TWIN WELL TO PERMIT#89337, TRANS FROM FARRAR OIL CO 9/26/01

10362 Type: OIL Status: PR Op: CONTINENTAL RESOURCES, INC (170979)
HOLLIMAN, S H #3 Depth: 2114 Bond: BS777800
Hopkins County Topo: NORTONVILLE Carter Coord: 12 I 25 2950FNL 1500FWL
OIL WELL, ALL RECORDS ON FILE, NO VIOLATIONS, TRANS FROM HAR-KEN OIL CO

Thank you




:

What do you mean by "everything in CAPITAL letters would be its own
field"? I guess you want to parse the string of text into fields
somewhat along these lines:

RecordNumber: 100001
Type: GAS
Status: SP
Op: CONTINENTAL RESOURCES, INC (1780979) PENIC, DOUGLAS #1
Bond: BS777800
Todd County Topo: ALLENSVILLE
Carter Coord: 25 C 1240FNL 1520FEL, PI, NO RECORDS ON FILE

This probably won't be very difficult, but it's *essential* to
understand the task and the data before you start. Can you post half a
dozen or so sample records which between them illustrate the range of
variability in the structure and values - and for each of them also show
exactly what the result needs to be.

On Wed, 14 Feb 2007 09:39:18 -0800, glnbnz

I have a question I hope can be answered.
How can I change information from paragraph form to table form.

Example:
100001 Type: GAS Status: SP Op: CONTINENTAL RESOURCES, INC (170979)
PENICK, DOUGLAS #1 Bond: BS777800
Todd County Topo: ALLENSVILLE Carter Coord: 25 C 30 1240FNL 1520FEL
PI, NO RECORDS ON FILE

I would like to convert the above paragraph into a table.
Everything in CAPITAL letters would be its own field straight across.
This is a part of a large text document with hundreds of records.
Thank you
 
J

John Nurick

As I said, that complicates things. One simple approach is to start by
editing the text file (using Word or a text editor) to get each record
onto a single line. If you haven't done this before, the idea is:

a) Make sure you know what separates the records in the text file.
Proably this is two successive linebreaks (in Word, two successive
paragraph marks).

b) Replace the record separator with something that doesn't appear
anywhere in the document. E.g. in Word you'd replace
^p^p
with
%%%

c) replace all remaining paragraph marks with spaces

d) finally replace the "%%%" with "^p" and save the result as a text
file. This should have each record on one line.

Now you can use the same technique I suggested before, except of course
for using a different expression for the Op field; you can model this on
the ones for the Type and Status fields.


I am also having problems writing the update query. When I enter
RecordNumber: rgxExtract([F1], "^\d+")
I get a 'The expression you entered has an invalid . (dot) or ! operator or
invalid parentheses.
error

Thanks again

glnbnz said:
I am sorry but I need one more piece of information. I need the name that
appears right under the first line. The [Op] field actually extends all the
way to the beginning of the word [Bond:]

Thanks again

John Nurick said:
Only needing those first few fields makes it a lot simpler. I'll assume that
the records are in a text file, with line breaks where they appear in your
sample data, and that the [Op] field extends to the end of the first line of
the record. If that's right, here's one way to go at it.

1) Import the text file into a new table. In the import wizard, select
Delimited, set the delimiter to the pipe character "|" and the text qualifier
to {none}.

2) The table will have just one field, probably called F1, with one record
for each line of the text file. Use a delete query with this criterion
Not Like "*Type: *Status:*"
to get rid of all the records that don't correspond to the first line of the
text records.

3) Go to table design view and add fields for RecordNumber, Type, Status and
Op.

4) Use an update query with expressions that extract the values from the
original field and put them in the new fields.

There are various ways of building the calculated fields. I'd probably use
my rgxExtract function (downloadable from
http://www.j.nurick.dial.pipex.com/Code/ )
in which case the expressions in the "update to" cell for the various fields
would be like this:

RecordNumber: rgxExtract([F1], "^\d+")

Type: rgxExtract([F1], "Type:\s+(.+)\s+Status:")

Status rgxExtract([F1], "Status:\s+(.+)\s+Op:")

Op: rgxExtract([F1], "Op:\s+(.+)$")

Alternatively you can use the standard string functions such as Instr() and
Mid() to locate "fixed points" such as "Type: " and " Status:" and return the
characters between them.



:

Yes I can. Here is 5 records. To be more specific all I really need is:
RecordNumber:
Type:
Status:
Op:
The rest of the record could be discarded.

100001 Type: GAS Status: SP Op: CONTINENTAL RESOURCES, INC (170979)
PENICK, DOUGLAS #1 Bond: BS777800
Todd County Topo: ALLENSVILLE Carter Coord: 25 C 30 1240FNL 1520FEL
PI, NO RECORDS ON FILE

10156 Type: SRI Status: AI Op: CONTINENTAL RESOURCES, INC (170979)
HOLLIMAN, S H #2 Depth: 2087 Bond: BS777800
Hopkins County Topo: NORTONVILLE Carter Coord: 12 I 25 2520FNL 2360FWL
SRI WELL, ALL RECORDS ON FILE, NO VIOLATIONS, TRANS FROM HAR-KEN OIL CO
10/16/97, TRANS FROM FARRAR OIL CO 9/26/01

10161 Type: OIL Status: PR Op: CONTINENTAL RESOURCES, INC (170979)
WILLIAM & DRAKE HEIRS #4 Depth: 1707 Bond: BS777800
Hopkins County Topo: NORTONVILLE Carter Coord: 8 I 25 170FSL 580FEL
OIL WELL, ALL RECORDS ON FILE, NO VIOLATIONS, TRANS FROM HAR-KEN OIL CO
10/16/97, TRANS FROM FARRAR OIL CO 9/26/01

10162 Type: OIL Status: PR Op: CONTINENTAL RESOURCES, INC (170979)
WILLIAM & DRAKE HEIRS #5 Depth: 2185 Bond: BS777800
Hopkins County Topo: NORTONVILLE Carter Coord: 13 I 25 1020FNL 590FEL
OIL WELL, ALL RECORDS ON FILE, NO VIOLATIONS, TRANS FROM HAR-KEN OIL CO
10/16/97, TWIN WELL TO PERMIT#89337, TRANS FROM FARRAR OIL CO 9/26/01

10362 Type: OIL Status: PR Op: CONTINENTAL RESOURCES, INC (170979)
HOLLIMAN, S H #3 Depth: 2114 Bond: BS777800
Hopkins County Topo: NORTONVILLE Carter Coord: 12 I 25 2950FNL 1500FWL
OIL WELL, ALL RECORDS ON FILE, NO VIOLATIONS, TRANS FROM HAR-KEN OIL CO

Thank you




:

What do you mean by "everything in CAPITAL letters would be its own
field"? I guess you want to parse the string of text into fields
somewhat along these lines:

RecordNumber: 100001
Type: GAS
Status: SP
Op: CONTINENTAL RESOURCES, INC (1780979) PENIC, DOUGLAS #1
Bond: BS777800
Todd County Topo: ALLENSVILLE
Carter Coord: 25 C 1240FNL 1520FEL, PI, NO RECORDS ON FILE

This probably won't be very difficult, but it's *essential* to
understand the task and the data before you start. Can you post half a
dozen or so sample records which between them illustrate the range of
variability in the structure and values - and for each of them also show
exactly what the result needs to be.

On Wed, 14 Feb 2007 09:39:18 -0800, glnbnz

I have a question I hope can be answered.
How can I change information from paragraph form to table form.

Example:
100001 Type: GAS Status: SP Op: CONTINENTAL RESOURCES, INC (170979)
PENICK, DOUGLAS #1 Bond: BS777800
Todd County Topo: ALLENSVILLE Carter Coord: 25 C 30 1240FNL 1520FEL
PI, NO RECORDS ON FILE

I would like to convert the above paragraph into a table.
Everything in CAPITAL letters would be its own field straight across.
This is a part of a large text document with hundreds of records.
Thank you
 
J

jazzmine

I have a question that I hope can be answered:
I have been having trouble importing a large text file in Access 2003. The
file I am importing is a .txt file. The import process seems to randomly
eliminate some rows and duplicate others, so that the total number of rows is
correct but the data is not. Each year should have 12 months associated with
it, but as you can see from this query, there are several years with an
incorrect number of months. I have tried importing this file into several
different Access 2003 databases, both with and without an import spec, to no
avail. The rows that are eliminated/duplicated are different each time I
import the file. However, I was able to import the file using Access 97 with
no problems. Additionally, I was able to import a .csv file of identical
proportions into my Access 2003 database without any trouble. The Access 97
workaround has allowed me to finish my project, but since this is a recurring
quarterly project I would like to fix this import problem for the future.
Thanks!
 
J

jazzmine

I have a question that I hope can be answered:
I have been having trouble importing a large text file in Access 2003. The
file I am importing is a .txt file. The import process seems to randomly
eliminate some rows and duplicate others, so that the total number of rows is
correct but the data is not. Each year should have 12 months associated with
it, but as you can see from this query, there are several years with an
incorrect number of months. I have tried importing this file into several
different Access 2003 databases, both with and without an import spec, to no
avail. The rows that are eliminated/duplicated are different each time I
import the file. However, I was able to import the file using Access 97 with
no problems. Additionally, I was able to import a .csv file of identical
proportions into my Access 2003 database without any trouble. The Access 97
workaround has allowed me to finish my project, but since this is a recurring
quarterly project I would like to fix this import problem for the future.
Thanks!
 
J

John Nurick

Usually this comes down to irregularities in the structure of the text
file - and as you say
there are several years with an
incorrect number of months.
this could be the cause.

Access 97 was more tolerant in this regard than Access 2003, which may
explain why you can import using the older version. Since that's the
case, the simplest work-round is to create a database in Access 97,
import the data into that, and then import from there into Access 2003.

Also, make sure that you have installed Office 2003 Service Pack 2; IIRC
the service packs fixed some import problems in the original Access
2003.
 
J

jazzmine

Hello Jeff

I had the user try this suggestion


One possibility might be to use Word to "parse" the separate "data elements"
into their own column in a (Word) table. After the data has been converted
to a Word table, and cleaned-up (probably will need this, unless all data is
EXACTLY the same), you could import that table into Access.



"Parse" - Open Word. Highlight the data. Click Tables | Convert | Text to
Tables.

and unfortunately this will not work because

The file I am working with has 360,000 rows. Word can only handle ~30,000
in its text-to-table feature, so it looks like this procedure doesn’t fit my
situation.


Any other ideas or suggestions would be very helpful.

Thanks,
Jazzmine
 
Top