What is the overhead of a text column in access?

F

Farhang Y.T.

Hi,
How can I calculate the maximum number of characters that I can insert in an
Access table (I'm using access 2003)? What is the overhead of text column in
Access?

I know that the documentation specifies that the max row size is 4k, which
should allow me to enter 4096 characters when Unicode Compression is set to
Yes.
However, the 4k size also includes the Column/Fields overheads.
So the more columns I have the less data I can actually enter.

For example, given a table with 180 Text columns (20chars each), I can enter
a maximum of 3258 characters --to do this I fill up 163 columns and the rest
are empty (zero length), any more and I get error 3047: Record too large.
3258 This is far less than 4096!

However, as the I reduce the number of fields, I am able to enter more date.
For example, given a table with 180 text columns (20 chars each), I can enter
3335 chars. And in a table with 16 Text columns (250 chars each) I can enter
3989 characters.
 
K

Klatuu

What are you trying to do? Is your question just theory or are you wanting to
do something?
If you can describe what you want to do, perhaps we can help with the how to
do it.

In any case, if you approach anywhere near the limit of the the number of
fields or the 4000 character limit (not 4K for 4096, but 4000), then your
table has a serious design flaw.
 
F

Farhang Y.T.

Dave,
Thanks for your reply. And thank you for clarifying 4000 chars vs. 4k.
What I need to do is to add a validation routine to a program that allows
users to create forms that are linked to a user defined table in a database.
The user decides the design of the table and I have no control over it.
I agree there is a design flaw but that is beyond my control. What I have to
do is to prevent the user from creating forms that can cause "Record too
large" errors (Error 3047).
Since the users can link to any tables that they like I need to be able to
calculate the maximum number of characters that can be handled.
 
K

Klatuu

okay, but I am not sure how that would be done.
If the users can create their own tables and forms, why not just let them
use Access rather than put another layer on top of it?
Sorry, but it doesn't reall seem realistic to me.
--
Dave Hargis, Microsoft Access MVP


Farhang Y.T. said:
Dave,
Thanks for your reply. And thank you for clarifying 4000 chars vs. 4k.
What I need to do is to add a validation routine to a program that allows
users to create forms that are linked to a user defined table in a database.
The user decides the design of the table and I have no control over it.
I agree there is a design flaw but that is beyond my control. What I have to
do is to prevent the user from creating forms that can cause "Record too
large" errors (Error 3047).
Since the users can link to any tables that they like I need to be able to
calculate the maximum number of characters that can be handled.
 
F

Farhang Y.T.

Allowing users to use Access directly is not an option because the program
I'm working on is part of a much bigger system.

What I have to do is add up the field sizes of all the fields that the user
has placed on the form and then add the overhead of all the fields in the
table and if the value is greater than 4000 then I'll pop a message box.

Ex: There is a table with 50 fields with a field size of 100 chars each.
The user adds 20 controls to the form. This will give me 20 * 100 = 2000.
Even with the overhead this value is way less than 4000 so no problem.
However, if the user adds 35 fields to the form then we'll end up with 3500
chars. In this case I have to consider the overhead and warn the user because
3500 + overhead = 4000+ chars.
 
A

a a r o n . k e m p f

I disagree.

Even MS Project comes close to breaking the 255 column limit.
SQL Server has a much brighter future-- JET is obsolete; and anyone
that tells you otherwise is trying to sell you somethnig you don't
need
 
A

a a r o n . k e m p f

and for the record, SQL Server now supports 'sparse columns' so it
would fit what you're trying to do TO A PERFECT FIT
 
B

Brendan Reynolds

The Access 97 Developer's Handbook had a list of the overhead used by each
data type. I'm afraid I've never seen an updated list. I'm really not sure
that what you describe is possible. The ADH says that the figures it gives
were derived from a Microsoft KB article which states that the numbers can
be used to calculate only an estimated record size. Even if the overheads
haven't changed in JET 4 - and they very well may have done - I don't think
you're ever going to be able to calculate the exact size.

With that caveat, here's the info from the ADH ...

Overhead includes ...

Seven bytes per record for record overhead

One byte variable-length column overhead for each text, memo, OLE and
hyperlink column.

One additional byte for every 256 bytes of the total space occupied by all
text, memo, OLE and hyperlink data type columns.

One byte fixed-column overhead for each yes/no, byte, integer, long integer,
counter, single, double, date/time, and replication ID column.

The figures were derived from KB article Q114215, "INF: Estimating Microsoft
Access Table Sizes". I haven't checked if that article is still on-line, it
*might* have been updated for JET 4, on the other hand, it's equally
possible that it may have been archived by now.

Good luck.

--
Brendan Reynolds

Farhang Y.T. said:
Allowing users to use Access directly is not an option because the program
I'm working on is part of a much bigger system.

What I have to do is add up the field sizes of all the fields that the
user
has placed on the form and then add the overhead of all the fields in the
table and if the value is greater than 4000 then I'll pop a message box.

Ex: There is a table with 50 fields with a field size of 100 chars each.
The user adds 20 controls to the form. This will give me 20 * 100 = 2000.
Even with the overhead this value is way less than 4000 so no problem.
However, if the user adds 35 fields to the form then we'll end up with
3500
chars. In this case I have to consider the overhead and warn the user
because
3500 + overhead = 4000+ chars.
 
A

a a r o n . k e m p f

I'd also mention this statistic:

a) text in jet (after Access 97) uses TWICE as much space as it should
because it's all unicode
b) so everything you do is TWICE as big as it should be
c) if you have an index on a text field, it is technically FOUR TIMES
as slow as it should be-- because it's not using optimal datatypes
(for the indexes, and for the base data).

With SQL Server you've got the choice between VarChar and NVarChar.
(ascii and unicode)

-Aaron
 
L

Larry Linson

As I have noted before, I avoid "engaging" Mr. Kempf, and do not even reply
unless his answers are so egregiously erroneous that correction is
imperative. That is the case with this post. I believe the quotations and
comments speak for themselves, so do not intend to respond if Mr. Kempf
chooses to argue.

a a r o n . k e m p f @ g m a i l . c o m said:
I'd also mention this statistic:
a) text in jet (after Access 97) uses TWICE as much
space as it should because it's all unicode
b) so everything you do is TWICE as big as it should be

See the Access Help, quoted (exact and unaltered) below, to refute this
claim. I particularly emphasize the following, extracted from that quote:
"Because the first byte of a Latin character- a character of a Western
European language such as English, Spanish, or German- is 0, Unicode
character representation does not affect how much storage space is required
for compressed data that consists entirely of Latin characters."
c) if you have an index on a text field, it is technically
FOUR TIMES as slow as it should be-- because it's not
using optimal datatypes (for the indexes, and for the
base data).

Mr. Kempf seems extremely knowledgeable about this area, considering that he
only would have had to open Access Help to refute his obvious
misunderstanding regarding Access and Unicode. I'm particularly impressed
that he should have such intimate knowledge of unpublished information to
know that indexed text fields are "FOUR TIMES as slow as it should be."
With SQL Server you've got the choice between
VarChar and NVarChar. (ascii and unicode)

In the case of data in character sets requiring 16 bits per character, one
has to use NVarChar in SQL Server to store them in Unicode or jump through a
few hoops to store them as Double Byte Character Sets. I would assume that
any database as capable as SQL Server would also have Unicode Compression
available.

Text and Memo Fields in Access are all variable length, so store only the
characters used, plus the overhead for the field.

Mr. Kempf's answer regarding Unicode is, at best, misleading, and, at worst,
simply wrong. The following is from the topic "About compressing the data
in a Text, Memo, or Hyperlink field (MDB) " in Access 2003 Help:

"Note The information in this topic applies only to a Microsoft Access
database (.mdb).
Microsoft Access 2000 or later uses the Unicode (Unicode: A character
encoding standard developed by the Unicode Consortium. By using more than
one byte to represent each character, Unicode enables almost all of the
written languages in the world to be represented by using a single character
set.) character-encoding scheme to represent the data in a Text (Text data
type: In a Microsoft Access database, this is a field data type. Text fields
can contain up to 255 characters or the number of characters specified by
the FieldSize property, whichever is less.), Memo (Memo data type: In a
Microsoft Access database, this is a field data type. Memo fields can
contain up to 65,535 characters.), or Hyperlink (Hyperlink data type: A data
type for an Access database field that stores hyperlink addresses. An
address can have up to four parts and is written using the following format:
displaytext#address#subaddress#.) field. Unicode represents each character
as two bytes, so the data in a Text, Memo, or Hyperlink field requires more
storage space than it did in Access 97 or earlier, where each character is
represented as one byte.

To offset this effect of Unicode character representation and to ensure
optimal performance, the default value of the Unicode Compression property
for a Text, Memo, or Hyperlink field is Yes. When a field's Unicode
Compression property is set to Yes, any character whose first byte is 0 is
compressed when it is stored and uncompressed when it is retrieved. Because
the first byte of a Latin character- a character of a Western European
language such as English, Spanish, or German- is 0, Unicode character
representation does not affect how much storage space is required for
compressed data that consists entirely of Latin characters.

In a single field, you can store any combination of characters that Unicode
supports. However, if the first byte of a particular character is not 0,
that character is not compressed.
Data in a Memo field is not compressed unless it requires 4,096 bytes or
less of storage space after compression. As a result, the contents of a Memo
field might be compressed in one record, but might not be compressed in
another record."

Larry Linson
Microsoft Office Access MVP
 
F

Farhang Y.T.

Thanks.
After doing a bunch of tests I came up with the following rule of thumb:

(Sum of Text Field Sizes) + (2 * number if Text Fields) + 150

The 150 is just an arbitrary number that I have chosen that will provide a
safety buffer for whatever unknown overhead.
 
L

Larry Linson

Farhang Y.T. said:
After doing a bunch of tests I came up with the following rule of thumb:

(Sum of Text Field Sizes) + (2 * number if Text Fields) + 150

The 150 is just an arbitrary number that I have chosen that will provide a
safety buffer for whatever unknown overhead.

Remember that Text and Memo Fields are variable-length and the length
defined is a maximum. That is, if you use maximum length in your
calcuation, you are calculating "worst-case" in regards to length.

Larry Linson
Microsoft Office Access MVP
 
F

Farhang Y.T.

Yes, I realize that.

--
Farhang


Larry Linson said:
Remember that Text and Memo Fields are variable-length and the length
defined is a maximum. That is, if you use maximum length in your
calcuation, you are calculating "worst-case" in regards to length.

Larry Linson
Microsoft Office Access MVP
 
T

Tony Toews [MVP]

Chris O'C via AccessMonster.com said:
For data pages that contain records, the overhead is as follows:

Chris

Where do you get those numbers from?

Tony
--
Tony Toews, Microsoft Access MVP
Please respond only in the newsgroups so that others can
read the entire thread of messages.
Microsoft Access Links, Hints, Tips & Accounting Systems at
http://www.granite.ab.ca/accsmstr.htm
Tony's Microsoft Access Blog - http://msmvps.com/blogs/access/
 
T

Tony Toews [MVP]

Chris O'C via AccessMonster.com said:
From studying data pages with a hex editor.

Ahhh. Thanks.

Where you aware of MDB Tools?

MDB Tools is a set of open source libraries and utilities to
facilitate exporting data from MS Access databases (mdb files) without
using the Microsoft DLLs. Thus non Windows OSs can read the data. Or,
to put it another way, they are reverse engineering the layout of the
MDB file.

mdbtools.sourceforge.net Download the file and open the HACKING file.

Tony
--
Tony Toews, Microsoft Access MVP
Please respond only in the newsgroups so that others can
read the entire thread of messages.
Microsoft Access Links, Hints, Tips & Accounting Systems at
http://www.granite.ab.ca/accsmstr.htm
Tony's Microsoft Access Blog - http://msmvps.com/blogs/access/
 
T

Tony Toews [MVP]

Chris O'C via AccessMonster.com said:
Holy smoke! Wish I'd had that info when I was digging for it a while back.
Well, confirms many of my findings with the hex editor.

You're welcome.

<smirk>All you had to do was ask in the newsgroups and I would've
posted that link. </smirk>

Tony (sorry, couldn't help myself. <whack>)
--
Tony Toews, Microsoft Access MVP
Please respond only in the newsgroups so that others can
read the entire thread of messages.
Microsoft Access Links, Hints, Tips & Accounting Systems at
http://www.granite.ab.ca/accsmstr.htm
Tony's Microsoft Access Blog - http://msmvps.com/blogs/access/
 
S

So Sorry For Poor Aaron

a a r o n . k e m p f @ g m a i l . c o said:
. . . call him a TROLL

Trolling may be THE one subject about which poor, poor, pitiful little aaron
knows anything.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top