As I have noted before, I avoid "engaging" Mr. Kempf, and do not even reply
unless his answers are so egregiously erroneous that correction is
imperative. That is the case with this post. I believe the quotations and
comments speak for themselves, so do not intend to respond if Mr. Kempf
chooses to argue.
a a r o n . k e m p f @ g m a i l . c o m said:
I'd also mention this statistic:
a) text in jet (after Access 97) uses TWICE as much
space as it should because it's all unicode
b) so everything you do is TWICE as big as it should be
See the Access Help, quoted (exact and unaltered) below, to refute this
claim. I particularly emphasize the following, extracted from that quote:
"Because the first byte of a Latin character- a character of a Western
European language such as English, Spanish, or German- is 0, Unicode
character representation does not affect how much storage space is required
for compressed data that consists entirely of Latin characters."
c) if you have an index on a text field, it is technically
FOUR TIMES as slow as it should be-- because it's not
using optimal datatypes (for the indexes, and for the
base data).
Mr. Kempf seems extremely knowledgeable about this area, considering that he
only would have had to open Access Help to refute his obvious
misunderstanding regarding Access and Unicode. I'm particularly impressed
that he should have such intimate knowledge of unpublished information to
know that indexed text fields are "FOUR TIMES as slow as it should be."
With SQL Server you've got the choice between
VarChar and NVarChar. (ascii and unicode)
In the case of data in character sets requiring 16 bits per character, one
has to use NVarChar in SQL Server to store them in Unicode or jump through a
few hoops to store them as Double Byte Character Sets. I would assume that
any database as capable as SQL Server would also have Unicode Compression
available.
Text and Memo Fields in Access are all variable length, so store only the
characters used, plus the overhead for the field.
Mr. Kempf's answer regarding Unicode is, at best, misleading, and, at worst,
simply wrong. The following is from the topic "About compressing the data
in a Text, Memo, or Hyperlink field (MDB) " in Access 2003 Help:
"Note The information in this topic applies only to a Microsoft Access
database (.mdb).
Microsoft Access 2000 or later uses the Unicode (Unicode: A character
encoding standard developed by the Unicode Consortium. By using more than
one byte to represent each character, Unicode enables almost all of the
written languages in the world to be represented by using a single character
set.) character-encoding scheme to represent the data in a Text (Text data
type: In a Microsoft Access database, this is a field data type. Text fields
can contain up to 255 characters or the number of characters specified by
the FieldSize property, whichever is less.), Memo (Memo data type: In a
Microsoft Access database, this is a field data type. Memo fields can
contain up to 65,535 characters.), or Hyperlink (Hyperlink data type: A data
type for an Access database field that stores hyperlink addresses. An
address can have up to four parts and is written using the following format:
displaytext#address#subaddress#.) field. Unicode represents each character
as two bytes, so the data in a Text, Memo, or Hyperlink field requires more
storage space than it did in Access 97 or earlier, where each character is
represented as one byte.
To offset this effect of Unicode character representation and to ensure
optimal performance, the default value of the Unicode Compression property
for a Text, Memo, or Hyperlink field is Yes. When a field's Unicode
Compression property is set to Yes, any character whose first byte is 0 is
compressed when it is stored and uncompressed when it is retrieved. Because
the first byte of a Latin character- a character of a Western European
language such as English, Spanish, or German- is 0, Unicode character
representation does not affect how much storage space is required for
compressed data that consists entirely of Latin characters.
In a single field, you can store any combination of characters that Unicode
supports. However, if the first byte of a particular character is not 0,
that character is not compressed.
Data in a Memo field is not compressed unless it requires 4,096 bytes or
less of storage space after compression. As a result, the contents of a Memo
field might be compressed in one record, but might not be compressed in
another record."
Larry Linson
Microsoft Office Access MVP