Distinct keyword in query

M

MoinJ

I have a query like 'Select distinct class_id from product where col1 like
'%...%''. Now, in this particular table i have around 400k records and the
class_id and col1 fields are indexed. This query takes me around 4-5 mins to
execute. Is there any other way to get me all the distinct class_id's?
 
B

Brendan Reynolds

You could try using GROUP BY, but I can't promise that it will be any
faster ...

SELECT class_id FROM product WHERE col1 LIKE '%...%' GROUP BY class_id

If this is a Jet query, using LIKE in this way with both leading and
trailing wildcards prevents Jet from using any index that may be present on
the col1 field. You may see a significant performance improvement if you can
modify the query in a way that would permit Jet to use an index. Jet can use
an index with a trailing wildcard, e.g. LIKE 'sometext%', or you may be able
to use BETWEEN or >= something AND <= somethingelse instead of LIKE.
 
D

Dirk Goldgar

MoinJ said:
I have a query like 'Select distinct class_id from product where col1
like '%...%''. Now, in this particular table i have around 400k
records and the class_id and col1 fields are indexed. This query
takes me around 4-5 mins to execute. Is there any other way to get me
all the distinct class_id's?

Are you using ANSI-compatible SQL, so that the '%' character matches
multiple characters? Normally, in an .mdb file and Jet SQL, the '*'
character is used for that. I only mention that in case you find you
don't get the results you expect.

The problem is with the leading wildcard character. That prevents the
database engine from using the index you have on col1. If the criterion
used just the trailing wildcard, then the index could be used.

If the query

SELECT DISTINCT class_id, col1 FROM PRODUCT

significantly reduces the number of records, then *maybe*

SELECT DISTINCT T.class_ID FROM
(SELECT DISTINCT class_id, col1 FROM PRODUCT) AS T
WHERE T.col1 LIKE '%[foo]%'

would help.

If that doesn't make things any better, then maybe the problem can be
refined somewhat. For example, if you are looking for a specific word
in a space-delimited or punctuated list of words, then maybe the field
can be parsed into words at the time the record is saved, and the
associated words can be saved, indexed, in a related table. In that
case you could write a query that searches the Keywords table and
returns only products that match the word you're looking for.
 
M

MoinJ

You could try using GROUP BY, but I can't promise that it will be any
faster ...

SELECT class_id FROM product WHERE col1 LIKE '%...%' GROUP BY class_id

I tried using GROUP BY too, but it did not make any difference.
If this is a Jet query, using LIKE in this way with both leading and
trailing wildcards prevents Jet from using any index that may be present on
the col1 field. You may see a significant performance improvement if you can
modify the query in a way that would permit Jet to use an index. Jet can use
an index with a trailing wildcard, e.g. LIKE 'sometext%'

I modified the query to LIKE 'sometext%' and i saw significant performance
improvent, but when i did something like LIKE '%sometext' the performance was
still bad.
or you may be able
to use BETWEEN or >= something AND <= somethingelse instead of LIKE.

I cannot use the above clauses as i still need to search for '%sometext%'
throughout the mdb in col1. Is is possible to use the above clauses to search
for text data?

Also a query like 'SELECT product_id,col1,col3 where col1 LIKE '%sometext%'
takes the equal amount of time to execute which has no DISTINCT keyword.
 
M

MoinJ

If running the query in the design window of access i do use '*', but since
i am running the SQL through VB, i gave u that syntax, so that is not the
problem.

If i use use just the trailing wildcard then it wont solve my purpose as i
want all the records from product where i find '%[foo]%' in the col1 field
from the entire mdb.

I tried using the query below too, but that did not make any difference.

Dirk Goldgar said:
MoinJ said:
I have a query like 'Select distinct class_id from product where col1
like '%...%''. Now, in this particular table i have around 400k
records and the class_id and col1 fields are indexed. This query
takes me around 4-5 mins to execute. Is there any other way to get me
all the distinct class_id's?

Are you using ANSI-compatible SQL, so that the '%' character matches
multiple characters? Normally, in an .mdb file and Jet SQL, the '*'
character is used for that. I only mention that in case you find you
don't get the results you expect.

The problem is with the leading wildcard character. That prevents the
database engine from using the index you have on col1. If the criterion
used just the trailing wildcard, then the index could be used.

If the query

SELECT DISTINCT class_id, col1 FROM PRODUCT

significantly reduces the number of records, then *maybe*

SELECT DISTINCT T.class_ID FROM
(SELECT DISTINCT class_id, col1 FROM PRODUCT) AS T
WHERE T.col1 LIKE '%[foo]%'

would help.

If that doesn't make things any better, then maybe the problem can be
refined somewhat. For example, if you are looking for a specific word
in a space-delimited or punctuated list of words, then maybe the field
can be parsed into words at the time the record is saved, and the
associated words can be saved, indexed, in a related table. In that
case you could write a query that searches the Keywords table and
returns only products that match the word you're looking for.

--
Dirk Goldgar, MS Access MVP
www.datagnostics.com

(please reply to the newsgroup)
 
B

Brendan Reynolds

Probably the greatest single factor influencing query performance is whether
the database engine can make use of an index/indexes or not - as your own
experience has demonstrated. If there really is no way that you can achieve
the result you require without using a leading wildcard, then I'm afraid you
will not get good performance.
 
D

Dirk Goldgar

MoinJ said:
If running the query in the design window of access i do use '*',
but since i am running the SQL through VB, i gave u that syntax, so
that is not the problem.

If i use use just the trailing wildcard then it wont solve my purpose
as i want all the records from product where i find '%[foo]%' in the
col1 field from the entire mdb.

As Brendan said, if you can't find a way to avoid the leading wildcard,
your query is always going to have to examine all records without
benefit of the index. With that many records, a full table scan is
going to be slow, and there's nothing you can do about it.

The only thing you can do about this (if you can) is restructure your
data so that you can search without the leading wildcard. If, for
example, you are using the wildcards to search in just one part of a
so-called "intelligent key" -- one key field that is made up of several
parts concatenated together, like a category code, a type code, and a
number -- then the answer is to decompose that field into separate
fields, each of which is indexed separately. Then your query can test
for equality on just one field.

But if that's not the nature of your data, there is really nothing I can
think of to improve your query performance.
 
M

MoinJ

I want to know whether the performance with MS Access 97 will remain the same
as compared to 2002? Performance wise which version is better? Please let me
know as i have to decide on which version to use.

Thanks.

Dirk Goldgar said:
MoinJ said:
If running the query in the design window of access i do use '*',
but since i am running the SQL through VB, i gave u that syntax, so
that is not the problem.

If i use use just the trailing wildcard then it wont solve my purpose
as i want all the records from product where i find '%[foo]%' in the
col1 field from the entire mdb.

As Brendan said, if you can't find a way to avoid the leading wildcard,
your query is always going to have to examine all records without
benefit of the index. With that many records, a full table scan is
going to be slow, and there's nothing you can do about it.

The only thing you can do about this (if you can) is restructure your
data so that you can search without the leading wildcard. If, for
example, you are using the wildcards to search in just one part of a
so-called "intelligent key" -- one key field that is made up of several
parts concatenated together, like a category code, a type code, and a
number -- then the answer is to decompose that field into separate
fields, each of which is indexed separately. Then your query can test
for equality on just one field.

But if that's not the nature of your data, there is really nothing I can
think of to improve your query performance.

--
Dirk Goldgar, MS Access MVP
www.datagnostics.com

(please reply to the newsgroup)
 
D

Dirk Goldgar

MoinJ said:
I want to know whether the performance with MS Access 97 will remain
the same as compared to 2002? Performance wise which version is
better? Please let me know as i have to decide on which version to
use.

That's really a new question, don't you think? In terms of query
performance, what you're really talking about is the difference between
Jet 3.5 and Jet 4.0, so maybe you should search the web for comparisons
between them. I don't really have any information on that subject, as
far as performance goes. Concerning this particular query, I don't
think there's going to be any significant difference, because the nature
of the query forces a full table scan of a large table. Even SQL Server
can't use normal indexes in a case like that, though it does have
support for full-text indexes (about which I know next to nothing).
 
Top