Which is faster joins or subqueries




Only output required fields


Fields that are not absolutely needed in the query result should be left out. Many specialized queries are faster than a few universal queries. They are also more ergonomic because you don't want to confuse a user who needs a customer's phone number with other data that they don't need at the moment.

Of course, this does not mean that one should be unfaithful to the generic paradigm and dispense with variable parameters in favor of literal criteria. Fortunately, in terms of maintainability and ease of work, there is no harmful potential here.


Specifically, this means:


  • No *
  • Hide conditions if there are no or expressions


The content of a simple condition field is the same in all DS anyway, i.e. it has no information. The category can be expressed in the name of the query or as an entry in a combo box that controls the criteria.


Output only required data records


What was said about fields applies even more to the limitation of the records that are output. Therefore


  • Restrict the result set with Where


Especially when there are calculated fields. The fewer lines in which a sentence-dependent calculation is carried out, the better.
How to set up WHERE conditions economically is described below.


Save queries


A saved query is significantly faster than SQL code that is dynamically generated in VBA code; in my tests by up to 400%

One should therefore assume that a saved query is also preferable as the RecordSource of a bound form. In principle, that is also correct. The fact that an SQL string entered in the origin of the data record is still uncritical is simply due to the fact that Access secretly creates invisible system queries for this purpose, which are not even visible via View Hidden Objects or System Objects. The same applies to list and combination fields.

CurrentDb.QueryDefs or a look into the system table MSysObjects reveals such queries that are easily recognizable by their prefixes:


~ sq_fFormularName or
~ sq_cFormularName ~ sq_cListeName


When using older Access versions, make sure that such system queries have already been created. If not, you do without SQL strings as the data origin and save them yourself as queries.


Only index fields for set-oriented operations


Fields in a


or one of the clauses



Must be indexed. If such operations are carried out on non-indexed fields, Jet has to resort to sorting and join algorithms, the runtime of which increases disproportionately with increasing data volume.
In addition, such actions are to be avoided calculated fields, as these do not allow index usage.




Sorting should be restricted to the minimum necessary. If a result set does not have to be sorted by code for further internal processing, it should be avoided entirely and not simply set a sorting out of habit.

Sorts on the clustered index conjured up as shown above without ORDER BY (500%) have the best performance.

Of course, this can only be used with relatively constant data, since new data records are not sorted in immediately.

Otherwise, sorting should only be carried out on index fields that also support the relevant order of the sorting fields, as described under index planning.

You don't have to worry much more about sorting, as there are hardly any other design options.




Design WHERE clauses


The following comparisons are unproblematic:


  • =
  • <
  • <=
  • >
  • >=
  • Like text *


All of these operators allow an existing index to be used. Regarding the like operator, it should also be noted that the index can be used up to the wildcard character, but not beyond. The further back this appears, the cheaper it is.


The following are problematic:



<> is executed as Not = (see next point), and with Like *… an index is no longer used at all in continuation of what was said above about Like.

If such filtering to trailing field contents systematically appears, the DB is incorrectly normalized, and one should consider how the field can be divided up so that data can be found as whole or leading field contents.





A need (not) in a criterion reliably prevents the index from being used. This also includes the use of <>.

Such expressions should, if at all possible, be transformed using appropriate logical laws. Examples of this can be found below, along with the achievable runtime gains.
There is one important exception to this rule: NOT in conjunction with the EXISTS operator for subqueries is harmless.

Use OR and AND


Often one will combine criteria with the logical operators Or and And. Jet uses one as a Rushmore A designated method of evaluating logical expressions that is particularly efficient.
In order for Jet to use this technology, the terms must meet certain criteria. You can find an article under the keyword Rushmore in the online help, the content of which is briefly summarized here.


A simple expression has the form:

[Field] comparison operator expression

A easy optimizable expression has the shape

[IndexedField] Comparison operator expression

For example, [last name] = 'Miller' if last name is indexed.
The following operators are permitted: <,>, =, <=,> =, Between ... And, Like, In


In the Access OH, <> is also listed in this list. This is wrong. With <> not only is no Rushmore optimization carried out, but the index is no longer used at all.

A complex expression is composed of simple expressions with AND and OR. The following combination options result for a two-valued complex expression:


  • Optimizable expression AND / OR Optimizable expression
  • Optimizable expression AND / OR Non-optimizable expression
  • Non-optimizable expression AND / OR Non-optimizable expression


Of course, by combining complex expressions, you can build further expressions of even greater complexity.

There are three levels for the optimizability behavior of a complex expression:


  • Fully optimizable
  • Can be partially optimized
  • Cannot be optimized

How these result can be seen in the following table.



With the help of logical laws, expressions can be transformed so that they can be optimized, for example

NOT (x <5 OR y> 3)  x> = 5 AND y <= 3


The first expression cannot be optimized in spite of the assumed index on x and y, the second expression can even be completely optimized.
There are host of logical theorems for such transformations, the most important in practice, which every programmer should master in his sleep


  • Distributive law of conjunction and disjunction
    A OR (B AND C) = (A OR B) AND (A OR C)
    A AND (B OR C) = (A AND B) OR (A AND C)
  • De Morgan's Laws
  • Law of the double negation
    NOT NOT A = A


Criteria order


Occasionally one hears the perfectly plausible tip that one should determine the order of restrictive criteria in such a way that the greatest restrictions come first. In principle, this is correct, but it is still superfluous.
On the one hand, the restrictive effect of criteria depends on the database and is therefore not known to the developer in case of doubt.
Since, on the other hand, the database system keeps internal statistics, it can very well assess the optimal order of criteria and uses this as a basis regardless of the order in the query.
So you don't have to worry about this and in this case you can leave the thinking to Jet.


OR or IN


The general recommendation to use IN instead of OR is also obsolete, since Jet uses the better variant with IN independently of the SQL code.


Avoidance of OR by UNION


Occasionally one can stumble over this tip, but in connection with other DB systems. It is not a good ideato adapt this procedure for Jet. In contrast to the systems that justify this tip, Jet - thanks to Rushmore - is quite capable of using and optimizing OR expressions with an index.
A query with OR is significantly faster than a union query with equivalent results.



BETWEEN 1 AND 5 is (minimally) better than> = 1 AND <= 5

Avoid WHERE on calculated fields

Even the simplest invoices prevent index usage. You can take advantage of this effect for testing and use the number field + 0 or the text field & '' to try out how an indexed field would behave without an index.
As a remedy, one can only try to transform the condition onto the indexed (!) Fields that are included in the calculation.
This works in principle with clearly reversible functions.
If necessary, you also have to adapt the table design accordingly.
Suppose a table contains dimensions of cuboids whose length, width, height and volume are to be managed. In addition, the requirement that the volume should be used in WHERE clauses is known, but not length, width and height.
For example, instead of storing (l, b, h) in the table and computing V in a query, you would store (V, l, b) in the table and then compute h in a query.
In the case of functions that are not clearly reversible, it can help to break them down into monotonic intervals and form their own conditions from them.
Assume that the data records are searched for in which the square of a field x is greater than 4. At first, the function x² is not clearly reversible, but it can be broken down into the monotonic intervals oo to 0 and 0 to oo.
So instead of (x * x)> 4 it is better x <-2 Or x> 2, with which, starting from the fact that the index is not used, we improve so considerably that even a Rushmore optimization is possible.


WHERE with form reference


Obtaining the criteria value from a control element in a form is unproblematic and usually unavoidable for ergonomic reasons.


Examples of optimization


All examples are based on the use of indexed fields.

Check for NULL
IsNull ([field])
[Field] is null
Speed ​​advantage approx. 1500%

Check for not NULL
NOT Is zero
With text
> = '' (Attention! No space between '')
With numbers
<0 OR> = 0
Speed ​​advantage approx. 700%

Check for unequal number
x <> 5 (NOT x = 5)
x <5 OR x> 5
Speed ​​advantage approx. 600%

Check for year
Year ([date]) = 2004
# 1/1/2004 # <= [date] AND [date] <= # 12/31/2004 #
[Date] BETWEEN # 1/1/2004 # AND # 12/31/2004 #
Speed ​​advantage approx. 2700%

Check for 0, positive, negative for calculated fields
Fields a, b, x: a * b
x = 0
x> 0
x <0
a = 0 OR b = 0
(a> 0 AND b> 0) OR (a <0 AND b <0)
(a> 0 AND b <0) OR (a <0 AND b> 0)
Speed ​​advantage approx. 700%

Check for not in quantity

NOT IN (3, 4, 5)
NOT IN (3, 4, 5)  NOT (x = 3 OR x = 4 OR x = 5)
(Dissolving IN)
NOT (x = 3 OR x = 4 OR x = 5)  NOT x = 3 AND NOT x = 4 AND NOT x = 5
(De Morgan)
NOT x = 3 AND NOT x = 4 AND NOT x = 5  x <> 3 AND x <> 4 AND x <> 5
(Dissolving not equal into unequal)
x <> 3 AND x <> 4 AND x <> 5  (x <3 OR x> 3) AND (x <4 OR x> 4) AND (x <5 OR x> 5)
(Resolving unequal into greater or smaller)
(x <3 OR x> 3) AND (x <4 OR x> 4) AND (x <5 OR x> 5)
Profit approx. 500%
Another example of replacing NOT IN, where the set of criteria is a result set, in the Subqueries chapter.

Check for not in area

→ NOT (x> = 3 AND x <= 12)
→ NOT x> = 3 OR NOT x <= 12
→ x <3 OR x> 12

x <3 OR x> 12
Profit approx. 500%

Move value check to input variable
Fields x, y with y: x * 0.12 + 24
→ x = (y - 24) / 12 * 100
y> 100
x> 633
Profit approx. 3700%

Exclude redundant conditions
(A = 3 AND B = 2 AND C = 5) OR (A = 3 AND B = 2 AND C = 12)
A = 3 AND B = 2 AND (C = 5 OR C = 12)
The first line is often created when conditions are created with the graphical query designer.

Profit approx. 20%, A, B, C all indexed.

If non-index fields and index fields are mixed, the conversion can mean that instead of no index use at all, it is even possible to convert to Rushmore in the best case. The advantage can therefore be greater.


Check for the beginning of the text
Left (field, 1) = 'M'
Like 'M *' field
Field> = 'M' And Field <'N'
Profit approx. 18900% version 3 against version 1; 800% version 3 versus version 2





The topic is now much less important than in older Access versions. The jet expression service has improved noticeably over time. Nevertheless, some tips for optimizing arithmetic expressions should be given, although they are of little interest in modern versions.


Calculated fields

Text vs. number

Common data types in descending order

  • Integer
  • Floating point number / date
  • text

Common operators in descending order

  • + - * \ Mod with (whole) numbers
  • / and &, as well as + with text, Iif (new versions)
  • Nz, Iif (old versions), functions made available by the Expression Service, such as Left, Mid, Right, Year, Month
  • Self-written VBA functions


Reshape expressions

Below are some examples that can be particularly useful in older versions. Check in each individual case what is faster under the given conditions.

Divide by a number that could also be 0. Then the return should be the NULL value:
If (x = 0; zero; y / x)
y / (x Or zero)
Do not use this expression if x can also have values ​​-1

Negative numbers should be set to 0:
If (x <0; 0; x)
(x> 0) * -x

Negative numbers should be set to ZERO:

If (x <0; zero; x)
((x> 0) or zero) * -x

The value 1 should result in 7, the value 2 in 12 (z. B. Discount Categories):
If (x = 1; 7; if (x = 2; 12; 0))
better linear combination with truth values:
- (x = 1) * 7- (x = 2) * 12
or, if it is certain that x can only be 1 or 2, polynomial:
5 * x + 2.
If the domain for x is {0, 1, 2}, then
-x * (x-8)

One can always express n arbitrary conditions as an n-dimensional linear combination and n conditions of the type "=" and a closed definition set as a polynomial in the worst case (n-1) th degree.

Auto-fill spaces:

If (IsNull (title); ""; Title & "") & first name & "" & last name
(Title + "") & first name & "" & last name)

In the concatenation with "&" the zero value behaves as a neutral element, in the concatenation with "+" it behaves as the dominant element

The gender is coded with 1 (male) and 2 (female). From this the letters "m" or "w" should be generated:

If (x = 1; 'm'; 'w')
Chr (10 * x + 99)

The gender is coded with m and w. The key figures 1 and 2 are to be generated from this:

If (x = 'm'; 1; 2)
(Asc (x) -99) / 10

Convert NULL value in string field to empty string

Nz (field; '')
'' & Field

Solve powers in multiplication

x ^ 2
x * x

Avoid Exp and Log if possible

Exp (2 * Log (x))
x * x

Alternative: calculation in the form
Arithmetic expressions in forms are not calculated faster than in queries, but if the query is used as the basis of an endless form, the image layout for form calculations is considerably better.


The time required for calculations with field values ​​within a data record is usually not critical with current Access versions. You can shoot far more capital bucks, for example, you can set a condition in such a field.



Domain functions


Instead of using a domain function (DomValue, DomSum, etc.), the related table should be linked by means of a join and, if necessary, appropriate SQL aggregate functions should be used or a subquery should be constructed. Domain functions are typically the slowest of all ways to achieve a result for a variety of reasons.





Design JOINS
In many DB systems there is some optimization potential here. In modern Jet versions, however, a surprising amount is automatically optimized by the system.


The basic advice not to set up a join classically via WHERE, but with a real JOIN, is irrelevant for Jet queries.

Classic inner join



Explicit inner join



Both syntax variants are executed by Jet in the optimal form with INNER JOIN!
This is not intended to be a plea for the old syntax (first example), but it shows how far the optimization with Jet goes.



JOINS with conditions in a data field


In principle, there are two options for including additional conditions in the join condition or in a WHERE clause:






or very classic



Indeed, all three variants executed exactly the same:



It should be remembered once again that the A.Krit field must of course be indexed in order to achieve the result “using rushmore” or at least “using index”.

The optimal procedure of first evaluating the restricting conditions for each table and then linking the other data records is always carried out by Jet, regardless of the formulation of the condition.



JOINS with conditions in the link field


There is, however, potential for optimization when joining with criteria in the Link field via a 1: n relationship; probably the most common case in a database.






Both queries deliver the same result, the second, with the restriction on the 1-side, does it 100 to 200% faster.
The execution plans of both queries are also quite different.



The ex-post-facto rationalization is easy with knowledge of the results: on the 1-side a certain value naturally occurs less often (once) than on the n-side, which means that the restriction is more effective.



Table order

The picture here was somewhat mixed.
1) (A x B) x C



2) (A x C) x B



Both queries deliver same result set and look exactly the same in the graphic design.

In some cases the query variants of Access were rewritten to the same syntax in the SQL code, in others the different bracketing was retained, but the execution plans were identical and then there was also the result that the execution plans were different for both queries and the Duration also.

When and which behavior occurs seems primarily to be related to whether between the fields used in the join Relationships with Referential Integrity consist.

Changing the brackets, i.e. the order of the joins, is not possible in the graphic design, but must be done in the SQL code.

As a rule of thumb for the order of the joins, the join with the larger selectivity, i.e. the one that delivers the smaller result set, brackets inward, and thus lets it be executed first.

They are - be careful, trap! - by no means necessarily that Tables with the fewest records, but those Pairs of tables with the fewest matches in the link field.

It is not easy to assess this without an empirical test, since the selectivity depends on the data distribution. So check the two queries that result from the table in the middle for the number of data records returned.


In the example this would be A x B and A x C, since A is linked to the other two tables.

Note that Jet may not allow all orders when outer joins occur.

If all joins are through referential integrity relationships, it seems that one can rely on Jet itself finding the optimal join order. Intervening in the SQL code to influence the sequence is then obsolete.
To be on the safe side, you should consult the output of the show schedule so that you know what is happening and not have to rely on guesswork.


Comparison operators

If possible you should use equi joins, i.e. those with = in the link expression.
Theta joins, so comparisons how

  • A.id
  • A.id <= B.id
  • A.id = B.id * 2

lead to a cross join without using an index.


INNER and OUTER joins

Outer joins are a little slower than inner joins. However, since they cannot be interchanged, the outer join will be accepted if a request requires it. At least the outer join is able to use indexes; so he by no means slips into the cross join league.
However, if you have used an outer join in an individual case, which due to the data situation (mandatory input) cannot deliver any other result than an inner join, you should replace it with it.


JOIN or subquery

Legend has it that subqueries are often very slow and a join is definitely a better choice. In fact, it is true that there are often cases in which the join performs significantly better when comparing join vs. subquery. However, there are also constellations in which the opposite is true.
The question of when a subquery and when a join has advantages is dealt with in a separate chapter below.



Use index fields
Repetitiones non placent - repetitions don't appeal, says the Romans. But he also says: Qui nocent, docent - sufferings are lessons.
So once again: Groupings should only be made on indexed fields.


WHERE instead of HAVING
With Where, criteria are applied before the grouping, with Having after. From a purely logical point of view, having only makes sense if it is used in a field that only arises in the course of a grouping through aggregation.

This should be taken to heart with other DB systems and also in older Access versions; in newer versions, Jet thinks along with you. A nonsensical Having clause is simply translated to Where when executed.

You can check the behavior with the ShowPlan and design your queries accordingly.
Usually having on non-aggregate fields does not come about on purpose, but rather through a peculiarity of the graphical query designer:

An entry in the criteria line leads to a where clause for non-grouped queries, but nonsensically to a having clause for grouped queries. The where clause is hidden behind the "function" condition.


Avoid following groups

Suppose you are grouping according to customer numbers in order to calculate sales totals from orders or the like.






Now the name of the customer should be displayed:






That should be the most commonly chosen path. However, since the name is already clearly established based on the customer number, a subsequent grouping is not only pointless, but even harmful, since groupings cause some computational effort.

Is better






First just outputs the random first value. Since after the grouping, for example after idKd = 3, all the following name values ​​are 'Danube farmer', it does not matter which of them is taken.

The first is just right because it is just at the beginning and does not have to be searched for.
Min outputs the smallest value, so it does not matter if the values ​​are identical.

In this constellation, you are spoiled for choice. This is not to be generalized - usually First and Min different Effects, but if all of the available values ​​are pre-grouped the same anyway, they do the same thing.

So which one should you choose?

If the following field, here name, not indexed is is is normally First faster if there is indexed is is Min more quickly.

The gain of First / Min compared to Group By is around 30% with little data on a subsequent field, with large amounts of data and several subsequent fields, a lot can come together.

It is best to try whether First or Min is faster in individual cases.


Group By in the join


You can significantly improve the result from just now if you put the grouping on the right side of the link. So not like above






If you do the grouping on the n-side instead of on the 1-side, that brings you from 40% - 60% upwards.


Handling of aggregate functions


Count (*)
Deviating from the basic rule "Avoid *" at the very beginning, it applies to the aggregate function Count that Count (*) is better than Count ([field]), since the Count (*) syntax has been specially optimized for Rushmore.

"Real" aggregates
... like sum, average and first / last do not benefit from the index. For a sum, for example, an index is irrelevant, since all values ​​must be added in each case, whereby their order, thanks to the commutative law, does not matter. Fields that are only required for such aggregates therefore do not have to be indexed.

“Fake” aggregates
... like Min, Max, Count benefit enormously. In an indexed field, the algorithm required to determine the minimum is reduced, for example, to simply reading the first entry. Such fields should therefore be indexed.

Unique records
To get unique data records from a field with repeated values, there are two approaches:






The advantage of DISTINCT compared to GROUP BY is here at 1400%



UNION queries


Avoid with E-R model
Experience has shown that around three quarters of all Union queries originate from an incorrect data model. Frequently seen requirements are, for example:

For a telephone list you want to combine fields from a customer table and a supplier table or addresses from a table for lecturers and a table for students for a circular or the like.

A correct model would, for example, keep a company table that contains common data fields for customers and suppliers, and append the customer and supplier-specific data in specialization tables using a 1-1 / 0 relationship each.

What is to be achieved by the union query in the faulty model is already available as a table in the correct model.

Of course, a clean model is a little more complex because customers can also be people. Since that is not our topic here, reference is made to my script for database normalization, which describes the correct table structure.

Gains of 1200% - 1500% could be measured as speed advantages through this measure.


For union queries, which also appear in a proper data model, it is better to use Union All Select instead of Union Select.

You have to know that a distinctive is already built into the simple union; multiple identical data records are therefore eliminated from the result. If the occurrence of duplicates does not bother you or if they do not appear anyway due to the data situation, Union All can override this automatic system.

The speed advantage hereby is in the order of 80%




Stand-alone query as

  • Comparison in WHERE clause
  • Origin of a single field, if unique
  • Data origin in FROM (Derived Table, actually no UA in the narrower sense)



Advantages and disadvantages
The biggest hurdle is undoubtedly that those spoiled by graphical query designers have to write text-oriented SQL. In addition, there are syntax components specific to subqueries such as EXISTS and the set predicates.

If you have solved this purely organizational problem by finally redeeming your resolution to learn the complete SQL syntax by heart at the turn of the year, there is little in the way of a meaningful use.

One differentiates:

  • Correlated UA (with reference to the main query) - These are executed n times  Observe all performance rules in the UA!
  • Uncorrelated UA (without reference to the main query) - These are only executed once and make their result available to all DS of the main query

Some queries can only be solved with subqueries.



Transformation into JOINS


Correlated subqueries can usually also be formulated as a join, which is often the faster alternative. However, one should not ask for this unchecked, there are also enough cases in which it is the other way around. Make sure in each individual case which solution is the cheapest.

Criterion in external table
One of the typical applications for subqueries is the filtering of a result set according to a criterion that is obtained from another table.



  • Process (idVrg, VrgBezeichnung, fiPrs)
  • Personnel (idPrs, PrsName, fiAbt)
  • Department (idAbt, AbtName)

with process n: 1 staff n: 1 department

Inquiry: Search for all processes that belong to department 3. The information is contained as a foreign key fiAbt in Personal, in general "Filter data from table A according to a criterion in table B, which is linked to A"

With subquery in the WHERE clause and IN:



With subquery in the WHERE clause and set predicate:



With subquery with EXISTS:



These typical subqueries can be formulated as a join as follows:



The join variant is around 200% faster than the subqueries.

At this point I want to go back to the alternative Key structure reminds you that detail tables inherit all key information from all higher-level tables. This means that the request can be resolved without any join and subquery, since the department number as a key part would be part of the activity table anyway.


Biggest / smallest
Another typical application for a subquery: Find the data record that has the largest (smallest) value in the criteria field.


  • Order (idBst, BstDatum, fiKd, ...)

Inquiry: Find the latest orders

Subquery with set predicate:


IMAGE page 53


To express this query without a subquery, you can use the TOP predicate:


IMAGE page 53


Note that TOP 1 by no means delivers only one DS with certainty. If several orders were made on the last order date in the example, all these DS are also output when selecting the fields in the example.

If TOP 1 were to be used on a non-unique field to determine the maximum as a return value, a SELECT DISTINCT TOP 1 would be required.

In the present example, however, it is precisely desirable that several DS appear if several have the same maximum value. The subqueries behave in the same way.
Subquery with aggregate


IMAGE page 54


Here the subquery solution with> = ALL is clearly the slowest, followed by the TOP1 solution. The subquery with the aggregate function is the undefeated winner.

This last query is 20,600% better than TOP1 and 128,000% better than the query with a set predicate.

Now two more generic examples in which the pendulum swings significantly in favor of the subquery:

1: n master with detail ("customers with order")
Inner join with Distinct


IMAGE page 54


Exists with subquery


IMAGE page 54


The subquery was found to be 2,300% up to 12,700% (!) Faster during testing. In addition, unlike Select Distinct, it can be edited.

1: n master without detail ("customers without order")
Outer join with Where Is Null


IMAGE page 54


Not (!) Exists with subquery



Advantage of the second solution in the test 1,500% to 17,000% (!!)
This can also serve as a substitute to avoid a subquery with NOT IN, like



One can certainly make general statements about the constellations in which solutions with or without a subquery are to be preferred. However, the rules in question are so complex and enforced, with exceptions, for the occurrence of which there are of course complex rules, that on the one hand it goes beyond the scope of this script and on the other hand simple testing of the alternatives leads to the goal more quickly.

Therefore only a few non-binding rules of thumb, the applicability of which should be tested in each individual case.

1) Compare to



are rather unfavorable, since all results have to be compared.

As well



2) Compare on



are cheaper because they can be canceled after the first hit. compare to



behave similarly

3) Compare on



are favorable, since a comparison is only made to a value that can also be found quickly via an index field with the aggregate functions Min / Max.

4) Compare using



are usually extremely fast; the latter even despite NOT.