Update documentation to explain use of EXISTS in SQL queries

Added an explanatory section to the "exports.rst" doc to clarify why to use an EXISTS subquery instead of a JOIN clause in SQL queries involving many-to-* relationships. This explanation includes sample SQL queries and results to illustrate the potential issue of duplicates with JOIN and count, and how EXISTS can help avoid this issue. Also updated the ".editorconfig" file for .rst files.
This commit is contained in:
Julien Fastré 2024-03-06 12:31:35 +01:00
parent c888b5b84f
commit 09578a775c
Signed by: julienfastre
GPG Key ID: BDE2190974723FCB
2 changed files with 130 additions and 0 deletions

View File

@ -23,3 +23,7 @@ max_line_length = 0
indent_size = 2
indent_style = space
[.rst]
ident_size = 3
ident_style = space

View File

@ -242,3 +242,129 @@ This is an example of the *filter by birthdate*. This filter asks some informati
Continue to explain the export framework
.. _main bundle: https://git.framasoft.org/Chill-project/Chill-Main
With many-to-* relationship, why should we set WHERE clauses in an EXISTS subquery instead of a JOIN ?
``````````````````````````````````````````````````````````````````````````````````````````````````````
As we described above, the doctrine builder is converted into a sql query. Let's see how to compute the "number of course
which count at least one activity type with the id 7". For the purpose of this demonstration, we will restrict this on
two accompanying period only: the ones with id 329 and 334.
Let's see the list of activities associated with those accompanying period:
.. code-block:: sql
SELECT id, accompanyingperiod_id, type_id FROM activity WHERE accompanyingperiod_id IN (329, 334) AND type_id = 7
ORDER BY accompanyingperiod_id;
We see that we have 6 activities for the accompanying period with id 329, and only one for the 334's one.
.. csv-table::
:header: id, accompany$iungperiod_id, type_id
990,329,7
986,329,7
987,329,7
993,329,7
991,329,7
992,329,7
1000,334,7
Let's calculate the average duration for those accompanying periods, and the number of period:
.. code-block:: sql
SELECT AVG(age(COALESCE(closingdate, CURRENT_DATE), openingdate)), COUNT(id) from chill_person_accompanying_period WHERE id IN (329, 334);
The result of this query is:
.. csv-table::
:header: AVG, COUNT
2 years 2 mons 21 days 12 hours 0 mins 0.0 secs,2
Now, we count the number of accompanying period, adding a :code:`JOIN` clause which make a link to the :code:`activity` table, and add a :code:`WHERE` clause to keep
only the accompanying period which contains the given activity type:
.. code-block:: sql
SELECT COUNT(chill_person_accompanying_period.id) from chill_person_accompanying_period
JOIN activity ON chill_person_accompanying_period.id = activity.accompanyingperiod_id
WHERE chill_person_accompanying_period.id IN (329, 334) AND activity.type_id = 7;
What are the results here ?
.. csv-table::
:header: COUNT
7
:code:`7` ! Why this result ? Because the number of lines is duplicated for each activity. Let's see the list of rows which
are taken into account for the computation:
.. code-block:: sql
SELECT chill_person_accompanying_period.id, activity.id from chill_person_accompanying_period
JOIN activity ON chill_person_accompanying_period.id = activity.accompanyingperiod_id
WHERE chill_person_accompanying_period.id IN (329, 334) AND activity.type_id = 7;
.. csv-table::
:header: accompanyingperiod.id, activity.id
329,993
334,1000
329,987
329,990
329,991
329,992
329,986
For each activity, a row is created and, as we count the number of non-null :code:`accompanyingperiod.id` columns, we
count one entry for each activity (actually, we count the number of activities).
So, let's use the :code:`DISTINCT` keyword to count only once the equal ids:
.. code-block::
SELECT COUNT(DISTINCT chill_person_accompanying_period.id) from chill_person_accompanying_period
JOIN activity ON chill_person_accompanying_period.id = activity.accompanyingperiod_id
WHERE chill_person_accompanying_period.id IN (329, 334) AND activity.type_id = 7;
Now, it works again...
.. csv-table::
:header: COUNT
2
But, for the average duration, this won't work: the duration which are equals (because the :code:`openingdate` is the same and
:code:`closingdate` is still :code:`NULL`, for instance) will be counted only once, which will give unexpected result.
The solution is to move the condition "having an activity with activity type with id 7" in a :code:`EXISTS` clause:
.. code-block:: sql
SELECT COUNT(chill_person_accompanying_period.id) from chill_person_accompanying_period
WHERE chill_person_accompanying_period.id IN (329, 334) AND EXISTS (SELECT 1 FROM activity WHERE type_id = 7 AND accompanyingperiod_id = chill_person_accompanying_period.id);
The result is correct without :code:`DISTINCT` keyword:
.. csv-table::
:header: COUNT
2
And we can now compute the average duration without fear:
.. code-block:: sql
SELECT AVG(age(COALESCE(closingdate, CURRENT_DATE), openingdate)) from chill_person_accompanying_period
WHERE chill_person_accompanying_period.id IN (329, 334) AND EXISTS (SELECT 1 FROM activity WHERE type_id = 7 AND accompanyingperiod_id = chill_person_accompanying_period.id);
Give the result:
.. csv-table::
:header: AVG
2 years 2 mons 21 days 12 hours 0 mins 0.0 secs