Update documentation to explain use of EXISTS in SQL queries

Added an explanatory section to the "exports.rst" doc to clarify why to use an EXISTS subquery instead of a JOIN clause in SQL queries involving many-to-* relationships. This explanation includes sample SQL queries and results to illustrate the potential issue of duplicates with JOIN and count, and how EXISTS can help avoid this issue. Also updated the ".editorconfig" file for .rst files.
2025-07-29 12:07:43 +00:00 · 2024-03-06 12:31:35 +01:00 · 2024-03-06 12:31:35 +01:00 · 09578a775c
commit 09578a775c
parent c888b5b84f
2 changed files with 130 additions and 0 deletions
--- a/.editorconfig
+++ b/.editorconfig
@ -23,3 +23,7 @@ max_line_length = 0
 indent_size = 2
 indent_style = space
 [.rst]
 ident_size = 3
 ident_style = space
--- a/docs/source/development/exports.rst
+++ b/docs/source/development/exports.rst
@ -242,3 +242,129 @@ This is an example of the *filter by birthdate*. This filter asks some informati
   Continue to explain the export framework
 .. _main bundle: https://git.framasoft.org/Chill-project/Chill-Main
 With many-to-* relationship, why should we set WHERE clauses in an EXISTS subquery instead of a JOIN ?
 ``````````````````````````````````````````````````````````````````````````````````````````````````````
 As we described above, the doctrine builder is converted into a sql query. Let's see how to compute the "number of course
 which count at least one activity type with the id 7". For the purpose of this demonstration, we will restrict this on
 two accompanying period only: the ones with id 329 and 334.
 Let's see the list of activities associated with those accompanying period:
 .. code-block:: sql
   SELECT id, accompanyingperiod_id, type_id FROM activity WHERE accompanyingperiod_id IN (329, 334) AND type_id = 7
       ORDER BY accompanyingperiod_id;
 We see that we have 6 activities for the accompanying period with id 329, and only one for the 334's one.
 .. csv-table::
   :header: id, accompany$iungperiod_id, type_id
   990,329,7
   986,329,7
   987,329,7
   993,329,7
   991,329,7
   992,329,7
   1000,334,7
 Let's calculate the average duration for those accompanying periods, and the number of period:
 .. code-block:: sql
   SELECT AVG(age(COALESCE(closingdate, CURRENT_DATE), openingdate)), COUNT(id) from chill_person_accompanying_period WHERE id IN (329, 334);
 The result of this query is:
 .. csv-table::
   :header: AVG, COUNT
   2 years 2 mons 21 days 12 hours 0 mins 0.0 secs,2
 Now, we count the number of accompanying period, adding a :code:`JOIN` clause which make a link to the :code:`activity` table, and add a :code:`WHERE` clause to keep
 only the accompanying period which contains the given activity type:
 .. code-block:: sql
   SELECT COUNT(chill_person_accompanying_period.id) from chill_person_accompanying_period
                 JOIN activity ON chill_person_accompanying_period.id = activity.accompanyingperiod_id
                 WHERE chill_person_accompanying_period.id IN (329, 334) AND activity.type_id = 7;
 What are the results here ?
 .. csv-table::
   :header: COUNT
   7
 :code:`7` ! Why this result ? Because the number of lines is duplicated for each activity. Let's see the list of rows which
 are taken into account for the computation:
 .. code-block:: sql
   SELECT chill_person_accompanying_period.id, activity.id from chill_person_accompanying_period
   JOIN activity ON chill_person_accompanying_period.id = activity.accompanyingperiod_id
   WHERE chill_person_accompanying_period.id IN (329, 334) AND activity.type_id = 7;
 .. csv-table::
   :header: accompanyingperiod.id, activity.id
   329,993
   334,1000
   329,987
   329,990
   329,991
   329,992
   329,986
 For each activity, a row is created and, as we count the number of non-null :code:`accompanyingperiod.id` columns, we
 count one entry for each activity (actually, we count the number of activities).
 So, let's use the :code:`DISTINCT` keyword to count only once the equal ids:
 .. code-block::
   SELECT COUNT(DISTINCT chill_person_accompanying_period.id) from chill_person_accompanying_period
   JOIN activity ON chill_person_accompanying_period.id = activity.accompanyingperiod_id
   WHERE chill_person_accompanying_period.id IN (329, 334) AND activity.type_id = 7;
 Now, it works again...
 .. csv-table::
   :header: COUNT
   2
 But, for the average duration, this won't work: the duration which are equals (because the :code:`openingdate` is the same and
 :code:`closingdate` is still :code:`NULL`, for instance) will be counted only once, which will give unexpected result.
 The solution is to move the condition "having an activity with activity type with id 7" in a :code:`EXISTS` clause:
 .. code-block:: sql
   SELECT COUNT(chill_person_accompanying_period.id) from chill_person_accompanying_period
   WHERE chill_person_accompanying_period.id IN (329, 334) AND EXISTS (SELECT 1 FROM activity WHERE type_id = 7 AND accompanyingperiod_id = chill_person_accompanying_period.id);
 The result is correct without :code:`DISTINCT` keyword:
 .. csv-table::
   :header: COUNT
   2
 And we can now compute the average duration without fear:
 .. code-block:: sql
  SELECT AVG(age(COALESCE(closingdate, CURRENT_DATE), openingdate)) from chill_person_accompanying_period
  WHERE chill_person_accompanying_period.id IN (329, 334) AND EXISTS (SELECT 1 FROM activity WHERE type_id = 7 AND accompanyingperiod_id = chill_person_accompanying_period.id);
 Give the result:
 .. csv-table::
   :header: AVG
   2 years 2 mons 21 days 12 hours 0 mins 0.0 secs