From 09578a775c95d746763d058ed679b13802a2cee7 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Julien=20Fastr=C3=A9?= Date: Wed, 6 Mar 2024 12:31:35 +0100 Subject: [PATCH] Update documentation to explain use of EXISTS in SQL queries Added an explanatory section to the "exports.rst" doc to clarify why to use an EXISTS subquery instead of a JOIN clause in SQL queries involving many-to-* relationships. This explanation includes sample SQL queries and results to illustrate the potential issue of duplicates with JOIN and count, and how EXISTS can help avoid this issue. Also updated the ".editorconfig" file for .rst files. --- .editorconfig | 4 + docs/source/development/exports.rst | 126 ++++++++++++++++++++++++++++ 2 files changed, 130 insertions(+) diff --git a/.editorconfig b/.editorconfig index a3e5a0fc1..bede621e3 100644 --- a/.editorconfig +++ b/.editorconfig @@ -23,3 +23,7 @@ max_line_length = 0 indent_size = 2 indent_style = space +[.rst] +ident_size = 3 +ident_style = space + diff --git a/docs/source/development/exports.rst b/docs/source/development/exports.rst index 3b01f9e0f..9447a0575 100644 --- a/docs/source/development/exports.rst +++ b/docs/source/development/exports.rst @@ -242,3 +242,129 @@ This is an example of the *filter by birthdate*. This filter asks some informati Continue to explain the export framework .. _main bundle: https://git.framasoft.org/Chill-project/Chill-Main + + +With many-to-* relationship, why should we set WHERE clauses in an EXISTS subquery instead of a JOIN ? +`````````````````````````````````````````````````````````````````````````````````````````````````````` + +As we described above, the doctrine builder is converted into a sql query. Let's see how to compute the "number of course +which count at least one activity type with the id 7". For the purpose of this demonstration, we will restrict this on +two accompanying period only: the ones with id 329 and 334. + +Let's see the list of activities associated with those accompanying period: + +.. code-block:: sql + + SELECT id, accompanyingperiod_id, type_id FROM activity WHERE accompanyingperiod_id IN (329, 334) AND type_id = 7 + ORDER BY accompanyingperiod_id; + +We see that we have 6 activities for the accompanying period with id 329, and only one for the 334's one. + +.. csv-table:: + :header: id, accompany$iungperiod_id, type_id + + 990,329,7 + 986,329,7 + 987,329,7 + 993,329,7 + 991,329,7 + 992,329,7 + 1000,334,7 + +Let's calculate the average duration for those accompanying periods, and the number of period: + +.. code-block:: sql + + SELECT AVG(age(COALESCE(closingdate, CURRENT_DATE), openingdate)), COUNT(id) from chill_person_accompanying_period WHERE id IN (329, 334); + +The result of this query is: + +.. csv-table:: + :header: AVG, COUNT + + 2 years 2 mons 21 days 12 hours 0 mins 0.0 secs,2 + +Now, we count the number of accompanying period, adding a :code:`JOIN` clause which make a link to the :code:`activity` table, and add a :code:`WHERE` clause to keep +only the accompanying period which contains the given activity type: + +.. code-block:: sql + + SELECT COUNT(chill_person_accompanying_period.id) from chill_person_accompanying_period + JOIN activity ON chill_person_accompanying_period.id = activity.accompanyingperiod_id + WHERE chill_person_accompanying_period.id IN (329, 334) AND activity.type_id = 7; + +What are the results here ? + +.. csv-table:: + :header: COUNT + + 7 + +:code:`7` ! Why this result ? Because the number of lines is duplicated for each activity. Let's see the list of rows which +are taken into account for the computation: + +.. code-block:: sql + + SELECT chill_person_accompanying_period.id, activity.id from chill_person_accompanying_period + JOIN activity ON chill_person_accompanying_period.id = activity.accompanyingperiod_id + WHERE chill_person_accompanying_period.id IN (329, 334) AND activity.type_id = 7; + +.. csv-table:: + :header: accompanyingperiod.id, activity.id + + 329,993 + 334,1000 + 329,987 + 329,990 + 329,991 + 329,992 + 329,986 + +For each activity, a row is created and, as we count the number of non-null :code:`accompanyingperiod.id` columns, we +count one entry for each activity (actually, we count the number of activities). + +So, let's use the :code:`DISTINCT` keyword to count only once the equal ids: + +.. code-block:: + + SELECT COUNT(DISTINCT chill_person_accompanying_period.id) from chill_person_accompanying_period + JOIN activity ON chill_person_accompanying_period.id = activity.accompanyingperiod_id + WHERE chill_person_accompanying_period.id IN (329, 334) AND activity.type_id = 7; + +Now, it works again... + +.. csv-table:: + :header: COUNT + + 2 + +But, for the average duration, this won't work: the duration which are equals (because the :code:`openingdate` is the same and +:code:`closingdate` is still :code:`NULL`, for instance) will be counted only once, which will give unexpected result. + +The solution is to move the condition "having an activity with activity type with id 7" in a :code:`EXISTS` clause: + +.. code-block:: sql + + SELECT COUNT(chill_person_accompanying_period.id) from chill_person_accompanying_period + WHERE chill_person_accompanying_period.id IN (329, 334) AND EXISTS (SELECT 1 FROM activity WHERE type_id = 7 AND accompanyingperiod_id = chill_person_accompanying_period.id); + +The result is correct without :code:`DISTINCT` keyword: + +.. csv-table:: + :header: COUNT + + 2 + +And we can now compute the average duration without fear: + +.. code-block:: sql + + SELECT AVG(age(COALESCE(closingdate, CURRENT_DATE), openingdate)) from chill_person_accompanying_period + WHERE chill_person_accompanying_period.id IN (329, 334) AND EXISTS (SELECT 1 FROM activity WHERE type_id = 7 AND accompanyingperiod_id = chill_person_accompanying_period.id); + +Give the result: + +.. csv-table:: + :header: AVG + + 2 years 2 mons 21 days 12 hours 0 mins 0.0 secs