13 KiB
Exports
Export is an important issue within the Chill software : users should be able to :
- compute statistics about their activity ;
- list "things" which are a part of their activities.
The [main bundle`_ provides a powerful framework to build custom queries with re-usable parts across differents bundles.
:local:
`The issue where this framework was discussed Provides some information about the pursued features and architecture.
Concepts
Some vocabulary: 3 "Export elements"
Four terms are used for this framework :
Exports provide some basic operation on the data. Two kinds of exports are available :
- computed data: it may be "the number of people," "the number of activities," "the duration of activities," ...
- list data: it may be "the list of people," "the list of activities," ...
Filters The filters create a filter on the data: it removes some information the user doesn't want to introduce in the computation done by the export.
Example of a filter: "people under 18-year-olds," "activities between the 1st of June and the 31st December," ...
Aggregators The aggregator aggregates the data into some group (some software use the term 'bucket').
Example of an aggregator: "group people by gender," "group people by nationality," "group activity by type," ...
Formatters
The formatters format the data into a :class:Symfony\Component\HttpFoundation\Response, which will be returned "as is" by the controller to the web client.
Example of a formatter: "format data as CSV", "format data as an ods spreadsheet," ...
Anatomy of an export
An export can be thought of as a sentence where each part of this sentence refers to one or multiple export elements. Examples :
Example 1: Count the number of people having at least one activity in the last month 12, and group them by nationality and gender, and format them in a CSV spreadsheet.
Here :
- count the number of people is the export part
- having at least one activity is the filter part
- group them by nationality is the aggregator part
- group them by gender is a second aggregator part
- format the date in a CSV spreadsheet is the formatter part
Note that :
- Aggregators, filters, exports and formatters are cross-bundle. Here the bundle activity provides a filter which is applied on an export provided by the person bundle ;
- Multiple aggregators or filters for one export may exist. Currently, only one export is allowed.
The result might be :
+-------------+--------+------------------+ | Nationality | Gender | Number of people | | Russian | Male | 12 | | Russian | Female | 24 | | France | Male | 110 | | France | Female| 150 |
Example 2: Count the average duration of an activity with type "meeting", which occurs between the 1st of June and the 31st of December, group them by week, and format the data in an OpenDocument spreadsheet.
Here :
- count the average duration of an activity is the export part
- activity with type meeting is a filter part
- activity which occurs between the 1st of June and the 31st of December is a filter
- group them by week is the aggregator part
- format the date in an OpenDocument spreadsheet is the formatter part
The result might be :
+-----------------------+----------------------+ | Week | Number of activities | | 2015-10 | 10 | | 2015-11 | 12 | | 2015-12 | 10 | | 2015-13 | 9 |
Authorization and exports
Exports, filters, and aggregators should not show data the user is not allowed to see within the application.
In other words, developers are required to take care of user authorization for each export.
There should be a specific role that grants permission to users who are allowed to build exports. For more simplicity, this role should apply on a center, and should not require special circles.
How does the magic work?
To build an export, we rely on the capacity of the database to execute queries with aggregate (i.e. GROUP BY) and filter (i.e. WHERE) instructions.
An export is an SQL query that is initiated by an export and modified by aggregators and filters.
Example: Count the number of people having at least one activity in the last month 12, and group them by nationality and gender
- The report initiates the query
SELECT count(people.*) FROM people
- The filter adds a where and join clause :
SELECT count(people.*) FROM people
RIGHT JOIN activity
WHERE activity.date IS BETWEEN now AND 6 month ago
- The aggregator "nationality" adds a GROUP BY clause and a column in the SELECT statement:
SELECT people.nationality, count(people.*) FROM people
RIGHT JOIN activity
WHERE activity.date IS BETWEEN now AND 6 month ago
GROUP BY nationality
- The aggregator "gender" does the same job as the nationality aggregator : it adds a GROUP BY clause and a column in the SELECT statement :
SELECT people.nationality, people.gender, count(people.*)
FROM people RIGHT JOIN activity
WHERE activity.date IS BETWEEN now AND 6 month ago
GROUP BY nationality, gender
Each filter, aggregator, and filter may collect parameters from the user through a form. This form is appended to the export form. Here is an example.
The screenshot shows the export form for CountPeople (Nombre de personnes). The filter by date of birth is checked (Filtrer par date de naissance de la personne), which triggers a subform, which is provided by the :class:Chill\PersonBundle\Export\Filter\BirthdateFilter. The other unchecked filter does not show the subform.
Two aggregators are also checked: by Country of birth (Aggréger les personnes par pays de naissance, the corresponding class is :class:Chill\PersonBundle\Export\Aggregator\CountryOfBirthAggregator, which also triggers a subform. The aggregator by gender (Aggréger les personnes par genre) is also checked, but there is no corresponding subform.
The Export Manager
The Export manager (:class:Chill\MainBundle\Export\ExportManager is the central class which registers all exports, aggregators, filters, and formatters.
The export manager is also responsible for orchestrating the whole export process, producing a :class:Symfony\FrameworkBundle\HttpFoundation\Request for each export request.
The export form step
The form step allows you to build a form, combining different parts of the module.
The building of forms is split into different subforms, where each one is responsible for rendering their part of the form (aggregators, filters, and export).
:scale: 40%
The formatter form step
The formatter form is processed after the user filled the export form. It is built the same way, but receives the data entered by the user on the previous step as parameters (i.e. export form). It may then adapt it accordingly (example: show a list of columns selected in aggregators).
Processing the export
This schema can explain the export process :
:scale: 40%
(Click to enlarge)
Export, formatters, and filters explained
Exports
This is an example of the CountPerson export :
- Line 36: the
getTypefunction returns a string. This string will be used to find the aggregtors and filters which will apply to this export. - Line 41: a simple description to help users understand what your export does.
- Line 46: The title of the export. A summary of what your export does.
- Line 51: The list of roles required to execute this export.
- Line 56: We initiate the query here...
- Line 59: We have to filter the query with centers the users checked in the form. We process the $acl variable to get all
Centerobjects in one array - Line 63: We create the query with a query builder.
- Line 74: We return the result but make sure to hydrate the results as an array.
- Line 103: return the list of formatter types which are allowed to be applied on this filter
Filters
This is an example of the filter by birthdate. This filter asks some information through a form (buildForm is not empty), and this form must be validated. To perform this validation, we implement a new Interface: Chill\MainBundle\Export\ExportElementValidatedInterface:
Continue to explain the export framework
With many-to-* relationship, why should we set WHERE clauses in an EXISTS subquery instead of a JOIN?
As we described above, the doctrine builder is converted into a sql query. Let's see how to compute the "number of course which count at least one activity type with the id 7". For the purpose of this demonstration, we will restrict this on two accompanying period only: the ones with id 329 and 334.
Let's see the list of activities associated with those accompanying period:
SELECT id, accompanyingperiod_id, type_id FROM activity WHERE accompanyingperiod_id IN (329, 334) AND type_id = 7 ORDER BY accompanyingperiod_id;
We see that we have 6 activities for the accompanying period with id 329, and only one for the 334's one.
| id | accompanyingperiod_id | type_id |
|---|---|---|
| 990 | 329 | 7 |
| 986 | 329 | 7 |
| 987 | 329 | 7 |
| 993 | 329 | 7 |
| 991 | 329 | 7 |
| 992 | 329 | 7 |
| 1000 | 334 | 7 |
Let's calculate the average duration for those accompanying periods, and the number of period:
SELECT AVG(age(COALESCE(closingdate, CURRENT_DATE), openingdate)), COUNT(id) from chill_person_accompanying_period WHERE id IN (329, 334);
The result of this query is:
| AVG | COUNT |
|---|---|
| 2 years 2 mons 21 days 12 hours 0 mins 0.0 secs | 2 |
Now, we count the number of accompanying period, adding a JOIN clause which make a link to the activity table, and add a WHERE clause to keep
only the accompanying period which contains the given activity type:
SELECT COUNT(chill_person_accompanying_period.id) from chill_person_accompanying_period JOIN activity ON chill_person_accompanying_period.id = activity.accompanyingperiod_id WHERE chill_person_accompanying_period.id IN (329, 334) AND activity.type_id = 7;
What are the results here?
| COUNT |
|---|
| 7 |
7 ! Why this result? Because the number of lines is duplicated for each activity. Let's see the list of rows which
are taken into account for the computation:
SELECT chill_person_accompanying_period.id, activity.id from chill_person_accompanying_period JOIN activity ON chill_person_accompanying_period.id = activity.accompanyingperiod_id WHERE chill_person_accompanying_period.id IN (329, 334) AND activity.type_id = 7;
| accompanyingperiod.id | activity.id |
|---|---|
| 329 | 993 |
| 334 | 1000 |
| 329 | 987 |
| 329 | 990 |
| 329 | 991 |
| 329 | 992 |
| 329 | 986 |
For each activity, a row is created and, as we count the number of non-null accompanyingperiod.id columns, we
count one entry for each activity (actually, we count the number of activities).
So, let's use the DISTINCT keyword to count only once the equal ids:
SELECT COUNT(DISTINCT chill_person_accompanying_period.id) from chill_person_accompanying_period JOIN activity ON chill_person_accompanying_period.id = activity.accompanyingperiod_id WHERE chill_person_accompanying_period.id IN (329, 334) AND activity.type_id = 7;
Now, it works again...
| COUNT |
|---|
| 2 |
But, for the average duration, this won't work: the duration which are equals (because the openingdate is the same and
closingdate is still NULL, for instance) will be counted only once, which will give unexpected result.
The solution is to move the condition "having an activity with activity type with id 7" in a EXISTS clause:
SELECT COUNT(chill_person_accompanying_period.id) from chill_person_accompanying_period WHERE chill_person_accompanying_period.id IN (329, 334) AND EXISTS (SELECT 1 FROM activity WHERE type_id = 7 AND accompanyingperiod_id = chill_person_accompanying_period.id);
The result is correct without DISTINCT keyword:
| COUNT |
|---|
| 2 |
And we can now compute the average duration without fear:
SELECT AVG(age(COALESCE(closingdate, CURRENT_DATE), openingdate)) from chill_person_accompanying_period WHERE chill_person_accompanying_period.id IN (329, 334) AND EXISTS (SELECT 1 FROM activity WHERE type_id = 7 AND accompanyingperiod_id = chill_person_accompanying_period.id);
Give the result:
| AVG |
|---|
| 2 years 2 mons 21 days 12 hours 0 mins 0.0 secs |