Compare commits

...

10 Commits

2 changed files with 241 additions and 5 deletions

112
README.md Normal file
View File

@@ -0,0 +1,112 @@
CHILL - Reprise de données
==========================
Ce dépôt contient un script d'import qui s'applique à un canevas excel présenté au client. Le client remplit le fichier excel, puis le script insère les données dans la base de donnée.
L'opération est semi-automatique et réduit considérablement le temps dédié à l'import en structurant le format des données en entrée. Par contre il y a toujours une série de manipulations, pour préparer et insérer les données correctement.
Ces manipulations sont décrites ici.
Le client a rempli le canevas. Une relecture du fichier est toujours nécessaire afin de repérer les éventuelles irrégularités.
## 1. Préparer les fichiers csv
Le fichier se compose de plusieurs feuilles, chacune doit être sauvée au format csv.
Pour préparer les fichiers on va:
- nettoyer le fichier pour ne laisser en étiquette que les noms de colonnes en anglais.
- ajouter une colonne de contôle en fin de ligne, par sécurité. Par exemple une colonne 'endcol' qui contient pour chaque cellule 'endrow'.
- ajouter les doublequote lors de la sauvegarde du csv,
- enlever tous les line breaks et caractères spéciaux.
```bash
# Exemple de remplacements exécutés sur les fichiers csv pour un import spécifique :
$ sed -e :1 -e '$q' -e "/$CR\$/b" -e 'N;s/\n//;b1' < file.2.csv > file.3.csv
$ sed -e 's#"end"#"end"\n#g' < file.3.csv > file.4.csv
# Exemple pour un autre import:
$ cat file2.csv | sed -e 'N; s#_x000D_##g; s#\n##g; s/$CR//g' | tr "\n" " " > file3.csv
$ sed -e 's#"endcol"#"endcol"\n#g; s#"endrow"#"endrow"\n#g' < file3.csv > file4.csv
$ sed -e 's#^,##g; s#^ ##g' < file4.csv > file5.csv
```
## 2. Insérer les csv dans la base de donnée
On va insérer chaque feuille csv comme table à part entière d'un nouveau schéma `import`. On aura:
- import.choix_personnes
- import.personnes
- import.choix_periodes
- import.periodes
Pour réaliser cet import, on peut utiliser des outils tels que `pgfutter`, mais celui-ci peut s'avérer capricieux selon le fichier.
La meilleure méthode pour moi est de réaliser cette étape en local avec phpstorm, puis d'exporter le schéma `import` avec pg_dump avant de le transférer sur le serveur.
### 2.a Manipulations dans phpstorm
- S'il n'existe pas, créer le schéma `import`; s'il existe, s'assurer qu'il ne contient pas de tables ni de données.
#### Importer le csv dans la db
- ouvre le fichier csv > passe en onglet text > edit as table > set options:
- cocher 'first row is header'
- 'null value text': undefined (pas de champs null dans la table, mais un texte vide)
- then > open table
- import to database > set options:
- régler target/schema: import
- et table: même nom que le csv
- DDL: TEXT pour tous les champs
- then > import
#### Exporter en sql
- créer un fichier `<client>-data.sql` vide
- depuis chaque table du schéma `import`:
- copier le DDL de la table dans le fichier (s'assurer d'ajouter le préfixe `import.` sur chaque requête)
- export data > extractor: SQL-insert-multirow > copy to clipboard
- coller les données dans `<client>-data.sql`
## 3. Import du schéma 'import' sur le serveur (safran)
- transférer le fichier `<client>-data.sql` sur le serveur (avec scp):
```bash
$ scp cyclo-data.sql debian@safran:~/data/tmp/
```
- faire une sauvegarde de la base sur laquelle on va réaliser l'insertion
```bash
debian@safran:~/bin$ bash backup_now_db.sh 5436 cycloprod
debian@safran:~/bin$ ls -l dump/ | tail -1
-rw-r--r-- 1 postgres postgres 234954230 Mar 15 10:40 20230315-104003_cycloprod.sql
```
- importer le fichier sql sur la base cible: `$ sudo su postgres -c 'psql -p5436'`
```sql
postgres=# \c cycloprod
You are now connected to database "cycloprod" as user "postgres".
cycloprod=# \dt import.*
Did not find any relation named "import.*".
cycloprod=# CREATE SCHEMA import;
-- insertion
cycloprod=# \i '/home/debian/data/tmp/cyclo-data.sql'
-- vérifier que le schéma import est en place
cycloprod=# \dt import.*
List of relations
Schema | Name | Type | Owner
--------+-----------------+-------+----------
import | choix_periodes | table | postgres
import | choix_personnes | table | postgres
import | periodes | table | postgres
import | personnes | table | postgres
(4 rows)
```
## 4. Exécution du script de migration
Se fait dans la console postgresql, en tant que user postgres, en étant connecté à la base de donnée cible.
On joue pas-à-pas les blocs de la section 'Up' du script `sql/import.sql`
## Tips
- Dans phpstorm, si on veut renommer le schéma pour ne pas tout mélanger, il vaut mieux faire 'Modify schema', car 'Rename' va faire des remplacements partout

View File

@@ -3,7 +3,12 @@
-- version v0.6 (== version canevas)
--
-- /!\ IMPORTANT
-- Avant de migrer (UP), il faut d'abord avoir importé les codes postaux !!!
-- * Avant de migrer (UP), il faut d'abord avoir importé les codes postaux !!!
-- * Adapter les valeurs par défaut
-- * centres: cfr. 41 et 42
-- * socialIssues: cfr. 56
-- * referrer: cfr. 57
-- * scopes: cfr. 58
--
@@ -315,6 +320,7 @@ INSERT INTO chill_person_household_composition (id, household_id, startdate, hou
FROM import.personnes
WHERE household_composition_type1 IS NOT NULL ;
-- 50. Prepare id mapping before insertion
ALTER TABLE import.periodes ADD column period_id BIGINT;
UPDATE import.periodes SET period_id = periodid
@@ -324,7 +330,8 @@ UPDATE import.periodes SET period_id = periodid
-- 51. Insert in chill_person_accompanying_period
INSERT INTO chill_person_accompanying_period (id, openingdate, closingdate, step, remark, intensity, createdby_id, createdat, updatedby_id, updatedat) SELECT
period_id, COALESCE(openingdate1, date(date_trunc('year', CURRENT_DATE))), closingdate1,
period_id,
COALESCE(openingdate1, date(date_trunc('year', CURRENT_DATE))), closingdate1,
'CONFIRMED', COALESCE(TRIM(remark), ''), intensity1,
(SELECT distinct(first_value(id) OVER(ORDER BY id)) FROM users), CURRENT_DATE,
(SELECT distinct(first_value(id) OVER(ORDER BY id)) FROM users), CURRENT_DATE
@@ -345,7 +352,7 @@ INSERT INTO chill_main_address (id, postcode_id, street, streetnumber, validFrom
ALTER TABLE import.choix_periodes ADD COLUMN address_location_id BIGINT;
UPDATE import.choix_periodes SET address_location_id = (SELECT max(id) FROM chill_main_address) WHERE street != '';
-- 54. Link period to person or temporary address location
-- 54. Link person or temporary address location to periods
UPDATE chill_person_accompanying_period acp
SET addresslocation_id = (SELECT address_location_id FROM import.choix_periodes WHERE address_location_id IS NOT NULL LIMIT 1)
FROM import.personnes pson JOIN import.periodes piod ON pson.id = piod.id
@@ -363,6 +370,96 @@ INSERT INTO chill_person_accompanying_period_location_history (id, period_id, st
FROM chill_person_accompanying_period acp
WHERE id NOT IN (SELECT period_id FROM chill_person_accompanying_period_location_history) AND step LIKE 'CONFIRMED' ORDER BY id;
-- 56. Link socialIssues to periods
INSERT INTO chill_person_accompanying_period_social_issues (accompanyingperiod_id, socialissue_id)
SELECT
DISTINCT ON (t.period_id) t.period_id,
COALESCE(
t.enfant_id,
t.parent_id,
1 -- default value ?
) AS socialissue_id
FROM (
SELECT p.period_id,
(SELECT id FROM chill_person_social_issue WHERE title::jsonb->>'fr' = icp.parent1::jsonb->>'fr' AND parent_id IS NULL) AS parent_id, icp.parent1,
(SELECT id FROM chill_person_social_issue WHERE title::jsonb->>'fr' = icp.enfant1::jsonb->>'fr' AND parent_id =
(SELECT id FROM chill_person_social_issue WHERE title::jsonb->>'fr' = icp.parent1::jsonb->>'fr' AND parent_id IS NULL)) AS enfant_id, icp.enfant1
FROM import.periodes p
JOIN import.choix_periodes icp ON p.acp_socialissues = icp.acp_socialissues
ORDER BY id) AS t;
-- 57. Link referrer to periods
UPDATE chill_person_accompanying_period acp
SET user_id = COALESCE(
(SELECT id FROM users WHERE users.username = ip.referrer),
1 -- default value ?
)
FROM import.periodes ip WHERE acp.id = ip.period_id;
--SELECT ip.id, (SELECT id FROM users WHERE users.username = ip.referrer) AS referrer_id, ip.referrer, acp.id as period_id, acp.user_id FROM chill_person_accompanying_period acp JOIN import.periodes ip ON ip.period_id = acp.id ORDER BY ip.id;
-- 58. Link scopes to periods
INSERT INTO accompanying_periods_scopes (accompanying_period_id, scope_id)
SELECT ip.period_id, COALESCE(
(SELECT id FROM scopes s WHERE ip.acp_scopes1::jsonb->>'fr' = s.name::jsonb->>'fr'),
(SELECT id from scopes s WHERE s.name::jsonb->>'fr' = 'tous') -- default value 'tous'
)
FROM import.periodes ip;
-- 59. Link origin to periods
UPDATE chill_person_accompanying_period acp SET origin_id =
(SELECT id FROM chill_person_accompanying_period_origin o WHERE o.label::jsonb->>'fr' = ip.origin1::jsonb->>'fr')
FROM import.periodes ip WHERE acp.id = ip.period_id;
--SELECT ip.id, ip.origin1, acp.id as period_id, acp.origin_id FROM chill_person_accompanying_period acp JOIN import.periodes ip ON ip.period_id = acp.id ORDER BY ip.id;
-- 60. Link jobs to periods
UPDATE chill_person_accompanying_period acp SET job_id =
(SELECT id FROM chill_main_user_job j WHERE j.label::jsonb->>'fr' = ip.job1::jsonb->>'fr')
FROM import.periodes ip WHERE acp.id = ip.period_id;
-- 61. Link administrative Location
-- (to be add in csv)
-- 62. Add and link comments
INSERT INTO chill_person_accompanying_period_comment (id, accompanyingperiod_id, content, creator_id, createdat, updatedby_id, updatedat)
SELECT nextval('chill_person_accompanying_period_comment_id_seq'), period_id, comment1_content,
(SELECT user_id FROM chill_person_accompanying_period acp WHERE acp.id = ip.period_id), CURRENT_TIMESTAMP,
(SELECT user_id FROM chill_person_accompanying_period acp WHERE acp.id = ip.period_id), CURRENT_TIMESTAMP
FROM import.periodes ip WHERE ip.comment1_content != '';
INSERT INTO chill_person_accompanying_period_comment (id, accompanyingperiod_id, content, creator_id, createdat, updatedby_id, updatedat)
SELECT nextval('chill_person_accompanying_period_comment_id_seq'), period_id, comment2_content,
(SELECT user_id FROM chill_person_accompanying_period acp WHERE acp.id = ip.period_id), CURRENT_TIMESTAMP,
(SELECT user_id FROM chill_person_accompanying_period acp WHERE acp.id = ip.period_id), CURRENT_TIMESTAMP
FROM import.periodes ip WHERE ip.comment2_content != '';
INSERT INTO chill_person_accompanying_period_comment (id, accompanyingperiod_id, content, creator_id, createdat, updatedby_id, updatedat)
SELECT nextval('chill_person_accompanying_period_comment_id_seq'), period_id, comment3_content,
(SELECT user_id FROM chill_person_accompanying_period acp WHERE acp.id = ip.period_id), CURRENT_TIMESTAMP,
(SELECT user_id FROM chill_person_accompanying_period acp WHERE acp.id = ip.period_id), CURRENT_TIMESTAMP
FROM import.periodes ip WHERE ip.comment3_content != '';
INSERT INTO chill_person_accompanying_period_comment (id, accompanyingperiod_id, content, creator_id, createdat, updatedby_id, updatedat)
SELECT nextval('chill_person_accompanying_period_comment_id_seq'), period_id, comment4_content,
(SELECT user_id FROM chill_person_accompanying_period acp WHERE acp.id = ip.period_id), CURRENT_TIMESTAMP,
(SELECT user_id FROM chill_person_accompanying_period acp WHERE acp.id = ip.period_id), CURRENT_TIMESTAMP
FROM import.periodes ip WHERE ip.comment4_content != '';
INSERT INTO chill_person_accompanying_period_comment (id, accompanyingperiod_id, content, creator_id, createdat, updatedby_id, updatedat)
SELECT nextval('chill_person_accompanying_period_comment_id_seq'), period_id, comment5_content,
(SELECT user_id FROM chill_person_accompanying_period acp WHERE acp.id = ip.period_id), CURRENT_TIMESTAMP,
(SELECT user_id FROM chill_person_accompanying_period acp WHERE acp.id = ip.period_id), CURRENT_TIMESTAMP
FROM import.periodes ip WHERE ip.comment5_content != '';
-- 63. Link pinned comment to period
UPDATE import.periodes SET comment1_content = null WHERE comment1_content = '';
UPDATE import.periodes SET comment2_content = null WHERE comment2_content = '';
UPDATE import.periodes SET comment3_content = null WHERE comment3_content = '';
UPDATE import.periodes SET comment4_content = null WHERE comment4_content = '';
UPDATE import.periodes SET comment5_content = null WHERE comment5_content = '';
UPDATE chill_person_accompanying_period acp SET pinnedcomment_id =
(SELECT id FROM chill_person_accompanying_period_comment com WHERE com.accompanyingperiod_id = acp.id
AND com.content = COALESCE(ip.comment5_content, ip.comment4_content, ip.comment3_content, ip.comment2_content, ip.comment1_content)
LIMIT 1)
FROM import.periodes ip WHERE acp.id = ip.period_id;
-- ~~Link closingmotive~~ (to be removed from csv)
-- ========================================================================================= --
@@ -370,6 +467,35 @@ INSERT INTO chill_person_accompanying_period_location_history (id, period_id, st
-- DOWN
--
-- Undo 63.
UPDATE chill_person_accompanying_period acp SET pinnedcomment_id = null FROM import.periodes ip WHERE acp.id = ip.period_id;
UPDATE import.periodes SET comment1_content = '' WHERE comment1_content IS NULL;
UPDATE import.periodes SET comment2_content = '' WHERE comment2_content IS NULL;
UPDATE import.periodes SET comment3_content = '' WHERE comment3_content IS NULL;
UPDATE import.periodes SET comment4_content = '' WHERE comment4_content IS NULL;
UPDATE import.periodes SET comment5_content = '' WHERE comment5_content IS NULL;
-- Undo 62.
DELETE FROM chill_person_accompanying_period_comment com USING import.periodes ip WHERE com.accompanyingperiod_id = ip.period_id;
SELECT setval('chill_person_accompanying_period_comment_id_seq', (SELECT COALESCE(max(id), 1) FROM chill_person_accompanying_period_comment));
-- Undo 61.
-- Undo 60.
UPDATE chill_person_accompanying_period acp SET job_id = null FROM import.periodes ip WHERE acp.id = ip.period_id;
-- Undo 59.
UPDATE chill_person_accompanying_period acp SET origin_id = null FROM import.periodes ip WHERE ip.period_id = acp.id;
-- Undo 58.
DELETE FROM accompanying_periods_scopes acs USING import.periodes ip WHERE acs.accompanying_period_id = ip.period_id;
-- Undo 57.
UPDATE chill_person_accompanying_period acp SET user_id = null FROM import.periodes ip WHERE ip.period_id = acp.id;
-- Undo 56.
DELETE FROM chill_person_accompanying_period_social_issues asi USING import.periodes ip WHERE asi.accompanyingperiod_id = ip.period_id;
-- Undo 55.
DELETE FROM chill_person_accompanying_period_location_history history USING import.periodes ip WHERE history.period_id = ip.period_id;
SELECT setval('chill_person_accompanying_period_location_history_id_seq', (SELECT COALESCE(max(id), 1) FROM chill_person_accompanying_period_location_history));
@@ -543,8 +669,6 @@ ALTER TABLE import.periodes DROP COLUMN openingdate1;
ALTER TABLE import.periodes DROP COLUMN closingdate1;
-- -------------
-- tiers choices_list: civility kind profession category
-- =============
-- QUESTIONS