Month names are missing (01.10.2020)

Machine translation from German

After a change, the names of the months were missing in the calendars. The visitors who are not familiar with the German language may also be interested in how I repaired the names of the months in the calendars. I hope the machine translation is both amusing and understandable. Here I show the essential parts of the scripts. The complete documented scripts will soon be available in the description of the software for the calendars, but only in German.

The mistake

The identifiers of the language-dependent texts in the DOCROOT/local/local.xml.LANG. files have been changed. The month names belong to the language-dependent texts. The old identifiers are still in the basic calendar data files DOCROOT/kal/b/YEAR/COUNTRY.xml. and DOCROOT/kal/b/YEAR/a.xml.LANG.. They must be replaced by the new IDs. Because of this bug the month names are missing in the calendars.

I use a Perl script for this.

The old identifiers

The old identifiers are used in every basic data file. I read it from the DOCROOT/kal/b/2020/de.xml.. The placeholders are contained in the file in the form <l:ph id="ID"/>. The file contains the calendar data not only for the current year, but also for December of the previous year and January of the following year. The data for each month usually contain the placeholder for the month name more than once. The following Perl code reads the old IDs.

   my $oldids = [];
   my $fn   = "DOCROOT/kal/b/2020/de.xml.";
   my $verb = 1;
   my $h;
   if (!open ($h, "<", $fn)) {
      print STDERR "Kann Datei \"$fn\" nicht öffnen\n";
      return;
   }
   my $d;
   {
      local $INPUT_RECORD_SEPARATOR;
      $d = <$h>;
      close $h;
   }
   my $pid = "";
   my $id  = "";
   while ( $d =~ /<l:ph id="([a-z0-9]+)"\/>/g ) {
      $id = $1;
      if ($id ne $pid) {
	 push (@$oldids, $id);
	 $pid = $id;
      }
   }   
   if ($verb) {
      $pid = 0;
      for $id (@$oldids) {
	 print "$pid $id\n";
	 ++$pid;
      }
   }
  

The new identifiers

I read the texts for the IDs from the DOCROOT/local/local.xml.de. file with the German texts. They are contained in the form <t id="ID">TEXT</t>. I save the IDs for all texts as $newids -> {"TEXT"} = "ID".

   my $newids = {};
   $fn = "DOCROOT/local/local.xml.de.";
   $h  = undef;  
   if (!open ($h, "<:encoding(utf-8)", $fn)) {
      print STDERR "Kann Datei \"$fn\" nicht öffnen\n";
      return;
   }
   {
      local $INPUT_RECORD_SEPARATOR;
      $d = <$h>;
      close $h;
   }
   # <t id="cr">Januar</t>
   while ( $d =~ /<t id="([a-z0-9]+)">([^>]+)<\/t>/g ) {
      $newids -> {$2} = $1;
   }
   my $names = [
      "Januar",
      "Februar",
      "März",
      "April",
      "Mai",
      "Juni",
      "Juli",
      "August",
      "September",
      "Oktober",
      "November",
      "Dezember",
   ],
   if ($verb) {
      my $nm;
      my $i = 0;
      for $nm (@$names) {
	 ++$i;
	 print "$i $nm ", $newids -> {$nm}, "\n";
      }
   }
  

Assign old and new IDs

Now I assign the new IDs to the old IDs in the form $idmap -> {"OLD_ID"} = "NEW_ID"}.

The first entry in the list of old identifiers is the identifier for the name of the month of December of the previous year. I ignore this entry. The next twelve entries in the list are the identifiers of the names of the months from January to December. The assignment is simple:

   my $idmap = {};
   shift @$oldids;
   my $nm;
   while (@$names) {
      $nm = shift @$names;
      $id = shift @$oldids;
      $idmap -> {$id} = $newids -> {$nm};
   }
  

Replace old IDs

I am now creating a function that replaces the old IDs with the new IDs in a calendar data file. A regular expression finds all placeholders of the form <l:ph id="ID"/>. The old ID is the first match group $1 of the regular expression. An auxiliary function supplies the placeholder with the new ID: <l:ph id="NEWID"/>. An old ID that has not been assigned a new ID (i.e. no month name) remains.

   my $subst = sub {
      $id = $idmap -> {$1} || $1;
      return "<l:ph id=\"$id\"/>";
   };
  

The function for processing a file takes two parameters: $in is the file path of the input file with the old IDs, $out is the file path of the output file with the new IDs.

   my $proc_file = sub {
      my ($in, $out) = @_;
      print "$in -> $out\n" if $verb;
      $h = undef;
      if (!open ($h, "<:encoding(utf-8)", $in)) {
	 print STDERR "Kann Datei \"$in\" nicht öffnen\n";
	 return;
      }
      {
	 local $INPUT_RECORD_SEPARATOR;
	 $d = <$h>;
	 close $h;
      }
      $d =~ s/<l:ph id="([a-z0-9]+)"\/>/$subst -> ($idmap, $1)/ge ;
      $h = undef;
      if (!open ($h, ">:encoding(utf-8)", $out)) {
	 print STDERR "Kann Ausgabedatei \"$out\" nicht öffnen\n";
	 return;
      }
      print $h $d;
      close $h;
   };
  

Process all files

All basic data files DOCROOT/kal/b/YEAR/COUNTRY.xml. and DOCROOT/kal/b/YEAR/a.xml.LANG. must be corrected. For the corrected files I choose a different path prefix instead of DOCROOT/kal/b (DOCROOT/kal/bnew. I read the directory DOCROOT/kal/b and the subdirectories YEAR and process the basic data files:

   use File::Spec::Functions qw(catdir catfile);
   use File::Path qw(make_path);
    
   my $indir  = "DOCROOT/kal/b";
   my $outdir = "DOCROOT/kal/bnew";
   my ($dh, $sdh);
   my ($de, $sde);
   my $dp;
   my $od;
   opendir ($dh, $indir);
   while (defined ($de = readdir ($dh))) {
      next unless $de =~ /^20\d\d$/;
      $dp = catdir ($indir, $de);
      next unless -d $dp;
      $sdh = undef;
      opendir ($sdh, $dp)
      $od = catdir ($outdir, $de);
      make_path ($od);
      while (defined ($sde = readdir ($sdh))) {
	 next unless $sde =~ /\.xml(?:\..+)?\.$/;
	 $proc_file -> (catfile ($dp, $sde), catfile ($od, $sde));
      }
      closedir $sdh;
   }
   closedir $dh;
  

Gzip compressed files

I add a gzip-compressed file with the suffix .gz to each basic data file. To do this, I use a bash script:

#!/bin/bash
b=$(realpath $0);
b=${b%/src/*};
dir=$b/docroot/kal/bnew;
for sd in $dir/*; do
   echo "subdir $sd";
   f=${sd#$dir/};
   [[ $f =~ ^20[0-9][0-9]$ ]] || continue;
   [[ -d $sd ]] || continue;
   for f in $sd/*xml*\. ; do
      echo $f;
      gzip --best --stdout $f > ${f}gz ;
   done;
done;