RPM Community Forums

Mailing List Message of <rpm-devel>

Retro-fitting an encoding on rpm tag metadata.

From: Jeff Johnson <n3npq@mac.com>
Date: Mon 31 Mar 2008 - 18:30:39 CEST
Message-Id: <B85F9BF2-BE81-4983-89D4-619A7D9C6677@mac.com>
Its time (imho) to figure a way to guarantee an encoding for
rpm tag metadata.

The conversion using iconv(3) is rather simple:

SYNOPSIS
        #include <iconv.h>

        size_t iconv(iconv_t cd,
                     char **inbuf, size_t *inbytesleft,
                     char **outbuf, size_t *outbytesleft);

The compexity and confusion comes solely from guessing
what string to load into "fromcode" in:

SYNOPSIS
        #include <iconv.h>

        iconv_t iconv_open(const char *tocode, const char *fromcode);

(I'm assuming that "UTF-8" is the primary desired target encoding, but
there's also complexity and confusion choosing "tocode". No matter what,
the choices for "tocode" are orders of magnitude simpler).

There are 2 approaches to encoding conversions:

1) Guessing, trying, first conversion success is what is used.

      There are many flaws with this approach (as used by glib), but
      if the order of guessing is carefully chosen, then the approach
      is viable even if the encoded target may be sometimes surprising.

2) Attaching an existing, known encoding to each of the elements
that are currently stored in RPMTAG_I18NTABLE.

     Using nicknames for encodings, with fully explicit encodings  
determined
     by secondary lookup, is essentially what glibc does with /usr/ 
share/locale/aliases
     (which is one perfectly reasonable, but perhaps unportable, choice
     for choosing a source encoding.

     The primary issue with using RPMTAG_I18NTABLE contents as a hint  
for
      source encoding is that only Summary:, Group:, and %description  
use
      RPMTAG_I18NTABLE currently. That is not a show-stopper if an  
explicit
      encoding is specified through rpm configuration for all tags.

The amount of work involved in calling iconv(3) is trivial, so I will  
likely implement
both approaches in the next couple week's. Use whichever conversion  
scheme
works for you ...

But if there are other approaches to encoding conversion that are  
desired, now
would be a wonderful time to say something ...

73 de Jeff
Received on Mon Mar 31 18:31:37 2008
Driven by Jeff Johnson and the RPM project team.
Hosted by OpenPKG and Ralf S. Engelschall.
Powered by FreeBSD and OpenPKG.