2011/4/5 Jeff Johnson <email@example.com>:
> On Apr 5, 2011, at 3:49 PM, Per Řyvind Karlsen wrote:
>> 2011/4/5 Jeff Johnson <firstname.lastname@example.org>:
>>> No way Jose!
>>> rpmbuild (and *.rpm metadata) can NOT have any encoding
>>> Encoding is for DISPLAY, not for octets.
>>> Put unicode into package metadata at your own peril.
>>> Meanwhile -- without an means to specify encoding in metadata --
>>> rpm in C has *ONLY* 8 bit clean octet's and the usual conventions
>>> for NUL terminated strings.
>>> Until there's a well defined means of specifying encoding for all
>>> tag strings -- and that's a fundamental design change to *.rpm packaging that
>>> likely will NEVER happen -- the problem simply CANNOT be fixed to meet naive
>>> luser expectations, and all attempts to "fix" anything
>>> are just doomed.
>>> C has octest, not utf8, and rpmdb strings are _NOT_ based on LC_ALL
>>> and other i18n/l10n conventions.
>>> You can of course put whatever garbage you wish into strings that
>>> will be stored as keys in an rpmdb, subject to all the usual
>>> GIGO conventions distro's wish to inflict upon their customers.
>> Okay, my mistake anyways, I was looking into an issue with unicode strings,
>> then I specified wrong locale when testing. I notice now that with properly
>> specified locale, it accepts unicode characters.
> The test is way way feeble, but once the expectation starts, well
> there's nothing to do but solve the problem "correctly".
> What is broken -- by design -- is that *.spec recipes have multiple
> encodings, not a per-file encoding. And all hell starts to break
> loose in *.rpm packages when retrievals using keys pick up a per-key encoding.
> You tell me how to lookup all possible encodings from a database without
> specifically tying an encoding to every possible tag.
>> Still though, using '%description -l', descriptions disappears.. :|
> C permits octets, not encodings. All possible encodings fit into octets
> with NUL terminated strings. The only thing that saves %description
> (which doesn't belong in *.rpm packages, another design issue that I don't
> feels like arguing about because you somplly will NOT like the answer
> of specifying possibly hundreds of properly encoded %description's in
> a single *.spec using the full-blown form of the 4-tuple used for
> encoding on a per-tag basis. Package metadata will simply explode
> for no known purpose.
> And RPM_I18NSTRING_TYPE has been on death row all of this century,
> is carried along solely because PLD and a few other distros *still*
> insist on inserting translations into *.spec recipes directly.
> A data type that is sometimes an arary, and sometimes a scalar dependent
> on the context of interpretation just isn't a useful data type.
> Nor is there any known/modern reason why all possible encodings MUST be carried in
> each and every package header in the year 2011. There's specspo and other means
> of %description et al distribution that are far far superiour to RPM_I18NSTRING_TYPE pulled
> in from *.spec recipes. This was _NOT_ true back in 1998 when RPM_I18NSTRING_TYPE was devised.
Hm, okay, so better obviously needs to be done.
For what currently is though, is it supposed to be broken or...?
Received on Tue Apr 5 22:13:25 2011