RPM Community Forums

Mailing List Message of <rpm-users>

Re: high performance computing, HA and RPM5

From: Jeff Johnson <n3npq@mac.com>
Date: Mon 07 Dec 2009 - 17:15:21 CET
Message-id: <21295230-87CA-42F4-BA4E-D7FC21F81383@mac.com>

On Dec 7, 2009, at 10:56 AM, devzero2000 wrote:

>> 
>> So I suspect we differ in package vs. configuration management assumptions.
> Not like. I am pretty sure we agreed.

Good.

>> 
>>>> 
>>>> Doing upgrades of multiple nodes is typically done by creating a new
>>>> system image, and then undertaking a reinstallation of the new system
>>>> image. This isn't as efficient as upgrading a package on a per-node basis
>>>> because new system images will contain redundant already installed
>>>> software. Its very hard to beat a reboot of a new system image located
>>>> on a distributed file system for KISS efficiency.
>>>> 
>>>> Tracking what system image is installed back to a specific PM database
>>>> that describes the installed software within the system image could
>>>> be done with a wrapper on rpm to choose 1-of-N rpmdb's to perform
>>>> detailed queries re files in the system image. But a flat file manifest
>>>> of what packages were installed in a system image is likely sufficient
>>>> for most purposes as well.
>>> But THIS make it useless or worse, the role of a package managemement
>>> system, let it call call RPM5 or other.
>>> Are you sure ?
>> 
>> Not sure about anything. What I described is based on an assumption
>> that physical images produced by a package manager are what could
>> be distributed. What is "THIS" and why is it "useless"?
> Traduzione da Italiano verso IngleseVisualizza caratteri romani
> THIS, in my interpretation of your answer - wrong as it could be - is
> to say a system of package management is , speaking of HPC, useless.

Ah, I dinna mean to imply or say "useless". I was just trying
to describe that HPC is a different area of software distribution where
package management is (afaik) unused.

>> 
>>>> 
>>>> A distributed PM (or system image) database using some RPC transport is
>>>> fairly simple. Since installed software is slowly changing, and mostly
>>> It is an opinion. Security system patch are DAILY.
>>>> readonly after system images are created, the RPC performance
>>>> is likely not critical. Berkeley DB supplied sunrpc until db-4.8.24. Other
>>>> RPC transports onto Berkeley DB are no harder than sunrpc.
>>>> 
>>>> The above probably (imho) describes a reasonable architecture that scales efficiently
>>>> for maintaining software on most of the nodes in a HPC "cluster".
>>>> 
>>>> There's still a need for fault tolerance on the management server(s)
>>>> where images are resident and where images are produced that need
>>>> more than readonly access to databases. The management servers would
>>>> likely benefit from a replicated database (which Berkeley DB can
>>>> provide).
>>>> 
>>>> One can imagine an architecture using replicated databases across
>>>> all nodes, with full ACID transactional properties on not only the
>>>> database, but also with packages and files. But the complexity
>>>> cost, and the scaling to many nodes, likely has combinatorial
>>>> failures. There are other efficiencies, like multicast transport,
>>>> and a reliable message bus (like dbus) that would likely be needed
>>>> as well.
>>> As I replied, your answer seems to reiterate that a package management
>>> system is not useful in HPC ENVIRONMENT. But I do not agree. These is
>>> because  a package management system involves, or is a necessary
>>> substrate, for  software distribution and patch management. But the
>>> your last reply it is interesting, although it deserves further
>>> investigation.
>> 
>> There's likely a further disagreement here in package vs patch management.
>> The one attempt I'm aware of to integrate patch management into RPM
>> (from SuSE) has been largely deprecated.
>> 
>> I can go into details re why I believe the SuSE patch management did not
>> succeed (there's nothing wrong with the patch into rpm), but basically
>>        Packages as containers for immutable files is where package management "works".
>> The corollary is that mutable files, either through configuration/patch management,
>> or for files that aren't contained in packages, doesn't work very well with RPM.
> Hu ? A package management system that not work with patches it is a
> contradiction. But, i am sure, not have understood your comment. Patch
> RPM, as tempted by Suse some time ago, it is largely different from
> the problem i want discuss (BTW, "patch rpm" IMHO  was relative to
> something that  now deltarpm solves better , for i can know ). I want
> discuss only this : it is a package management system necessary in a
> HPC env ? If no, it is necessary to put on ALL the system env  a
> virtual provides for the "Requires" caps that the package, manually
> installed because rpm is no the solution for doing this (in what i
> have understood from your word), that the pkg requires ?
> 

RPM does not currently "work" with binary patches. One can
certainly put binary patches in a *.rpm and attempt application
within a scriptlet. But one cannot change the (immutable) metadata
of an installed package, and so file verification after binary
patch application is lost. There's also no easy way to
identify that a binary patch is being applied to the
correct "before" file.

Yes, deltarpm is a better (because simpler and outside of RPM itself) approach
than patch rpm's to reducing the no. of bytes that need to be transported.

But there are still issues with delta rpm's that have to deconstruct
and uncompress an entire *.rpm in order to get plaintext that might
benefit from deltafication. At this point, the complexity of deltarpm's
is perhaps _MORE_ rather than less complicated than patch rpm's would
be.

But both patch and delta rpm's suffer from a combinatorial failure
matching up all possible before <-> after end-points for patch/delta
applications.

> Best Regards
>> 
>>>> 
>>>> hth random opinions from 5 minutes of thought about
>>> HPC, HA, shared storage and RPM probably require further reflection.
>>> IMHO  they are not been mentioned in the past is probably due to the
>>> fact that many applications (user application not system) are
>>> installed manually and they have not considered the benefits to use a
>>> package management system for their applications
>> 
>> Again, please take my comments as a result of 5 minutes of thought.
>> There are other architectures, and additional implementations, that
>> would be needed for RPM to be successful in managing HPC software
>> installations.
>> 
>> For starters, there's little reason (aside from silly 32bit constraints
>> imposed on files and payloads with existing), why system image
>> or network appliance or DVD or ... management could not be done in
>> *.rpm. So far there's been little interest in attempting *.rpm
>> extensions in those directions, largely because of the known
>> 32bit limits.
> In 2009 ?

Yes cpio in 2009 has the same flaws that it has always had.

But sure other archive formats could be added, and tag types
can be changed to 64bits at some considerable incompatibility
cost with no clear (in 2009) benefit.

73 de Jeff
>> 
>> 73 de Jeff
>> ______________________________________________________________________
>> RPM Package Manager                                    http://rpm5.org
>> User Communication List                             rpm-users@rpm5.org
>> 
> ______________________________________________________________________
> RPM Package Manager                                    http://rpm5.org
> User Communication List                             rpm-users@rpm5.org
Received on Mon Dec 7 17:16:13 2009
Driven by Jeff Johnson and the RPM project team.
Hosted by OpenPKG and Ralf S. Engelschall.
Powered by FreeBSD and OpenPKG.