Q: Cluster File System Reliability

Q: Cluster File System Reliability

Post by nicc77 » Mon, 03 Sep 2007 05:50:50

Hi there,

I am rather new to cluster file systems, so let me state my
requirements, and hopefully the kind people here get point me in the
right direction.

Basically I want a file system spanning multiple physical hosts, but
mounted on a server as a single mount point - almost like having some
kind of RAID over various physical nodes. The idea is that if one node
fails, the server should still be able to continue with processing
(assume it's something like a database). I am hoping that when the
failed file server(s) come back online, the "RAID" should be rebuild
on the fly, like a real RAID. This will be first prize.

If the above is not possible, or if the performance penalty is too
high, I will also settle for a real time mirroring solution, but with
one crucial requirement: the write on the remote file server (the
mirror) should be guaranteed - in other words, first ensure the data
is on the remote host before writing to the local host disc, and then
only return to the system. I know this will also have a very negative
impact on performance, but I am between a rock and a hard place with
all the requirements (fast file system, but never ever loose any

Perhaps my requirements can not be met, but then I at least need some
pointer as to the next best thing.

Any help will be greatly appreciated. I will feedback to the group in
months to come as to what eventually worked for us (or didn't work and

Thanks in advance.

PS: if anybody knows something about battery backups for hard drives,
and how to protect the hard drive cache in case of power failures etc,
I would also like to hear some of your thoughts/learnings/experiences.

Q: Cluster File System Reliability

Post by The Natura » Mon, 03 Sep 2007 07:30:42

Not much to ask...is it? :-)

I think proprietary systems exist, but not on Linux as such. You would
need to hack some code I think.

I can concieve of one Linux box running a custom daemon talking to
similar and intercepting network file requests and sharing them out
amongst the other machines..use synchronous writes to ensure data
integrity, and loads of RAM to cache files with..

You might use linux as a starting point, but you would be writing your
own code to run on it, I suspect.

The solution there is to have UPS and power monitoring to flush caches
and halt further access before the system is taken down.

Rather than asking for a particular solution, how about describing te
exact problem you are trying to solve..like how much storage, and how
much throughput, and what level of integrity.

There may be other approaches that would work..


Q: Cluster File System Reliability

Post by Juha Laih » Mon, 03 Sep 2007 15:17:03

nicc777 < XXXX@XXXXX.COM > said:

This doesn't sound what commonly is called "cluster file system";
"cluster file system" (to me, at least), means a set of shared storage
accessible by several computers concurrently (as local storage).
So, the actual storage media will be concurrently connected to
several hosts, and all hosts are able to concurrently access the
file system(s) created on the shared media.

What you describe sounds something I've heard Google having developed
for their search data storage - but I haven't heard of such technology
being used anywhere else.

There are storage subsystems (disk racks) from various vendors with
various amounts of local battery-backed cache. As long as the write
to the cache completes, the disk subsystem will be in consistent
state (and when the power resumes, will flush any buffered data to
Wolf a.k.a. Juha Laiho Espoo, Finland
(GC 3.0) GIT d- s+: a C++ ULSH++++$ P++@ L+++ E- W+$@ N++ !K w !O !M V
PS(+) PE Y+ PGP(+) t- 5 !X R !tv b+ !DI D G e+ h---- r+++ y++++
"...cancel my subscription to the resurrection!" (Jim Morrison)

Q: Cluster File System Reliability

Post by nicc77 » Mon, 03 Sep 2007 17:51:35

n Sep 2, 12:30 am, The Natural Philosopher < XXXX@XXXXX.COM > wrote:

Ok - here goes...

I am trying to put together an architecture that can handle large
amount of data processing near real time (data of around 10TB per
day), but the processing of data should not be broken by any hardware
failures. Think of an accounting/billing system like you get in mobile
operators (or the telecoms industry in general) - if they have a
critical hardware failure in the billing modules one of two things
could happen: 1) you can make calls for free for a period of time; or
2) you can make no calls at all (system stops you). The two scenarios
depend on how the system is designed to handle situations where the
billing module stops working.

*** I am trying to prevent the billing system from breaking in the
first place. ***

That means a "cluster" with nodes in various physical locations,
working as a single image. I don't know if there is a term for this,
but the best I can come up with is a entire system looking/working
like a RAID5 disk system (maybe the term "grid" also kind of describes
it). Storage obviously makes a big part of the solution, as all the
data that is streamed to the system should be written to disk first to
guarantee that the nodes processing the data can retrieve the data
again (in case of a failure somewhere else in the processing system).
This "raw" data might also be archived from time to time when you are
collecting data for troubleshooting parts of your system - the
captured data can be "re-run" in a simulation environment where you
can recreate a days worth of processing, but perhaps slowed down or
something while you debug (if that makes sense to anybody).

of what I want (11g) - but it would be great if we can put a complete
"open source" solution together that many others can use. Not that I
have something against Oracle per say, but I have this thing against
vendor lock-in, and Oracle is rather good with that.

And yes - I do belief a lot of coding will have to take place.
Personally I am leaning toward something like MPI for the data
processing bits - but still, reliable storage is a problem for me. I
don't know how I can get a distributed (physically) filesystem that
looks and behaves like a single system image (or partition) that would
make for highly reliable data storage.

I hope that sheds some more light of what I want to achieve.




Q: Cluster File System Reliability

Post by The Natura » Mon, 03 Sep 2007 19:44:44

icc777 wrote:

Is this a data base application?
I am not sure, but think if you are trying to spread a database across
multiple engines, this is something that people like e.g. oracle have
addressed already.

Well yes, one always hopes..

MM. Again I think this is something that database people have addressed
but in a different way.

google 'oracle cluster' or 'VAX cluster' and see what you come up with.

I am very hazy on exactly wha is involved here, but I *think* that is a
way of having multiple transaction engines that synchronise with each
other. The application, rather than the hardware or OS software, does this.

I have to say in this case my first port of call would be to phone an
IBM salesman. My brother in law works for them in this sort of
area..might drop him an email..I know he deals with implementation of
massive database projects for very large companies. He wouldn't know
himself, but he probably knows a amn who does..

Of all the software people I have worked with, Oracle get my vote for
total professionalism of approach. At least in the UK, them and IBM.
Both have chosen to be ultra high quality total support vendors to
pretty large installation user bases. That doesn't come free, and
largely you accept the lock in as the price you pay for the quality you get.

I think you should be looking more at a a multiple database/machine
system that looks like a single one.

In essence transactions are, I think, cached and distributed amongst

I will say one thing: the likelihood of - say - a heavily monitored
server farm going down (Oracle UK have about half an acre of machines,
mainly customer machines, in their facility) is a LOT less than the
likelihood of WAN link going down.

Likewise the ability to concentrate 24x7 staff in a single location,
with the expertise and spares they need, is also great.

Plus the cost of building - yea even unto diesel generators - a fully
power conditioned air conditioned environment that is - if your paranoia
runs that deep - terrorist bomb proof..is a lot less than spreading it
all around.

What comes to my mind would be some sort of cluster of RAIDED machines -
proper hardware RAID, fully hot swappable - backed up by service
contracts, hot and cold running technicians and plenty of spares..and
built with e.g. twin Ethernet cards etc..for redundancy. The networking
layer itself needs attention..you will need a failover routing policy in
there as well. Or perhaps simply create two networks with two switches
and run some sort of dynamic routing on the boxes so they use whichever
network is available..

..but you still need to take machines down sometimes...more n that later.

You need multiply redundant internet connections from TRULY different
suppliers running down TRULY different bits of physical fibre. Coming
into your site by totally different physical routes..yep the digger
through the single point conduit scenario looms here..;-)

Unsurprisingly this is precisely the sort of facility that e.g. Oracle

So hosting at a place like that, even if you have a few of YOUR staff
permanently seconded to the site, is probably cost effective unless you
truly have megabucks to play with.

Now you have a hardware system that should take you through anything but
CPU type failure.

I have a feeling that its possible to have twin engines hitting a single

Q: Cluster File System Reliability

Post by nicc77 » Tue, 04 Sep 2007 01:14:41

Thanks for all the advice - it really helps to hear some one else's
thoughts on these issues. I suppose there is only so much you can do,
and in the end it depends what the cost implication would be.

You have given me a lot to chew on, so let me be of then :)

Cheers and thanks again !

Q: Cluster File System Reliability

Post by nicc77 » Tue, 04 Sep 2007 01:16:08

Thanks for your thoughts.

Q: Cluster File System Reliability

Post by Alex » Thu, 06 Sep 2007 02:05:47

Take a look at http://www.yqcomputer.com/
Except [yet] for the resync part, it may worth a try.

Hope it helps,

Q: Cluster File System Reliability

Post by Eric Beche » Sat, 15 Sep 2007 18:53:36

nicc777 a rit :
try coda !