NDMP backups I/O bottleneck?

NDMP backups I/O bottleneck?

Post by Mike » Thu, 18 Jan 2007 13:08:35


ello,

Scheme:

10x NetApp filer, 2x 1Tb volumes
ADIC library with 6x LTO-2 drives
Brocade Silkworm 3800 16 port FC Switch
Veritas Netbackup with shared storage option enabled

Backup Server - IBM machine with 2x Intel Xeon and 8Gb memory, running
fc3. There is an external storage with 14x 146Gb Raid5EE U320 10K RPM
drives attached.

Problem:
Volumes on the netapps have a lot of directories and small files. When
I start backup (over NDMP), netbackup mounts the tape and starts a NDMP
session between NetApp and tape drive. It takes forever to complete
that backup.

This is a log from the netapp:
Jan 16 22:59:47 EST [ndmpd:106]: Message NDMP_FH_ADD_DIR sent
Jan 16 22:59:47 EST [ndmpd:106]: Message Header:
Jan 16 22:59:47 EST [ndmpd:106]: Sequence 15997
Jan 16 22:59:47 EST [ndmpd:106]: Timestamp 1169006386
Jan 16 22:59:47 EST [ndmpd:106]: Msgtype 0
Jan 16 22:59:47 EST [ndmpd:106]: Method 1796
Jan 16 22:59:47 EST [ndmpd:106]: ReplySequence 0
Jan 16 22:59:47 EST [ndmpd:106]: Error NDMP_NO_ERR
Jan 16 22:59:47 EST [ndmpd:106]: Number of directories: 4096
Jan 16 22:59:47 EST [ndmpd:106]: Message NDMP_FH_ADD_DIR sent
Jan 16 22:59:47 EST [ndmpd:106]: Message Header:
Jan 16 22:59:47 EST [ndmpd:106]: Sequence 15998
Jan 16 22:59:47 EST [ndmpd:106]: Timestamp 1169006387
Jan 16 22:59:47 EST [ndmpd:106]: Msgtype 0
Jan 16 22:59:47 EST [ndmpd:106]: Method 1796
Jan 16 22:59:47 EST [ndmpd:106]: ReplySequence 0
Jan 16 22:59:47 EST [ndmpd:106]: Error NDMP_NO_ERR
Jan 16 22:59:47 EST [ndmpd:106]: Number of directories: 4096

I'm getting a lot of those messages...

This is tape I/O log from the same netapp..

CPU NFS CIFS HTTP Net kB/s Disk kB/s Tape kB/s
Cache
in out read write read write
age
44% 7659 0 0 4381 11951 16155 16 0 0
4
51% 6027 0 0 4357 9726 20571 12323 0 0
4
46% 6137 0 0 3098 11679 20951 10438 0 0
4
78% 5207 0 0 3369 11168 25557 9513 0 1843
4
93% 6318 0 0 4424 9623 24429 12163 0 3270
4
46% 5846 0 0 3410 8087 16164 14505 0 516
4
31% 5225 0 0 1910 7638 11612 6680 0 0
4
31% 5986 0 0 4416 11591 11960 24 0 0
4
30% 5746 0 0 4670 9569 12627 0 0 0
4
34% 7059 0 0 4049 12973 9670 8 0 0
4


As you can see here, I'm getting tape write performance about 563
Kb/s.. LTO-2 can store data at 30Mb/s (I saw it!!)

Now, top output from the backup server:

[root@backup root]# top

23:00:23 up 7 days, 9:41, 2 users, load average: 7.56, 7.25, 7.15
214 processes: 213 sleeping, 1 running, 0 zombie, 0 stopped
CPU states: cpu user nice system irq softirq iowait
idle
total 4.6% 0.0% 6.5% 0.0% 0.0% 88.4%
0.1%
cpu00 4.5% 0.0% 5.3% 0.0% 0.0% 90.0%
0.0%
cpu01 8.9% 0.0% 13.5% 0.0% 0.0% 77.4%
0.0%
cpu02 3.5% 0.0% 2.5% 0.0% 0.0% 93.0%
0.7%
cpu03 1.3% 0.0% 4.5% 0.3% 0.1% 93.4%
0.0%

It's clearly an I/O bottle neck here. NetApp send a NDMP packet with
directories, and since the backup is running in the indexed mode (with
indexes
 
 
 

NDMP backups I/O bottleneck?

Post by Faeanda » Fri, 19 Jan 2007 11:34:38

n 16 Jan 2007 20:08:35 -0800, "Mike" < XXXX@XXXXX.COM > wrote:


How long does it take for the backup to complete? And what is the
average transfer rate once it gets to phase 4?

What I think you're seeing is the same issue every file system has
when you try to do a sequential dump of a large number files and/or
directories. There is a lot of metadata that the server has to map
before it can start sending it full stream to tape.

My guess is once you get to phase 4 dump you will see very good tape
stream performance.


Capture this information at phase 4.


All this is expected and normal when dealing with phase 1 through 3 if
a dump.


Have fewer files and directories per volume. Or use snapmirror to
tape. Or skip tape alltogether and go with something like Avamar or
Data Domain.

This issue has plagued people for decades and is not specific to
NetApp and certainly not NDMP.

NDMP is simply the command protocol, not a transfer or backup
protocol. NDMP is simply keeping a connection open so the filer can
tell the NDMP client (backup server) when it's ready to start sending
data to tape via standard *nix dump command. Plus all the metadata
bits of course.

~F

 
 
 

NDMP backups I/O bottleneck?

Post by Mooji » Fri, 26 Jan 2007 22:36:47

f your server is running windows, try using datamover to determine how well
the server to storage interface performs. You'll need a demo license to use
the advanced dialog features which is probably what you want
to do. Please be careful, datamover can talk to logical or physical LUNs.
Physical device I/O will destroy your data, please use logical only.

download from www.moojit.net


"Faeandar" < XXXX@XXXXX.COM > wrote in message
news: XXXX@XXXXX.COM ...


 
 
 

NDMP backups I/O bottleneck?

Post by Raju Mahal » Sat, 27 Jan 2007 02:02:19

have you tried storage pool in between backup server and tape. I am not
sure it works in netbackup. Same issue we have. We have lots of files
on almost all netapp volume. average file size is very less.
We use Tivoli storage manager so we configured LAN back which I feel is
better in case of lots of smaller file size. Backup server first backup
files in diskstorage pool and once backup completed then moves from
storage pool to tape in offline manner so primary netapp files doesn't
remain busy alltime.
please check if diskstorage pool posibility exists in netbackup

Regards,
Raju

On Jan 17, 9:08 am, "Mike" < XXXX@XXXXX.COM > wrote:

 
 
 

NDMP backups I/O bottleneck?

Post by Curtis Pre » Sat, 27 Jan 2007 03:01:55

lot of things can slow down an NDMP backup.

First, make sure you're using a locally attached tape drive, not
three-way NDMP (where backups are sent via IP to another system). The
other will work, but not if you want any performance. ;)

Second, the layout of the volume can really affect your performance. If
your volume consists of only a few spindles, or if they're all from the
same disk shelf, etc.... Any of those can make a volume that won't
supply you with enough data to make that LTO-2 tape drive happy.

Third, the content of the filesystem really affects performance as well.
Small files are your enemy. The more files you have per GB, the slower
your performance will be.

Finally, if your NetApp is sending only, say 15 MB/s, that isn't going
to stream that LTO-2. The LTO-2 ends up spending all its time
backhitching and shoeshing, and your 15 MB/s turns into only a few MB/s.
This means that it becomes the next bottleneck.

So, do what you can to layout the volume well. As to the files, they
are what they are. Not much you can probably do there. Finally, if you
want tape performance to exactly match what's coming out of the NetApp,
consider a virtual tape library. NDMP and VTLs are a match made in
heaven. You can give every filer as many virtual tape drives as it
needs to do its local backup, without actually purchasing dozens of tape
drives.

---
W. Curtis Preston
Author of O'Reilly's Backup & Recovery and Using SANs and NAS
VP Data Protection
GlassHouse Technologies


-----Original Message-----
From: XXXX@XXXXX.COM
[mailto: XXXX@XXXXX.COM ] On Behalf Of Raju
Mahala
Sent: Thursday, January 25, 2007 9:02 AM
To: XXXX@XXXXX.COM
Subject: Re: [C.A.S.] NDMP backups I/O bottleneck?


have you tried storage pool in between backup server and tape. I am not
sure it works in netbackup. Same issue we have. We have lots of files
on almost all netapp volume. average file size is very less.
We use Tivoli storage manager so we configured LAN back which I feel is
better in case of lots of smaller file size. Backup server first backup
files in diskstorage pool and once backup completed then moves from
storage pool to tape in offline manner so primary netapp files doesn't
remain busy alltime.
please check if diskstorage pool posibility exists in netbackup

Regards,
Raju

On Jan 17, 9:08 am, "Mike" < XXXX@XXXXX.COM > wrote:
NDMP

_______________________________________________
Subscribe or Unsubscribe to this mailing list here:
http://backupcentral.com/mailman/listinfo/comp.arch.storage_backupcentra
l.com