[ntp:questions] ntp sanity limit kills ntp daily

[ntp:questions] ntp sanity limit kills ntp daily

Post by Brad Knowl » Sun, 22 May 2005 17:12:11



That's one I hadn't heard of before. I knew about the HZ and
ACPI issues, which are not unique to Linux, but hdparm settings seems
to me like it would be something that probably would be unique to
Linux.

Does anyone have any more in-depth information on the issue of
hdparm settings causing massive loss of interrupts like this?

--
Brad Knowles, < XXXX@XXXXX.COM >

"Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety."

-- Benjamin Franklin (1706-1790), reply of the Pennsylvania
Assembly to the Governor, November 11, 1755

SAGE member since 1995. See < http://www.yqcomputer.com/ ; for more info.
 
 
 

[ntp:questions] ntp sanity limit kills ntp daily

Post by Jan Ceulee » Sun, 22 May 2005 18:19:08

rad Knowles wrote:

The only direct reference in the hdparm manpage is the following:

-u Get/set interrupt-unmask flag for the drive. A
setting of 1 permits the driver to unmask other
interrupts during processing of a disk interrupt,
which greatly improves Linux's responsiveness and
eliminates "serial port overrun" errors. Use this
feature with caution: some drive/controller combi nations do not tolerate the increased I/O latencies
possible when this feature is enabled, resulting in
massive filesystem corruption. In particular,
CMD-640B and RZ1000 (E)IDE interfaces can be unre liable (due to a hardware flaw) when this option is
used with kernel versions earlier than 2.0.13.
Disabling the IDE prefetch feature of these inter faces (usually a BIOS/CMOS setting) provides a safe
fix for the problem for use with earlier kernels.

I don't know how relevant the above cautionary statement still is in
kernels that are somewhat more recent than 2.0.13...

There is another reference to interrupts in the manpage where it deals
with IDE Block Mode, but my interpretation is that this merely reduces
the frequency of disk-related interrupts, without having any influence
over the masking of other interrupts.

Here you go:

-m Get/set sector count for multiple sector I/O on the
drive. A setting of 0 disables this feature. Mul tiple sector mode (aka IDE Block Mode), is a fea ture of most modern IDE hard drives, permitting the
transfer of multiple sectors per I/O interrupt,
rather than the usual one sector per interrupt.
When this feature is enabled, it typically reduces
operating system overhead for disk I/O by 30-50%.
On many systems, it also provides increased data
throughput of anywhere from 5% to 50%. Some
drives, however (most notably the WD Caviar
series), seem to run slower with multiple mode
enabled. Your mileage may vary. Most drives sup port the minimum settings of 2, 4, 8, or 16 (sec tors). Larger settings may also be possible,
depending on the drive. A setting of 16 or 32
seems optimal on many systems. Western Digital
recommends lower settings of 4 to 8 on many of
their drives, due tiny (32kB) drive buffers and
non-optimized buffering algorithms. The -i flag
can be used to find the maximum setting supported
by an installed drive (look for MaxMultSect in the
output). Some drives claim to support multiple
mode, but lose data at some settings. Under rare
circumstances, such failures can result in massive
filesystem corruption.


Jan

 
 
 

[ntp:questions] ntp sanity limit kills ntp daily

Post by Allen McIn » Sun, 22 May 2005 22:16:08

> Does anyone have any more in-depth information on the issue of

Personal experience. When I burn a CD, my machine loses serial
interrupts, and NTP claims to have lost sync. When I complained about
this on c.p.t.ntp, someone suggested looking at hdparm, and sure enough
unmasking made the problem go away. Now, the clock loss was not nearly
as massive (say 5 minutes from burning 3-4 CD's), but I still think it's
something the OP should pursue.
 
 
 

[ntp:questions] ntp sanity limit kills ntp daily

Post by Brian T. B » Wed, 15 Jun 2005 23:05:25

Silly rabbit: A $2 component means $40 in higher final product price.
That difference makes a BIG difference to the customer world.
Good, Cheap, and Marketable: Pick two.

Brian Brunner
XXXX@XXXXX.COM
(610)796-5838


Can someone explain why motherboard companies don't just
spend an extra $2 and install real quartz based clocks on
motherboards so all this nonsense about losing time would
be a non-issue?
ms
_______________________________________________
questions mailing list
XXXX@XXXXX.COM
https://lists.ntp.isc.org/mailman/listinfo/questions

*******************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.

This footnote also confirms that this email message has been swept
for the presence of computer viruses.

www.hubbell.com - Hubbell Incorporated
 
 
 

[ntp:questions] ntp sanity limit kills ntp daily

Post by Brad Knowl » Thu, 16 Jun 2005 05:24:38


Actually, it frequently is that bad. But that's only part of the problem.


There are problems with the quartz crystal oscillators. There
are BIOS problems. There are ACPI/APIC problems. Those can all be
lumped into the "hardware problems" category, although some of them
may be able to be fixed with an update to the BIOS, changing the ACPI
or APIC configuration, etc....


The OS and the application fall into the "software problems"
category. Again, some of them can be fixed, and others are not
practical to fix (i.e., it may not be possible to replace the OS with
something else and still run the necessary applications in the way
the customer wants).


The hardware problems are shared by all OSes using that same
hardware platform.

The software problems are unique to a given OS platform and the
software running on that platform. Unfortunately, as shipped by the
respective vendors, Linux frequently has more software problems in
this area than other OSes, but those should be able to be corrected
by generating a new kernel image with corrected value. For Windows,
there's little you can do to fix the underlying problems at the OS
level.

--
Brad Knowles, < XXXX@XXXXX.COM >

"Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety."

-- Benjamin Franklin (1706-1790), reply of the Pennsylvania
Assembly to the Governor, November 11, 1755

SAGE member since 1995. See < http://www.yqcomputer.com/ ; for more info.
 
 
 

[ntp:questions] ntp sanity limit kills ntp daily

Post by Richard B. » Thu, 16 Jun 2005 20:23:41


Not bad enough to lose 1200 seconds during backups!!

Changing 1000 Hz to 100 Hz in the Linux Kernel does not fix the problem,
it merely reduces the number of lost interrupts by ~90%. Losing only
120 seconds during backups instead of 1200 is just not that big a win!
 
 
 

[ntp:questions] ntp sanity limit kills ntp daily

Post by Brian T. B » Thu, 16 Jun 2005 23:03:12

> Michael Ward < XXXX@XXXXX.COM > writes:



As has ben discussed: this is only partly a hardware problem, and partly a software problem.

hwclock --hctosys

has been suggested. My opinion (get things done now, figure out who to blame and shoot later)
is that the backup script should start at beginning, and stop at end, a small script that does

while `true`;do hwclock --hctosys;sleep 1;done

This will keep the system time up-to-the-second, so ntp doesn't go berserk.

Also, the backup itself should be 'nice'd to permit the hwclock script to run, of course.


Brian Brunner
XXXX@XXXXX.COM
(610)796-5838

*******************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.

This footnote also confirms that this email message has been swept
for the presence of computer viruses.

www.hubbell.com - Hubbell Incorporated
 
 
 

[ntp:questions] ntp sanity limit kills ntp daily

Post by Heiko Gers » Thu, 16 Jun 2005 23:59:55

My 2c :

Brian T. Brunner schrieb:
[...]
[...]

I'm not sure if ntpd likes to find out another process is messing around
with the clock. My workaround would be:

- stop ntpd
- run backup
- get time back with ntpd -q -g or ntpdate
- start ntpd

However, if you need the correct time during your backup sessions (if
you need logfiles to be correct or whatever), I'd use Brian's small
script as an additional step. Fire it up after you stopped NTP and stop
it after the update has been completed.


Maybe this could slow down the whole show. I'd reduce the calls to
hwclock to once a minute (sleep 60), but all depends on how accurate
your time needs to be during backup.

If you really depend on this, you should go and try another hardware.

Kind regards,
Heiko

[...]


Cool, now I know a secret ;-)


--
Meinberg radio clocks: 25 years of accurate time worldwide

MEINBERG Radio Clocks
www.meinberg.de

Stand alone ntp time servers and radio clocks based on GPS, DCF77 and
IRIG. Rackmount and desktop versions and PCI slot cards.