Rhce - Trobleshoot
Rhce - Trobleshoot
Rhce - Trobleshoot
The following demonstrates how collectl identify the process reading/writing most
data to disk
#Hammer disk by writing 50mb data with dd
$dd if=/dev/urandom of=test bs=1k count=50000
#collectl identifies the dd process
#in top mode, sort by iokb total I/O KB ; collectl showtopopts
$collectl -i2 --top iokb
TOP PROCESSES sorted by iokb (counters are /sec) 12:50:31
# PID User
PR PPID THRD S VSZ RSS CP SysT UsrT Pct AccuTime RKB WKB MajF MinF
Command
6861 root
18 6784 0 R 3M 572K 0 0.91 0.00 45 0:00.91 0 3680 0 97 dd
1 root
15
0 0 S 2M 632K 0 0.00 0.00 0 0:28.21 0 0 0 0 init
2 root
RT
1 0S
0
0 0 0.00 0.00 0 0:00.00 0 0 0 0 migration/0
If Linux system can boot but hang during starting a service, booting to recovery
runlevels can skip the service and gain shell to troubleshoot.
If Linux system cant boot at all, booting from rescue CD (first installation media)
and type linux rescue to gain shell to troubleshoot
Red Hat Linux boot order
The BIOS ->MBR->Boot Loader->Kernel->/sbin/init->
/etc/inittab->
/etc/rc.d/rc.sysinit->
/etc/rc.d/rcX.d/ #where X is run level in /etc/inittab
run script with K then script with S
Recovery runlevels
- runlevel 1
Execute up to /etc/rc.d/rc.sysinit and /etc/rc.d/rc1.d/
Runlevel 1 is identical to singleuser mode. It is switched to singleuser mode in last
step, just a number of trivial scripts executed before that.
$ls /etc/rc.d/rc1.d/S*
/etc/rc.d/rc1.d/S02lvm2-monitor /etc/rc.d/rc1.d/S13cpuspeed
/etc/rc.d/rc1.d/S99singlesingleuser
- single
Execute up to /etc/rc.d/rc.sysinit
- Emergency
Does not execute /etc/rc.d/rc.sysinit.
Because rc.sysinit is not executed, file system is mounted in read-only mode. You
need run mount o rw,remount / to remount it in read-write mode.
emergency runlevel is Red Hat term, it is identical to init=/bin/sh in any Linux
distribution
How to go to a runlevel
In the grub menu, type a to append one of following options to boot line.
1
single
emergency
init=/bin/sh
When Centos hung on starting up boot services, how to get to shell without rescue
CD
RHCE Notes - Troubleshooting booting issue
Posted by honglus at 5:15 PM No comments:
Labels: Linux, Troubleshooting
So how can you gain shell access without rescue CD? The answer is to append
init=/bin/sh to kernel line in grub boot loader.
Lets review the Linux boot order
The BIOS ->MBR->Boot Loader->Kernel->/sbin/init->
/etc/inittab->
/etc/rc.d/rc.sysinit->
/etc/rc.d/rcX.d/ #where X is run level in /etc/inittab
run script with K then script with S
In Linux, sysstat package installs tools: sar, iostat .. , in the mean time, setups a
cron job to run sar periodically. The sar binary output files are in /var/log/sa or
/var/log/sysstat.
The files are very useful for troubleshooting performance issues, if you dont have
monitoring solution in place.
To visualize the data into graph, you can use generic plotting tool: gnuplot or
special tool designed for sar: ksar.
Visualize sar output by gnuplot
gnuplot can be directly installed online in most Linux distributions.
file saved by sar cron job is binary, convert it to ascii format. The following example
output CPU usage
$LC_ALL=C;sar -u -f /var/log/sa/sa27
9]'
| sed '1s/^/#/' >sar-cpu.log
| egrep
'[0-9][0-9]:[0-9][0-9]:[0-9][0-
#Convert sar binary output to ascii for ksar, -A means include all counters
$LC_ALL=C;sar -A -f /var/log/sa/sa27 >sar-all.log
It is very easy to view any counter, once sar output files are imported into ksar.
[root@ kSar-5.0.6]$./run.sh -help
[root@ kSar-5.0.6]$./run.sh -input /tmp/sar-all.log
The Theory:
hugemem/PAE enabled kernel is ONLY needed for RAM from 4GB up to 64GB, The
generic 32bit Linux Kernel can see 4GB without hugemem/PAE enabled kernel.
The Cause:
Sometimes, why 32bit Linux can't see all the 4GB ram? The answer is from
physical RAM map provided by BIOS.
The Analysis:
[32bit Linux]$dmesg | less
Linux version 2.6.16.60-0.21-default (geeko@buildhost) (gcc version 4.1.2 20070115 (SUSE Linux)) #1
Tue May 6 12:41:02 UTC 2008
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 000000000009f800 (usable)
BIOS-e820: 000000000009f800 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000ca000 - 00000000000cc000 (reserved)
BIOS-e820: 00000000000dc000 - 00000000000e4000 (reserved)
BIOS-e820: 00000000000e8000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 00000000bfee0000 (usable)
BIOS-e820: 00000000bfee0000 - 00000000bfeff000 (ACPI data)
BIOS-e820: 00000000bfeff000 - 00000000bff00000 (ACPI NVS)
BIOS-e820: 00000000bff00000 - 00000000c0000000 (usable)
BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved)
BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
BIOS-e820: 00000000fffe0000 - 0000000100000000 (reserved)
The interesting part of physical RAM map is the last line, and interesting number is
0000000100000000 which is hex value of 4GB.
Converting them decimal values to be more readable.
[32bit Linux]$ dmesg | awk --re-interval --non-decimal-data '/[0-9a-z]{16}/ { x1=sprintf ("%d",
"0x"$2) ; x2=sprintf ("%d", "0x"$4); printf "%d %s %d %s %d %s\n", x1/1024/1024, " - ",
x2/1024/1024, "=", (x1-x2)/1024/1024,$NF }'
0 - 0 = -0 (usable)
0 - 0 = -0 (reserved)
0 - 0 = -0 (reserved)
0 - 0 = -0 (reserved)
0 - 1 = -0 (reserved)
1 - 3070 = -3069 (usable)
3070 - 3070 = -0 data)
3070 - 3071 = -0 NVS)
3071 - 3072 = -1 (usable)
3584 - 3840 = -256 (reserved)
4076 - 4076 = -0 (reserved)
4078 - 4078 = -0 (reserved)
4095 - 4096 = -0 (reserved)
4096 - 5120 = -1024 (usable)
It is clear that BIOS has too many reserved parts in lower address space and push
the one trunk of 1GB over 4GB address space.
The Solution:
1)To release some reserved space and bring all usable spaces below 4GB, you
might try to disable some devices in BIOS. It is the best option, but might not be
achievable , consult with your hardware vendor.
2)Reinstall system with 64bit Kernel
3)Install hugemem/PAE kernel on current 32 Bit system. It is the last option
because hugemem/PAE kernel hurt performance due to the dynamic rempapping
with three-level paging model.
Posted by honglus at 1:05 PM No comments:
Labels: Linux, Troubleshooting
I removed two old NICs and assigned two news NICs for a Vmware VM (SLES 10)
and I expect the interface name to be eth0, eth1, but they appear as eth2, eth3.
dmesg output revealed that eth0 was renamed to eth2 and eth1 was renamed to
eth3 at some stage, It turned out udev rules renamed it.
Why?
30-net_persistent_names.rules had four entries, the first 2 recorded MAC address
of two old NICs ; the last 2 entries recorded MAC address of current two NICs. Upon
matching current MAC address, the udev rule renamed the interfaces to eth2 and
eth3.
$cat /etc/udev/rules.d/30-net_persistent_names.rules
# This rules are autogenerated from /lib/udev/rename_netiface.
# But you can modify them, but make sure that you don't use an interface name
# twice. Also add such interface name rules only in this rules file. Otherwise
# rename_netiface will create wrong rules for new interfaces.
# It is safe to delete a rule, as long as you did not disable automatic rule
# generation. Only if all interfaces get a rule the renaming will work
# flawlessly. See also /etc/udev/rules.d/31-net_create_names.rules.
#
# Read /usr/share/doc/packages/sysconfig/README.Persistent_Interface_Names for
# further information.
#
# Use only a-z, A-Z and 0-9 for interface names!
SUBSYSTEM=="net", ACTION=="add", SYSFS{address}=="00:50:56:b7:6d:df",
IMPORT="/lib/udev/rename_netiface %k eth0"
SUBSYSTEM=="net", ACTION=="add", SYSFS{address}=="00:50:56:b7:0b:2c",
IMPORT="/lib/udev/rename_netiface %k eth1"
SUBSYSTEM=="net", ACTION=="add", SYSFS{address}=="00:50:56:b7:1a:26",
IMPORT="/lib/udev/rename_netiface %k eth2"
SUBSYSTEM=="net", ACTION=="add", SYSFS{address}=="00:50:56:b7:14:a6",
IMPORT="/lib/udev/rename_netiface %k eth3"
$ ls -l /sys/class/net/eth*/device
lrwxrwxrwx 1 root root 0 2010-08-26 09:46 /sys/class/net/eth0/device ->
../../../devices/pci0000:00/0000:00:11.0/0000:02:00.0
10
The value can be changed with ulimit n command, but it is only effective on
current shell session. To impose limit on each new shell session, enable PAM
module pam_limits.so.
There are many ways to start new shell session: login, sudo, and su.
each need to be enabled with pam_limits by PAM config file /etc/pam.d/login,
/etc/pam.d/sudo, or /etc/pam.d/su
/etc/security/limits.conf is the configuration file for pam_limits.so to set values.
e.g Increase max number of open files from 1024 to 4096 for Apache web server,
which is started with user apache
apache
nofile
4096
pam_limits.so is session PAM, so change become effective for new session, no
reboot required.
11
65535
Test ulimit.
You will be disappointed to test open files directly in shell by command tail f etc,
because the limit is imposed on process, but each tail f will start new process.
The following Perl script can open 10 files in a single process.
#!/usr/bin/perl -w
foreach $i (1..10) {
$FH="FH${i}";
open ($FH,'>',"/tmp/Test${i}.log") || die "$!";
print $FH "$i\n";
}
(-n) 8
Too many open files error appeared while halfway creating files
$ ./testnfiles.pl
Too many open files at ./ testnfiles.pl line 4
web server? (It is an obvious DNS issue, but I just want to demonstrate how can
strace pinpoint the issue.)
#Print out the lines in question. it is clear that DNS timed out on waiting response
from DNS server 100.0.0.23, it tried four times(the remaining 3 timeout were not
included here) each time took 5 secs.
$awk '{ if ( NR > 125 && NR <= 136 ) {print "LINE#"NR, $0 } }' /tmp/trace.log
LINE#126
0.000000 [b7e601d1] stat64("/etc/resolv.conf", {st_dev=makedev(117, 0),
st_ino=50235, st_mode=S_IFREG|0644, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8,
st_size=83, st_atime=2010/07/16-09:47:25, st_mtime=2010/07/16-09:45:02, st_ctime=2010/07/1609:45:02}) = 0 <0.000000>
LINE#127
0.000000 [b7e2a0f1] gettimeofday({1279237645, 625155}, NULL) = 0 <0.000000>
LINE#128
0.000000 [b7e72402] socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 4 <0.000000>
LINE#129
0.000000 [b7e71f0c] connect(4, {sa_family=AF_INET, sin_port=htons(53),
sin_addr=inet_addr("100.0.2.3")}, 28) = 0 <0.000000>
LINE#130
0.000000 [b7e61e88] fcntl64(4, F_GETFL) = 0x2 (flags O_RDWR) <0.000000>
LINE#131
0.000000 [b7e61e88] fcntl64(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0 <0.000000>
LINE#132
0.000000 [b7e2a0f1] gettimeofday({1279237645, 625155}, NULL) = 0 <0.000000>
LINE#133
0.000000 [b7e67296] poll([{fd=4, events=POLLOUT}], 1, 0) = 1 ([{fd=4,
revents=POLLOUT}]) <0.000000>
LINE#134
0.000000 [b7e7220c] send(4,
"B\262\1\0\0\1\0\0\0\0\0\0\3www\6google\3com\0\0\1\0\1"..., 32, MSG_NOSIGNAL) = 32 <0.000000>
LINE#135
0.000000 [b7e67296] poll([{fd=4, events=POLLIN}], 1, 5000) = 0 (Timeout)
<5.000076>
LINE#136
5.000076 [b7e2a0f1] gettimeofday({1279237650, 625231}, NULL) = 0 <0.000000>
Linux.
Another way of doing this is to send QUIT signal to the PID (kill 3 PID), but the
thread dump will be directed to stdout, which can be viewed with cat
/proc/PID/fd/1 |tee /tmp/dump.log, messages will be constantly directed to
/proc/PID/fd/1 until the process is stopped. So it is useful for real time debugging.
The following java application example use gcore command in gdb. (gcore or kill 3
will not stop the process )
Linux default core file size is 0, which means core dump is disabled, It needs to be
changed to unlimited
#ulimit -a | grep core
core file size
#ulimit c unlimited
#ulimit -a | grep core
core file size
(blocks, -c) 0
(blocks, -c) unlimited
Firstly, find the virtual memory size of the process, the PID is 10008 in following
example.
# ps aux | egrep 'VSZ| 10008'
USER
PID %CPU %MEM
VSZ
RSS TTY
root
10008 0.4 8.3 2231312 660100 ?
/opt/sunjdk6/bin/java
STAT START
TIME COMMAND
Sl
Jun03 43:57
The process VSZ=2.2GB, it will be the size of the core file. Go to a dump dir which
has more than 2.2GB free space
$cd /var/tmp
Attach to running process PID.
$gdb --pid=10008
At the gdb prompt, enter gcore command
gdb>gcore
wait for few minutes for core file to be generated. type in quit to exit gdb, answer
yes to detach the process.
gdb>quit
core file is generated
$ ls -lh /var/tmp/core.10008
-rw-r--r-- 1 root root 2.2G Jun 10 11:59 /var/tmp/core.10008
$ file /var/tmp/core.10008
/var/tmp/core.10008: ELF 64-bit LSB core file AMD x86-64, version 1 (SYSV),
SVR4-style, from 'java'
14
0 191420
8688
35780
0 1006
31
4 96
0 124468
9208
98020
0 15626
2074 1195
188
0 76
0 24
0 110716
9316 110996
3268
4144 1366
84
0 94
97048
9416 122272
109
1 80
0 20
80476
9544 137888
3908
2786 1272
172
0 54
0 46
72860
9612 145848
1930
0 1193
141
0 42
0 58
74300
9620 145860
6 1208
67
0 38
0 62
75680
9620 145860
6929 1364
101
0 70
6 24
Lets run mpstat to show more detailed CPU usage,it showed CPU was busy with
interruptions.
# mpstat 2
Linux 2.6.18-92.el5 (centos-ks)
02:03:50 AM CPU
%idle
intr/s
02:04:04 AM all
52.89
1015.56
02:04:06 AM all
0.00
1326.63
02:04:08 AM all
4.83
1327.54
02:04:10 AM all
0.00
1280.10
02:04:12 AM all
0.00
1183.08
02:04:14 AM all
0.00
1190.95
01/14/2010
%user
%nice
%sys %iowait
%irq
%soft
%steal
1.33
0.00
41.78
0.00
0.44
3.56
0.00
0.00
0.00
8.04
38.69
29.65
23.62
0.00
0.00
0.00
8.70
30.43
27.54
28.50
0.00
0.00
0.00
5.47
46.77
27.36
20.40
0.00
0.50
0.00
6.47
63.18
19.40
10.45
0.00
1.01
0.00
6.53
62.31
21.11
9.05
0.00
15
02:04:16 AM all
0.00
1365.83
02:04:18 AM all
98.00
1006.50
0.00
0.00
8.04
26.63
43.72
21.61
0.00
0.00
0.00
1.50
0.00
0.00
0.50
0.00
Use sar to find out which interrupt number was culprit. #9 was the highest
excluding system interrupt #0.
# sar -I XALL 2 10
02:07:10
02:07:12
02:07:12
02:07:12
02:07:12
02:07:12
02:07:12
02:07:12
02:07:12
02:07:12
02:07:12
AM
AM
AM
AM
AM
AM
AM
AM
AM
AM
AM
INTR
0
1
2
3
4
5
6
7
8
9
intr/s
992.57
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
350.50
# cat /proc/interrupts
CPU0
0:
1:
2:
6:
8:
9:
11:
12:
14:
15:
NMI:
LOC:
ERR:
MIS:
702980
439
0
2
1
14464
12
400
6091
22
0
700623
0
0
XT-PIC timer
XT-PIC i8042
XT-PIC cascade
XT-PIC floppy
XT-PIC rtc
XT-PIC acpi, eth2
XT-PIC eth0
XT-PIC i8042
XT-PIC ide0
XT-PIC ide1
Solution:
When the card transmits or receives a frame, the system must be notified of the
event. If the card interrupts the system for each transmitted and received frame,
the result is a high degree of processor overhead. To prevent that, Gigabit Ethernet
16
provides a feature called Interrupt Coalescence. Effective use of this feature can
reduce system overhead and improve performance.
Interrupt Coalescence essentially means that the card interrupts the system after sending or
receiving batch of frames.
you can enable adaptive moderation ( Adaptive RX: off TX: off) to let system choose value
automatically or set individual values manually.
A interrupt is generated by the card to the host when either frame counter or timer counter
is met. Values 0 means disabled.
RX for example:
Timer counter in microseconds: rx-usecs/rx-usecs-irq
Frames counter:rx-frames/rx-frames-irq
# A sample output with default values.
# ethtool -c eth1
Coalesce parameters for eth1:
Adaptive RX: off TX: off
stats-block-usecs: 999936
sample-interval: 0
pkt-rate-low: 0
pkt-rate-high: 0
rx-usecs: 18
rx-frames: 6
rx-usecs-irq: 18
rx-frames-irq: 6
tx-usecs: 80
tx-frames: 20
tx-usecs-irq: 80
tx-frames-irq: 20
rx-usecs-low:
rx-frame-low:
tx-usecs-low:
tx-frame-low:
rx-usecs-high:
rx-frame-high:
tx-usecs-high:
tx-frame-high:
<>
0
0
0
0
0
0
0
0
Alternative Workaround:
17
I couldn't config Interrupt Coalescence because virtual machine NIC didn't support it. but as
workaround, Increasing mtu can also decrease interrupt, ifconfig eth2 mtu 9000 resolved
the issue. It needs to set on both hosts peer, if they are not directly connected, make sure
the switch supports jumbo frames.
You don't need to care Interrupt Coalescence if your CPU resource is abundant, But for high
load NFS/CIFS/ISCSI/ NAS servers, it is very useful.
- Solaris
list open files for all process,then search the file for "port: 22"
$ ps -e -o pid | xargs pfiles > /tmp/pfiles.log
19