Solarisx86 Sparc Boot Troubleshoot
Solarisx86 Sparc Boot Troubleshoot
Solarisx86 Sparc Boot Troubleshoot
Each SPARC based system has a PROM (programmable read-only memory) chip with a program
called the monitor. The monitor controls the operation of the system before the Solaris kernel is
available. When a system is turned on, the monitor runs a quick self-test procedure to checks the
hardware and memory on the system. If no errors are found, the system begins the automatic
boot process.
Note
Some older systems might require PROM upgrades before they will work with the Solaris
system software. Contact your local service provider for more information.
Description
Boot PROM
2. Then, the PROM loads the primary boot program, bootblk, whose
purpose is to load the secondary boot program (that is located in the ufs
file system) from the default boot device.
Boot
Programs
3. The bootblk program finds and executes the secondary boot program,
ufsboot, and loads it into memory.
Boot Phase
Description
4. After the ufsboot program is loaded, the ufsboot program loads the
kernel.
Kernel
Initialization
6. The kernel creates a user process and starts the /sbin/init process,
which starts other processes by reading the /etc/inittab file.
init
7. The /sbin/init process starts the run control (rc) scripts, which
execute a series of other scripts. These scripts (/sbin/rc*) check and
mount file systems, start various processes, and perform system
maintenance tasks.
Boot Phase
Description
BIOS
1. When the system is turned on, the BIOS runs self-test diagnostics to
verify the system's hardware and memory. The system begins to boot
automatically if no errors are found. If errors are found, error messages are
displayed that describe recovery options.
The BIOS of additional hardware devices are run at this time.
2. The BIOS boot program tries to read the first physical sector from the
boot device. This first disk sector on the boot device contains the master
boot record mboot, which is loaded and executed. If no mboot file is found,
an error message is displayed.
Boot
Programs
3. The master boot record, mboot, which contains disk information needed
to find the active partition and the location of the Solaris boot program,
pboot, loads and executes pboot.
4. The Solaris boot program, pboot loads bootblk, the primary boot
program, whose purpose is to load the secondary boot program that is
located in the ufs file system.
5. If there is more than one bootable partition, bootblk reads the fdisk
table to locate the default boot partition, and builds and displays a menu of
available partitions. You have a 30-second interval to select an alternate
partition from which to boot. This step only occurs if there is more than one
bootable partition present on the system.
Boot Phase
Description
Kernel
initialization
8. The kernel initializes itself and begins loading modules by using the
secondary boot program (boot.bin or ufsboot) to read the files. When the
kernel has loaded enough modules to mount the root (/) file system, the
kernel unmaps the secondary boot program and continues, using its own
resources.
9. The kernel creates a user process and starts the /sbin/init process,
which starts other processes by reading the /etc/inittab file.
init
10. The /sbin/init process starts the run control (rc) scripts, which
execute a series of other scripts. These scripts (/sbin/rc*) check and
mount file systems, start various processes, and perform system
maintenance tasks
Extended Diagnostics: If diag-switch? and diag-level are set, additional diagnostics will
appear on the system console.
auto-boot?:
If the auto-boot? PROM parameter is set, the boot process will begin. Otherwise,
the system will drop to the ok> PROM monitor prompt, or (if sunmon-compat? and securitymode are set) the > security prompt.
The boot process will use the boot-device and boot-file PROM parameters unless diagswitch? is set. In this case, the boot process will use the diag-device and diag-file.
bootblk: The OBP (Open Boot PROM) program loads the bootblk primary boot program from
the boot-device (or diag-device, if diag-switch? is set). If the bootblk is not present or
needs to be regenerated, it can be installed by running the installboot command after booting
from a CDROM or the network. A copy of the bootblk is available at /usr/platform/`arch
-k`/lib/fs/ufs/bootblk
ufsboot: The secondary boot program, /platform/`arch -k`/ufsboot is run. This program
loads the kernel core image files. If this file is corrupted or missing, a bootblk: can't find
the boot program or similar error message will be returned.
kernel: The kernel is loaded and run. For 32-bit Solaris systems, the relevant files are:
/platform/`arch -k`/kernel/unix
/kernel/genunix
/platform/`arch -k`/kernel/sparcV9/unix
/kernel/genunix
As part of the kernel loading process, the kernel banner is displayed to the screen. This includes
the kernel version number (including patch level, if appropriate) and the copyright notice.
The kernel initializes itself and begins loading modules, reading the files with the ufsboot
program until it has loaded enough modules to mount the root filesystem itself. At that point,
ufsboot is unmapped and the kernel uses its own drivers. If the system complains about not
being able to write to the root filesystem, it is stuck in this part of the boot process.
The boot -a command singlesteps through this portion of the boot process. This can be a useful
diagnostic procedure if the kernel is not loading properly.
/etc/system: The /etc/system
file is read by the kernel, and the system parameters are set.
rootfs: Specify the system type for the root file system. (ufs is the default.)
If the /etc/system file is edited, it is strongly recommended that a copy of the working file be
made to a well-known location. In the event that the new /etc/system file renders the system
unbootable, it might be possible to bring the system up with a boot -a command that specifies
the old file. If this has not been done, the system may need to be booted from CD or network so
that the file can be mounted and edited.
kernel initialized: The kernel creates PID 0 ( sched). The sched process is sometimes called the
"swapper."
init: The kernel starts PID 1 (init).
init: The init
rc scripts: The rc scripts execute the files in the /etc/rc#.d directories. They are run by the
/sbin/rc# scripts, each of which corresponds to a run level.
Debugging can often be done on these scripts by adding echo lines to a script to print either a "I
got this far" message or to print out the value of a problematic variable.
x86: Boot Process
The following table describes the boot process on x86 based systems.
Table 16-2 x86: Description of the Boot Process
Boot
Description
Phase
BIOS
1. When the system is turned on, the BIOS runs self-test diagnostics to
verify the system's hardware and memory. The system begins to boot
automatically if no errors are found. If errors are found, error messages
are displayed that describe recovery options.
following:
In addition, svc.startd executes the run control (rc) scripts for compatibility.
x86: Boot Files
In addition to the run control scripts and boot files, there are additional boot files that are
associated with booting x86 based systems.
Table 16-3 x86: Boot Files
File
Description
/etc/bootrc
/boot
/boot/mdboot
/boot/mdbootbp
/boot/rc.d
/boot/solaris
/boot/solaris/boot.bin
/boot/solaris/boot.rc
/boot/solaris/devicedb
/boot/solaris/drivers
/boot/solaris/itup2.exe
/boot/solaris/machines
Obsolete directory.
/boot/solaris/nbp
/boot/solaris/strap.rc
/boot/strap.com
Note
rpc.bootparamd,
The boot process for SPARC platform involves 5 phases as shown in the diagram below. There is
a slight difference in booting process of a SPARC based and x86/x64 based solaris operating
system.
1. The boot PROM runs the power on self test (POST) to test the hardware.
2. The boot PROM displays the banner with below information
Model type
processor type
Memory
Ethernet address and host ID
3. Boot PROM reads the PROM variable boot-device to determine the boot device.
4. Boot PROM reads the primary boot program (bootblk) [sector 1 to 15] and executes it.
Boot program phase
3. ufsboot combines these 2 kernel into one complete kernel and loads into memory.
Kernel initialization phase
svc.startd phase
1. After kernel starts the svc.startd daemon. svc.startd daemon executes the rc scripts in the /sbin
directory based upon the run level.
rc scripts
Now with each run level has an associated script in the /sbin directory.
# ls -l /sbin/rc?
-rwxr--r-3 root
-rwxr--r-1 root
-rwxr--r-1 root
sys
sys
sys
1678 Sep 20
2031 Sep 20
2046 Sep 20
2012 /sbin/rc0
2012 /sbin/rc1
2012 /sbin/rc2
-rwxr--r--rwxr--r--rwxr--r--rwxr--r--
1
3
3
1
root
root
root
root
sys
sys
sys
sys
1969
1678
1678
4069
Sep
Sep
Sep
Sep
20
20
20
20
2012
2012
2012
2012
/sbin/rc3
/sbin/rc5
/sbin/rc6
/sbin/rcS
Each rc script runs the corresponding /etc/rc?.d/K* and /etc/rc?.d/S* scripts. For example for a
run level 3, below scripts will be executed by /sbin/rc3 :
/etc/rc3.d/K*
/etc/rc3.d/S*
Note the S and K in caps. Scripts starting with small s and k will be ignored. This can be used to
disable a script for that particular run level.
Solaris 10 boot process : SPARC
Solaris 10 boot process : x86/x64
Solaris 10 boot process : x86/x64
1Share
1Share
1Tweet
1Share
Solaris 10 boot process : SPARC
Solaris 10 boot process : x86/x64
In the last post we saw the boot process in solaris 10 on SPARC platform. The boot process on
x86/x64 hardware is bit different than the SPARC hardware. The x86/x64 hardware also involves
the 5 step boot process, same as the SPARC hardware. Refer the flow diagram below.
1. The BIOS (Basic Input Output System) ROM runs the power on self test (POST) to test the
hardware.
2. BIOS tries to boot from the device mentioned in the boot sequence. (We can change this by
pressing F12 or F2).
3. When booting from the boot disk, BIOS reads the master boot program (mboot) on the first
sector and the FDISK table.
Boot program phase
1. mboot finds the active partition in FDISK table and loads the first sector containing GRUB
stage1.
2. GRUB stage1 in-turn loads GRUB stage2.
3. GRUB stage2 locates the GRUB menu file /boot/grub/menu.lst and displays GRUB main
menu.
4. Here user can select to boot the OS from partition or disk or network.
5. GRUB commands in /boot/grub/menu.lst are executed to load a pre-constructed primary boot
The process keyword specifies the process to execute for the action keyword. For example
/usr/sbin/shutdown -y -i5 -g0 is the process to execute for the action powerfail
Legacy Run Levels
Run level specifies the state in which specific services and resources are available to users.
Below are the run levels available in solaris :
0
- system running PROM monitor (ok> prompt)
s or S
- single user mode with critical file-systems mounted.(single user
can access the OS)
1
- single user administrative mode with access to all file-systems.
(single user can access the OS)
2
- multi-user mode. Multiple users can access the system. NFS and
some other network related daemons does not run
3
- multi-user-server mode. Multi user mode with NFS and all other
network resources available.
4
- not implemented.
5
- transitional run level. Os is shutdown and system is powered off.
6
- transitional run level. Os is shutdown and system is rebooted to
the default run level.
svc.startd phase
1. After kernel starts the svc.startd daemon. svc.startd daemon executes the rc scripts in the /sbin
directory according to the run level.
rc scripts
Now with each run level has an associated script in the /sbin directory.
# ls -l /sbin/rc?
-rwxr--r-3 root
-rwxr--r-1 root
-rwxr--r-1 root
-rwxr--r-1 root
-rwxr--r-3 root
-rwxr--r-3 root
-rwxr--r-1 root
sys
sys
sys
sys
sys
sys
sys
1678
2031
2046
1969
1678
1678
4069
Sep
Sep
Sep
Sep
Sep
Sep
Sep
20
20
20
20
20
20
20
2012
2012
2012
2012
2012
2012
2012
/sbin/rc0
/sbin/rc1
/sbin/rc2
/sbin/rc3
/sbin/rc5
/sbin/rc6
/sbin/rcS
Each rc script runs the corresponding /etc/rc?.d/K* and /etc/rc?.d/S* scripts. For example for a
run level 3, below scripts will be executed by /sbin/rc3 :
/etc/rc3.d/K*
/etc/rc3.d/S*
Note the S and K in caps. Scripts starting with small s and k will be ignored. This can be used to
disable a script for that particular run level.
BIOS (yes this is x86, not SPARC) initializes CPU, memory and platform
hardware
There are also options for edit command - press "e" or for CLI - press "c"
o
Type -s at the end of line. Press Enter brings you back to main menu.
Primary boot archive is file system image containing kernel modules and
data. This goes into memory in this moment.
Multiboot program reads boot archive and assembles core kernel modules
into memory.
stage-2 installed in reserved area in fdisk partition. This is the core image of
GRUB.
/boot/grub/menu.list
The boot behavior can be modified using command eeprom which will edit file
/boot/solaris/bootenv.rc. See the file for more info.
Update a corrupt boot archive
Well, sooner or later you will have to do this, trust me :(
umount /a
init 6
Good luck!
Tip: setup cron job to run bootadm update-archive on regular basis and do it
manually after system upgrade or patch install.
Primary boot archive has files below (if any of them is updated, rebuild boot archive
with "bootadm update-archive".
boot/solaris/bootenv.rc
boot/solaris.xpm
etc/dacf.conf
etc/devices
etc/driver_aliases
etc/driver_classes
etc/match
etc/name_to_sysnum
etc/path_to_inst
etc/rtc_config
etc/system
kernel
platform/i86pc/biosint
platform/i86pc/kernel
Installing GRUB
This is also something you may need to do, say you are mirroring two disks using
SVM and want to install GRUB on second disk in case you need to boot from there.
So to install GRUB in master boot sector run (replace c1t0d0s0 with yours if
needed):
installgrub -fm /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c1t0d0s0
Actually, beside primary boot archive, there is one more - Failsafe boot archive. It
can boot on its own, require no maintenance and is created during OS installation.
The following example shows how to install the boot block on a UFS root file system.
# installboot /usr/platform/sun4u/lib/fs/ufs/bootblk
/dev/rdsk/c0t0d0s0
The following example shows how to install the boot block on a ZFS root file system.
SPARC EXAMPLES
The ufs bootblock is in /usr/lib/fs/ufs/bootblk. To install
the bootblock on slice 0 of target 0 on controller 1, use:
example# /usr/sbin/installboot /usr/lib/fs/ufs/bootblk \
/dev/rdsk/c1t0d0s0
x86 EXAMPLES
The ufs bootblock is in /usr/lib/fs/ufs/pboot. To install
the bootblock on slice 2 of target 0 on controller 1, use:
example# /usr/sbin/installboot /usr/lib/fs/ufs/pboot \
/usr/lib/fs/ufs/bootblk /dev/rdsk/c1t0d0s2
##################################################
###########################
Searchable Keywords: bootblock bootsector boot bootblk
FILES
/usr/platform/platform-name/lib/fs/ufs
Directory where ufs
boot objects reside.
SPARC SYNOPSIS
installboot bootblk raw-disk-device
To install a ufs bootblock on slice 0 of target 0 on controller 1 of the platform where the command is being run,
use:
Solaris 2.x
Solaris 2.8
# installboot /usr/platform/sun4u/lib/fs/ufs/bootblk /dev/rdsk/cXtXdXsX
This error occurs when the default bootblock has been corrupted. To overcome
this problem do the following:
5. Make sure that your the restoresymtable file does not exist. If so,
remove it.
Note:
* Loaction of bootblk file may differ depending on the type of hardware
platformexample: /usr/platform/sun4u vs /usr/platform/sun4m
* Not sure you really need to mount the partition ? I guess to verify you have
the right one.
/usr/platform/platform-name/lib/fs/ufs
Directory where ufs
boot objects reside.
SunOs-------SunOs---------SunOs-------------SunOs
OPTIONS
-h Leave the a.out header on the bootblock when installed
on disk.
gram.
EXAMPLE
To install the bootblocks onto the root partition on a Xylogics disk:
example% cd /usr/kvm/mdec
example% installboot -vlt /boot bootxy /dev/rxy0a
For an SD disk, you would use bootsd and /dev/rsd0a, respectively, in place of bootxy and /dev/rxy0a.
NOTE:The "/boot" is the boot file that resides in the root directory
NOTE: inside the /usr/kvm/mdec dir there must be bootsd(for scsi devices)
and bootfd(for floppys) if these aren't there installboot
isn't going to work.
10
730
10
740
10
750
10
760
10
770
10
780
10
790
10
7a0
10
7b0
10
7c0
10
7d0
10
7f0
10
800
10
example detail:
Options:
-l Print out the list of block numbers of the boot program.
---- More Sunos ------- More Sunos ------- More Sunos ------- More Sunos ---
So you have some older Sunos 4.X.x dump images you want to put on
another machine.
# mount
/dev/sd0a on / type 4.2 (rw)
/dev/sd0g on /usr type 4.2 (rw)
diastolic:/systems/cs712a_dumpimages on /mnt type nfs (rw) <--- image
/dev/sd1a on /mnt2 type 4.2 (rw,noquota) <--- new disk
There is every little chance that one loses or rather forgets the root password of his Sun Solaris
servers. In the event, this happens, there is a way out of it. Well the way and infact the only way
is to reset the password as there is no way to recover it. Recovering/restting the password
involves booting the server in Single User mode and mounting the root file system.
Ofcourse, it is recommeded that the security for the physical access to the server is restricted so
as to ensure that there is no unauthorized access and anyone who follows this routine is an
authorized personnel.
Boot the server with a Sun Solaris Operating System CD (Im using a Solaris 10 CD but doesnt
matter really) or a network boot with a JumpStart server from the OBP OK prompt.
OK boot cdrom -s
or
OK boot net -s
This will boot the server from the CD or Jumpstart server and launch a single user mode (No
Password).
Mount the root file system (assume /dev/dsk/c0t0d0s0 here) onto /a
solaris# mount /dev/dsk/c0t0d0s0 /a
NOTE: /a is a temporary mount point that is available when you boot from CD or a JumpStart
server
Now, with the root file system mounted on /a. All you need to do is to edit the shadow file and
remove the encrypted password for root.
solaris# vi /a/etc/shadow
Now, exit the mounted filesystem, unmount the root filesystem and reboot the system to singleuser mode booting of the disk.
solaris# cd /
solaris# umount /a
solaris# init s
This should boot of the disk and take you to the single-user mode. Press enter at the prompt to
enter a password for root.
This should allow you to login to the system. Once in, set the password and change to multi-user
mode.
NOTE: Single-User mode is only to ensure that the root user without password is not exposed to
others if started in multi-user mode before being set with a new password.
solaris# passwd root
solaris# reboot
This should do. You should now be able to logon with the new password set for root
Posted by admin at 10:42 am Tagged with: password, recovery, root, solaris, sun
6 Responses to How to recover/reset root password in Sun solaris
(SPARC)
To
Bottom
Applies to:
Solaris Cluster - Version 3.0 to 3.3 [Release 3.0 to 3.3]
All Platforms
Symptoms
This document provides the basic steps to resolve the following failfast panics.
Failfast:
Failfast:
Failfast:
Failfast:
Failfast:
Failfast:
Aborting
Aborting
Aborting
Aborting
Aborting
Aborting
Cause
Solaris Cluster node panics due to a cluster daemon exiting.
Solution
Why does it happen?
The panic message indicates that a cluster-specific daemon shown in the message dies. It is a
recovery action taken by failfast mechanism of the cluster monitoring a critical problem. As
those processes are critical processes and cannot be restarted, the cluster shuts down the node
using the failfast driver. Critical daemons are registered with failfast ff() driver with some time
interval. If a daemon does not report back to failfast driver within the registered time interval (eg,
30sec) the driver will trigger a Solaris kernel panic.
Troubleshooting steps
To find the root cause of the problem, you need to find out why a cluster-specific daemon shown
in the messages dies. The followings are steps how to identify the root cause.
1. Check out the /var/adm/messages system log file for messages for system or
operation system errors
indicating that memory resources may have been limited, such as in the following
example. If those messages appears before the panic messages, the root cause would be
memory exhaustion since a process may get application core dumping and dies when a
system has lack of memory resource. If you find messages indicating lack of memory
resource, you will need to find out why the system had lack of or was low in memory and
or swap and address it to avoid this panic. If cluster daemon can not fork a new process
or can not allocate memory (malloc() ) it will likely to exit and trigger a panic.
For additional information, from a kernel core file, you can see messages given before the
panic using the mdb. Check out those messages as well as the /var/adm/messages file.
# cd /var/crash/`uname -n`
# echo "::msgbuf -v" | mdb -k unix.0 vmcore.0
2. There are some bugs causing cluster daemons to exit (thus this panic) were fixed in
the Core Patch or Core/Sys admin those patches, check out if your system still has old
patch installed. Check out a the README of patch installed on your machine if there are
any relevant bugs fixed between a patch installed and the latest one.
Solaris Cluster 3.x update releases and matching / including Solaris Cluster 3.x core
patches (Doc ID 1368494.1)
The following is the list of some bugs that could be causing this panic and their patches.
Note that this is not a comprehensive list, always check MOS for most current bugs:
1. The failfast panic will generate a kernel core file, however, in general, it does
not help you find a reason why a process dies. But in most of causes, when this
panic happens, a process dies due to an application core dumping and this
application core file will help you find the root cause. To collect an application
core file, use the coreadm command to get core files that are uniquely named and
are stored in a consistent place. Run the following commands on each cluster
node.
mkdir -p /var/cores
coreadm -g /var/cores/%f.%n.%p.%t.core \
-e global \
-e global-setid \
-e log \
-d process \
-d proc-setid
If you mad modification to dumpadm test to make sure that your settings are
working and you can collect a core:
# ps -ef|grep rgmd
root 1829 1 0 Dec 24 ? 0:00 /usr/cluster/lib/sc/rgmd
root 1833 744 0 Dec 24 ? 3195:27 /usr/cluster/lib/sc/rgmd -z global
# gcore -g 1829
gcore: /var/cores/rgmd.clnode1.1829.1393620353.core dumped
This will leave rgmd running only collects it's core.
# ps -ef|grep rgmd
root 1829 1 0 Dec 24 ? 0:00 /usr/cluster/lib/sc/rgmd
How to Enable Method and/or System Coredumps for Cluster Methods That Timeout
(Doc ID 1019001.1)
hostname:displaynumber.screennumbe