Kernel Crash Dump
Introduction
A Kernel Crash Dump refers to a portion of the contents of volatile memory (RAM) that is copied to disk whenever the execution of the kernel is disrupted. The following events can cause a kernel disruption :
Kernel Panic
Interruptions non masquables (NMI)
Machine Check Exceptions (MCE)
Problème matériel
Intervention manuelle
Kernel Crash Dump Mechanism
When a kernel panic occurs, the kernel relies on the kexec mechanism to quickly reboot a new instance of the kernel in a pre-reserved section of memory that had been allocated when the system booted (see below). This permits the existing memory area to remain untouched in order to safely copy its contents to storage.
Installation
The kernel crash dump utility is installed with the following command:
sudo apt-get install linux-crashdump
Un redémarrage est alors requis.
Configuration
Aucune configuration supplémentaire n'est nécessaire pour activer le mécanisme de vidage du noyau.
Vérification
To confirm that the kernel dump mechanism is enabled, there are a few things to verify. First, confirm that the crashkernel boot parameter is present (note: The following line has been split into two to fit the format of this document:
cat /proc/cmdline BOOT_IMAGE=/vmlinuz-3.2.0-17-server root=/dev/mapper/PreciseS-root ro crashkernel=384M-2G:64M,2G-:128M
Le paramètre crashkernel a la syntaxe suivante :
crashkernel=<range1>:<size1>[,<range2>:<size2>,...][@offset] range=start-[end] 'start' is inclusive and 'end' is exclusive.
Donc, pour le paramètre crashkernel trouvé dans /proc/cmdline, nous devrions avoir :
crashkernel=384M-2G:64M,2G-:128M
La valeur ci-dessus signifie :
if the RAM is smaller than 384M, then don't reserve anything (this is the "rescue" case)
si la quantité de RAM se situe entre 386 Mo et 2 Go (exclusif), alors réservez 64 Mo
si la quantité de RAM est supérieure à 2 Go, alors réservez 128 Mo
Deuxièmement, vérifiez que le noyau a réservé la zone de mémoire requise pour le noyau kdump en faisant :
dmesg | grep -i crash ... [ 0.000000] Reserving 64MB of memory at 800MB for crashkernel (System RAM: 1023MB)
Testing the Crash Dump Mechanism
Testing the Crash Dump Mechanism will cause a system reboot. In certain situations, this can cause data loss if the system is under heavy load. If you want to test the mechanism, make sure that the system is idle or under very light load.
Vérifiez que le mécanisme SysRq est activé en regardant la valeur du paramètre kernel /proc/sys/kernel/sysrq :
cat /proc/sys/kernel/sysrq
Si une valeur de 0 est renvoyée, la fonction est désactivée. Activez-la avec la commande suivante :
sudo sysctl -w kernel.sysrq=1
Once this is done, you must become root, as just using sudo will not be sufficient. As the root user, you will have to issue the command echo c > /proc/sysrq-trigger. If you are using a network connection, you will lose contact with the system. This is why it is better to do the test while being connected to the system console. This has the advantage of making the kernel dump process visible.
Une sortie de test typique devrait ressembler à ce qui suit :
sudo -s [sudo] password for ubuntu: # echo c > /proc/sysrq-trigger [ 31.659002] SysRq : Trigger a crash [ 31.659749] BUG: unable to handle kernel NULL pointer dereference at (null) [ 31.662668] IP: [<ffffffff8139f166>] sysrq_handle_crash+0x16/0x20 [ 31.662668] PGD 3bfb9067 PUD 368a7067 PMD 0 [ 31.662668] Oops: 0002 [#1] SMP [ 31.662668] CPU 1 ....
The rest of the output is truncated, but you should see the system rebooting and somewhere in the log, you will see the following line :
Begin: Saving vmcore from kernel crash ...
ls /var/crash
linux-image-3.0.0-12-server.0.crash
Ressources
Kernel Crash Dump is a vast topic that requires good knowledge of the linux kernel. You can find more information on the topic here :
-
Analyzing Linux Kernel Crash (Based on Fedora, it still gives a good walkthrough of kernel dump analysis)