Smartmontools
Smartmontools permet de dialoguer avec de nombreux disques-dur pour obtenir des informations et statistiques sur leur état, permettant de diagnostiquer un disque-dur en fin de vie notamment.
Commandes de base
Activer:
# smartctl --smart=on --offlineauto=on --saveauto=on /dev/hda smartctl version 5.32 Copyright (C) 2002-4 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF ENABLE/DISABLE COMMANDS SECTION === SMART Enabled. SMART Attribute Autosave Enabled. SMART Automatic Offline Testing Enabled every four hours.
ou plus facile à retenir:
smartctl -s on -S on -o on
Vérifier les valeurs:
# smartctl -A /dev/hda smartctl version 5.32 Copyright (C) 2002-4 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF READ SMART DATA SECTION === SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 3 Spin_Up_Time 0x0027 252 252 063 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 253 253 000 Old_age Always - 0 5 Reallocated_Sector_Ct 0x0033 253 253 063 Pre-fail Always - 0 6 Read_Channel_Margin 0x0001 253 253 100 Pre-fail Offline - 0 7 Seek_Error_Rate 0x000a 253 252 000 Old_age Always - 0 8 Seek_Time_Performance 0x0027 250 245 187 Pre-fail Always - 58790 9 Power_On_Minutes 0x0032 251 251 000 Old_age Always - 714h+26m 10 Spin_Retry_Count 0x002b 252 252 157 Pre-fail Always - 0 11 Calibration_Retry_Count 0x002b 252 252 223 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 253 253 000 Old_age Always - 83 192 Power-Off_Retract_Count 0x0032 253 253 000 Old_age Always - 0 193 Load_Cycle_Count 0x0032 253 253 000 Old_age Always - 0 194 Temperature_Celsius 0x0032 253 253 000 Old_age Always - 31 195 Hardware_ECC_Recovered 0x000a 253 252 000 Old_age Always - 4749 196 Reallocated_Event_Count 0x0008 253 253 000 Old_age Offline - 0 197 Current_Pending_Sector 0x0008 253 253 000 Old_age Offline - 0 198 Offline_Uncorrectable 0x0008 253 253 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0008 199 199 000 Old_age Offline - 0 200 Multi_Zone_Error_Rate 0x000a 253 252 000 Old_age Always - 0 201 Soft_Read_Error_Rate 0x000a 253 252 000 Old_age Always - 6 202 TA_Increase_Count 0x000a 253 252 000 Old_age Always - 0 203 Run_Out_Cancel 0x000b 253 252 180 Pre-fail Always - 2 204 Shock_Count_Write_Opern 0x000a 253 252 000 Old_age Always - 0 205 Shock_Rate_Write_Opern 0x000a 253 252 000 Old_age Always - 0 207 Spin_High_Current 0x002a 252 252 000 Old_age Always - 0 208 Spin_Buzz 0x002a 252 252 000 Old_age Always - 0 209 Offline_Seek_Performnce 0x0024 186 183 000 Old_age Offline - 0 99 Unknown_Attribute 0x0004 253 253 000 Old_age Offline - 0 100 Unknown_Attribute 0x0004 253 253 000 Old_age Offline - 0 101 Unknown_Attribute 0x0004 253 253 000 Old_age Offline - 0
Vérifiez la colonne "WHEN_FAILED" essentiellement.
Voir plus de choses:
# smartctl -a /dev/hda
Sur mon disque SATA (non IDE):
# smartctl -d ata ... /dev/sda
Les tests:
smartctl -t short /dev/hda smartctl -t long /dev/hda
Vérifier les résultats:
smartctl -l selftest /dev/hda
Mauvais blocs
Les disques durs actuels ont apparemment une réserve de blocs non utilisés, qui peuvent remplacer des blocs défectueux. Le remplacement se fait au moment de l'écriture sur le bloc défectueux (pas avant, pour laisser une chance de récupérer les données qui y sont inscrites).
Le Bad block HOWTO de la section lien discute de ce sujet plus en détail.
Dans la sortie de smartmontools, les éléments importants sont:
- Reallocated_Sector_Ct: les secteurs réalloués au cours de la vie du disque
- Current_Pending_Sector: les secteurs à réallouer dès que possible
- Reallocated_Event_Count: (je ne sais pas bien, à completer)
Exemple sur un disque de 2 ans (regarder la colonne RAW_VALUE), après un badblocs
lecture/écriture complet (les blocs défectueux ont tous été réalloués):
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE [...] 5 Reallocated_Sector_Ct 0x0033 192 192 140 Pre-fail Always - 62 [...] 196 Reallocated_Event_Count 0x0032 182 182 000 Old_age Always - 18 197 Current_Pending_Sector 0x0012 200 199 000 Old_age Always - 0
Autres exemples
Disque en train de mourir (plein d'erreurs disques), état après un badblocks
partiel:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000b 001 001 051 Pre-fail Always FAILING_NOW 2144 3 Spin_Up_Time 0x0007 100 091 021 Pre-fail Always - 2366 4 Start_Stop_Count 0x0032 099 099 040 Old_age Always - 1152 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x000b 200 200 051 Pre-fail Always - 0 9 Power_On_Hours 0x0032 099 099 000 Old_age Always - 1305 10 Spin_Retry_Count 0x0013 100 100 051 Pre-fail Always - 0 11 Calibration_Retry_Count 0x0013 100 100 051 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always - 1128 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0012 200 149 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0012 099 099 000 Old_age Always - 330 199 UDMA_CRC_Error_Count 0x000a 200 253 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0009 200 174 051 Pre-fail Offline - 0
smartd
Pensez à activer le démon smartd
, celui-ci vous préviendra en cas de problème détecté (courriel envoyé à root).
Sous Debian, dans /etc/default/smartmontools
, décommenter:
#start_smartd=yes
Exemple:
This email was generated by the smartd daemon running on: host name: bob DNS domain: centre.local NIS domain: (none) The following warning/error was logged by the smartd daemon: Device: /dev/hda, ATA error count increased from 10 to 11 For details see host's SYSLOG (default: /var/log/messages). You can also use the smartctl utility for further investigation. No additional email messages about this problem will be sent.
Dans le syslog:
Nov 7 01:27:20 bob kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } Nov 7 01:27:20 bob kernel: hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=10997362, high=0, low=10997362, sector=10997359 Nov 7 01:27:20 bob kernel: ide: failed opcode was: unknown Nov 7 01:27:20 bob kernel: end_request: I/O error, dev hda, sector 10997359 Nov 7 01:28:20 bob smartd[3499]: Device: /dev/hda, SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 200 to 195 Nov 7 01:28:20 bob smartd[3499]: Device: /dev/hda, SMART Usage Attribute: 194 Temperature_Celsius changed from 99 to 97 Nov 7 01:28:20 bob smartd[3499]: Device: /dev/hda, ATA error count increased from 10 to 11 Nov 7 01:28:20 bob smartd[3499]: Sending warning via /usr/share/smartmontools/smartd-runner to root ... Nov 7 01:28:21 bob smartd[3499]: Warning via /usr/share/smartmontools/smartd-runner to root: successful
Cependant j'ai aussi eu, mais sans notification cette fois:
Nov 7 09:58:20 bob smartd[3499]: Device: /dev/hda, SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 193 to 185 Nov 7 09:58:20 bob smartd[3499]: Device: /dev/hda, SMART Usage Attribute: 194 Temperature_Celsius changed from 96 to 95 Nov 7 09:58:20 bob smartd[3499]: Device: /dev/hda, ATA error count increased from 11 to 13
Donc je ne sais pas dans quels cas la notification courriel est envoyée :/
Autres exemples, sur le second disque:
email:
Device: /dev/hdb, 1 Currently unreadable (pending) sectors
syslog:
Nov 12 01:28:20 bob smartd[3499]: Device: /dev/hda, SMART Usage Attribute: 194 Temperature_Celsius changed from 97 to 95 Nov 12 01:28:20 bob smartd[3499]: Device: /dev/hdb, 1 Currently unreadable (pending) sectors Nov 12 01:28:20 bob smartd[3499]: Sending warning via /usr/share/smartmontools/smartd-runner to root ... Nov 12 01:28:21 bob smartd[3499]: Warning via /usr/share/smartmontools/smartd-runner to root: successful Nov 12 01:28:21 bob smartd[3499]: Device: /dev/hdb, SMART Prefailure Attribute: 7 Seek_Error_Rate changed from 100 to 200 Nov 12 01:28:21 bob smartd[3499]: Device: /dev/hdb, SMART Usage Attribute: 194 Temperature_Celsius changed from 101 to 100
email:
Device: /dev/hdb, 1 Offline uncorrectable sectors
syslog:
Nov 12 07:58:19 bob smartd[3499]: Device: /dev/hdb, 1 Currently unreadable (pending) sectors Nov 12 07:58:19 bob smartd[3499]: Device: /dev/hdb, 1 Offline uncorrectable sectors Nov 12 07:58:19 bob smartd[3499]: Sending warning via /usr/share/smartmontools/smartd-runner to root ... Nov 12 07:58:19 bob smartd[3499]: Warning via /usr/share/smartmontools/smartd-runner to root: successful
smartctl:
197 Current_Pending_Sector 0x0012 200 200 000 Old_age Always - 1 198 Offline_Uncorrectable 0x0012 200 200 000 Old_age Always - 1
Liens
- Page officielle
- Références des attributs du disque
- Soyez Smart! chez Lea-Linux
- Bad block HOWTO for smartmontools