storcli64和smartctl定位硬碟的故障信息

来源:https://www.cnblogs.com/wangl-blog/archive/2019/05/09/10839635.html
-Advertisement-
Play Games

定位硬碟盤位和盤符的方法 From Lin.Wang [TOC] Section One : Introduction strocli是megacli的升級版本,針對於戴爾伺服器是perccli,用法完全一致 smartctl可以查看磁碟的主控晶元smart信息 lsscsi可以查看系統的scsi信 ...


目錄

定位硬碟盤位和盤符的方法

From Lin.Wang

Section One : Introduction

strocli是megacli的升級版本,針對於戴爾伺服器是perccli,用法完全一致

smartctl可以查看磁碟的主控晶元smart信息

lsscsi可以查看系統的scsi信息,數據來源/proc/scsi/scsi相關,該文檔此處暫不介紹

這些工具都是查看磁碟相關信息的常用工具,對於排查磁碟狀態和raid卡問題都有幫助

Section Two : Install package

安裝一下storcli或者perccli,並且將命令軟連接到/usr/bin/目錄下,方便使用命令:

ln -s /opt/MegaRAID/storcli/storcli64 /usr/bin/

ln -s /opt/MegaRAID/perccli/percclie64 /usr/bin/

Section Three : Step

由系統磁碟盤符/dev/sdf定位對應的硬碟盤位思路如下:

  1. perccli64 /c0/eall/sall show 看到該磁碟有

    img-/c0/eall/sall

    從該圖看到有四個jbod分區,根據經驗一般人為jbod的分區系統盤符會在raid分區之前,也就是說jbod的分區會從/dev/sda > /dev/sdd,raid的分區從/dev/sde開始;

    DG代表drive group,是配置raid建分組的順序,有圖上看到32:4和32:5是一個捲組。

  2. perccli64 /c0/vall show看到該磁碟的DG與VD的對應關係如下

img-/c0/vall

​ 由圖上看到DG/VD就是raid的捲組和系統里捲組的順序對應關係,一般如果伺服器只有raid捲組來說的話,VD0就是操作系統里的/dev/sda,以此類推;但是如果伺服器包括了jbod捲組,則raid的捲組從jbod後開始排序,本例中也就是VD0=/dev/sde,則要定位/dev/sdf的話VD=1,對應DG=1;

​ 回到img-/c0/eall/sall上,DG為1時,DID=6,DID就是device id,這個概念後邊有用;同時Slot NO.也就是slt = 6對應的伺服器上盤位就是第7個(從0開始到6),此時即定位到了/dev/sdf的物理盤位。

反之從伺服器上看到硬碟故障燈,可以反推對應的系統分區盤符

Note:

​ 如果伺服器沒有jbod捲組,全是raid的,則此時/c0/vall找到對應關係即可定位關聯關係

​ 實際操作時還可以通過 perccli64 /c0/e32/s6 start/stop locate點亮關閉磁碟燈,來判斷定位是否正確

Section Four : storcli/perccli Usage

查看控制器的信息

perccli64 show ctrlcount 查看有幾個控制器即幾個raid卡

perccli64 show 顯示raid卡信息

[root@node-15 ~]# perccli64 show
Status Code = 0
Status = Success
Description = None

Number of Controllers = 1
Host Name = node-15.domain.tld
Operating System  = Linux3.10.0-327.20.1.es2.el7.x86_64

System Overview :
===============

------------------------------------------------------------------------
Ctl Model        Ports PDs DGs DNOpt VDs VNOpt BBU sPR DS EHS ASOs Hlth 
------------------------------------------------------------------------
  0 PERCH730Mini     8  16  11     0  11     0 Opt On  3  N      0 Opt  
------------------------------------------------------------------------

Ctl=Controller Index|DGs=Drive groups|VDs=Virtual drives|Fld=Failed
PDs=Physical drives|DNOpt=DG NotOptimal|VNOpt=VD NotOptimal|Opt=Optimal
Msng=Missing|Dgd=Degraded|NdAtn=Need Attention|Unkwn=Unknown
sPR=Scheduled Patrol Read|DS=DimmerSwitch|EHS=Emergency Hot Spare
Y=Yes|N=No|ASOs=Advanced Software Options|BBU=Battery backup unit
Hlth=Health|Safe=Safe-mode boot

可以看到只有一個raid卡,ctrl 0也是就是/c0

storcli64 /c0 show

[root@node-15 ~]# perccli64 /c0 show
Generating detailed summary of the adapter, it may take a while to complete.

Controller = 0
Status = Success
Description = None

Product Name = PERC H730 Mini
Serial Number = 663021Z
SAS Address =  51866da066153000
PCI Address = 00:03:00:00
System Time = 01/10/2019 20:48:38
Mfg. Date = 06/17/16
Controller Time = 01/10/2019 12:44:21
FW Package Build = 25.4.0.0017
BIOS Version = 6.29.00.0_4.16.07.00_0x06120100
FW Version = 4.260.00-6259
Driver Name = megaraid_sas
Driver Version = 06.807.10.00-rh1
Current Personality = RAID-Mode
Vendor Id = 0x1000
Device Id = 0x5D
SubVendor Id = 0x1028
SubDevice Id = 0x1F49
Host Interface = PCI-E
Device Interface = SAS-12G
Bus Number = 3
Device Number = 0
Function Number = 0
Drive Groups = 11

TOPOLOGY :
========

---------------------------------------------------------------------------
DG Arr Row EID:Slot DID Type  State BT     Size PDC  PI SED DS3  FSpace TR 
---------------------------------------------------------------------------
 0 -   -   -        -   RAID1 Optl  N  931.0 GB dflt N  N   dflt N      N  
 0 0   -   -        -   RAID1 Optl  N  931.0 GB dflt N  N   dflt N      N  
 0 0   0   32:4     4   DRIVE Onln  N  931.0 GB dflt N  N   dflt -      N  
 0 0   1   32:5     5   DRIVE Onln  N  931.0 GB dflt N  N   dflt -      N  
 1 -   -   -        -   RAID0 Optl  N  931.0 GB dflt N  N   dflt N      N  
 1 0   -   -        -   RAID0 Optl  N  931.0 GB dflt N  N   dflt N      N  
 1 0   0   32:6     6   DRIVE Onln  N  931.0 GB dflt N  N   dflt -      N  
 2 -   -   -        -   RAID0 Optl  N  931.0 GB dflt N  N   dflt N      N  
 2 0   -   -        -   RAID0 Optl  N  931.0 GB dflt N  N   dflt N      N  
 2 0   0   32:7     7   DRIVE Onln  N  931.0 GB dflt N  N   dflt -      N  
 3 -   -   -        -   RAID0 Optl  N  931.0 GB dflt N  N   dflt N      N  
 3 0   -   -        -   RAID0 Optl  N  931.0 GB dflt N  N   dflt N      N  
 3 0   0   32:8     8   DRIVE Onln  N  931.0 GB dflt N  N   dflt -      N  
 4 -   -   -        -   RAID0 Optl  N  931.0 GB dflt N  N   dflt N      N  
 4 0   -   -        -   RAID0 Optl  N  931.0 GB dflt N  N   dflt N      N  
 4 0   0   32:9     9   DRIVE Onln  N  931.0 GB dflt N  N   dflt -      N  
 5 -   -   -        -   RAID0 Optl  N  931.0 GB dflt N  N   dflt N      N  
 5 0   -   -        -   RAID0 Optl  N  931.0 GB dflt N  N   dflt N      N  
 5 0   0   32:10    10  DRIVE Onln  N  931.0 GB dflt N  N   dflt -      N  
 6 -   -   -        -   RAID0 Optl  N  931.0 GB dflt N  N   dflt N      N  
 6 0   -   -        -   RAID0 Optl  N  931.0 GB dflt N  N   dflt N      N  
 6 0   0   32:11    11  DRIVE Onln  N  931.0 GB dflt N  N   dflt -      N  
 7 -   -   -        -   RAID0 Optl  N  931.0 GB dflt N  N   dflt N      N  
 7 0   -   -        -   RAID0 Optl  N  931.0 GB dflt N  N   dflt N      N  
 7 0   0   32:12    12  DRIVE Onln  N  931.0 GB dflt N  N   dflt -      N  
 8 -   -   -        -   RAID0 Optl  N  931.0 GB dflt N  N   dflt N      N  
 8 0   -   -        -   RAID0 Optl  N  931.0 GB dflt N  N   dflt N      N  
 8 0   0   32:13    13  DRIVE Onln  N  931.0 GB dflt N  N   dflt -      N  
 9 -   -   -        -   RAID0 Optl  N  931.0 GB dflt N  N   dflt N      N  
 9 0   -   -        -   RAID0 Optl  N  931.0 GB dflt N  N   dflt N      N  
 9 0   0   32:14    14  DRIVE Onln  N  931.0 GB dflt N  N   dflt -      N  
10 -   -   -        -   RAID0 Optl  N  931.0 GB dflt N  N   dflt N      N  
10 0   -   -        -   RAID0 Optl  N  931.0 GB dflt N  N   dflt N      N  
10 0   0   32:15    15  DRIVE Onln  N  931.0 GB dflt N  N   dflt -      N  
---------------------------------------------------------------------------

DG=Disk Group Index|Arr=Array Index|Row=Row Index|EID=Enclosure Device ID
DID=Device ID|Type=Drive Type|Onln=Online|Rbld=Rebuild|Dgrd=Degraded
Pdgd=Partially degraded|Offln=Offline|BT=Background Task Active
PDC=PD Cache|PI=Protection Info|SED=Self Encrypting Drive|Frgn=Foreign
DS3=Dimmer Switch 3|dflt=Default|Msng=Missing|FSpace=Free Space Present
TR=Transport Ready

Virtual Drives = 11

VD LIST :
=======

-------------------------------------------------------------
DG/VD TYPE  State Access Consist Cache Cac sCC     Size Name 
-------------------------------------------------------------
0/0   RAID1 Optl  RW     Yes     RWBD  -   OFF 931.0 GB      
1/1   RAID0 Optl  RW     Yes     RWBD  -   OFF 931.0 GB      
2/2   RAID0 Optl  RW     Yes     RWBD  -   OFF 931.0 GB      
3/3   RAID0 Optl  RW     Yes     RWBD  -   OFF 931.0 GB      
4/4   RAID0 Optl  RW     Yes     RWBD  -   OFF 931.0 GB      
5/5   RAID0 Optl  RW     Yes     RWBD  -   OFF 931.0 GB      
6/6   RAID0 Optl  RW     Yes     RWBD  -   OFF 931.0 GB      
7/7   RAID0 Optl  RW     Yes     RWBD  -   OFF 931.0 GB      
8/8   RAID0 Optl  RW     Yes     RWBD  -   OFF 931.0 GB      
9/9   RAID0 Optl  RW     Yes     RWBD  -   OFF 931.0 GB      
10/10 RAID0 Optl  RW     Yes     RWBD  -   OFF 931.0 GB      
-------------------------------------------------------------

Cac=CacheCade|Rec=Recovery|OfLn=OffLine|Pdgd=Partially Degraded|Dgrd=Degraded
Optl=Optimal|RO=Read Only|RW=Read Write|HD=Hidden|TRANS=TransportReady|B=Blocked|
Consist=Consistent|R=Read Ahead Always|NR=No Read Ahead|WB=WriteBack|
FWB=Force WriteBack|WT=WriteThrough|C=Cached IO|D=Direct IO|sCC=Scheduled
Check Consistency

Physical Drives = 16

PD LIST :
=======

----------------------------------------------------------------------------
EID:Slt DID State DG      Size Intf Med SED PI SeSz Model                Sp 
----------------------------------------------------------------------------
32:0      0 JBOD  -  185.75 GB SATA SSD N   N  512B INTEL SSDSC2BX200G4R U  
32:1      1 JBOD  -  185.75 GB SATA SSD N   N  512B INTEL SSDSC2BX200G4R U  
32:2      2 JBOD  -  185.75 GB SATA SSD N   N  512B INTEL SSDSC2BX200G4R U  
32:3      3 JBOD  -  185.75 GB SATA SSD N   N  512B INTEL SSDSC2BX200G4R U  
32:4      4 Onln  0   931.0 GB SATA HDD N   N  512B ST91000640NS         U  
32:5      5 Onln  0   931.0 GB SATA HDD N   N  512B ST91000640NS         U  
32:6      6 Onln  1   931.0 GB SATA HDD N   N  512B ST91000640NS         U  
32:7      7 Onln  2   931.0 GB SATA HDD N   N  512B ST91000640NS         U  
32:8      8 Onln  3   931.0 GB SATA HDD N   N  512B ST91000640NS         U  
32:9      9 Onln  4   931.0 GB SATA HDD N   N  512B ST91000640NS         U  
32:10    10 Onln  5   931.0 GB SATA HDD N   N  512B ST91000640NS         U  
32:11    11 Onln  6   931.0 GB SATA HDD N   N  512B ST91000640NS         U  
32:12    12 Onln  7   931.0 GB SATA HDD N   N  512B ST91000640NS         U  
32:13    13 Onln  8   931.0 GB SATA HDD N   N  512B ST91000640NS         U  
32:14    14 Onln  9   931.0 GB SATA HDD N   N  512B ST91000640NS         U  
32:15    15 Onln  10  931.0 GB SATA HDD N   N  512B ST91000640NS         U  
----------------------------------------------------------------------------

EID-Enclosure Device ID|Slt-Slot No.|DID-Device ID|DG-DriveGroup
DHS-Dedicated Hot Spare|UGood-Unconfigured Good|GHS-Global Hotspare
UBad-Unconfigured Bad|Onln-Online|Offln-Offline|Intf-Interface
Med-Media Type|SED-Self Encryptive Drive|PI-Protection Info
SeSz-Sector Size|Sp-Spun|U-Up|D-Down/PowerSave|T-Transition|F-Foreign
UGUnsp-Unsupported|UGShld-UnConfigured shielded|HSPShld-Hotspare shielded
CFShld-Configured shielded|Cpybck-CopyBack|CBShld-Copyback Shielded


BBU_Info :
========

----------------------------------------------
Model State   RetentionTime Temp Mode MfgDate 
----------------------------------------------
BBU   Optimal 0 hour(s)     38C  -    0/00/00 
----------------------------------------------
看磁碟的Device id、Slot No. 以及DriveGroup
[root@node-15 ~]# perccli64 /c0/eall/sall show
Controller = 0
Status = Success
Description = Show Drive Information Succeeded.


Drive Information :
=================

----------------------------------------------------------------------------
EID:Slt DID State DG      Size Intf Med SED PI SeSz Model                Sp 
----------------------------------------------------------------------------
32:0      0 JBOD  -  185.75 GB SATA SSD N   N  512B INTEL SSDSC2BX200G4R U  
32:1      1 JBOD  -  185.75 GB SATA SSD N   N  512B INTEL SSDSC2BX200G4R U  
32:2      2 JBOD  -  185.75 GB SATA SSD N   N  512B INTEL SSDSC2BX200G4R U  
32:3      3 JBOD  -  185.75 GB SATA SSD N   N  512B INTEL SSDSC2BX200G4R U  
32:4      4 Onln  0   931.0 GB SATA HDD N   N  512B ST91000640NS         U  
32:5      5 Onln  0   931.0 GB SATA HDD N   N  512B ST91000640NS         U  
32:6      6 Onln  1   931.0 GB SATA HDD N   N  512B ST91000640NS         U  
32:7      7 Onln  2   931.0 GB SATA HDD N   N  512B ST91000640NS         U  
32:8      8 Onln  3   931.0 GB SATA HDD N   N  512B ST91000640NS         U  
32:9      9 Onln  4   931.0 GB SATA HDD N   N  512B ST91000640NS         U  
32:10    10 Onln  5   931.0 GB SATA HDD N   N  512B ST91000640NS         U  
32:11    11 Onln  6   931.0 GB SATA HDD N   N  512B ST91000640NS         U  
32:12    12 Onln  7   931.0 GB SATA HDD N   N  512B ST91000640NS         U  
32:13    13 Onln  8   931.0 GB SATA HDD N   N  512B ST91000640NS         U  
32:14    14 Onln  9   931.0 GB SATA HDD N   N  512B ST91000640NS         U  
32:15    15 Onln  10  931.0 GB SATA HDD N   N  512B ST91000640NS         U  
----------------------------------------------------------------------------

EID-Enclosure Device ID|Slt-Slot No.|DID-Device ID|DG-DriveGroup
DHS-Dedicated Hot Spare|UGood-Unconfigured Good|GHS-Global Hotspare
UBad-Unconfigured Bad|Onln-Online|Offln-Offline|Intf-Interface
Med-Media Type|SED-Self Encryptive Drive|PI-Protection Info
SeSz-Sector Size|Sp-Spun|U-Up|D-Down/PowerSave|T-Transition|F-Foreign
UGUnsp-Unsupported|UGShld-UnConfigured shielded|HSPShld-Hotspare shielded
CFShld-Configured shielded|Cpybck-CopyBack|CBShld-Copyback Shielded

Note:

​ 根據經驗,jbod的分區在raid的分區之前

查看指定硬碟的信息
[root@node-15 ~]# perccli64 /c0/e32/s6 show all
Controller = 0
Status = Success
Description = Show Drive Information Succeeded.


Drive /c0/e32/s6 :
================

-------------------------------------------------------------------
EID:Slt DID State DG     Size Intf Med SED PI SeSz Model        Sp 
-------------------------------------------------------------------
32:6      6 Onln   1 931.0 GB SATA HDD N   N  512B ST91000640NS U  
-------------------------------------------------------------------

EID-Enclosure Device ID|Slt-Slot No.|DID-Device ID|DG-DriveGroup
DHS-Dedicated Hot Spare|UGood-Unconfigured Good|GHS-Global Hotspare
UBad-Unconfigured Bad|Onln-Online|Offln-Offline|Intf-Interface
Med-Media Type|SED-Self Encryptive Drive|PI-Protection Info
SeSz-Sector Size|Sp-Spun|U-Up|D-Down/PowerSave|T-Transition|F-Foreign
UGUnsp-Unsupported|UGShld-UnConfigured shielded|HSPShld-Hotspare shielded
CFShld-Configured shielded|Cpybck-CopyBack|CBShld-Copyback Shielded


Drive /c0/e32/s6 - Detailed Information :
=======================================

Drive /c0/e32/s6 State :
======================
Shield Counter = 0
Media Error Count = 46431               *** 很明顯的問題發生了46431次介質錯誤 ***
Other Error Count = 0
Drive Temperature =  31C (87.80 F)  
Predictive Failure Count = 126          *** 預測故障次數126次 ***
S.M.A.R.T alert flagged by drive = Yes


Drive /c0/e32/s6 Device attributes :
==================================
SN = 9XGA228L
Manufacturer Id = ATA     
Model Number = ST91000640NS
NAND Vendor = NA
WWN = 5000c500918f2f8a
Firmware Revision =     AA63
Raw size = 931.512 GB [0x74706db0 Sectors]
Coerced size = 931.0 GB [0x74600000 Sectors]
Non Coerced size = 931.012 GB [0x74606db0 Sectors]
Device Speed = 6.0Gb/s
Link Speed = 12.0Gb/s
NCQ setting = N/A
Write Cache = Enabled
Logical Sector Size = 512B
Physical Sector Size = 512B
Connector Name = 00 


Drive /c0/e32/s6 Policies/Settings :
==================================
Drive position = DriveGroup:1, Span:0, Row:0
Enclosure position = 0
Connected Port Number = 0(path0) 
Sequence Number = 2
Commissioned Spare = No
Emergency Spare = No
Last Predictive Failure Event Sequence Number = 95183    *** 上一次預測錯誤的序號95183 ***
Successful diagnostics completion on = N/A
SED Capable = No
SED Enabled = No
Secured = No
Cryptographic Erase Capable = No
Locked = No
Needs EKM Attention = No
PI Eligible = No
Certified = Yes
Wide Port Capable = No

Port Information :
================

-----------------------------------------
Port Status Linkspeed SAS address        
-----------------------------------------
   0 Active 12.0Gb/s  0x500056b33fefe586 
-----------------------------------------


Inquiry Data = 
5a 0c ff 3f 37 c8 10 00 00 00 00 00 3f 00 00 00 
00 00 00 00 20 20 20 20 20 20 20 20 20 20 20 20 
58 39 41 47 32 32 4c 38 00 00 00 00 04 00 20 20 
20 20 41 41 33 36 54 53 31 39 30 30 36 30 30 34 
53 4e 20 20 20 20 20 20 20 20 20 20 20 20 20 20 
20 20 20 20 20 20 20 20 20 20 20 20 20 20 10 80 
00 40 00 2f 00 40 00 02 00 02 07 00 ff 3f 10 00 
3f 00 10 fc fb 00 10 00 ff ff ff 0f 00 00 07 00 

Note:

通過單個捲組的信息查看,發現了media error,說明瞭硬碟是有問題的

查看磁碟與系統磁碟分區的對應
[root@node-15 ~]# perccli64 /c0/vall show
Controller = 0
Status = Success
Description = None


Virtual Drives :
==============

-------------------------------------------------------------
DG/VD TYPE  State Access Consist Cache Cac sCC     Size Name 
-------------------------------------------------------------
0/0   RAID1 Optl  RW     Yes     RWBD  -   OFF 931.0 GB      
1/1   RAID0 Optl  RW     Yes     RWBD  -   OFF 931.0 GB      
2/2   RAID0 Optl  RW     Yes     RWBD  -   OFF 931.0 GB      
3/3   RAID0 Optl  RW     Yes     RWBD  -   OFF 931.0 GB      
4/4   RAID0 Optl  RW     Yes     RWBD  -   OFF 931.0 GB      
5/5   RAID0 Optl  RW     Yes     RWBD  -   OFF 931.0 GB      
6/6   RAID0 Optl  RW     Yes     RWBD  -   OFF 931.0 GB      
7/7   RAID0 Optl  RW     Yes     RWBD  -   OFF 931.0 GB      
8/8   RAID0 Optl  RW     Yes     RWBD  -   OFF 931.0 GB      
9/9   RAID0 Optl  RW     Yes     RWBD  -   OFF 931.0 GB      
10/10 RAID0 Optl  RW     Yes     RWBD  -   OFF 931.0 GB      
-------------------------------------------------------------

Cac=CacheCade|Rec=Recovery|OfLn=OffLine|Pdgd=Partially Degraded|Dgrd=Degraded
Optl=Optimal|RO=Read Only|RW=Read Write|HD=Hidden|TRANS=TransportReady|B=Blocked|
Consist=Consistent|R=Read Ahead Always|NR=No Read Ahead|WB=WriteBack|
FWB=Force WriteBack|WT=WriteThrough|C=Cached IO|D=Direct IO|sCC=Scheduled
Check Consistency

Note:

VD:一般認為是該硬碟在系統里的設備順序,一般如果只有raid分區,那麼VD=0的就是系統里的/dev/sda,VD=1就是/dev/sdb以此類推,但是如果有jbod的分區,先排列jbod分區,如jbod的到了/dev/sdc,VD0則是/dev/sdd,以此類推;
DG:是在raid卡裡配置捲組的順序;

Raid卡日誌收集相關命令

storcli64 /c0 show time 顯示raid的時間

storcli64 /c0 show alilog logfile=node-x.alilog 獲取alilog,所有的log都包括了

storcli64 /c0 show all logfile=node-x.all.log raid卡的信息

storcli64 /c0 show badblocks 磁碟壞道的信息

perccli64 /c0 show events filter=fatal 顯示事件級別為fatal的,可以獲取所有毀滅性事件的信息,發現磁碟故障或raid卡故障

perccli64 /c0 show cc 數據一致性檢測,raid1以上的級別多個盤的數據是需要進行一致性檢測的,但是單盤raid0可能是不需要的,是否影響性能不確定

Section Five : Smartctl Get Error info of Disks

Common Commands Usage Description

--scan Scan for devices

--scan-open Scan for devices and try to open each device

-x, --xall Show all information for device

-a, --all Show all SMART information for device

-i, --info Show identity information for device

-d TYPE, --device=TYPE Specify device type to one of: ata, scsi, nvme[,NSID], sat[,auto][,N][+TYPE], usbcypress[,X], usbjmicron[,p][,x][,N], usbprolific, usbsunplus, marvell, areca,N/E, 3ware,N, hpt,L/M/N, megaraid,N, aacraid,H,L,ID, cciss,N, auto, test

-s VALUE, --smart=VALUE Enable/disable SMART on device (on/off)

-o VALUE, --offlineauto=VALUE(ATA) Enable/disable automatic offline testing on device (on/off)

-S VALUE, --saveauto=VALUE(ATA) Enable/disable Attribute autosave on device (on/off)

-H, --health Show device SMART health status

-c, --capabilities(ATA,NVMe) Show device SMART capabilities

-A, --attributes Show device SMART vendor-specific Attributes and values

-l TYPE, --log=TYPE Show device log. TYPE: error, selftest, selective, directory[,g|s],
​ xerror[,N][,error], xselftest[,N][,selftest],
​ background, sasphy[,reset], sataphy[,reset],
​ scttemp[sts,hist], scttempint,N[,p],
​ scterc[,N,M], devstat[,N], ssd,
​ gplog,N[,RANGE], smartlog,N[,RANGE],
​ nvmelog,N,SIZE

-t TEST, --test=TEST Run test. TEST: offline, short, long, conveyance, force, vendor,N,
​ select,M-N, pending,N, afterselect,[on|off]

-X, --abort Abort any non-captive test on device

Get info for /dev/sdf

查看所有設備列表
[root@node-15 ~]# smartctl --scan
/dev/sda -d scsi # /dev/sda, SCSI device
/dev/sdb -d scsi # /dev/sdb, SCSI device
/dev/sdc -d scsi # /dev/sdc, SCSI device
/dev/sdd -d scsi # /dev/sdd, SCSI device
/dev/sde -d scsi # /dev/sde, SCSI device
/dev/sdf -d scsi # /dev/sdf, SCSI device
/dev/sdg -d scsi # /dev/sdg, SCSI device
/dev/sdh -d scsi # /dev/sdh, SCSI device
/dev/sdi -d scsi # /dev/sdi, SCSI device
/dev/sdj -d scsi # /dev/sdj, SCSI device
/dev/sdk -d scsi # /dev/sdk, SCSI device
/dev/sdl -d scsi # /dev/sdl, SCSI device
/dev/sdm -d scsi # /dev/sdm, SCSI device
/dev/sdn -d scsi # /dev/sdn, SCSI device
/dev/sdo -d scsi # /dev/sdo, SCSI device
/dev/bus/0 -d megaraid,0 # /dev/bus/0 [megaraid_disk_00], SCSI device
/dev/bus/0 -d megaraid,1 # /dev/bus/0 [megaraid_disk_01], SCSI device
/dev/bus/0 -d megaraid,2 # /dev/bus/0 [megaraid_disk_02], SCSI device
/dev/bus/0 -d megaraid,3 # /dev/bus/0 [megaraid_disk_03], SCSI device
/dev/bus/0 -d megaraid,4 # /dev/bus/0 [megaraid_disk_04], SCSI device
/dev/bus/0 -d megaraid,5 # /dev/bus/0 [megaraid_disk_05], SCSI device
/dev/bus/0 -d megaraid,6 # /dev/bus/0 [megaraid_disk_06], SCSI device
/dev/bus/0 -d megaraid,7 # /dev/bus/0 [megaraid_disk_07], SCSI device
/dev/bus/0 -d megaraid,8 # /dev/bus/0 [megaraid_disk_08], SCSI device
/dev/bus/0 -d megaraid,9 # /dev/bus/0 [megaraid_disk_09], SCSI device
/dev/bus/0 -d megaraid,10 # /dev/bus/0 [megaraid_disk_10], SCSI device
/dev/bus/0 -d megaraid,11 # /dev/bus/0 [megaraid_disk_11], SCSI device
/dev/bus/0 -d megaraid,12 # /dev/bus/0 [megaraid_disk_12], SCSI device
/dev/bus/0 -d megaraid,13 # /dev/bus/0 [megaraid_disk_13], SCSI device
/dev/bus/0 -d megaraid,14 # /dev/bus/0 [megaraid_disk_14], SCSI device
/dev/bus/0 -d megaraid,15 # /dev/bus/0 [megaraid_disk_15], SCSI device

Note:

通過前面的章節我們定位到了磁碟/dev/sdf在perccli里的DID即device_id為6,也就是/dev/bus/0 -d megaraid,6

查看磁碟信息
[root@node-15 ~]# smartctl -i -d megaraid,6 /dev/sdf
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-3.10.0-327.20.1.es2.el7.x86_64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Constellation.2 (SATA)
Device Model:     ST91000640NS
Serial Number:    9XGA228L
LU WWN Device Id: 5 000c50 0918f2f8a
Add. Product Id:  DELL(tm)
Firmware Version: AA63
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Form Factor:      2.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Fri Jan 11 11:28:46 2019 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
查看磁碟的屬性信息

一般此處可以用來查看磁碟的整體健康狀態指標參數

針對以下輸出信息,欄位的解釋

  • ID:屬性ID,通常是一個1到255之間的十進位或十六進位的數字。
  • ATTRIBUTE_NAME:硬碟製造商定義的屬性名。
  • FLAG:屬性操作標誌(可以忽略)。
  • VALUE:這是表格中最重要的信息之一,代表給定屬性的標準化值,在1到253之間。253意味著最好情況,1意味著最壞情況。取決於屬性和製造商,初始化VALUE可以被設置成100或200.
  • WORST:所記錄的最小VALUE。
  • THRESH:在報告硬碟FAILED狀態前,WORST可以允許的最小值,也就是WORST如果小於THRESH,磁碟就會報告FAILED。
  • TYPE:屬性的類型(Pre-fail或Oldage)。Pre-fail類型的屬性可被看成一個關鍵屬性,表示參與磁碟的整體SMART健康評估(PASSED/FAILED)。如果任何Pre-fail類型的屬性故障,那麼可視為磁碟將要發生故障。另一方面,Oldage類型的屬性可被看成一個非關鍵的屬性(如正常的磁碟磨損),表示不會使磁碟本身發生故障。
  • UPDATED:表示屬性的更新頻率。Offline代表磁碟上執行離線測試的時間。
  • WHEN_FAILED:如果VALUE小於等於THRESH,會被設置成“FAILING_NOW”;如果WORST小於等於THRESH會被設置成“In_the_past”;如果都不是,會被設置成“-”。在“FAILING_NOW”情況下,需要儘快備份重要文件,特別是屬性是Pre-fail類型時。“In_the_past”代表屬性已經故障了,但在運行測試的時候沒問題。“-”代表這個屬性從沒故障過。
  • RAW_VALUE:製造商定義的原始值,從VALUE派生。
[root@node-15 ~]# smartctl -A -d megaraid,6 /dev/sdf  
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-3.10.0-327.20.1.es2.el7.x86_64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x010f   081   038   044    Pre-fail  Always   In_the_past 151546765
  3 Spin_Up_Time            0x0103   094   094   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       21
  5 Reallocated_Sector_Ct   0x0133   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   085   060   030    Pre-fail  Always       -       338813105
  9 Power_On_Hours          0x0032   079   079   000    Old_age   Always       -       18784
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       21
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   001   001   000    Old_age   Always       -       1710
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   069   053   045    Old_age   Always       -       31 (Min/Max 24/40)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       19
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       852
194 Temperature_Celsius     0x0022   031   047   000    Old_age   Always       -       31 (0 14 0 0 0)
195 Hardware_ECC_Recovered  0x001a   117   099   000    Old_age   Always       -       151546765
197 Current_Pending_Sector  0x0012   084   084   000    Old_age   Always       -       688
198 Offline_Uncorrectable   0x0010   084   084   000    Old_age   Offline      -       688
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       8093 (164 214 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       1870535293
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       1530387871
查看磁碟的健康檢測狀態

Note:

關於以下檢測結果,說明檢測結果是PASSED的,就是磁碟還可以使用,但是列出了一條檢測異常的WORST<THRESH,TYPE是Pre-fail,WHEN_FAILED是In_the_past,說明預測這個盤快壞了。

[root@node-15 ~]# smartctl -H -d megaraid,6 /dev/sdf  
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-3.10.0-327.20.1.es2.el7.x86_64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Status not supported: ATA return descriptor not supported by controller firmware
SMART overall-health self-assessment test result: PASSED
Warning: This result is based on an Attribute check.
Please note the following marginal Attributes:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x010f   081   038   044    Pre-fail  Always   In_the_past 151546765
查看磁碟的錯誤日誌
[root@node-15 ~]# smartctl -l error -d megaraid,6 /dev/sdf
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-3.10.0-327.20.1.es2.el7.x86_64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Error Log Version: 1
ATA Error Count: 46431 (device log contains only the most recent five errors)
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 46431 occurred at disk power-on lifetime: 18640 hours (776 days + 16 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  42 00 00 ff ff ff 4f 00  46d+15:15:32.968  READ VERIFY SECTOR(S) EXT
  42 00 00 ff ff ff 4f 00  46d+15:15:29.901  READ VERIFY SECTOR(S) EXT
  42 00 00 ff ff ff 4f 00  46d+15:15:26.825  READ VERIFY SECTOR(S) EXT
  42 00 00 ff ff ff 4f 00  46d+15:15:23.965  READ VERIFY SECTOR(S) EXT
  42 00 00 ff ff ff 4f 00  46d+15:15:20.905  READ VERIFY SECTOR(S) EXT

Error 46430 occurred at disk power-on lifetime: 18640 hours (776 days + 16 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  42 00 00 ff ff ff 4f 00  46d+15:15:29.901  READ VERIFY SECTOR(S) EXT
  42 00 00 ff ff ff 4f 00  46d+15:15:26.825  READ VERIFY SECTOR(S) EXT
  42 00 00 ff ff ff 4f 00  46d+15:15:23.965  READ VERIFY SECTOR(S) EXT
  42 00 00 ff ff ff 4f 00  46d+15:15:20.905  READ VERIFY SECTOR(S) EXT
  42 00 00 ff ff ff 4f 00  46d+15:15:18.093  READ VERIFY SECTOR(S) EXT

Error 46429 occurred at disk power-on lifetime: 18640 hours (776 days + 16 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  42 00 00 ff ff ff 4f 00  46d+15:15:26.825  READ VERIFY SECTOR(S) EXT
  42 00 00 ff ff ff 4f 00  46d+15:15:23.965  READ VERIFY SECTOR(S) EXT
  42 00 00 ff ff ff 4f 00  46d+15:15:20.905  READ VERIFY SECTOR(S) EXT
  42 00 00 ff ff ff 4f 00  46d+15:15:18.093  READ VERIFY SECTOR(S) EXT
  b0 da 00 00 4f c2 00 00  46d+15:15:17.838  SMART RETURN STATUS

Error 46428 occurred at disk power-on lifetime: 18640 hours (776 days + 16 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  42 00 00 ff ff ff 4f 00  46d+15:15:23.965  READ VERIFY SECTOR(S) EXT
  42 00 00 ff ff ff 4f 00  46d+15:15:20.905  READ VERIFY SECTOR(S) EXT
  42 00 00 ff ff ff 4f 00  46d+15:15:18.093  READ VERIFY SECTOR(S) EXT
  b0 da 00 00 4f c2 00 00  46d+15:15:17.838  SMART RETURN STATUS
  2f 00 01 e0 00 00 40 00  46d+15:15:17.703  READ LOG EXT

Error 46427 occurred at disk power-on lifetime: 18640 hours (776 days + 16 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  42 00 00 ff ff ff 4f 00  46d+15:15:20.905  READ VERIFY SECTOR(S) EXT
  42 00 00 ff ff ff 4f 00  46d+15:15:18.093  READ VERIFY SECTOR(S) EXT
  b0 da 00 00 4f c2 00 00  46d+15:15:17.838  SMART RETURN STATUS
  2f 00 01 e0 00 00 40 00  46d+15:15:17.703  READ LOG EXT
  42 00 00 ff ff ff 4f 00  46d+15:15:15.276  READ VERIFY SECTOR(S) EXT
補充
  • 如果沒有開啟磁碟的smart可以通過-s on device開啟
  • 一般來說如果samrtctl -i 獲取info時沒有什麼信息輸出且smart support是允許的可用的,那麼說明可能需要做test才能獲取到-t short/long,該測試不會破壞硬碟上的數據,但對於存儲一般不適用離線offline測試
  • 收集時可以通過-x -a參數獲取更全面的磁碟信息
  • smartctl是可以配置服務的/etc/smartmontools/smartd.conf,對此目前沒有研究,後續有研究成果再更新

您的分享是我們最大的動力!

-Advertisement-
Play Games
更多相關文章
  • Abp計劃在今年6月發佈v1.0. 但可能被標記為Beta版本, 因為.Net Core 3.0將帶來很多重大變化. Abp團隊希望基於.Net Core 3.0推出1.0的穩定版本. .NET Core 3.0 RC版定於2019年7月發佈 GA版定於2019年9月發佈 綜上所述Abp vNext ...
  • 1.401錯誤,通常是指向訪問文件授權問題導致的,利於系統採用域賬號登錄,windows身份認證開啟,例如在做一些不需要登錄的介面的時候,通常會報401錯誤;解決方式如下: 找到身份驗證,雙擊打開 打開匿名訪問即可;由於不可能所有的文件都匿名訪問,可以設置個別文件為匿名訪問,401錯誤,解決 2.5 ...
  • 昨天有練習對數字陣列進行排序,《C#陣列Array排序》https://www.cnblogs.com/insus/p/10825174.html 其實一切都弄得很複雜,array已經有2個方法OrderBy和OrderByDescending: 參考下麵代碼演示: int[] ints = { 1 ...
  • 第1節:安裝screen1.載入系統鏡像文件,因為screen的安裝包在系統鏡像文件中圖001 2.列出系統上所有的磁碟[root@centos6 ~]# lsblk圖002 3.安裝screen應用命令過程圖003 圖004 第2節:應用screen1.發起名為help的會話 [root@cent ...
  • Vim編輯器 文本編輯器 , 字處理器ASCIIvi:Visual Interface vim :VI iMproved 全屏編輯器,模式化編輯器vim 模式:編輯模式(命令模式)輸入模式末行模式 模式轉換:編輯模式-->輸入模式: i : 在當前游標所在的字元的前面,轉為輸入模式 a : 在當前光 ...
  • Linux中shell腳本的執行通常有4種方式,分別為工作目錄執行,絕對路徑執行,sh執行,shell環境執行。 首先,看下我們的腳本內容 1、工作目錄執行 工作目錄執行,指的是執行腳本時,先進入到腳本所在的目錄(此時,稱為工作目錄),然後使用 ./腳本方式執行 如圖,報了許可權錯誤,上一篇博文有提到 ...
  • @ "TOC" Tab鍵是linux系統中最重要的鍵之一了,它的功能是命令自動補全== 1.用於顯示當前的日期和時間 2/用於顯示當前的日曆 3.用於顯示當前的工作路徑 4.用於切換當前路徑 例: 比如你需要切換到本目錄的上一個目錄可以使用 5.回到用戶的家目錄 6.wc用於統計給定文件的行數、字數 ...
  • 1.平時不用root,添加普通用戶,使用sudo授權管理 sudo su - 2.更改ssh服務埠和禁止root用戶遠程連接 vim /etc/ssh/sshd_config 3.定時自動更新系統時間(ntp服務) 4.配置yum更新源,設置國內的跟新地址 5.關閉selinux及iptables ...
一周排行
    -Advertisement-
    Play Games
  • 移動開發(一):使用.NET MAUI開發第一個安卓APP 對於工作多年的C#程式員來說,近來想嘗試開發一款安卓APP,考慮了很久最終選擇使用.NET MAUI這個微軟官方的框架來嘗試體驗開發安卓APP,畢竟是使用Visual Studio開發工具,使用起來也比較的順手,結合微軟官方的教程進行了安卓 ...
  • 前言 QuestPDF 是一個開源 .NET 庫,用於生成 PDF 文檔。使用了C# Fluent API方式可簡化開發、減少錯誤並提高工作效率。利用它可以輕鬆生成 PDF 報告、發票、導出文件等。 項目介紹 QuestPDF 是一個革命性的開源 .NET 庫,它徹底改變了我們生成 PDF 文檔的方 ...
  • 項目地址 項目後端地址: https://github.com/ZyPLJ/ZYTteeHole 項目前端頁面地址: ZyPLJ/TreeHoleVue (github.com) https://github.com/ZyPLJ/TreeHoleVue 目前項目測試訪問地址: http://tree ...
  • 話不多說,直接開乾 一.下載 1.官方鏈接下載: https://www.microsoft.com/zh-cn/sql-server/sql-server-downloads 2.在下載目錄中找到下麵這個小的安裝包 SQL2022-SSEI-Dev.exe,運行開始下載SQL server; 二. ...
  • 前言 隨著物聯網(IoT)技術的迅猛發展,MQTT(消息隊列遙測傳輸)協議憑藉其輕量級和高效性,已成為眾多物聯網應用的首選通信標準。 MQTTnet 作為一個高性能的 .NET 開源庫,為 .NET 平臺上的 MQTT 客戶端與伺服器開發提供了強大的支持。 本文將全面介紹 MQTTnet 的核心功能 ...
  • Serilog支持多種接收器用於日誌存儲,增強器用於添加屬性,LogContext管理動態屬性,支持多種輸出格式包括純文本、JSON及ExpressionTemplate。還提供了自定義格式化選項,適用於不同需求。 ...
  • 目錄簡介獲取 HTML 文檔解析 HTML 文檔測試參考文章 簡介 動態內容網站使用 JavaScript 腳本動態檢索和渲染數據,爬取信息時需要模擬瀏覽器行為,否則獲取到的源碼基本是空的。 本文使用的爬取步驟如下: 使用 Selenium 獲取渲染後的 HTML 文檔 使用 HtmlAgility ...
  • 1.前言 什麼是熱更新 游戲或者軟體更新時,無需重新下載客戶端進行安裝,而是在應用程式啟動的情況下,在內部進行資源或者代碼更新 Unity目前常用熱更新解決方案 HybridCLR,Xlua,ILRuntime等 Unity目前常用資源管理解決方案 AssetBundles,Addressable, ...
  • 本文章主要是在C# ASP.NET Core Web API框架實現向手機發送驗證碼簡訊功能。這裡我選擇是一個互億無線簡訊驗證碼平臺,其實像阿裡雲,騰訊雲上面也可以。 首先我們先去 互億無線 https://www.ihuyi.com/api/sms.html 去註冊一個賬號 註冊完成賬號後,它會送 ...
  • 通過以下方式可以高效,並保證數據同步的可靠性 1.API設計 使用RESTful設計,確保API端點明確,並使用適當的HTTP方法(如POST用於創建,PUT用於更新)。 設計清晰的請求和響應模型,以確保客戶端能夠理解預期格式。 2.數據驗證 在伺服器端進行嚴格的數據驗證,確保接收到的數據符合預期格 ...