Talk:Self-Monitoring, Analysis and Reporting Technology
This is the talk page for discussing improvements to the Self-Monitoring, Analysis and Reporting Technology article. This is not a forum for general discussion of the article's subject. |
Article policies
|
Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL |
Archives: Index, 1, 2 |
Computing Start‑class Low‑importance | ||||||||||
|
The contents of the Threshold Exceeds Condition page were merged into Self-Monitoring, Analysis and Reporting Technology on January 24, 2007. For the contribution history and old versions of the redirected page, please see Error: Invalid time. its history; for the discussion at that location, see its talk page. |
Unnamed section
This article should include a warning of the obnoxious SMART HDD trojan. As they lend a legitimate name, this article could lend credibility _*_ — Preceding unsigned comment added by 195.192.23.68 (talk) 14:53, 19 April 2012 (UTC)
Reallocated Sectors Count - mess
Whoever wrote this has mixed in a bunch of out-of-context explanations which together are non-sensical or contradictory. For instance - the drive itself has no concept of partitions. Or, "failure of boot sector" - obviously this makes no difference after it's reallocated - but antique info from the days of floppy disks has crept into the explanation... —Preceding unsigned comment added by 203.45.103.88 (talk) 23:16, 9 February 2011 (UTC)
SMART Attributes List
Some descriptions of the SMART attributes are clearly incorrect. "Load" refers to the operation and number of times the heads move from parked to unparked, not when the drive is seeking. GMR head amplitude refers to the signal from the read head, not any movement. "Read Channel Margin" description is content-free.
AAM and APM should be listed as follows:
AAM = Automatic Acoustic Management, APM = Automatic Power Management
Spin Retry Count
This description does not appear to correspond to actual data values in Western Digital and Seagate drives (recent models). Seagate posts 100-100-97-0-OK with HDTune marking it yellow (warning) bar, and WD posts 100-100-51-0-OK with HDTune with no marking color. (Values correspond to Current-Worst-Threshold-Data-Status). With respect to these two manufacturers, the description makes no sense. —The preceding unsigned comment was added by FUBARinSFO (talk • contribs) 00:39, 2 May 2007 (UTC).
Reallocated sectors
Please make it easier for me to **use** this information. Please enhance the Table entries.
Could we get practical and say, This is a down-counter, and, when zero, there is no way to deal with additional sectors whose read errors are too severe to be fixed with Error Correction Codes.
In the discussion above the Table, tell me, if there is no more space to absorb a sector needing reallocation, does my drive now pass errors up to the opsys file system, which reports read errors and/or other file unavailability?
In the table, you can make room by erasing: "the more sectors that are reallocated, the more read/write speed will decrease". This is true. However, it hardly matters to a user who is suffering data loss, possible data corruption, and potential boot failure, which blocks access to everything.
My SMARTCTRL under WinXP or Knoppix shows
Reallocated_Sector_Ct
VALUE 1
WORST 1
THRESHOLD 63 (Please confirm that a number less than 63 is bad -- but which number?)
This "pre-failure" category of parameter is UPDATED "always".
WHEN_FAILED is "FAILING_NOW". How can I tell? Because VALUE 1 is less than THRESHOLD 63?
The "RA" column (I do not know what this is. Do you? RAW counts?)
is 12. 12 is not 1 and 12 is not 63. I wonder what 12 is.
This page has not yet evolved into a pratical guide and it still lacks an accessible exposition of the topic's salient points. Nevertheless, and IMHO, it is already far ahead of most pages and posts on the Internet. So let's not stop now ! Jerry-va 01:27, 29 May 2006 (UTC)jerry-va
- The final column is RAW_VALUE, and 12 for the Reallocated_Sector_Ct attribute means that 12 bad sectors have been remapped. You should replace this disk soon, as that number will only rise, and the higher it gets, the more data you're going to lose. --Error28 12:03, 4 September 2006 (UTC)
It comes down to money. If you have reallocated sectors you should replace your disk. In my experience you can go much longer with reallocated sectors on desktop drives, once reallocated sectors show up in laptops they increase fast.Josiah 20:02, 12 October 2007 (UTC)
"This is why, on modern hard disks, "bad blocks" cannot be found while testing the surface — all bad blocks are hidden in reallocated sectors. " I don't think that's quite accurate, based on my understanding. When you write to a bad sector, sure, it gets silently reallocated. If you read the sector and the data is bad but corrected by ECC, the drive should correct it and copy it to a reallocated sector. But if the data is uncorrectable, an error must be returned, since it would be unacceptable to return bogus data. Moreover, it must continue to give an error on a future read, until the sector is rewritten. So you can sometimes find bad sectors by reading the entire disk. "Testing the surface" is confusingly vague. 76.254.84.64 07:30, 31 October 2007 (UTC)
I find this table entry confusing as well -- It says in the table "A decrease in the attribute value indicates bad sectors", *but* the 'Better' column indicates that a decrease in this number is a *good* thing?? It seems like the arrow should be changed for this table entry. If this is really an indicator of the number of sectors potentially available for reallocation (in the event of a bad sector being detected), then it would make sense that a higher number is better, a decrease is bad. When using applications such as HD Tune, under the 'Health' tab it tells me that my particular drive has 100 as the current count for this field, and that 36 is the threshold. It does not seem to see any problem with this and it is telling me that it is OK -- so it seems to coincide... ChrisTracy (talk) 18:09, 15 May 2008 (UTC)
Temperature sensor
the section on temperature and temperature sensors is opiniated and somewhat incorrect / not up to date. all the hard drives from 1998+ include a temperature sensor. the reason: all modern hard drives use GMR heads (Giant Magneto Resistive heads), which requires very accurate temperature measurement to be able to read the data back (the difference between a 0 and a 1 readback is about the same order of magnitude as a 0.1 degrees Celsius change in the GMR head).
also, the temperature failure mode is not necessarily cumulative.
- my samsung 1999 drive (8 gb) may or may not have a T sensor, in any case it does not report about it in SMART. --145.253.2.236 (talk) 12:57, 12 July 2008 (UTC)
Curious sentence
<partsunkn> SMART is a system used to kill the drive when the warranty is up —Clarknova 03:35, 28 February 2006 (UTC)
Removed the following curious sentence from "working modality".
- Manufacturing companies which claim to support S.M.A.R.T. but withhold specific sensor information on individual products include Seagate, [...]
[...], indeed! What the frip. - 194.89.3.244 17:56, 28 February 2006 (UTC)
- But it's true. They specifically withhold the information. Do the research.
Read Error Rate description incorrect
Elsewhere I have read that a high value for Read Error Rate is good, and the attribute value decreases as read error rate increase.
- That is simply incorrect. It's saying that the more errors the better. That's nonsense. —Preceding unsigned comment added by 90.5.11.225 (talk) 22:42, 1 November 2008 (UTC)
Consistent with this, the two SMART monitoring tools I have used alert the user when the Read Error Rate attribute value falls below a threshold.
This description deserves accuracy and careful explanation perhaps more than any other, since this attribute is so critical.
-- I think it means 'time between read errors'; the smaller the number, the higher the rate, but whether it's seconds, hours, or fortnights, I couldn't begin to guess.
- Perhaps a more logical definition would be 'no. of succesful reads between errors' ? --217.173.195.210 09:23, 14 August 2007 (UTC)
I agree that the description is incorrect. The higher the value the better. This isn't a "rate" per se - it should be regarded more as a score. All values are a max of 255 - most manufacturers see this as a 'percent good' - a value out of 100. —Preceding unsigned comment added by 64.7.157.226 (talk) 21:13, 14 October 2009 (UTC)
On quite some manufacturers, this counter is two 32bit counters concatenated into a 64bit counter, one meaning the amount of read operations, the other the amount of read errors. This means that something like "1 209 029 668" despite being quite large, is a good sign, since it means that your drive did over a billion read operations without error. "5 503 996 964" otoh is quite good too, since it means that out of over a billion read operations, there was only one error. 8720 out of over a billion could look like "37 453 323 850 788" and doesn't really mean much too. Did it all occur within the last few weeks/months? bad. Did it occur over the whole hdd lifetime? Not the best, but no real reason to worry at all. otoh "37 452 114 842 660" looks not too far away, but means 8720 out of 21 540 reads had an error. Really a reason to get rid of that hdd. So in a sense everyone is right. The higher can mean the better, but only when you see it from "higher than some offset that denotes actual amount of errors". So unless you know how to specifically decode that value for your hdd, don't bother making any sense out of that number at all (Who knows, it could even be bbig endian or signed)
Frustration with SMART
I'd like it if the table spelled out what the "good" and "bad" values of the attributes are.
- The general rule is that higher is better than lower, except in the case of temperature. The specific thresholds of "OK" and "failing" are up to the manufacturers to specify. Most of the numbers involved are arbitrary and defined separately by the manufacturers. GreenReaper 16:07, 24 August 2006 (UTC)
- This is definitely not true. Load cycles and such? The higher the better? If that's the case then a drive with a load count of 600,000+ that just failed for good is the healthiest drive you can have. —Preceding unsigned comment added by 90.5.11.225 (talk) 22:44, 1 November 2008 (UTC)
More frustration...
I also feel that nobody tells you some useful (average, maybe IQ 100, no IT degree) human-readable information about your harddisk. Something like "your harddisk /dev/hda is 2.8 years old the probability that it will survive this month is 98% (suggested replacement value: 96%; suggested backup value: 99%)". But I'm confused by 1000 different values. How bad are they really? Where should the values be (see comment above)? It does not really help to make a business decision to replace or not to replace the drive. Can someone please shed some light into this? Can smartmontools developer please think of the CTO's business decision of replacing or not replacing a disk? And some useful information for the home user. THANKS -- Michael Janich 09:15, 31 July 2006 (UTC)
- Most hard disk fail within the first two years, if it doesn't fail the within those years, it is a good idea to keep the hard disk for another 3 years.
If you can plot the failure rate of hard disk, it starts off very high, then it goes to its lowest point around two years, and then slowly climbs back up to the rate at which it started. Hope that helps Hqduong 08:10, 5 December 2006 (UTC)
- I question that. You are saying that most HDDs fail within two years? That's scary. What's worse: it's simply nonsense. Perhaps those that fail will most often fail within the first two years; but that is definitely not what you wrote.
- Google published a study on hard disks that claimed (based on memory, not citation) that 1) only half the disks that failed had something significant in their SMART readouts, and 2) only half the disks with something significant in their SMART readings actually failed. So after all the hoopla, it may not be that useful after all. --Alvestrand 07:39, 19 March 2007 (UTC)
- Disraeli had something insightful to say about this.
- Hard Disk Sentinel software can display information in an understandable way. It gives a textual description about the hard disks, displays the real number of problems found so you can have some ideas about the real status instead of displaying just some numbers/values. Because thresholds + value pairs and T.E.C. dates are not really able to predict hard disk failures, this software uses a completely different method to detect and display real hard disk problems found on IDE/SATA/USB/SCSI hard disks. Works under Windows, DOS, Linux. —Preceding unsigned comment added by 87.229.50.242 (talk) 08:12, 4 June 2008 (UTC)
SMART and RAID
Any idea if SMART can still be used on HDD's included in a RAID array? --Evolve2k 05:06, 7 January 2007 (UTC)
- I have seen some motherboards with hardware RAID support/PCI RAID expansion cards that have a BIOS/firmware capable of retrieving and displaying SMART data. No idea if there's anything out there that lets you do this in software though. SMART is a very mysterious technology IMO. --86.138.51.21 08:20, 26 January 2007 (UTC)
- I'm building a RAID array with four Seagate ST3320620AS (7200.10 320GB) drives in it. Once I get the second pair of drives I'll let you guys know. Using NVIDIA MediaShield on a P5N32-E SLI Plus. I can also confirm that BE is definitely a temperature sensor on that drive, btw. 66.146.62.42 22:23, 10 May 2007 (UTC)
- Under Linux, with software RAID, the individual drives are still accessible, so their SMART data can be retrieved. Jrvz (talk) 00:00, 8 July 2008 (UTC)
Merging in Threshold Exceeds Condition
Since the mergeto tag of the Threshold Exceeds Condition article says to discuss the subject here:
- Merge. My opinion. --Alvestrand 22:16, 13 January 2007 (UTC)
Background
According to the cited google study, SMART can predict about 40-60% of all drive failures, depending on the monitored attributes. The stated 30% taken from some FAQ might be too pessimistic here.
—The preceding unsigned comment was added by Michi cc (talk • contribs) 17:38, 21 April 2007 (UTC).
Attribute list is confusing
Some of the arrows in the attribute list don't appear to be correct. "Power On Hours" is marked with an up arrow--I would think that a *lower* number of operating hours would be considered better, not a higher one. Same thing with calibration retries. It's also not clear in many of the descriptions whether the values being referred to are the raw values, normalized values, worst values, threshold values, or something else, making the table even more unintelligible to someone unfamiliar to SMART. All of this should be made much more clear. ::Travis Evans 11:39, 16 June 2007 (UTC)
- As of today, these issues now appear to be largely improved. ::Travis Evans (talk) 14:38, 16 December 2007 (UTC)
Set Load/Unload cycle count with a down arrow - as when the head unloads/reloads it creates wear on the servo and the read/write head has a possibility of failiure TO load if it isn't loaded or unloaded completely —Preceding unsigned comment added by 64.228.219.208 (talk) 03:59, 11 October 2007 (UTC)
Contradictory statement about higher vs lower
Note that the attribute values are always mapped to the range of 1 to 253 in a way that means higher values are better.
This is then followed by a chart which describes whether it is better to have lower or higher values, seeming to contradict the above sentence. Can someone please clarify or correct? Ham Pastrami (talk) 19:20, 24 November 2007 (UTC)
- I believe that the reason for this apparent conflict is that the chart refers to the “raw” attribute values rather than the “normalized” ones. For normalized values, the statement in the article that higher numbers are “always” better is almost correct (I'll explain why I say “almost” in a moment), but the raw values can follow any rule that the drive manufacturer wants.
- The biggest problem with the article, I think, is that it doesn't explain clearly enough that there are actually several different values involved for each attribute. The chart is totally unclear about it. The chart is also problematic because some of the attributes the chart describes appear to function in a totally different (even the exact opposite) manner on certain drives.
- The statement “...The attribute values are always mapped ...in a way...that higher values are better” also isn't true in the strictest since, because I know of some drives (such as mine) which actually indicate the normalized temperature value directly in Celsius (e.g., a value of 40 means 40°C), which means that for this attribute, higher values are actually worse. This is likely a rare exception, though.
- I may attempt to greatly clarify the article myself some time if I get a chance, but if anyone else wants to do it right now instead, feel free to go ahead and do so. ::Travis Evans (talk) 21:58, 5 December 2007 (UTC)
- Okay, I just edited the Attributes section in the hope that it will now make much more sense. ::Travis Evans (talk) 14:35, 16 December 2007 (UTC)
More info on Selftest specifications please
A good article. You can get SMART drives to initate either short or long self test (managed by the drive itself). But what exactly does the SMART specification reqiure a drive to do during these tests? Robin April 2008
Critical Attributes
The study at Google described in the Background section found four parameters strongly correlated with drive failure. The later table describes the SMART attributes, but even after reading the paper I don't see which attributes correspond to those four critical parameters. Here are the possibilities I see:
name in paper | SMART attribute # | SMART attribute name |
---|---|---|
scan errors | 1 | Read Error Rate |
187 | Reported Uncorrectable Errors | |
201 | Soft Read Error Rate | |
250 | Read Error Retry Rate | |
reallocations | 5 | Reallocated Sectors Count |
offline reallocations | 198 | Offline Uncorrectable Sector Count |
probational counts | 197 | Current Pending Sector Count |
(I note that smartctl calls attribute 198 "Offline_Uncorrectable"[1][2].)
The discussion of the study and/or the table entries should be revised so the correspondence is clear.
Also, the table has six attributes highlighted as "critical". What's the justification for those six, as opposed to the four parameters noted in the Google study? Jrvz (talk) 14:08, 8 July 2008 (UTC)
- Um, since your Q is what the Google authors used, you should probably mail them. Unfortunately, the attributes don't even have the same meaning across disks, e.g. Seagate disks report a large value for raw read errors. Scan errors might be Attribute 7, Seek errors (this is probably related to sector/track not found problems and if nonzero, basically means the disk is mechanically dying); "Read Error Rate" is probably the number of problems found reading a sector, but in most harddisks, this should be named Serious Read Error, and it's pretty cumulative, not a rate. Minor read errors are normal and auto-corrected, some disks tell you their number, like my Atlas reports 80k read errors every boot to smartctl. Attributes 201 and 250 aren't even remotely standard. -- "Offline" reallocations.. well, it's in the smartctl man page you quoted, it's probably a sector that is dead but has not yet been remapped. Should be #198, yes. -- "Probational Counts", I am really guessing here, but it sounds either like the minor read errors or how often a sector was classified as "not so good". The number of write problems are counted in High_Fly_Writes and in Multi_Zone_Error_Rate, IIRC (don't ask me what multi zones got to do with that, and no, it's not a rate either I think). --88.74.187.45 (talk) 10:14, 24 January 2009 (UTC)
---
What is attribute 188?
smartd[2647]: Device: /dev/sda, SMART Usage Attribute: 188 Unknown_Attribute changed from 96 to 100 —Preceding unsigned comment added by 76.119.201.189 (talk) 03:44, 10 July 2008 (UTC)
The "Scan Error" is not a simple "read" error, but a "reallocation sector" error, because they talk of "first reallocation". Furthermore, his raw value is normally zero. I think is #187 (Uncorrectable Sector Errors). According to Google report, Scan Error is the most critical value! Why don't you highlight it in the reference table? --93.148.74.16 (talk) 09:30, 16 February 2011 (UTC)
Isn't "007 Seek_error_rate" critical?
The description seems to indicate it is, but it's not highlighted. (Love the highlighting btw, I wish smartmontools did it too). --82.134.28.194 (talk) 08:24, 30 July 2008 (UTC)
Panterasoft HDD Health not 100% freeware (non-commercial only)
The main page on panterasoft claims HDD Health is freeware, but if you install it and open the help->about it explains it's only free for non-commercial use. There's a help option that takes you to:
http://www.panterasoft.com/orderlnk.php?no=25
Which redirects to a web store selling commercial-use licenses for $29.95
Feel free to verify. —Preceding unsigned comment added by 71.248.110.143 (talk) 14:58, 1 October 2008 (UTC)
Split: Comparison of S.M.A.R.T. tools
This section is getting fairly big and would be easier to maintain away from the main article. If nobody disagrees I will go ahead with it. --Hm2k (talk) 00:17, 30 October 2008 (UTC)
- This has now taken place. Comparison of S.M.A.R.T. tools --Hm2k (talk) 16:22, 30 October 2008 (UTC)
Requested move
- The following is a closed discussion of the proposal. Please do not modify it. Subsequent comments should be made in a new section on the talk page. No further edits should be made to this section.
The result of the proposal was Move Parsecboy (talk) 14:57, 7 November 2008 (UTC)
As per WP:NC, "Titles should be brief without being ambiguous". This technology is almost always referred to as S.M.A.R.T. and almost never as "Self-Monitoring, Analysis, and Reporting Technology". This move would lead to a shorter, more manageable title and less confusion for readers. --Hm2k (talk) 15:19, 30 October 2008 (UTC)
- Support the expanded form is not known to users. 70.55.86.100 (talk) 08:33, 5 November 2008 (UTC)
- The above discussion is preserved as an archive of the proposal. Please do not modify it. Subsequent comments should be made in a new section on this talk page. No further edits should be made to this section.
Start_stop attribute
My Samsung Spinpoint drive (from 1999 or 2000) counts starts AND stops in this attribute. start +=1, stop +=1, simply losing power += 0. This means this attribute is basically useless to tell you something about the usage history of the drive. A SV with Powercycle = 1000 and Start_stop = 1902 (raw) could have been subjected to a lot of spinup and spindowns (maybe from powersaving), or it could simply have had a careful owner who always parked it with the "poweroff" command. I wonder how other harddisks handle this. --92.78.30.160 (talk) 19:56, 19 January 2009 (UTC)
SMART predicts 64% of failures?
The reference is this page, but where is that data from? Is it reputable? Also that page says 30%. Family Guy Guy (talk) 15:49, 30 March 2009 (UTC)
197 C5 Current Pending Sector Count
197 C5 Current Pending Sector Count
- Number of "unstable" sectors (waiting to be remapped). If the unstable sector is subsequently written or read successfully, this value is decreased and the sector is not remapped. Read errors on the sector will not remap the sector, it will only be remapped on a failed write attempt. This can be problematic to test because cached writes will not remap the sector, only direct I/O writes to the disk.
I am going to take out that last sentence, because I think it is wrong, and this is now getting quoted from this article, all over the internet.
I think the sentence is half-right. Only direct I/O writes let you know what happens. With a cached write, I am guessing that it will still potentially try to reallocate a sector on the "waiting" list, but that will happen after the cached write is initiated. If the remapping fails, the write will eventually fail, and the computer will get an error message -- but in some cases, the computer will have already assumed the write was OK.
In other words, with direct I/O you know immediately if there is a problem (as soon as you get a normal "completed" signal). With a cached write, you can't know if the proper remapping happened until after a delay, or after a "sync" or flush of the write cache.
I also question the use of the word "failed" in the previous sentence: "it will only be remapped on a failed write attempt". I question whether all drives would actually try to get a good write to a sector on the "pending" list, or just assume it is bad, and try to reallocate.
I'm just making all this up, based on a general understanding of computers. Someone who knows, and can find a good reference, should make the article more complete and accurate. -96.233.30.237 (talk) 23:02, 9 July 2009 (UTC)
Seagate's Raw Seek Error Rate attribute
I believe that Seagate's raw Seek Error Rate attribute stores the number of seek errors in the uppermost 16 bits, and the number of seeks in the lower 32 bits.
A drive begins life with a cooked value of 253 until it accumulates enough seeks for the data to be statistically significant, after which the cooked value starts off at 60. The cooked value then increases or decrease as errors appear.
The normalised attribute appears to follow a logarithmic pattern:
90% = < 1 error per 1000 million seeks
80% = < 1 error per 100 million
70% = < 1 error per 10 million
60% = < 1 error per million
50% = 10 errors per million
40% = 100 errors per million
30% = 1000 errors per million
I don't have any official confirmation for the above information. It is the result of my analyses of numerous SMART reports.
I have performed several tests in support of my hypothesis. These are described in Google's Usenet archives. I don't know whether they can be considered for inclusion in this article, possibly as references. They are certainly not authoritative, but I believe they will withstand scrutiny.
Edit: http://www.users.on.net/~fzabkar/HDD/Seagate_SER_RRER_HEC.html — Preceding unsigned comment added by 121.44.55.232 (talk) 06:54, 1 September 2012 (UTC)
121.44.138.74 (talk) 07:18, 21 August 2009 (UTC)
After watching my dying ST31000340AS Barracuda 7200.11 drive, I believe that the 6 bytes long field, Seagate's Raw Seek Error Rate Attribute stores the number of seek errors in the lower [0:23] bits, and the number of seeks in the uppermost [24:47] bits, both values in Big Endian. But I could not compute meaningful 'normalized' values, similar to previous poster's values. My seek error number was really high, maybe it never gets reset?
184.99.101.172 (talk) 03:17, 15 June 2011 (UTC)
Raw Read Error Rate
This Seagate forum thread discusses the meaning and behaviour of Seagate's Raw Read Error Rate attribute:
http://forums.seagate.com/stx/board/message?board.id=ata_drives&message.id=8700
It may help dispel fears about the relatively high numbers displayed by Seagate SMART reports. The RRER numbers actually reflect sector counts, not error counts.
I have verified that my Fujitsu drive interprets this attribute in a similar manner, except that its numbers are much smaller.
Edit: http://www.users.on.net/~fzabkar/HDD/Seagate_SER_RRER_HEC.html
SandForce appears to use a similar scheme for their SMART error rates:
http://hddguardian.googlecode.com/svn/docs/Kingston%20SMART%20attributes%20details.pdf
—Preceding unsigned comment added by 121.44.138.74 (talk) 07:52, 21 August 2009 (UTC) (It's amazing how far people will take spite against another manufacturer. Just so you folks know the only other drive competing with seagates are velociraptors,which are faster,but the difference is negligible.Extremely reliable drives with a rarely low failure rate,statistically irrefutable. —Preceding unsigned comment added by 24.17.18.249 (talk) 11:45, 20 April 2010 (UTC)
Seek Error Rate - thermal widening
The article refers to seek errors being the result of "thermal widening" of the platters. My understanding is that this was only an issue for stepper motor drives, or voice coil drives that had a separate servo surface. Modern drives use embedded servos, so the positioner will always be able to find a track, no matter how much the platter expands or contracts.
121.44.138.74 (talk) 08:08, 21 August 2009 (UTC)
Attribute 240
The product manual for Fujitsu MHY2xxxBH series drives identifies attribute 240 as "Transfer Error Rate":
"*If the device receives the reset during transferring the data, the transfer error is counted up."
http://www.msc-ge.com/download/itmain/datasheets/fujitsu/MHY2xxxBH.pdf 121.44.98.124 (talk) 09:19, 22 September 2009 (UTC)
- Thank you! Reliable information about SMART values is extremely hard to come by, this is helpful. -- intgr [talk] 02:50, 20 January 2010 (UTC)
Attributes 185 & 186 (WDC)
The following SMART attribute names were extracted from WDC's wdtler.exe. There are two attributes (185 & 186) that do not appear in the Wikipedia list.
WDTLER 1.03 Copyright (C) 2004-2006 Western Digital Corporation Western Digital Time Limit Error Recovery Utility
http://zacuke.com/files/wdtler.ZIP
Raw Read Error Rate
Throughput Performance
Spin Up Time
Start/Stop Count
Re-allocated Sector Count
Read Channel Margin
Seek Error Rate
Seek Time Performance
Power-On Hours Count
Spin Retry Count
Drive Calibration Retry Count
Drive Power Cycle Count
Soft Read Error Rate
End to End Error Count
Head Stability - attribute 185 ?
Induced Op-Vibration Detection - attribute 186 ?
Reported Uncorrectable Errors
Command Time Out
High Fly Writes
Airflow Temperature
G-Sense Error Rate
Emergency Retract Count
Load/Unload Count
HDA Temperature
Hardware ECC Recovered
Relocation Event Count
Current Pending Sector Count
Offline Uncorrectable Sector Count
UltraDMA CRC Error Rate
Multi Zone Error Rate
Soft Read Error Rate
Data Address Mark Errors
Run Out Cancel
Soft ECC Correction
Thermal Asperity Rate
Flying Height
Spin High Current
Spin Buzz
Offline Seek Performance
Disk Shift
G-Sense Error Rate
Loaded hours
Load/Unload Retry Count
Load Friction
Load/Unload Cycle Count
Load-in Time
Torque Amplification Count
Power-Off Retract Count
GMR Head Amp
Temperature
Head Flying Hours
Read Error Retry Rate
121.44.85.40 (talk) 21:33, 2 October 2009 (UTC)
SMART Transmitters
I'd like to know whether SMART Transmitters used mainly in Oil and Gas plants can be categorized under this technology? I tried lately to search the internet for the meaning by SMART in such transmitters. These transmitters usually support the HART , FOUNDATION. ™ fieldbus, Modbus, and/or Profibus protocols.--Email4mobile (talk) 15:14, 18 October 2009 (UTC)
- SMART's protocols appear ATA & SCSI. However, the general idea is the same. Back in the 80s, I claimed no one would have a PC (that was silly): more useful was firmware to examine & control devices, and a workstation to analyze the data; or both firmware controls & processor embedded in a clothes iron, for example, to tell you when it is fully heated & warn you when it tips over; or an automobile, so mechanics could plug in a computer & analyze the problems. My medical thermometer beeps when it's fully heated. SMART gives us the 'raw' data from specific sensors on ATA & SCSI devices (another technology lets us adjust their speed): but where is the sophisticated analysis software? Also, I had no idea it worked on flashdrives; does it work on ATA optical drive? Geologist (talk) 21:14, 10 November 2009 (UTC)
More WDC SMART attributes
The following attributes were extracted from WD's wdidle3.exe utility, after unpacking it with UPX.
http://www.synology.com/support/faq_images/enu/wdidle3.zip
Raw Read Error Rate
Throughput Performance
Spin Up Time
Start/Stop Count
Re-allocated Sector Count
Read Channel Margin
Seek Error Rate
Seek Time Performance
Power-On Hours Count
Spin Retry Count
Drive Calibration Retry Count
Drive Power Cycle Count
Soft Read Error Rate
SATA Downshift Error Count
End to End Error Det/Corr Count
Head Stability
Induced Op-Vibration Detection
Reported Uncorrectable Errors
Command Time Out
High Fly Writes
Airflow Temperature
Shock Sense
Emergency Retract Cycle Count
Load/Unload Cycle Count
HDA Temperature
ECC on the Fly Count
Re-allocated Sector Event
Current Pending Sector Count
Offline Uncorrectable Sector Count
UltraDMA CRC Error Rate
Multi Zone Error Rate
Soft Read Error Rate
Data Address Mark Errors
Run Out Cancel
Soft ECC Correction
Thermal Asperity Rate
Flying Height
Spin High Current
Spin Buzz
Offline Seek Performance
Disk Shift
G-Sense Error Rate
Loaded hours
Load/Unload Retry Count
Load Friction
Load/Unload Cycle Count
Load-in Time
Torque Amplification Count
Power-Off Retract Count
GMR Head Amp
Temperature
Head Flying Hours
Total LBAs written
Total LBAs read
Read Error Retry Rate
Free Fall Sensor
121.44.79.8 (talk) 22:35, 22 October 2009 (UTC)
Implementations & Analysis
The above is a lot of information; but the value of each is just a raw datum. My Linux implementation doesn't log the rates of change in values, and it evaluates the implication of each individually: 'old', 'may fail soon'. Scientists would perform a cluster analysis of the above values, their rates, & their accelerations with time (using 'rcs', for example). Then they would examine the history of each cluster and label them 'dropped laptop', 'defective from factory', 'very old', 'bad RAM', &c.
Instead of 'may fail soon', I'm surprised we don't have both software & analyses by companies to draw upon, allowing much more informative messages. (What are companies doing with all the above data? They wouldn't collect it if they weren't using it.) I see no reason why companies haven't done this, and administrators of large servers & company LANs don't calculate the % chance of (or mean time to) failure described nicely above.
Also, do LAN administrators write scripts that collect these data during LAN backups? How do they decide when to replace a disk? Are there papers? Are such studies as the above proprietary?
This is a fine article, but these are some of the questions that people who look it up are probably seeking to answer. Geologist (talk) 20:51, 10 November 2009 (UTC)
SSD & CompactFlash SMART attributes
http://www.hsgi.jp/documents/Delkin-Solid-State-SATA-Drive-Engineering-Specification.pdf
ID Attribute 9 Power-On Hours 12 Power On Count 175 Program Fail Count (chip) 176 Erase Fail Count (chip) 177 Wear Leveling Count 178 Used Reserved Block Count (Chip) 179 Used Reserved Block Count (Total) 180 Unused Reserved Block Count (Total) 181 Program Fail Count (Total) 182 Erase Fail Count (Total) 183 Runtime Bad Block (Total) 187 Uncorrectable Error Count 195 ECC rate 198 Off-line Uncorrectable Error Count 199 CRC Error Count
http://www.arrowne.com/solid-state/pdf/STEC%20MACH4%20Datasheet.pdf
SLCFxGM4U(I)(-M) CompactFlash Card
ID Name Description Type 1 Raw Read Error Count of raw data errors while data from media, including retry errors or data error (uncorrectable) Warranty 2 Throughput Performance Internally measured average and worst data transfer rate Warranty 5 Reallocated Sector Count Count of reallocated blocks. In the case of the CF card, this will be count of reallocated or remapped blocks during normal operation from the grown defect table Warranty 9 Power On Hours Number of hours elapsed in the Power-On state Advisory 12 Power Cycle Number of power-on events Advisory 13 Soft Read Error Rate Number of corrected read errors reported to the operating system (SLC = 3 or more bits; MLC =5 or more bits) Advisory 100 Erase/program cycles Count of erase program cycles for entire card Advisory 103 Translation Table Rebuild Power backup fault or internal error resulting in loss of system unit tables Advisory 170 Reserved Block Count Number of reserved spares for bad block handling Warranty 171 Program Fail Count Count of flash program failures Advisory 172 Erase Fail Count Count of flash erase command failures Advisory 173 Wear Leveling Count Worst case erase count Advisory 174 Unexpected Power Loss Attribute counts number of unexpected power loss events Advisory 184 End-to-end error detection Tracks the number of end to end internal card data path errors that were detected Warranty 187 Reported Uncorrectable Errors Number of uncorrectable errors reported at the interface. Advisory 188 Command Timeout Tracks the number of command time outs as defined by an active command being interrupted Advisory 194 Temperature Temperature of the base casting. Advisory 196 Reallocation Event Total number of remapping events during normal operation and offline surface scanning. Advisory 198 Offline Surface Scan # of uncorrected errors that occurred during offline scan. Advisory 199 UDMA CRC Error Number of CRC errors during UDMA mode Advisory
http://www.stec-inc.com/downloads/flash_datasheets/SLMPCIxGM4U_M_61000_05494.pdf
SLMPCIxGM4U-M mPCI-Express IDE Card
ID Name Description Type 1 Raw Read Error Count of raw data errors while data from media, including retry errors or data error (uncorrectable) Warranty 2 Throughput Performance Internally measured average and worst data transfer rate Warranty 5 Reallocated Sector Count Count of reallocated blocks. In the case of the mPCI-Express IDE Card, this will be count of reallocated or remapped blocks during normal operation from the grown defect table Warranty 9 Power On Hours Number of hours elapsed in the Power-On state Advisory 12 Power Cycle Number of power-on events Advisory 13 Soft Read Error Rate Number of corrected read errors reported to the operating system (SLC = 3 or more bits; MLC =5 or more bits) Advisory 100 Erase/program cycles Count of erase program cycles for entire card Advisory 103 Translation Table Rebuild Power backup fault or internal error resulting in loss of system unit tables Advisory 170 Reserved Block Count Number of reserved spares for bad block handling Warranty 171 Program Fail Count Count of flash program failures Advisory 172 Erase Fail Count Count of flash erase command failures Advisory 173 Wear Leveling Count Worst case erase count Advisory 174 Unexpected Power Loss Attribute counts number of unexpected power loss events Advisory 184 End-to-end error detection Tracks the number of end to end internal card data path errors that were detected Warranty 187 Reported Uncorrectable Errors Number of uncorrectable errors reported at the interface. Advisory 188 Command Timeout Tracks the number of command time outs as defined by an active command being interrupted Advisory 194 Temperature Temperature of the base casting. Advisory 196 Reallocation Event Total number of remapping events during normal operation and offline surface scanning. Advisory 198 Offline Surface Scan # of uncorrected errors that occurred during offline scan. Advisory 199 UDMA CRC Error Number of CRC errors during UDMA mode Advisory
http://www.satron.at/pdf/NSSD_25_SATA.pdf
Serial ATA NSSD (NAND based Solid State Drive)
Attribute ID Numbers: Any nonzero value in the Attribute ID Number indicates an active attribute. The device supports following Attribute ID Numbers. The names marked with (*) indicate that the corresponding Attribute Values is fixed value for compatibility.
ID Attribute Name 0 Indicates that this entry in the data structure is not used * 1 Raw Read Error Rate * 3 Spin Up Time * 4 Spin Up Count * 5 Reallocated Sector Count * 7 Seek Error Rate * 8 Seek Time Performance * 9 Power-On Hours 10 Spin Retry Count * 11 LUL Retry Count * 12 Power On Count 184 Buffer CRC Count * 187 Uncorrectable Error Count 188 Command Time-out Error Count * 190 Air Flow Temperature * 191 Shock Count * 192 Emergency Retract * 193 LUL Count * 194 User Temperature * 195 ECC rate 197 Pending Sector Count * 198 Off-line Uncorrectable Error Count 199 CRC Error Count 200 Used Reserved Block Count 201 Program Fail Count 202 Erase Fail Count 203 Wear Leveling Count
121.44.19.141 (talk) 22:28, 25 October 2009 (UTC)
Attributes reseting
Some attribute may be useful to be reset. The "UDMA CRC error" is due to cable issue (damaged, bad shield, bad PSU voltage ...). When replaced cable or moved HDD into another computer, we should reset that attribute. "Smart" idea, isn't it?
But, seems there is no way to reset any SMART attributes as easy we would.
1) Please add here other obvious examples on SMART attributes they are useful to be reset when working environnment change. 2) Please tell about utilities they can reset them or use other hint (updating firmware, fulling platter by zeros, ...)
I insist there are bad/malhonest reasons to reset some attributes: ie the "total power on counter" and so on... I do not ask how to reset them ones.
Kind regards, LaPeche35, France —Preceding unsigned comment added by 213.56.248.115 (talk) 10:04, 18 November 2009 (UTC)
Dear LaPache35 - You can reset SMART using specialized tools like HD Doctor. Most of those tools (if not all) are hardware-based which send custom vendor commands to put HDD in Service mode that allows access to servo layers.
Sorry to tell you, but SMART is part of servo manufacturer part of the disk and without putting HDD into service mode - no program will give you access, because HDD itself will not give you access to servo.
If you are not familiar with such tools, you do not have specialized hardware & software. I doubt you will simply reset SMART. Except ofcourse you will somehow send vendor command for SuperAdmin to the HDD and put it in service mode - then you can do it. 95.48.133.106 (talk) 20:51, 20 May 2013 (UTC)
Attribute descriptions copied directly from other sources.
This article may contain text that is copied verbatim from other sources. For example, the description of the "Reallocated Sectors" attribute is identical word-for-word with the description at [3] (archived at [4]). I do not know if that page copied from the wikipedia article, or the other way around, or if both copied from the same other source. --24.190.224.244 (talk) 17:23, 18 February 2010 (UTC)
Deleted wrong citation
I've deleted the below sentence from the article. Though it's informative if true, but the cited page doesn't corroborate it:
Approximately 64% of failures can be predicted by S.M.A.R.T.<ref>[http://smartlinux.sourceforge.net/smart/faq.php?#2 How does S.M.A.R.T. work?]</ref> —Preceding unsigned comment added by 123.222.33.67 (talk) 06:46, 26 March 2010 (UTC)
Windows Software
Is there any good Windows software to access S.M.A.R.T.? I've found the linux smartd/smartctl tools to be very useful, but such things aren't for everyone. —Preceding unsigned comment added by 142.179.217.154 (talk) 01:21, 28 July 2010 (UTC)
- Smartmontools are available under Windows as well (comfortably with Gsmartcontrol). You need to install some GTK package, I think. Speedfan is another tool that reads Smart Data. Lavalys' Everest does too. And here is a whole list of them: Comparison_of_S.M.A.R.T._tools --Echosmoke (talk) 02:34, 19 January 2011 (UTC)
- HDD Guardian is another tool that provides a Windows GUI for smartctl:
- http://code.google.com/p/hddguardian/ — Preceding unsigned comment added by 121.44.55.232 (talk) 07:07, 1 September 2012 (UTC)
Accessibility and readility improvements
I've just made a few changes according to this discussion about readability / visibility concerns for some graphics. I've also made a few accessibility improvements and simplified the syntax at the same time. The down icon doesn't show up just yet because of a large-scale software bug. Hopefully it will be fixed soon; we shouldn't remove it for that motive in the meantime. Yours, Dodoïste (talk) 21:38, 22 August 2010 (UTC)
- The image problem is solved now. :-) Dodoïste (talk) 15:05, 23 August 2010 (UTC)
- I think the up/down arrows should be different colors. Right now when you scroll across it, it's just a long trail of triangles. -- intgr [talk] 20:11, 23 August 2010 (UTC)
Cheking the SMART
Nowhere in the article is said how to check/read the information stored in the SMART ;). Please somebody expand the article. --Leonardo Da Vinci (talk) 10:41, 7 December 2010 (UTC)
- This is usually done with smartmontools and smartctl. See Comparison of S.M.A.R.T. tools.-96.233.20.116 (talk) 03:04, 30 April 2012 (UTC)
Attribute values, raw values, and thresholds
I'm looking at both a Western Digital and a Hitachi hard drive right now, and it seems clear that manufacturers have broad leeway with the SMART data displayed.
Some "real" values appear in the "raw value" field, such as Reallocated_Sectors_Ct, and the "value" field is false (displaying "100"). Some "real" values appear in the "value" field, such as Temperature, and the "raw value" is false or at least meaningless (displaying 159 billion +/- a few).
I suspect that the "Threshold" value for some attributes is a number that should not be exceeded (e.g. Reallocated_Sector_Ct), and for other attributes is a number that should always be exceeded (e.g. some kind of percentage-of-original-performance number).
So, to get useful information, I suggest looking at the "value" field to see if it's likely invalid (e.g. a round number like 100) then considering the "raw value" field, then deciding if for the attribute you are looking at if a lower-than or greater-than threshold makes the most sense.
And to get the very best information, the data probably should be gathered continuously over time so that sudden changes can be noted. --Scalveg (talk) 23:19, 17 May 2011 (UTC)
SSD SMART attributes
Seems like the table should reflect some attributes used by SSDs. Some key ones on my Intel SSD seem to be:
- media wearout indicator -- % of drive lifetime left based on number of writes/number of rated writes
- host writes count -- number of sectors written from system's perspective, divided by 65536 (it seems)
- available reserved space -- % of reserved space (probably drops rapidly near end of drive life)
All that and more is in Intel's manual. And other controller makers must have their own.
There are also places that the article's wording, and SMART's, are wrong for SSDs -- e.g., smartctl is telling me about "spin-up time" and "number of attempts to compensate for platter speed variations" which clearly don't literally apply to the drive. Presumably we just don't care. Maybe it's worth a sentence or two on SSDs' failure mode (flash wearout, flaking controller logic) anyway. 173.164.250.233 (talk) 18:02, 1 June 2011 (UTC)
- Intel's own SSD documentation refers to "Spin Up Time" and states that it "Reports a fixed value of zero (0)".
Footnote #2 URL is invalid
Footnote #2 URL on Google.com is returning 404-not found. It is the article about the disk wear analysis. — Preceding unsigned comment added by Bmomjian (talk • contribs) 00:47, 29 December 2011 (UTC)
Different sections for SMART attributes
I think is better to differentiate SMART attributes in more than a single table, because each vendor have their attribute name (some times at the same ID of another vendor) and a different meaning. This situation is more marked now for the introduction of SSDs. I propose a table for the classical HDDs and a table for each SSD manufacturer. A sample for these pages can be get here for classical HDDs, and then for SSD manufacturers Indilinx, Intel, JMicron, Micron, Samsung, SandForce, Satron, SMART and STEC.95.232.243.104 (talk) 18:20, 17 January 2012 (UTC)
When is SMART support mandatory?
I was wondering this, and found a bit of info by skimming "SATA 1.0 NCITS 397-2005 (vol 1)" which is linked off AT Attachment. It seems that SMART support is optional for most devices (drives and controllers) that are PATA and SATA compliant - it seems that SMART support is required by those that support the PACKET feature, but otherwise support for the SMART feature set is optional. I thought I't ask for feedback on my skimming, and some info regarding what devices typically support PACKET. (Personal note: I bought a SATA controller and was disappointed to find that the drivers provided for Windows and MacOS don't support SMART, while the one for Linux does. An amusing contrast to the usual situation!)--Elvey (talk) 00:34, 29 February 2012 (UTC)
Run Out Cancel
Can someone confirm the description given by the article for attribute "203 Run Out Cancel" is correct? This link seems to suggest it's wrong: http://www.pcreview.co.uk/forums/smart-attribute-203-run-out-cancel-t3993570.html — Preceding unsigned comment added by 24.212.210.116 (talk) 06:01, 24 August 2012 (UTC)
- Agreed, I will remove it for the time being. This article seems to include quite a lot of assumptions and original research. Sadly there are almost no authoritative sources about SMART attributes. -- intgr [talk] 10:06, 24 August 2012 (UTC)
Corresponding SMART attributes for the Google study
The referenced Google paper only describes the SMART attributes descriptively but does not give the actual codes. Most of the corresponding codes are obvious and i have added them to the article, except for the two reallocation codes- does anyone have any further info on these (or feel like emailing the authors to confirm?) 121.45.215.68 (talk) —Preceding undated comment added 03:13, 1 December 2012 (UTC)
Load Cycle Count
The 0xC1 value states: "Many Linux installations write to the file system a few times a minute in the background".
Shouldn't this state something like Most systems access the file system of the OS installation […]. It is as much as Windows as Linux and other systems issue. E.g. this rather entertaining thread.
There is also the case of cache miss, laptop mode etc. on Linux where dirty pages are flushed to disk before it goes idle and cache can be set to a higher value so that write to disk only happens every N minute. Off course giving the drawback one can loose N minutes of work.
Warumwarum (talk) 04:09, 6 March 2013 (UTC)
- This "Linux" bit is probably a reference to this saga -- it's not because Linux writes to the disk too frequently, but that some HDDs come with default power management settings that park the heads too eagerly. As I understand, the defaults were overridden by Windows under some conditions making Windows unaffected. But I could be wrong. -- intgr [talk] 16:55, 6 March 2013 (UTC)
- Yes, I'm not to sure about this either but it seems to me it can be a bit misleading. There can be a gazillion reasons for disk activity not only by the OS but various software. Anti-virus for one can be a beast. However in e.g. laptop mode writes to disk does not actually get written to disk immediately, but marked as dirty memory and written by timed intervals. Would believe Windows and others has same type of feature. Read, of course, always gives disk activity unless files are cached in memory. Unused memory is wasted memory etc. so Linux, and guess most others, cache as much as possible in RAM. Windows process-viewer has however displayed this as available memory, under Linux this varies between tools but they are always distinguishable by separate values.
- So yes, the issue is perhaps turned on its head. Some disks have an over eager park policy that also on some models disregard activity time span and park regardless. It seems to me, tho this is not a help-channel, a more correct phrasing would be warning/notice about this issue regardless of OS. More research to identify either case, if patches are present, etc. is perhaps needed – but then it'll soon grow to a own article or own section size of information. Hopefully someone with authority on the subject read this one day :P . Warumwarum (talk) 22:25, 7 March 2013 (UTC)
Concerning the guy's name being placed into the article
This edit is not appropriate without a third-party reliable source. His personal blogspot blog or an open wiki or a Google+ account, these are all self-published sources and don't cut it. Furthermore, if he made it while at Red Hat, Red Hat made it. Even if he made it completely on his own, if there aren't third-party reliable sources showing that this is relevant to the article then his name doesn't need to me mentioned there, especially in such a promotional way that it includes a link to his website in the body of the article. If this individual's contributions are so important, reliable sources will show this. If they don't, then his name doesn't need to be mentioned just because the software is; that's what wikilinks are for, to show more information about the relevant software. - Aoidh (talk) 20:40, 6 July 2014 (UTC)
David Zeuthen and GNOME Disks and S.M.A.R.T.
GNOME Disks states Developer(s) is David Zeuthen. Will you remove his name there ?Xb2u7Zjzc32 (talk) 00:11, 7 July 2014 (UTC)
09 0x09 Power-On Hours (POH)
Please add information about the range of normal/healthy values -- in particular, for Power-On Hours and Loaded Hours, Head Flying Hours, etc.
- "9 Power-On Hours Count Estimated remaining lifetime, based on the time a device was powered on. The normalized value decreases over time, typically from 100 to 0. The Raw value shows the actual powered-on time, usually in hours."
- "By default, the total expected lifetime of a hard disk in perfect condition is defined as 5 years (running every day and night on all days). This is equal to 1825 days in 24/7 mode or 43800 hours." www.hdsentinel.com/help/en/54_pot.html
Until we can find real numbers, I'd suggest that under 1000 hours is quite young, past 10,000 hrs is mature-aging, and past 50,000 hrs getting very old (absent any other indications of ill health).
The key source:
- Failure Trends in a Large Disk Drive Population
(Eduardo Pinheiro, Wolf-Dietrich Weber and Luiz Andr´e Barroso Google Inc., 1600 Amphitheatre Pkwy, Mountain View, CA 94043)
says that for the first year of continuous use a disk drive is "young" and fails less, and after the first year the drive is "old" and tends to fail at about the same annual rate, year after year, for at least the first five years. -96.233.19.191 (talk) 13:38, 10 July 2014 (UTC)
Lack of common interpretation
This entire section has been lifted verbatum from LSoft Technologies's webstite (WP:CFAQ) (WP:NOR). Slatedorg (talk) 18:36, 29 October 2014 (UTC)
- @Slatedorg: I did some archeology on this and I believe you are mistaken. Most of this section dates back to February 2006. In August the sentence "Although an industry standard amongst most major hard drive manufacturers [...]" was added. Later in December the statement on Wikipedia about Compaq gets removed. Since this information was being incrementally edited on Wikipedia and both of these changes are also present on ntfs.com, it's much more likely the website copied it from Wikipedia instead. At that time, the Wikipedia section title was "Standards and Implementation" too, like you see on ntfs.com. The first archive.org copy from ntfs.com dates back to May 2008, much later than these changes were made on Wikipedia. -- intgr [talk] 22:45, 29 October 2014 (UTC)
Sticks and cards?
Is there SMART for simple Flash-memory like USB-Sticks or Memory-Cards? Next Question: Why not? --Itu (talk) 14:57, 20 May 2015 (UTC)
- Hello! Some of them seem to have S.M.A.R.T. capability, for example this USB flash drive. Why the majority doesn't? Well, perhaps because the manufacturers have to keep the prices as low as possible for such "expendable" devices, and adding S.M.A.R.T. capability costs money. However, please note that Wikipedia isn't a forum, so questions like this one might not be best suited for talk pages. — Dsimic (talk | contribs) 18:48, 20 May 2015 (UTC)
- Thanks for answering. That's not a forum question, it's of course in the scope of the article and i still not really understand the cost issue, since the normal controller has to log the state of wear leveling. --Itu (talk) 23:52, 20 May 2015 (UTC)
- You're welcome. Unfortunately, I'm having troubles with providing references that would back my assumptions about cost being the main reason, but a quote from this paper might be a good starting point:
- In addition, an FTL must be implementable on the flash controller; while SSDs may contain 32-bit processors and megabytes of RAM, allowing sophisticated algorithms, some of the USB drives analyzed below use 8-bit controllers with as little as 5 KB of RAM.
- While it's possible to do small miracles with such microcontrollers, miracles and cheap high-volume products usually don't go together. :) For example, just testing the compatibility of a S.M.A.R.T. implementation would cost quite a lot. Also, manufacturers probably don't see the whole thing as a selling point for their USB flash drives, so they don't bother – I'd bet that even a vast majority of hard disk drives never see S.M.A.R.T. queries beyond what system firmware does at startup (excluding the HDDs used in servers and storage systems, of course). Why should manufacturers pay for implementing something in USB flash drives that only 1% (maybe 5%, but certainly not 80%) of end users is actually going to use? — Dsimic (talk | contribs) 02:47, 21 May 2015 (UTC)
- You're welcome. Unfortunately, I'm having troubles with providing references that would back my assumptions about cost being the main reason, but a quote from this paper might be a good starting point:
Hitachi SMART attributes
The following attribute names were extracted from Hitachi's "HDD Firmware Update Tool for ATA Hard Disk Drives":
http://www.mcetech.com/optibay/utilities/HDDFT10.iso.zip
Raw Read Error Rate
Throughput Performance
Spin Up Time
Start/Stop Count
Reallocated Sector Count
Seek Error Rate
Seek Time Performance
Power-On Hours Count
Spin Retry Count
Device Power Cycle Count
G-Sense error rate
Power Off Retract Count
Load Cycle Count
Temperature
Reallocation Event Count
Current Pending Sector Count
Uncorrectable Sector Count
CRC Error Count
Spindle Running Current
SSM Error Count
Gross Seek Error Count
Gross Load Error Count
Gross SpinUp Error Count
Unexpected Error Count
Unlock/Mis Read Count
Disk Shift
G-Sensor Error Rate
Loaded Hours
Load Retry Count
Load Friction
Load Cycle Count
Load-in Time
Torque Amplification Count
Power-Off Retract Count
GMR Head Amplitude
Drive Temperature
121.44.60.234 (talk) 20:05, 21 May 2015 (UTC)
Explain ATA SMART attributes
Please, a little bit more explanation on the part of the ATA SMART attributes would do the article good. For instance, "and a worst value, which represents the lowest recorded normalized value.". What is the precise meaning of that? The lowest recorded normalized value _ever recorded by any such device_, or the lowest recorded value by the particular device in question? And recorded by whom? The drive manufacturer?
I think even drive manufacturers don't have a real clue about that. E.g. my Western Digital WD5000BEVT reports "11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0", which would be nonsense. The drive didn't have calibration retries, but if the threshold is 0, then no calibration retries must ever occur, the very first one would would make the drive FAIL. That would be consistent with the worst being 253, because 253 is the maximum. But how can it be that a drive with raw value of 0 is rated "100"? A drive with 0 calibration retries would supposedly be better than one with the worst value recorded... Also, "a normalized value, which ranges from 1 to 253 (with 1 representing the worst case and 253 representing the best) and a worst value, which represents the lowest recorded normalized value. Depending on the manufacturer, a value of 100 or 200 will often be chosen as the initial normalized value" is contradictory. When talking about singular events of failure like calibration retries, shouldn't a drive with zero of those be at the best normalized value, thus a 100 or 200 as initial value does not make sense? Confusing :-) 46.244.189.3 (talk) 10:08, 10 January 2016 (UTC)
- You're right that even drive manufacturers don't have a real clue about that. :) — Dsimic (talk | contribs) 18:15, 10 January 2016 (UTC)
Add column 'Critical' to table
There is no possibility to sort the table Known ATA S.M.A.R.T. attributes by the value Critical in the current version, so I added the column Critical. I'd like to insert this into the current table. Looks like this:
--Soluvo (talk) 19:59, 1 February 2016 (UTC)
- Seems like no one has anything against it. I'll add the colum to the table. --Soluvo (talk) 10:04, 3 February 2016 (UTC)
- @Soluvo: I took the liberty of removing the copied table from this talk page. In the future, feel free to follow WP:BOLD if you want to make changes to articles.
- I don't have anything against the column in particular, but a major problem with this table is that it's largely original research or poorly sourced. Who says that these attributes are critical? And even when citations are used, it doesn't take into account that vendors have somewhat differing implementations, especially what the "raw" field value contains; that's the problem with using so many primary sources. Strictly per WP:OR, WP:V and WP:RS, the whole table should be deleted. -- intgr [talk] 10:36, 3 February 2016 (UTC)
- @Intgr: Though I cannot verify that these attributes are actually critical, you now have at least the option to sort them by this attribute. --Soluvo (talk) 19:53, 9 February 2016 (UTC)
Table layout improvement?
The table is very cramped, but full of wasted space rather than information. The Description column needs as much space as it can get. I would suggest:
- get rid of the separate decimal and hex columns; there is plenty of room for them in one column with a break.
- Get rid of the "Critical" (nothing/yes) column; it is redundant, as critical parameters are all pink. This would prevent sorting on Critical; and I don't know how that affects users who rely on screen voice readers. If the column is left in it could have a shorter heading to keep it narrow. I've just notice that this column was added specifically to support sortability. At least the heading should be narrowed; maybe <small>Crit</small>?
- The "Better" column could be made narrower; my breaking header into Bet/ter is probably not the best way. The table would look like this; compare with the article as it stands today (ignore the content, slightly modified to illustrate layout only):
ID | Attribute name | Bet ter |
Description |
---|---|---|---|
04 0x04 |
Start/Stop Count | A tally of spindle start/stop cycles. The spindle turns on, and hence the count is increased, both when the hard disk is turned on after having before been turned entirely off (disconnected from power source) and when the hard disk returns from having previously been put to sleep mode. | |
05 0x05 |
Reallocated Sectors Count | Count of reallocated sectors. When the hard drive finds a read/write/verification error, it marks that sector as "reallocated" and transfers data to a special reserved area (spare area). This process is also known as remapping, and reallocated sectors are called "remaps". The raw value normally represents a count of the bad sectors that have been found and remapped. Thus, the higher the attribute value, the more sectors the drive has had to reallocate. This allows a drive with bad sectors to continue operation; however, a drive which has had any reallocations at all is significantly more likely to fail in the near future. While primarily used as a metric of the life expectancy of the drive, this number also affects performance. As the count of reallocated sectors increases, the read/write speed tends to become worse because the drive head is forced to seek to the reserved area whenever a remap is accessed. If sequential access speed is critical, the remapped sectors can be manually marked as bad blocks in the file system in order to prevent their use. | |
06 0x06 |
Read Channel Margin | Margin of a channel while reading data. The function of this attribute is not specified. | |
193 0xC1 |
Power-off Retract Count, Emergency Retract Cycle Count (Fujitsu), or Unsafe Shutdown Count fake entry | — | (Vendor specific raw value.) Rate of seek errors of the magnetic heads. If there is a partial failure in the mechanical positioning system, then seek errors will arise. Such a failure may be due to numerous factors, such as damage to a servo, or thermal widening of the hard disk. The raw value has different structure for different vendors and is often not meaningful as a decimal number. |
Pol098 (talk) 14:25, 5 April 2016 (UTC)
- I agree that critical column is superfluous given that critical entries are already indicated in pink. -- ChamithN (talk) 15:23, 5 April 2016 (UTC)
- I added some relevant comments on rotated (vertical) column headings to save space to Template talk:Wikitable#Width bug?. Letters displayed unrotated but arranged vertically work properly, but there's a bug (error) in the code used to display rotated column headings—I know this isn't very clear, but a glance at the linked Template Talk will show what I mean. Pol098 (talk) 13:00, 21 July 2016 (UTC)
Revamping this article 161004
Guys, I am new to wiki, so please be gentle...
I could not help myself and modified a couple of the attributes because they they indicating that a device was failing or about to fail simply because the attribute counted. This was simply no representative of today's technology.
I am working in an industry standards organization the includes 100% of the HDD industry and 70% of the SSD industry. We are creating a technical report that includes SMART attributes and definitions that are common to the industry. It will take us about 6 months to complete the project.
The wiki current table has a combination of observed definitions from source code, monitoring tools, and some users manuals. After reading several of the definitions, it is clear to me that these definitions do not match current industry usage very well. In addition to the basic attribute definition, there is also marketing information and in some cases, old operational information that does not apply to today's technology. I think this needs to be removed, or moved to another part of the article.
I would like to restructure the information a bit to show common usage and agreed definitions or at least clearly differentiate common definitions from vendor specific ones.
Comments?
CodeMastadon (talk) 00:46, 5 October 2016 (UTC)
Sandforce SSDs
Per: [5] It appears that Smart Attribute 198 "Uncorrectable Sector Count" in SandForce SF-2xxx based controllers is potentially misleading to most SMART software. Per the above link attribute 198 is fixed at 120 until a sample size of between 10^10 and 10^12 'BitsRead' is achieved, at which point it reports a 'normalized value' between 38-120. And it resets at power cycle to boot.