Tag Archive: intel chips



Surya R Praveen Win 8 crash

Researchers working at Microsoft have analyzed the crash data sent back to Redmond from over a million PCs. You might think that research data on PC component failure rates would be abundant given how long these devices have been in-market and the sophisticated data analytics applied to the server market — but you’d be wrong. According to the authors, this study is one of the first to focus on consumer systems rather than datacenter deployments.

What they found is fascinating. Thefull study is well worth a read; we’re going to focus on the high points and central findings. There are two limitations to the data collected that we need to acknowledge. First, the data set we’re about to discuss is limited to hardware failures that actually led to a system crash. Failures that don’t lead to crashes are not cataloged. Second, the data presented here is limited to hardware crashes, with no information on the relative frequency of software to hardware crashes.

CPU overclocking, underclocking, and reliability

When it comes to baseline CPU reliability, the team found that the chance of a CPU crashing within 5 days of Total Accumulated CPU Time (TACT) over an eight month period was relatively low, at 1:330. Machines with a TACT of 30 days over the same 8 months of real-time have a higher failure rate, of 1:190. Once a hardware fault has appeared once, however, its 100x more likely to happen again, with 97% of machines crashing from the same cause within a month.

Overclocking, underclocking, and the machine’s manufacturer all play a significant role in how likely a CPU crash is. Microsoft collected data on the behavior of CPUs built by Vendor A and Vendor B (no, they don’t identify which is which). Here’s the comparison chart, where Pr[1st] is the chance of the first crash, Pr[2nd|1] the chance of a second subsequent crash, Pr[3rd|2] the chance of a third failure. In this case, overclocking is defined as running the CPU more than 5% above stock.

Surya R Praveen AMD-vs-Intel

Are Intel chips just as good as AMD chips? At stock speeds, the answer is yes. Once you start overclocking, however, the two separate. CPU Vendor A’s chips are more than 20x more likely to crash at OC speeds than at stock, compared to CPU Vendor B’s processors, which are still 8x more likely to crash. The report notes that “After a failure occurs, all machines, irrespective of CPU vendor or overclocking, are significantly more likely to crash from additional machine check exceptions.” The team doesn’t break out overclocking failures by percentage above , but their methodology does prevent Turbo Boost/Turbo Mode from skewing results. Does overclocking hurt CPU reliability? Obviously, yes.

So what about underclocking? Turns out, that has a significant impact on CPU failures as well.

Surya R Praveen

As you can see, underclocking the CPU has a significant impact on failure rates. The impact on DRAM might seem puzzling — the researchers only reference CPU speed as a determinant of underclocking, rather than any changes to DRAM clock rate. Our guess is that the sizable impact on DRAM is caused by a slower CPU alone rather than any hand-tuning of RAM clock, RAM latency, or integrated memory controller (IMC) speed. IMC behavior varies depending on CPU manufacturer and product generation in any case, while the size of the study guarantees that a sizable number of Intel Core 2 Duo chips without IMCs would still been part of the sample data.

Laptops vs. desktops, OEM vs. white box

Ask enthusiasts what they think about systems built by Dell, HP, or any other big brand manufacturer, and you aren’t likely to hear much good. Actual data proves that major vendors actually have fewer problems than the systems built by everyone else. The researchers identified the Top 20 computer OEMs as “brand names” and removed overclocked machines from the analysis of the data. Only failure rates within the first 30 days of TACT were considered among machines with at least 30 days of TACT. This is critical because brand name boxes have an average of 9% more TACT than white box systems, which implies that the computers are used longer before being replaced.

Surya R Praveen Brand-vs-whitebox

White box systems don’t come off looking very good in these comparisons. CPUs are significantly more likely to fail, as is RAM. Disk reliability remains unchanged.

How about laptops? The researchers admitted that they expected desktops to prove more reliable than laptops due to the rougher handling of mobile devices and the higher temperatures such systems must endure. What they found suggests that laptop hardware is actually more reliable than desktop equipment, despite the greater likelihood that mobile systems will be dropped, sat on, or eaten by a bear. Again, overclocked systems were omitted from the comparison.

Surya R Praveen Desktops-vs-Laptops

Desktops don’t come off looking very good here despite their sedentary nature. The team theorizes that the higher tolerances engineered into the CPU and DRAM, combined with better shock-absorbing capabilities in mobile hard drives may be responsible for the lower failure rate. The difference between SSDs and HDDs was not documented.

More data needed

The limitations of the study are such that we can’t draw absolute conclusions from this data, but they suggest a need for better analysis tools and indicate that adopting certain technologies, like ECC, would help improve desktop reliability. It’s one thing to say that overclocking hurts CPU longevity; something else to see that difference spelled out in data. The impact of underclocking was also quite surprising, this is the first study we’re aware of to demonstrate that running your CPU at a lower speed reduces the chance of a hardware error compared to stock.

The Microsoft team conducted the research as one step towards the goal of building operating systems and machines that are more tolerant of hardware faults. The fact that systems which throw these types of errors are far more likely to continue doing so strikes at the idea that such problems are random occurrences, as does much of the reliability information concerning DRAM.

The report throws doubt on a good deal of “conventional” wisdom and implies reliability is rather sorely lacking. More data is needed to determine why that is, and to correct the problem.

Source


Surya R Praveen CPU heatmap

Quanta, the largest contract manufacturer of laptops in the world, has sued AMD alleging that the CPU manufacturer sold it defective processors for use in NEC systems. According to court documents, AMD’s chips failed to meet heat tolerances and were unfit for use.

The company’s filing states that “Quanta has suffered significant injury to prospective revenue and profits.” The company seeks a jury trial with damages and also claims thatAMD breached its warranty terms, negligently misrepresented the products in question, engaged in civil fraud, and interfered with the execution of a contract.

AMD, for its part, is viewing the situation with a rather dubious eye. “AMD is aware of no other customer reports of the alleged issues with the AMD chip that Quanta used, which AMD no longer sells,” said AMD spokesperson Mike Silverman. “In fact, Quanta has itself acknowledged to AMD that it used the identical chip in large volumes in a different computer platform that it manufactured for NEC without such issues.”

There are a few factors that make this story a bit odder than your usual he said/she said. It’s certainly possible that AMD built a defective batch of chips, shipped those chips to Quanta, and that Quanta used said processors in a particular run of NEC products. This would explain why the failure is isolated to a single group of machines. AMD and Intel do extensive defect testing, so whatever the flaw was, it’d be something that slipped past standard reviews.

 

Surya R Praveen Ghetto oven
But probably not in this oven. 

This is where Quanta’s allegations make less sense. Heat tolerance testing is a bog-standard part of the validation process for both AMD and Intel. Chips are extensively tested, including thermal tests. Repeated heating and cooling cycles can cause solder joints to break if they’re formed incorrectly, as happened to Nvidia in 2008, but Quanta’s allegations don’t mention this sort of problem.

Even once a CPU is in a product, there are extensive safeguards in place to ensure it doesn’t fry itself. Both AMD and Intel chips contain internal sensors that will first throttle, then deactivate a processor to prevent an overheat. CPUs, meanwhile, are pretty darn robust. Both Intel and AMD have confirmed in various conversations over the years that they target at least a seven-year life span — and that’s seven years of work at high temperatures.

It’s possible that AMD is trying to weasel out of paying Quanta, but based on what we know about microprocessor manufacture and validation, it doesn’t seem very likely. Laptop manufacturing margins are pretty thin to start with; it’s much easier to believe that Quanta went for an underperforming cooling solution than to argue that a particular batch of defective chips passed validation, ended up in a single NEC product, and then caused problems.

Source


Surya R Praveen Grim Reaper... slashes hard drive warranties...

The flood waters are receding and manufacturers are bringing factories back online, but the major HDD distributors haven’t issued much new guidance as far as when they expect to be fully operational again. The big news this past week came from Intel, which lowered its guidance on Q4 results by a billion dollars as a result of the HDD shortage.

AMD, however, claims to see no such problem. Speaking to MarketWatch, AMD CEO Rory Read stated that HDD supplies were “going pretty well” and that “in 1Q and 2Q, maybe you see some manifestations [of shortages,] but I wouldn’t bet against the supply chain. They’re very resilient.” Investment group Nomura also claims that Intel’s problems are bigger than the hard drive shortage, writing “we think weak sell-through is also contributing to the $1 billion shortfall. We see softness in China, continued demand for ARM-based more power-efficient devices, and low volumes for ultrabooks.”

AMD’s lack of trouble (assuming Read was being honest) may have more to do with the preparedness of the company’s primary vendors than any obfuscation on Intel’s part. As for Nomura’s claims, Intel’s ultrabook push is still in its infancy, the company has set an aggressive target for next year’s sales, but has made no claims about 2011. ARM chips aren’t replacing Intel chips in notebooks, and tablets aren’t eating into notebook sales.

So what is going on in the HDD market? Not much, though there’s a major short-term caveat to that statement. Here’s what prices look like.

Surya R Praveen HDD Prices

The Samsung Spinpoint F3 (in red above) is currently $149 at Newegg, but the site is offering $40 offwith the use of promotion code EMCJHJE29. That brings the 1TB down to $109, and while that’s still nearly twice the price such drives were going for this past fall, it’s much cheaper than any other option we’ve seen in months. This deal expires on December 21, so if you plan to use it you’d best do so quickly.

Aside from the Spinpoint deal, prices have generally held steady. The 2TB Caviar Black is back down to $249 after rising as high as $279 just before Thanksgiving. Seagate’s 1TB Barracuda has fallen to $129, while the other drives haven’t budged. The Hitachi Touro Desk Pro saw a modest price increase up to $129, but the external USB 3.0 2TB drive remains a much better deal than any of its internal counterparts.

Surya R Praveen Drive prices

We’ve supplemented our previous chart with a graph to show how current prices compare with historical levels. The VelociRaptor 600GB and Hitachi Touro Desk Pro are the only two drives whose prices aren’t running at least 50% higher than back in September. At present, analysts continue to predict that drive shipments will impact prices well into 2012 with levels only slowly returning to pre-flood points.

Seagate, Western Digital slash drive warranties

Both Seagate and WD win this year’s “Suspicious Timing” award for drastically slashing HDD warranties in the wake of the Thailand floods. In WD’s case, there’s at least an obvious reason: The company has stated it expects to take at least a $275 million hit against its Q4 results as a result of the flooding. Seagate, however, is the manufacturer least affected by the floods and the one expected to see the greatest revenue gain as a result of the shortages.

There’s also a difference in scope. Western Digital cut warranties on the lower end of its consumer products — Caviar Blue, Green, and Scorpio Blue drives will carry two-year warranties as of January 2, down from three years previously. Warranty lengths on Caviar/Scorpio Black drives and VelociRaptor drives remains at five years.

Seagate, on the other hand, is cutting warranties on its Constellation and Momentus XT families (down to three years, from five). The real slash is aimed at the Barracuda family, where the warranty period will now be just one year (down from five) as of December 31.

Surya R Praveen Thailand floodsBoth manufacturers have cut warranties before and hid behind vague explanations when they did it; Seagate currently claims that this move is being made “to be more consistent with those commonly applied throughout the consumer electronics and technology industries.”

The reality is simpler: This is a decision to protect profit margins by cutting down on RMA costs. In Western Digital’s case, it’s at least somewhat understandable. Seagate is further maximizing its already excellent market position by cutting product costs. Both companies deny that the HDD floods have anything to do with this decision.

There’s an alternative, somewhat darker explanation for this behavior. It’s possible that Seagate and WD have slashed warranties because they’re worried about the reliability of the drive components they’re currently using. HDD component manufacturers were also swamped by the flooding; the drive motor company Nidec was particularly affected.

These circumstances created an enormous gray market for HDDs and their constituent parts. It’s absolutely possible that the major drive companies have cut warranties because they’re not as confident in the reliability of their hardware as they would be under normal circumstances.

Source