reliability notes

Reliability Engineering

2: principles and techniques

Realistically, most likely failures, actions to those failures.
　

1, failures from the customer, FA, actions reliability analyses.
2. understadings of inherent reliability of a product or process and pinpoint potential areas for reliability improvement.
3. effects of design changes and corrections.

team, including reliability engineers, quality engineers, test engineers, systems engineers or design engineers.
　

hour	day	year	fit	address	bits
1000	41.66667	0.114155		10	1.02E+03
1.00E+06	41666.67	114.1553		20	1.05E+06
1.00E+05	4166.667	11.41553		30	1.07E+09
168	7	0.019178		4	1.60E+01
				8	2.56E+02
				9	5.12E+02
				7	1.28E+02

What is Reliability? Why

4 things: probability, intended function, a specified period of time, under stated conditions.

What is Reliability? reliability can be defined as the probability that an item will continue to perform its intended function without failure for a specified period of time under stated conditions.

Why is Reliability Important? What is the Difference Between Quality and Reliability?

http://weibull.com/basics/reliability.htm

Why is Reliability Important?

Reputation. A company's reputation is very closely related to the reliability of its products. The more reliable a product is, the more likely the company is to have a favorable reputation.

Chain of reputation: reliability is not a probability, one failure in critical moment is not a probability

human psychological effect, 99 good things may have same impact as one bad thing

Competitive

What is the Difference Between Quality and Reliability?

quality time 0 good, sorting, good dies, pass functional tests

reliability, devices, HCI, mosfet, ILD, leakage time dependant

design, process, poorly manufactured. , even though the product design, manufacturing process.

Quality - Design of Experiments (DOE) and the Taguchi Approach

http://nutek-us.com/wp-doe.html

- Optimize product and process designs, study the effects of multiple factors (i.e.- variables, parameters, ingredients, etc.) on the performance, and solve production problems by objectively laying out the investigative experiments. (Overall application goals)

- Study Influence of individual factors on the performance and determine which factor has more influence, which ones have less. You can also find out which factor should have tighter tolerance and which tolerance should be relaxed.

DOE using Taguchi approach attempts to improve quality which is defined as the consistency of performance. This can be done by moving the mean performance to the target as well as by reducing variations around the target. The prime motivation behind the Taguchi experiment design technique is to achieve reduced variation (also known as ROBUST DESIGN).

Further, the experimental data will allow you determine:

- How you can determine which factor is causing most variations in the result
- How you can set up your process such that it is insensitive to the uncontrollable factors
- Which factors have more influence on the mean performance
- What you need to do to reduce performance variation around the target
- How you can adjust factors for a system whose response varies proportional to signal factor (Dynamic response)

Introduced by R. A. Fisher in England in the 1920's

The Taguchi Approach , Dr. Genechi Taguchi DOE techniques in the late 1940's

HTOL

The High Temperature Operating Life (HTOL) or steady-state life test is performed to determine the reliability of devices under operation at high temperature conditions over an extended period of time. It consists of subjecting the parts to a specified bias or electrical stressing, for a specified amount of time, and at a specified high temperature (essentially just a long-term burn-in). from: http://www.siliconfareast.com/HTOL.htm

Reliability Engineering; Reliability Modeling;

Qualification Process; Failure Analysis; Package Failures; Die Failures

reliability key words

JEDEC is the global leader in developing open standards for the microelectronics industry, with more than 4,000 volunteers representing nearly 300 member companies.Joint Electron Device Engineering Council.

Reliability engineers are in many ways like soothsayers - they are expected to predict many things for the semiconductor company: how many failures from this and that lot will occur within x number of years, how much of this and that lot will survive after x number of years, what will happen if a device is operated under these conditions, etc.

To many people, such questions seem overwhelmingly difficult to answer, half-expecting reliability engineers to demonstrate some supernatural powers of their own to come up with the right figures.

Fortunately for reliability engineers, they don't need any paranormal abilities to give intelligent responses to questions involving failures that have not yet happened. All they need is a good understanding of statistics and reliability mathematics to be up to the task.

Reliability assessment, or the process of determining to a certain degree of confidence the probability of a lot being able to survive for a specified period of time under specified conditions, applies various statistical analysis techniques to analyze reliability data. If properly done, a reliability prediction using such techniques will match the survival behavior of a lot, many years after the prediction was made.

A good understanding of life distributions is a must-have for every reliability engineer who expects to exercise sound reliability engineering judgment whenever the need for it arises. A life distribution is simply a collection of time-to-failure data, or life data, graphically presented as a plot of the number of failures versus time. It is just like any statistical distribution, except that the data involved are life data.

By looking at the time-to-failure data or life distribution of a set of samples taken from a given population of devices after they have undergone reliability testing, the reliability engineer is able to assess how the rest of the population will fail in time when they are operated in the field. Based on this reliability assessment, the company can make the decision as to whether it would be safe to release the lot to its customers or not, and what risks are involved in doing so.

All new engineers in the semiconductor industry are acquainted with the bath tub curve = burn in , which represents the over-all failure rate curve generally observed in a very large population of semiconductor devices from the time they are released to the time they all fail. The bath-tub curve has three components: the early life phase, the steady-state phase, and the wear-out phase.

The failure rate is highest at the beginning of the early life phase and the end of the wear-out phase. On the other hand, it is lowest and constant in the long steady-state phase at the middle part of the curve. Collectively, these phases make the curve look like a bath tub (where it obviously got its name).

The bath tub curve takes into account all possible failure mechanisms that the population will encounter. Some failure mechanisms are more pronounced in the early life phase (such as early life dielectric breakdown), while others are more pronounced in the steady-state or wear-out phases. Failures that occur in the early life phase are known as infant mortality = burn in , which are screened out in production by burn-in.

In real life, it is not always practical to evaluate the failure or survival rate of a population of devices in terms of the bath tub curve. Reliability assessments are often conducted to evaluate only the known weaknesses of a given lot or, if the lot has no known weaknesses, to determine if it is vulnerable to any of the critical failure mechanisms dreaded in the semiconductor industry today.

Such reliability assessments are conducted by running a set of industry-standard reliability tests, generating life data along the way. These life data are then analyzed according to what type of life distribution they fit.

There are currently four (4) life distributions being used in semiconductor reliability engineering today, namely, the normal distribution, the exponential distribution, the lognormal distribution, and the Weibull distribution. Different failure mechanisms will result in time-to-failure data that fit different life distributions, so it is up to the reliability engineer to select which life distribution would best model the failure mechanism of interest.

reliability, quality, fit, ppm, burn-in

MTTF, Accelaration factor, and FIT

C: number of failures, Pa: 1-Pa = confidence level

H

HCI

TOL, chip reliability test, longer than BI,

NBTI

gate oxide property degradation mechanism tddb

Breakdown field ~ 8 MV/cm for thick oxides and increases > 10 MV/cm for thinner oxides.

FA FIB, voltage contrast

ser

Soft error rate

Soft error rate (SER) is the rate at which a device or system encounters or is predicted to encounter soft errors. It is typically expressed as either number of failures-in-time (FIT), or mean time between failures (MTBF). The unit adopted for quantifying failures in time is called FIT, equivalent to 1 error per billion hours of device operation. MTBF is usually given in years of device operation. To put it in perspective, 1 year MTBF is equal to approximately 114,077 FIT (approximately ${\frac{1,000,000,000}{24 \times 365.25}}$ ).

Enhance storage capacitance
TFT load devices
Use retrograde wells, i.e., highly doped implanted layers in order to reduce charge collection at the drain nodes
SOI

While many electronic systems have an MTBF that exceeds the expected lifetime of the circuit, the SER may still be unacceptable to the manufacturer or customer. For instance, many failures per million circuits due to soft errors can be expected in the field if the system does not have adequate soft error protection. The failure of even a few products in the field, particularly if catastrophic, can tarnish the reputation of the product and company that designed it. Also, in safety- or cost-critical applications where the cost of system failure far outweighs the cost of the system itself, a 1% chance of soft error failure per lifetime may be too high to be acceptable to the customer. Therefore, it is advantageous to design for low SER when manufacturing a system in high-volume or requiring extremely high reliability.

What is EM and it effects?

Due to high current flow in the metal atoms of the metal can displaced from its origial place. When it happens in larger amount the metal can open or bulging of metal layer can happen. This effect is known as Electro Migration.

Affects: Either short or open of the signal line or power line.