## Digital VLSI Testing Prof. Santanu Chattopadhyay Department of Electronics and EC Engineering Indian Institute of Technology, Kharagpur

## Lecture - 01 Introduction

Welcome to this course on digital VLSI testing. VLSI design we know is one of the very important part of system design, because a system consists electronic system consists of IC chips and this IC chips are fabricated by these VLSI manufacturing process. It includes from the specification till design and implementation. Out of all this phases testing comes in one of the very important part of it and in all almost all the phases you will find some application of testing, which is required and to get the system correctness ensured.

During this course we will be primarily looking into the digital VLSI testing, because knowing fully well that this integrated circuit chips they can broadly be divided into two categories; analog circuits, analog chips and digital chips and of course, there are some chips which are mixed mode having both analog and digital components. Out of these digital VLSI testing is one of the very well gone domains, and over the years it has developed a lot having a good number of strategies for testing, good number of algorithms, and the problems and how to address those problems. So, those all of them have become part of these digital VLSI testing.

On the other hand this analog testing is also popular because to test analog chips we need them, but it is not that much developed as digital testing and in many cases while testing analog chips, so we take help of these digital testing techniques for testing them and of course, for mixed mode chips we have to do both analog and digital both components are there. So, accordingly we have to test both the components, and many a times the algorithms that are used for digital testing they are also used for analog testing. So, primarily we will focus on this digital testing part.

## Introduction to VLSITesting

- a Introduction
- Q Testing During VLSI Life Cycle
- Q Test Generation
- Q Fault Models
- Q Levels of Abstraction
- Q Overview of Test Technology

So, in today's lecture we will be having an introduction to this VLSI testing. So, we will start with a general introduction, then we will try to see in the VLSI design life cycle where exactly this testing lies. So, where at which phases we need this testing; then for testing we need to apply some patterns, some test ha patterns have to be applied for a circuit to be tested or a chip to be tested.

So, that requires some techniques by which we can generate the stimulus for the test operation. So, they are known as test generation techniques, then in the VLSI chip that we have. So, there can be many types of defects, like some of the points maybe one point the line when the signal line was being drawn, it may be broken at some point, it may be sorted with some other line, it may be that some part has got some the silicon itself had some impurity as a result there is some problem some defect that has grown. But this type of defect, so there can be innumerable types of defects, and possibly many of them are even unknown till today. So, when we are trying to de test a system against all those defects, it is very difficult to talk in terms of those defects.

So, what we do in state is that we will talk in terms of faults. So, fault is the manifestation of the defect. So, if two lines are say sorted in the manufacturing process, so that actually gives rise to something known as a Beijing fault. Similarly if a line is broken in between, then depending upon the technology that we have; so that may be treated as a high point or logic high point, logic low point or maybe a high impedance

point or a maybe. So, that way they are can be many such types of faults that we can associate with these defects.

So, instead of talking in terms of defects, so if we talk in terms of faults it often makes the analysis simpler. So, we can get good set of algorithms by which those faults can be abstract. So, they actually come under the broad heading of fault models; then there are levels of abstraction like whether we look at a system at the transistor level which is the lowest one, or at the logic gate level, or maybe a combination of logic gates forming say RTL level modules like say adder, subtractor, multiplier, multiplexers like that or even at a higher level when we talk in terms of a system, where it there maybe a number of such a chips connected put into a board.

So, we talk in terms of board level or maybe if we integrate all those chips on to a single silicon floor, so all those designs on to a single silicon flood, so it may get the entire PCB on a single chip giving rise to system on chip type of architecture. So, that way at which level we are attacking these faults, which are at which level we are trying to address these faults, so that gives rise to these levels of abstraction. So, it may be easy to attack the system behavior related faults at a much at much higher level, compared to say a transistor level for us we are when we are talking about say a individual lines being short, or individual lines being open it may be better that we talk at a much lower level.

So, that way levels of abstraction will play a very important role in deciding which type of test technique we apply for the purpose and there are several test technologies, so we will also have an overview of the test technologies.

(Refer Slide Time: 06:24)



So, this particular diagram it shows how this VLSI design has become more and more complex over the years. So, in 1960s, so started with this small scale integration having 10 to 100 number of transistors per chip, then in 60s to 70s, so it switched over to MSI medium scale integration there we have got 100 to 1000 number of transistors per chip then came LSI large scale integration having 1000 to say 10000 or slightly more number of chips and after that when we incorporate more number of transistors onto a chip, so that actually brought the era of VLSI. So, what has happened over the years? Over the years number of transistors per chip has become more and more complex, as the ma VLSI manufacturing process is advancing the technology is advancing, size of individual transistors are becoming small. So, on the same silicon flood, so we can accommodate more number of transistors. So, if we can accommodate more number of transistors, then we can have more complex systems and we can put more functionality into the system.

So, that way our say IC chips are becoming smaller and smaller and the device is they are becoming the systems that we are developing around that. So, they are becoming smaller and smaller. So, it is captured by this Moore's law, it tells that IC s scale of IC is doubles every 18 months. So, every 18 months the number of transistors per unit area, so that is going to double; so this law is more or less follow till today. So, in future technology what will happen whether some new technology will come up to continue with this rule or not, so that that is a question be debated by the VLSI device

manufacturers. So, will not going to that, but whatever happens. So, complexity of VLSI systems will continue to increase. So, that is accepted.

Now, this increase in complexity it opposes important challenges to the designers, that is true because now the system is more complex etcetera; however, for the test engineers also this becomes a very important aspect. Now you see if you have got only say 100 transistors in the entire chip, so we can test it is test them exhaustively possibly, we can try to develop some test technique by which each of those 100 transistors can be tested individually.

But if we do that when I have got say a lacks of transistors on a say on the chip, then this test testing time itself will become too high to be accommodated in to the overall design, overall system chip manufacturing time. In fact, it is true that in the overall VLSI design life cycle about 60 to 70 percent time is spent on the testing part. So, if testing is not given proper care, then it may so happen that at the end when we are trying to test the system for correctness, so you may find that many of them are many of the chips are failing. So, that may affect the performance of the manufacturing process or that may lead to some loss in the profit of the company.

So, this has got many complexities. So, with the codes will see how this complexity of testing has grown, apart from the inherent challenge that is number of device or number of transistors per chip has gone up. The other issues that are there it include like the power consumption of the chip has gone up; because so many devices are there. So, in the testing phase also we need to address the power issue, we need to address the temperature issues the different portions of the chips, so they become hot differently. So, as a result as we know that as the temperature of a VLSI system VLSI chip increases, its delay behavior becomes unpredictable. So, if different parts of the chip are heated differently, then the delay behavior or different parts will be different; as a result some part may be showing that it is the signal may be reaching faster than the other parts of the chip as a result it may lead to some sort of timings errors. So, this is these are the things that needs to be taken care of. So, why a testing is so important? So, Moore's law it results from decreasing feature size.

(Refer Slide Time: 11:04)

## Importance of Testing

- Moore's Law results from decreasing feature size (dimensions)
  - from 10s of μm to 10s of nm for transistors and interconnecting wires
- Operating frequencies have increased from 100KHz to several GHz
- Decreasing feature size increases probability of defects during manufacturing process
  - · A single faulty transistor or wire results in faulty IC
  - Testing required to guarantee fault-free products



So, from 10s of micrometer to 10s of nanometer for transistors and interconnecting wire; so this is the fact that has happened over the years to keep in accordance with this Moore's law, what has happened is number the transistor sizes. So, they have gone from micrometer to nanometer range and it is decreasing further, so interconnecting wires are also reducing in size.

On the other hand operating frequency it is increased. So, it is from 10 kilohertz to 100 kilohertz to several gigahertz, so the frequencies are increasing that way. This decreasing feature size, so it increases probability of defects during manufacturing process. So, what we essentially mean is assuming that my manufacturing process is flawless thus there is no problem with manufacturing process. But even then the silicon wafer on to which we are making the design, that wafer will have impurities and that cannot be controlled. So, we cannot have 100 percent pure silicon wafer.

So, when we are putting more and more devices on to the chip, the possibility of some of them becoming faulty. So, that is increasing because now that density is increasing. So, probability of defect during manufacturing process this will increase a single faulty transistor or wire will result in faulty IC. So, if we talk in terms of correctness of the IC. So, this is the extreme point. So, a single transistor failing it may result in the failure of the IC, a single wire failing may have to be considered as a failure of IC of course, there are techniques by we have got fault tolerant designs and all that, so not going into that.

So, apart from that ma in general this is true. So, now, testing is required to guarantee fault free product. So, we want that this product should be fault free. So, how to ensure it? It has to be ensured by testing, because even if my manufacturing process is 100 percent correct, we cannot say for with guarantee that the fabricated chip will be fault free, because of impurities in the silicon; plus there are defects in the manufacturing process, process variation etcetera. So, those cannot be, we can never make the manufacturing process also 100 percent correct.

(Refer Slide Time: 13:40)



So, testing becomes an important issue; there is a rule of 10 which says that the cost to detect the faulty IC increases by an order of magnitude, as we move from device to PCB to system to field operation. So, when I have manufactured a transistor on the silicon flood, so we can test it. So, that transistor may become faulty; say if that transistor is faulty. So, we may think about replacing it by another transistor, there may be sphere transistors to which we make the connection just ignoring the faulty transistor. So, that way in the manufacturing process itself we can rectify the problem. Suppose we bypass that and take the device up to PCB levels. So, in the PCB level we have got a number of such IC chips mounted, now if at that level I am testing then my test is confined to the board chip level testing. So, a chip may be may be correct a chip may be faulty. So, there is no question of having a part of the chip working correctly. So, that is not there. So, it may be PCB level testing can be done.

So, device level if we can detect some problem we can rectify with minimum effort; if we have at PCB level, at the PCB level we have to replace the chip or if we think about fault tolerance then we have to duplicate that an entire chip that way; so that way it becomes costly. Now coming to the system level a system consists of a number of such PCB s like a computer, if you open up the cabinet you will see that a number of chips are mounted on to number of PCBs are mounted on to the inside rule.

So, if something goes wrong. So, they are normally we do not have provision even to change at the chip level, so we have to change at the PCB level. So, if the PCB is faulty so we change the PCB, so that way it becomes testing. Now when it goes to the field, now there are many systems where we need to test the system when it has been put into the operation. Particularly the safety critical systems; so though they are we have to do the testing, and if at that point the system fails then possibly we need to replace it by another system all together.

So, the cost increases as we go from the lowest level of devices to the field operation. So, testing has to be performed in all these steps all these stages, to minimize the cost or minimize the overhead. Testing is also used during manufacturing to improve yield. So, it may so happened that in the manufacturing process. So, if you may find that in one lot of production there are many of the many of the chips that are produced are not correct, there is it fails the testing process. So, what it points to possibly, there is a problem in the manufacturing process itself. So to check the problems with the manufacturing process, so we need to do testing. So, at the manufacturing stage itself we need to do the testing, so that is required which is known as failure mode analysis, how what type of failures are coming in the manufacturing process.

On the other hand in the field operation also we need to do testing for fault free system operation. Like if we have got some safety critical systems mounted for medical purposes. So, those systems need to be tested while it is in operation. So, we need to test the system when it is operating. So, that way when the field operation also we need to ensure that the system is fault free, and if there is a problem that is detected in the system, so that has to be repaired whenever this faults is detected. So, there may be several strategies for that it depends on the system designer like you may think about some duplicate, some replacement or some bypass mechanism in the operation of the system.

(Refer Slide Time: 17:46)



So, testing process it looks like this that we have got some circuit under test, see you normally call it cut or CUT circuit under test. So, this is the system that we want to test. So, it may be at the very lowest level it maybe transistors and wires, at the highest level it may be a system with some input and some output. So, to test this system what we do apply some input pattern to this circuit under test, and as a result the circuit produces the output and that output, we need to check whether that response is correct or not. So, this testing typically consists of applying a set of test stimuli. So, maybe this circuit may have several types of faults in it. So, one, a single input stimulus may not be sufficient for checking all of them. So, it may be that one stimulus can excite only say 2 or 3 faults in the circuit. So, to test for all the faults, we need to have a set of stimuli, so that, so all of them are to be applied.

So depending upon the type of fault that we are going to cover, or type of fault that we are going to detect, this stimuli part will also change. So, stimuli part will depend on how it can exceed that those faults. So, these stimuli they are input to the circuit under test and analyzing the output responses. So, the response is produced here and this response analysis has to be done. So, if incorrect then the cut will be assumed to be faulty; if it is correct then the cut is assumed to be fault free.

Now, there may be several catches in it; like I apply say 100 test stimulus test stimuli now all this 100 test stimuli they pass. So, for the circuit produces correct desired output

for all these 100 test stimuli. Does it mean that my circuit is absolutely correct? There is no problem with the circuit, but that cannot be guaranteed. So, this only tells that the type of faults that this set of stimuli could detect; so they were not detected in this system. So, that gives us this much this particular testing session, will give us this much confidence.

So, it is absolutely true that if I have to ensure or if I have to obtain a very high confidence on the about the correctness of the system, then we need to apply a very large test stimuli. So, set of test a stimulus which is sufficiently large.

(Refer Slide Time: 20:49)



But in the process what will happen is that the time requirement for testing, so that will go up significantly. In fact, if we plot it like this, on this side we pa plot the size of this stimuli set or number of test pattern that we are applying and this side we plot the confidence that how much confidence we can have about the correctness of the system; then normally it shows a behavior like this.

So why if this is the say the 100 percent a line, if this is say the 100 percent line. So, if you are applying a very small number of test pattern, if you are applying say only this much of test patterns, then the confidence that you are getting is only this much. So, you can say only out of 100 say may only say about 10 percent of the faults, I have tasted, so I know that my circuit is secure against this 10 percent faults. Now as you try to get more and more confidence, so you have to spend more time applying this test pattern and analyzing the responses.

Accordingly the cost of the system goes up, and in because now you know that if I have planned for producing some system. So, if may be that if the design can come within say 6 months, so less than. So, this is say 6 month; if the product can come within 6 month maybe my profit will be very high, my profit will be high if it can come within 6 month. If it comes after this after 6 month, it may start the profit may start decreasing like this whether it will become 0 or not after sometime that depends on the product that we have, but this marketing team that can tell us that the product should come within 6 month for maximum benefit.

Now, you think that if we want to put more time on testing, then this 6 month will not be sufficient. So, we need to put more time on to that, as a result the benefit or the profit that we can get by marketing the products, so that will decrease.

So, we need very efficient techniques by for testing, so that my product can launch into the market within 6 months; at the same time I have got very good confidence about the correctness of the system. So, normally we see that for similar type of products with similar functionalities. So, products are there where the costs are widely wearing; one of the major reason for this to happen is that this testing time that is dedicated to it is much less in case of cheaper products, because that saves the time to market and as a result we can bring the product into the market at a much lower cost. But at the same time the confidence on the product will be very low, because it is possibly not tested thoroughly with all the functionalities of the system. So, that is a known fact. So, that is there. So, testing will ensure that we have more confidence about the correctness of the system.



In the VLSI development life cycle, so this testing will come and various stages; first VLSI design cycle starts with the specification of the design. So, we would tell what we expect the system to do in terms of some formal or informal technique, it may be that we do it in some hardware description language (Refer Time: 24:31) very log etcetera or it may be that we write it in some other specification technique from which these conversion is done to the behavioral description or structural description of it.

Then after this specification is given it passes through some manufacture some design process, and this design process will give me the corresponding design. So, basically it passes through some cat tools from where we get the design. Now at this point of time what is the guarantee that the functionality that I had specified here has been translated properly into this? Typical examples maybe suppose I want to design a counter. So, when I am writing the specification of the counter. So, if it is say mod 16 counter, then we can say that we can write down a behavior that count equal to 0 and for I equal to 1 to 15 count equal to count plus 1; if count becomes 15 then count will be reset to 0. So, that way we can write in some a hardware description language.

Now, this when converted into a counter design, possibly so it will have four flip flops in it, and then associated gates and all that that will realize the same functionality. Now what is the guarantee that this design that I have got consisting of flip flops and these

gates, they really ha have the behavior of the specification of the counter that I had. So, this is actually done by this verification process.

So this verification process if there is a design error, then this verification process will try to prove some properties of the specification on to this design and if the design is incorrect then those proofs will fail; and when the proof fails this verification tools. So, they can generate some counter example by which it can tell you that this is false because of this reason. So, this way this verification can be taken care; this verification face can take care of this design errors. So, design verification targets design errors and corrections are made prior to fabrication.

So, if you can detect some error in the design process here itself, then that can be corrected and much effort can be saved, much effort in terms of testing can be saved in the process. After the design phase is over, so it goes to the fabrication stage. So, now, there is no more correction to the design that is possible. So, we are going to the fabrication where we have made the onto the silicon wafer the individual circuit. So, we have fabricated the transistors, the links between them etcetera.

Now, at that point of time we need to check whether the transistors have been fabricated properly or not. As I was telling if we can detect this wafer to be faulty at some point at this early stage some transistor to be faulty. So, we can remove this transistor from our connection pattern and maybe we can use a substitute transistor for the purpose. So, this fabrication level at this level we have to do a testing, which is wafer level test. But the wafer level test the basic problem is that at that point these wafers are open look. So, they do not have any protection like they do not have packaging and all that. So, when it is not there. So, they cannot tolerate voltage fluctuations are very high voltage, very high power etcetera so they cannot tolerate all these.

So, as a result this wafer level test is very delicate and then now we need to take some special care while doing this thing. But we cannot avoid this stage because this will be requiring us to check the correctness of the individual devices that are fabricated. Now once we have got the confidence that by transistors are all fabricated properly and all that. So, it comes to the packaging stage. So, in the packaging stage the packaging will be done. So, now, will have this bonding of the chip and this at that point also we need to check whether this packaging has been done correctly; the pins that are created whether

the pins are connected properly to the system or not so those type of checks that to be done. So, that test is also necessary at the packaging level and for. So, after fabrication has been done, so we have got these individual devices check.

Now, at the packaging stage, so these pins will be attached and then we check whether this packaging has been done properly or not. So, at that stage also the functionalities are to be checked, so that is done at the packaging level. And after that when the pack the sis the chip has been packaged. So, we have to check for the overall operation of the system; overall operation of the chip. So, that is due to quality assurance whether it meets all the requirement of the system or not so that is done by the final testing, so all these remaining tests apart from this verification, so they will target manufacturing defects. So this fabrication problem, packaging problem so all these stages, so they will be checking for manufacturing defects.

So, as I told at the very beginning a defect is a flow or physical imperfection that can lead to a fault. So, a fault is like this say if a line if two lines are say shorted. So, we say that these two line there is a bridge that has been created, so bridge is a type of fault. Similarly a line maybe should a line may get sorted with say the power supply line VCC. So, we say that the line is stuck at 1 so that is one type of fault. But the actual problem that comes is basically a shortening of the power supply line with that particular line. So, that is the problem. Similarly if a line is open at some point there is a break at some point, so that is that may be treated as a high or low permanently high or permanently low, or high impedance state depending upon the technology. So, that is also a type of physical defect, but in terms of testing, so we will model it as a fault.

So, we will continue in the next lecture on this.