September 17, 2024

COA (Computer Organization & Architecture)

Computer organization and Architecture
Share :

COA Question and Answer


(Suggest: Find the question by search page and keep refreshing the page for updated content)

Q 1. For the single bus organization, write the complete control sequence for the instruction showing all the six steps: Move (R1), R2 that i.e [R1] -> R2

Answer

the complete control sequence for the single bus organization to execute the Move (R1), R2 instruction:

Step 1: Instruction Fetch (IF)

  • The program counter (PC) is loaded with the address of the next instruction to be executed.
  • The instruction is fetched from memory and stored in the Instruction Register (IR).

Step 2: Instruction Decode (ID)

  • The instruction in the IR is decoded to determine the opcode and operand(s).
  • In this case, the opcode is “Move” and the operands are R1 and R2.

Step 3: Register Fetch (RF)

  • The contents of the R1 register are loaded onto the bus.

Step 4: ALU (Arithmetic Logic Unit) Operations

  • None required for this instruction.

Step 5: Data Transfer (DT)

  • The data on the bus is transferred to the R2 register.

Step 6: Register Write (RW)

  • The value in the R2 register is written back to memory or cache if necessary.
  • And that’s it! This sequence of steps will execute the Move (R1), R2 instruction in a single bus organization.



Q 2. A given processor requires 1000 cycles to perform a context switch and start an interrupt handler and 1500 number of cycles to switch back to the program that was running when the interrupt occurred, or 500 cycles to poll an I/O device. An I/O device attached to that processor makes 200 requests per second, each of which takes 12,500 cycles to resolve once the handler has been started. By default, the processor polls every 0.25 ms if it is not using interrupts

a. How many cycles per second does the processor spend handling I/O from the device if interrupts are used?

Answer. If interrupts are used, the processor does not need to poll for I/O device requests. Instead, when the device makes a request, the processor will spend 1000 cycles to perform a context switch and start an interrupt handler. Once the handler is started, the device’s request will take 12,500 cycles to resolve. After the handler is done, the processor will switch back to the program that was running before the interrupt occurred, which takes 1500 cycles.

Therefore, the total time spent on I/O handling per request is 1000 + 12,500 + 1500 = 15,000 cycles.

The I/O device makes 200 requests per second, so the total number of cycles per second spent on I/O handling is:

200 requests/second * 15,000 cycles/request = 3,000,000 cycles/second

Therefore, if interrupts are used, the processor spends 3,000,000 cycles per second handling I/O from the device.


b. How many cycles per second are spent on I/O if polling is used (include all polling attempts)? Assume the processor only polls during time slices ‘when user programs are not running, so do not include any context-switch time in your calculation

Answer. If polling is used, the processor will spend 500 cycles each time it polls the I/O device, regardless of whether there is a request or not. The device makes 200 requests per second, so the processor will need to poll the device at least 200 times per second to ensure that no requests are missed.

The processor polls every 0.25 ms, which is 250,000 cycles. Therefore, the number of cycles spent on each polling attempt is: 250,000 cycles * 1 polling attempt / 1 poll = 250,000 cycles/poll

To determine the total number of cycles spent on I/O handling per second, we need to consider the time spent on both polling and handling requests.

Assuming that the processor only polls during time slices when user programs are not running, and that each time slice is 10 ms long, the total number of polling attempts per second is: 1 time slice / 10 ms * 1000 ms / 1 s * 1 poll / 0.25 ms = 4000 polls/second

However, we need to account for the fact that the processor only needs to poll 200 times per second to handle all requests. Therefore, the number of extra polls per second is: 4000 polls/second – 200 requests/second = 3800 extra polls/second

The total number of cycles spent on polling is: 250,000 cycles/poll * 4000 polls/second = 1,000,000,000 cycles/second

The total number of cycles spent on handling requests is: 200 requests/second * 12,500 cycles/request = 2,500,000 cycles/second

The total number of cycles spent on extra polls is: 3800 extra polls/second * 500 cycles/poll = 1,900,000 cycles/second

Therefore, the total number of cycles per second spent on I/O handling if polling is used is: 1,000,000,000 cycles/second + 2,500,000 cycles/second + 1,900,000 cycles/second = 1,004,400,000 cycles/second


c. How often would the processor have to poll for polling to take as many cycles per second as interrupts.

Answer. To determine how often the processor would have to poll for polling to take as many cycles per second as interrupts, we can set the total number of cycles spent on I/O handling using polling equal to the number of cycles spent on I/O handling using interrupts:

1,004,400,000 cycles/second = 3,000,000 cycles/second

Solving for the polling frequency, we get:

polling frequency = 3,000,000 cycles/second / (250,000 cycles/poll + 12,500 cycles/request) = 11



Q 3. A nonpipelined processor has a clock rate of 25 GHz and an average CPI (cycles per instruction) of 4 An upgrade to the processor codices a five-stage pipeline However, due to internal pipeline delays, such as latch delay, the clock rate of the new processor has to be reduced to 2 GHz

a. What is the speedup achieved for a typical program?

Answer. For the non-pipelined processor, the execution time of a program can be calculated as:

Execution Time = CPU Clock Cycles x CPI / Clock Rate
Execution Time = CPI / Clock Rate = 4 / (25 x 10^9) = 0.16 ns

For the pipelined processor, the execution time can be calculated as:

Execution Time = Pipeline Depth x CPI / Clock Rate
Execution Time = 5 x 4 / (2 x 10^9) = 10 ns

Therefore, the speedup achieved for a typical program is:

Speedup = Execution Time (Old Processor) / Execution Time (New Processor)
Speedup = 0.16 / 10 = 0.016

So, the new processor is 62.5 times faster than the old processor.



b. What is the MIPS rate for each processor?

Answer. The MIPS rate for each processor can be calculated as:

MIPS = Clock Rate / CPI x 10^6

For the non-pipelined processor, the MIPS rate is:

MIPS = 25 x 10^9 / 4 x 10^6 = 6250

For the pipelined processor, the MIPS rate is:

MIPS = 2 x 10^9 / 4 x 10^6 = 500

Therefore, the non-pipelined processor has a MIPS rate of 6250 and the pipelined processor has a MIPS rate of 500.



Q 4. A microprocessor provides an instruction capable of moving a string of bytes from one area of memory to another. The fetching and initial decoding of the instruction takes 20 clock cycles. Thereafter, it takes 30 clock cycles to transfer each byte. The microprocessor is clocked at a rate of 20 GHZ

a. Determine the length of the instruction cycle for the case of a string of 32 bytes.

Answer

a. The length of the instruction cycle for the case of a string of 32 bytes is given by:

  • Time taken to initiate the transfer + Time taken to transfer all 32 bytes
    = 20 + (30 x 32)
    = 980 clock cycles
  • The duration of one clock cycle is given by:
    1 / (20 GHz)
    = 0.05 nanoseconds
  • Therefore, the length of the instruction cycle is:
    980 x 0.05 nanoseconds
    = 49 microseconds

So, the length of the instruction cycle for the transfer of 32 bytes is 49 microseconds.


b. What is the worst-case delay for acknowledging an interrupt if the instruction is non-interruptible?

Answer. If the instruction is non-interruptible, the worst-case delay for acknowledging an interrupt is the time taken to complete the current instruction cycle. In this case, the instruction cycle takes 980 clock cycles, which corresponds to a time of 49 microseconds. Therefore, the worst-case delay for acknowledging an interrupt is 49 microseconds.


c. Repeat part (b) assuming the instruction can be interrupted at the beginning of each byte transfer.

Answer. If the instruction can be interrupted at the beginning of each byte transfer, then the worst-case delay for acknowledging an interrupt is the time taken to complete the current byte transfer, which is 30 clock cycles or 1.5 microseconds. Since there are 32 bytes to transfer, there could be up to 32 interruptions, so the worst-case delay for acknowledging an interrupt would be 32 x 1.5 microseconds, which is 48 microseconds.



Q 5. Identify all data dependencies in the following code, assuming that we are using the 5-stage MIPS pipelined data path. Which dependencies can be resolved via forwarding?
Here $2 <- $5 + $4
add $2,$5,$4
add $4,$2,$5
sw $5,100($2)
add $3,$2,$4

Answer.

There are two types of data dependencies that can occur in a pipelined processor: data hazards and control hazards. Data hazards occur when an instruction depends on the result of a previous instruction that has not yet completed. Control hazards occur when the flow of instructions is altered, such as by a branch or jump instruction.

In the given code, there are three instructions that have data dependencies:

  1. The third instruction (sw) depends on the result of the first instruction (add). This is a RAW (read-after-write) data hazard.
  2. The fourth instruction (add) depends on the result of the first instruction (add). This is also a RAW data hazard.
  3. The fourth instruction (add) depends on the result of the third instruction (sw). This is a WAR (write-after-read) data hazard.

However, both of the RAW data dependencies can be resolved via forwarding. Forwarding, also known as bypassing, allows the result of an instruction to be forwarded directly to a subsequent instruction that depends on it, without waiting for the result to be written back to the register file.

Therefore, only the WAR data dependency cannot be resolved via forwarding, and the fourth instruction will need to wait until the third instruction completes its store operation before it can read the updated value of $2 from the register file.

Note that there are no control hazards in this code since there are no branch or jump instructions.



Q 6.
a. To calculate the time consumed in sequential access, we need to consider the time it takes to read each sector and the time it takes to move from one track to the next. Since we are reading 3000 sectors and each sector is 512 bytes, the total amount of data to be read is:

Answer. 3000 sectors * 512 bytes/sector = 1,536,000 bytes

The disk’s rotation speed is 15,000 rpm, which means it completes one revolution every: 1/15,000 rpm * 60 s/min = 0.004 ms

The average rotational delay is 2 ms, so the average time it takes for the disk to rotate to the desired sector is: 2 ms + 0.5 * 0.004 ms = 2.002 ms

The disk’s average seek time is 5 ms, so the average time it takes to move from one track to the next is: 5 ms

To read one sector, the total time required is the sum of the rotational delay, the seek time, and the time it takes to transfer the data: 2.002 ms + 5 ms + (512 bytes / (15000 rpm / 60 s/min) / 2) = 2.002 ms + 5 ms + 0.0001717 ms = 7.0021717 ms

Therefore, the time consumed in sequential access is: 3000 sectors * 7.0021717 ms/sector = 21.0065151 s


b. To calculate the time consumed in random access, we need to consider the time it takes to move the read/write head to the desired sector, as well as the time it takes to read the sector. Since the sectors are being accessed randomly, we need to factor in the average seek time for each access.

Answer. Assuming that the access to each sector is independent, the total time consumed in random access can be calculated as follows:

Average seek time for each sector: 5 ms
Average rotational delay for each sector: 2 ms + 0.5 * 0.004 ms = 2.002 ms
Time to transfer 512 bytes: (512 bytes / (15000 rpm / 60 s/min) / 2) = 0.0001717 ms
Total time for each sector: 5 ms + 2.002 ms + 0.0001717 ms = 7.0021717 ms
Total time for all 3000 sectors: 3000 sectors * 7.0021717 ms/sector = 21.0065151 s
Therefore, the time consumed in random access is also 21.0065151 s, which is the same as the time consumed in sequential access.



Q . A 4 way set associative cache has a block size of 8 bytes. The cache can accommodate a total of 2048 bytes. The non memory sure that is cacheable is 128K Bytes. Compute the following.

a. How many lines are there per set?

Answer.  The cache has a total of 2048 bytes (128 KB = 128 x 1024 bytes). Since the block size is 8 bytes, the number of blocks in the cache is: 2048 bytes / 8 bytes per block = 256 blocks

The cache is 4-way set associative, which means there are 4 blocks in each set. Therefore, the number of sets in the cache is: 256 blocks / 4 blocks per set = 64 sets

Since there are 64 sets and the cache is 4-way set associative, there are: 4 lines per set


b. Number of bits needed to address the main memory?

Answer. To address the main memory, we need to calculate the number of bits needed to represent the address space of the memory. The cache can accommodate 2048 bytes, which can be represented using 11 bits (2^11 = 2048). Since the block size is 8 bytes, the offset within each block can be represented using 3 bits (2^3 = 8). The remaining bits in the address are used to identify the set within the cache and the tag bits. Since there are 64 sets in the cache, we need 6 bits to represent the set index (2^6 = 64). The tag bits are the remaining bits in the address:

Number of bits needed = 11 bits for the byte offset + 6 bits for the set index + tag bits

Tag bits = Total bits in address – 11 bits for byte offset – 6 bits for set index
= 32 – 11 – 6
= 15 bits

Therefore, a total of 32 bits are needed to address the main memory.


c How many lines are there in cache?

The cache has a total of 256 blocks, each of size 8 bytes. Therefore, the total size of the cache is:

256 blocks x 8 bytes per block = 2048 bytes


d. How many sets are there in the cache ?

As calculated earlier, the cache is 4-way set associative with 64 sets. Therefore, there are 64 sets in the cache.


Q 8. A computer system contains separate data cache and instruction cache of 16K8 capacity each organized as direct mapped cache. The mess penalty for instruction cache is 100 ns and for the data cache is 200 ns The CPU clock rate is 200MHz. The miss rates are 0.8% for instruction cache and 3.4% for data cache and the data access occurs once for every 3 instructions The machine has a tine CPI of 1. Calculate the total penalty in terms of CPI.

Answer.

To calculate the total penalty in terms of CPI, we need to first calculate the effective access time (EAT) for each cache.

For the instruction cache, the miss penalty is 100 ns and the miss rate is 0.8%, so the EAT is:

EAT_Instruction = Hit Time + Miss Rate x Miss Penalty = 1 cycle + 0.008 x 100 ns = 1 cycle + 0.8 ns = 1.8 ns

For the data cache, the miss penalty is 200 ns and the miss rate is 3.4%, so the EAT is:

EAT_Data = Hit Time + Miss Rate x Miss Penalty = 1 cycle + 0.034 x 200 ns = 1 cycle + 6.8 ns = 7.8 ns

Since the data access occurs once for every 3 instructions, the average memory access time per instruction can be calculated as:

Memory Access Time per Instruction = (2/3) x EAT_Instruction + (1/3) x EAT_Data = (2/3) x 1.8 ns + (1/3) x 7.8 ns = 3.8 ns

Therefore, the total penalty in terms of CPI is:

Total Penalty in CPI = Memory Access Time per Instruction / Clock Cycle Time = 3.8 ns / (1 / 200 MHz) = 0.76 cycles

So the total penalty in terms of CPI is 0.76 cycles.


For More Updates Join Our Channels :