CPU Organizations

Types of CPU Organizations

Single Accumulator Organization
General Register Organization
Stack organization

Single Accumulator Organization (Not Important)

We are familiar with the general format of a register, which is utilized for storing and manipulating data. A register typically consists of a mode, opcode, and operand.
1. Mode: Specifies the address mode.
2. Opcode: Indicates the operation to be performed.
3. Data: Contains the actual data.
This type of register is employed in computer architectures.
Now, let's delve into the Single Accumulator Organization: Basic computers often adopt the single accumulator organization, where the accumulator serves as a dedicated register.
Operations are executed within the Arithmetic Logic Unit (ALU), which resides inside the Central Processing Unit (CPU). The CPU is directly linked to the register, ensuring high-speed data transfer.
The input data for the ALU is sourced from the accumulator register, and the result of the ALU calculation is then stored back in the accumulator.
Single Accumulator Organization is chosen when cost-effectiveness is a priority. By minimizing the number of registers, we reduce the overall system cost.
Registers are the fastest form of memory, and connecting them directly to the CPU ensures rapid data exchange, contributing to high system performance. The more registers used, the higher the cost. To achieve cost-effectiveness, a single accumulator register is employed.
Definition of Single Accumulator Organization: In this architecture, a single accumulator register is designated for arithmetic and logic operations. It optimizes cost by limiting the number of registers, emphasizing efficiency in simple computing systems.

General Architecture of Single Accumulator Organization

In this architecture, we have a Program Counter (PC) responsible for storing addresses, and it is connected to the memory. Additionally, there is an Accumulator (AC) connected to the Arithmetic Logic Unit (ALU).
The data stored in memory is retrieved and transferred to the Instruction Register (IR), from where it is moved to the Accumulator (AC) and subsequently processed in the Arithmetic Logic Unit (ALU).
The ALU and memory components are interconnected through a common bus, facilitating the exchange of data between them.

This type of organization supports single-address instructions. As an example, consider the instruction sequence for the operation C = A + B:

LDA A: Load the content of memory location A into the Accumulator (AC).
ADD B: Add the content of memory location B to the data in the Accumulator (AC).
Store C: Store the result in the Accumulator (AC) back to the memory location designated for C.

General Register Organization

To understand General Register Organization, it's essential to grasp the major components within a CPU:

Storage Components: These include registers and flip-flops, serving as temporary storage for data.
Execution Components: The Arithmetic Logic Unit (ALU) is responsible for carrying out calculations and logical operations.
Transfer Components: The bus facilitates the transfer of data between storage and execution components.
Control Component: The control unit oversees and directs the functioning of other components within the CPU.

Memory locations play a crucial role in storing various data types such as pointers, counters, return addresses, temporary results, and partial products. However, accessing memory is a time-consuming task. To enhance efficiency, intermediate values are stored in processor registers. These registers are interconnected through a common bus system, allowing seamless communication not only for direct data transfer but also for coordinating various microoperations.
Definition of General Register Organization: In computing, General Register Organization refers to the systematic arrangement and utilization of registers within the CPU. These registers serve as high-speed, temporary storage for data and play a vital role in enhancing computational efficiency by minimizing the need for frequent memory access.

A Bus Organization for Seven CPU Registers

The depicted bus organization features seven CPU registers, and its functionality is detailed as follows:
The output of each register is linked to two multiplexers (MUX), both of which play a crucial role in transferring register data into the Arithmetic Logic Unit (ALU).
Two buses, A and B, are utilized for data transfer. The selection lines in each multiplexer determine whether to choose data from a register or from input data. Data is transmitted to the ALU via buses A and B.
The OPR (Operation) signal serves to define the type of operation to be executed by the ALU.
The result of the operation conducted by the ALU can be directed to other units within the system or stored in any of the processor registers.
A decoder is employed to select the register where the result will be stored. The decoder activates one of the register load inputs, specifying the destination register for storing the result.

Example, Let we want to perform the operation R1 ← R2 + R3

To do this operation, Control Unit generates following singal (Control Word).

Control Word

A control word, designed for the aforementioned CPU organization, consists of four fields as illustrated below:
The three bits of SEL A are dedicated to transferring the contents of a register into BUS A.
The three bits of SEL B are assigned to transferring the contents of a register into BUS B.
The three bits of SELD are utilized for selecting a destination register. This facilitates the decision of whether to store the result in a register or to transmit it outside the ALU.
The five bits of OPR define the type of operation to be performed by the ALU. This field governs the arithmetic or logical operation executed by the ALU based on the specified opcode.

Code for for Register Selection

Operation Code for ALU

Stack Organization

The memory of a CPU can be organized as a STACK, a structure where information is stored in a Last-In-First-Out (LIFO) manner. This means that the item last stored is the first to be removed or popped.
To manage the items, a stack uses a stack pointer (SP) register. The stack pointer stores the address of the last item in the stack, essentially pointing to the topmost element. Stack operations involve two main actions:
1. Insertion (Push): When an item is added to the stack, it is referred to as insertion or a push operation.
2. Deletion (Pop): When an item is removed from the stack, it is known as deletion or a pop operation.
There are two main types of stacks:
1. Register Stack: Utilizes processor registers to create a stack structure, enhancing speed and efficiency in certain operations.
2. Memory Stack: Involves using dedicated memory locations to implement the stack structure.

Register Stack

When processor registers are organized in a stack-like fashion, it is termed a register stack. The diagram above illustrates a 64-word register stack.
The stack pointer (SP) contains the address of the topmost element in the stack.
When the stack is empty, the EMPTY flag is set to 1, and when the stack is full, the FULL flag is set to 1.
The DR (Data Register) contains the data either being popped from or pushed into the stack.
Additional benefits of a register stack include faster access times and reduced memory bus contention, making it suitable for certain computing tasks requiring high-speed data manipulation.
For example, in the figure, three items (A, B, and C) are placed in the stack, with item C at the top. Thus, the stack pointer (SP) holds the address of C (SP = 3).

PUSH Operation:

When performing a PUSH operation to add an element (let's say E) to the stack, the following steps are executed:

Step 1: Increment the Stack Pointer (SP) by 1 so that it points to an empty slot.
SP ← SP + 1 [Increment stack pointer]
Step 2: Store the value of the Data Register (DR) at the address pointed to by SP.
M[SP] ← DR [Write the item on top of the stack]
Step 3: Check boundary conditions.
If (SP = 0), then set FULL ← 1 indicating the stack is full.
EMPTY ← 0 signifies that the stack is not empty.

POP Operation:

When performing a POP operation to remove an element from the stack, the following steps are executed:

Step 1: Retrieve the data from the address stored in the Stack Pointer (SP) and store it in the Data Register (DR).
DR ← M[SP] [Fetch item from the top of the stack]
Step 2: Decrement the Stack Pointer (SP) by 1.
SP → SP - 1
Step 3: Check boundary conditions.
if (SP = 0) then (EMPTY ← 1) [Check if the stack is empty] FULL ← 0 [Mark the stack as not full]

Memory Stack

When primary memory (RAM) is organized in the form of a stack, it is referred to as a Memory Stack.

The Program Counter (PC) indicates the address of the next instruction in the program.
The Address Register (AR) points to an array of data within the memory stack.
The Stack Pointer (SP) identifies the top of the stack.
In the illustrated figure, the initial value of SP is 4001, and the stack grows with decreasing addresses. Consequently, the first item stored in the stack is at address 4000, the second item at address 3999, and the last item at address 3000.

PUSH Operation


    SP → SP - 1
    
M[SP] ← DR

POP Operation


    DR ← M[SP]
    
SP → SP + 1

The PUSH operation involves decrementing the Stack Pointer (SP) to allocate space for a new item and storing the value from the Data Register (DR) at the address pointed to by the updated SP.
The POP operation retrieves the item from the top of the stack by copying the data from the address indicated by SP to DR. Subsequently, SP is incremented to free up space in the stack.

Addressing Modes

An instruction format is a collection of bits that defines the type of instruction, operands, and the type of operation. The instruction format is represented by a rectangular box, and a basic instruction format includes the following fields: Opcode, Mode, and Address.

Opcode: Defines the type of operation to be performed, such as add, subtract, complement, and shift.
Address field: Defines the address of operands.
Mode (or addressing mode) field: Defines the method by which operands are fetched, modifying the address field of the instruction to determine the actual address of the data.

Addressing Modes:

1. Implied Addressing Mode: The zero-address instruction and all instructions using the accumulator are implied-mode instructions. For example, the "complement accumulator" instruction is implied-mode because the operand is in the accumulator.
2. Immediate Addressing Mode: In this mode, the operand is specified in the instruction itself, having an operand field instead of an address field.
For example: ADD 10, 20.
3. Register Addressing Mode: Used when data is stored in processor registers, and the address part of the instruction contains the address of the processor register.
For example: SUB R1, R2.
4. Register Indirect Addressing Mode: The instruction has the address of a processor register, which contains the address of the operand in memory.
5. Direct Address Mode: The instruction has the address of a memory cell where the data is stored, and the effective address is the address stored in the instruction.
6. Indirect Address Mode: The address field of the instruction has a memory address where the data is stored.
7. Autoincrement or Autodecrement Address Mode: Used when fetching a series of data, and the address part of the instruction gives the starting address, which is incremented or decremented to fetch the next data from memory.
8. Relative Address Mode: The content of the program counter is added to the address part of the instruction to obtain the effective address of data.
9. Indexed Addressing Mode: The content of an index register is added to the address part of the instruction to obtain the effective address, useful for accessing data arrays in memory.
10. Base Register Addressing Mode: Similar to indexed addressing mode, the content of a base register is added to the address part of the instruction to obtain the effective address.

Data Transfer Instructions

Data transfer and manipulation are fundamental aspects of computer architecture, integral to the execution of instructions within a computing system.

These instructions are typically categorized into three main types:

Data Transfer Instructions
Data Manipulation Instructions
Program Control Instructions

Data Transfer Instructions

Data transfer instructions facilitate the movement of data from one location to another within the computer system. These instructions are essential for controlling the flow of information, including:

Data transfer between memory and processor registers
Data transfer between processor registers and input or output devices
Data transfer between different processor registers

The table below presents a list of eight common data transfer instructions widely utilized across various computer architectures:

Data Manipulation Instructions

Data manipulation instructions play a critical role in performing operations on data within a computer system. These instructions can be broadly categorized into three types:

Arithmetic Instructions
Logical and Bit Manipulation Instructions
Shift Instructions

Arithmetic Instructions

Arithmetic instructions encompass fundamental operations such as addition, subtraction, multiplication, and division. The table below provides a list of typical arithmetic instructions:

Logical and Bit Manipulation Instructions

Logical instructions are designed to perform binary operations on data stored in registers. These instructions consider each bit of the operand individually. Here are some common logical and bit manipulation instructions:

Shift Instructions

Shift instructions move bits within a register either to the left or right. Logical shifts insert 0 to the end bit position. The table below illustrates various types of shift instructions:

Program Control Instructions

In a computer system, instructions are typically stored in successive memory locations, and the execution of a program involves fetching instructions from these consecutive memory locations. As each instruction is fetched, the program counter is incremented to contain the address of the next instruction in sequence. Program control instructions play a crucial role in directing the flow of a program and managing the execution process.

General program control instructions encompass a variety of operations that dictate the execution flow. Some of these instructions are outlined in the table below:

Parallel Processing

In older computers, only a single instruction used to be executed at a time, leading to the wastage of ALU time and an inability to fully utilize processing capabilities. To address this inefficiency, the concept of parallel processing was introduced.
Parallel processing involves the simultaneous execution of multiple instructions, allowing for concurrent data processing and faster execution times. Instead of processing instructions sequentially, parallel processing techniques enable more efficient use of computing resources. For example:
- While an instruction is being executed in the ALU, the next instruction can be read from memory.
- A system may have two or more ALUs, capable of executing multiple instructions simultaneously.
- Multiple processors may operate concurrently, enhancing overall system performance.
The primary purpose of parallel processing is to accelerate computer capabilities by leveraging increased hardware resources.
Parallel processing can be examined at various levels of complexity:
- At a lower level, the distinction between parallel and serial operations is based on the type of registers used.
- Shift registers operate in a serial fashion, processing one bit at a time, while registers with parallel load operate with all bits of the word simultaneously.
- At a higher level of complexity, parallel processing can involve a multiplicity of functional units performing identical or different operations simultaneously.

The following diagram illustrates a processor with multiple functional units, showcasing the additional components added to increase productivity and enable parallel processing:

Parallel processing can be classified in various ways, one of which is introduced by M.J. Flynn. Flynn's classification divides computers into four major groups based on the sequence of instructions read from memory and the operations performed in the instruction and data streams:

Single Instruction Stream, Single Data Stream (SISD): In SISD architecture, a single instruction stream is executed on a single data stream. This is the traditional von Neumann architecture where one instruction is processed at a time.
Single Instruction Stream, Multiple Data Streams (SIMD): SIMD architecture involves the processing of a single instruction simultaneously on multiple data streams. This is commonly seen in vector processors, where the same operation is applied to multiple data elements concurrently.
Multiple Instruction Streams, Single Data Stream (MISD): MISD architecture, although rare in practice, involves multiple instruction streams operating on a single data stream. This concept is not widely implemented due to its complexity and limited applicability.
Multiple Instruction Streams, Multiple Data Streams (MIMD): MIMD architecture allows for the simultaneous execution of multiple instruction streams on multiple data streams. This is a versatile and widely used parallel processing architecture found in modern multi-core processors and parallel computing systems.

Pipelining

Pipelining involves dividing a process into several suboperations, with each suboperation associated with a segment.
The output of each segment is stored in a register, and this register information is passed to the next segment, facilitating a continuous flow of data.
Each segment operates independently, allowing for concurrent execution of all segments in the pipeline.
The term "pipelining" is derived from the sequential transfer of information from one segment to another.

Example: Performing Ai * Bi + Ci; for i = 1 to 7.

Segment 1: R1 ← Ai, R2 ← Bi
Segment 2: R3 ⇆ R1 * R2, R4 ← Ci
Segment 3: R5 ← R3 + R4

Pipelining is an efficient technique that allows for the overlap of different stages of instruction execution, thereby improving overall throughput. Each segment operates concurrently, enabling the processor to handle multiple instructions simultaneously. This approach significantly enhances the speed and efficiency of data processing in modern computer architectures.