───✱*.。:。✱*.:。✧*.。✰*.:。✧*.。:。*.。✱ ───
Parallelism & SIMD
- Single-core performance has significant limits
- Heat, energy, speed of light, etc.
 
 - Modern systems increase performance via parallelism
 - Parallelism allows multiple instructions or data elements to be processes at the same time
- Similar to async programming, but on a hardware level
 
 
Forms of Parallelism
- Instruction-Level Parallelism (ILP)
- Multiple instructions are executed within one core
 
 - Data-Level Parallelism (DLP)
- Multiple data elements processed in parallel
 - Often implemented using SIMD
 
 
Flynn’s Taxonomy
| Category | Description | Example | 
|---|---|---|
| SISD | Single Instruction, Single Data | Classic uniprocessor | 
| SIMD | Single Instruction, Multiple Data | Vector instructions, GPUs | 
| MISD | Multiple Instruction, Single Data | Rare, mostly theoretical | 
| MIMD | Multiple Instruction, Multiple Data | Multicore CPUS, clusters | 
SIMD
- SIMD → Single Instruction, Multiple Data
 - One instruction operates on many data elements simulatenously
 - Common in vector processing and GPUs
 - Efficient for data-parallel tasks (image processing, matrix math, etc.)
 
SIMD in Hardware
- SIMD is implemented on the hardware-level via
- Vector registers
 - Special instruction sets (SSE, AVX, NEON)
 
 - Data is packed into a single wide register
 
Pros
- Boosts performance for loop-based, data-heavy workloads
 
Cons
- Works best with uniform data access patterns
 - Not suited for irregular control flow
 
Parallelism in I/O Context
• DMA and interrupts can overlap I/O and computation → implicit parallelism
• SIMD offers explicit data-level parallelism
• DMA + SIMD → extremely fast I/O & processing (video frames, audio streams, etc)
Types of Parallelism & CPU Cores
| Type | Description | Role of CPU Cores | 
|---|---|---|
| Instruction-Level (ILP) | Executes multiple instructions in a single core using pipelining, superscalar, etc | Within a single core | 
| Data-Level (DLP/SIMD) | One instruction operates on multiple data elements (SIMD) | Within a single core using vector units | 
| Threat-Level (TLP) | Runs multiple independent threads at the same time | Each core can run one or more threads concurrently | 
| Process-Level | Multiple independent programs run in parallel | Each program uses one or more cores | 
True Parallelism
- The use of multiple CPU cores can enable true parallelism
 - Single-core CPUS can only simulate parallelism using context-switching
- Only one task can be running at a time
 
 - Multi-core CPUS can execute multiple threads of processes simultaneously
- True parallel execution
 
 - For instance, a quad-core CPU can run 4 independent tasks at once oruse all 4 cores to split a larger task (such as video rendering)
 
Cores & Threads
- Modern OS and CPUs use multithreading
- Each core can handle 2 threads (such as Intel Hyper-Threading)
 
 - This means that a CPU with 4 physical cores may run 8 threads (logical cores)
 
───✱*.。:。✱*.:。✧*.。✰*.:。✧*.。:。*.。✱ ───