───✱*.。:。✱*.:。✧*.。✰*.:。✧*.。:。*.。✱ ───
Parallelism & SIMD
- Single-core performance has significant limits
- Heat, energy, speed of light, etc.
- Modern systems increase performance via parallelism
- Parallelism allows multiple instructions or data elements to be processes at the same time
- Similar to async programming, but on a hardware level
Forms of Parallelism
- Instruction-Level Parallelism (ILP)
- Multiple instructions are executed within one core
- Data-Level Parallelism (DLP)
- Multiple data elements processed in parallel
- Often implemented using SIMD
Flynn’s Taxonomy
Category | Description | Example |
---|---|---|
SISD | Single Instruction, Single Data | Classic uniprocessor |
SIMD | Single Instruction, Multiple Data | Vector instructions, GPUs |
MISD | Multiple Instruction, Single Data | Rare, mostly theoretical |
MIMD | Multiple Instruction, Multiple Data | Multicore CPUS, clusters |
SIMD
- SIMD → Single Instruction, Multiple Data
- One instruction operates on many data elements simulatenously
- Common in vector processing and GPUs
- Efficient for data-parallel tasks (image processing, matrix math, etc.)
SIMD in Hardware
- SIMD is implemented on the hardware-level via
- Vector registers
- Special instruction sets (SSE, AVX, NEON)
- Data is packed into a single wide register
Pros
- Boosts performance for loop-based, data-heavy workloads
Cons
- Works best with uniform data access patterns
- Not suited for irregular control flow
Parallelism in I/O Context
• DMA and interrupts can overlap I/O and computation → implicit parallelism
• SIMD offers explicit data-level parallelism
• DMA + SIMD → extremely fast I/O & processing (video frames, audio streams, etc)
Types of Parallelism & CPU Cores
Type | Description | Role of CPU Cores |
---|---|---|
Instruction-Level (ILP) | Executes multiple instructions in a single core using pipelining, superscalar, etc | Within a single core |
Data-Level (DLP/SIMD) | One instruction operates on multiple data elements (SIMD) | Within a single core using vector units |
Threat-Level (TLP) | Runs multiple independent threads at the same time | Each core can run one or more threads concurrently |
Process-Level | Multiple independent programs run in parallel | Each program uses one or more cores |
True Parallelism
- The use of multiple CPU cores can enable true parallelism
- Single-core CPUS can only simulate parallelism using context-switching
- Only one task can be running at a time
- Multi-core CPUS can execute multiple threads of processes simultaneously
- True parallel execution
- For instance, a quad-core CPU can run 4 independent tasks at once oruse all 4 cores to split a larger task (such as video rendering)
Cores & Threads
- Modern OS and CPUs use multithreading
- Each core can handle 2 threads (such as Intel Hyper-Threading)
- This means that a CPU with 4 physical cores may run 8 threads (logical cores)
───✱*.。:。✱*.:。✧*.。✰*.:。✧*.。:。*.。✱ ───