───✱*.。:。✱*.:。✧*.。✰*.:。✧*.。:。*.。✱ ───

Parallelism & SIMD

  • Single-core performance has significant limits
    • Heat, energy, speed of light, etc.
  • Modern systems increase performance via parallelism
  • Parallelism allows multiple instructions or data elements to be processes at the same time
    • Similar to async programming, but on a hardware level

Forms of Parallelism

  • Instruction-Level Parallelism (ILP)
    • Multiple instructions are executed within one core
  • Data-Level Parallelism (DLP)
    • Multiple data elements processed in parallel
    • Often implemented using SIMD

Flynn’s Taxonomy

CategoryDescriptionExample
SISDSingle Instruction, Single DataClassic uniprocessor
SIMDSingle Instruction, Multiple DataVector instructions, GPUs
MISDMultiple Instruction, Single DataRare, mostly theoretical
MIMDMultiple Instruction, Multiple DataMulticore CPUS, clusters

SIMD

  • SIMD → Single Instruction, Multiple Data
  • One instruction operates on many data elements simulatenously
  • Common in vector processing and GPUs
  • Efficient for data-parallel tasks (image processing, matrix math, etc.)

SIMD in Hardware

  • SIMD is implemented on the hardware-level via
    • Vector registers
    • Special instruction sets (SSE, AVX, NEON)
  • Data is packed into a single wide register

Pros

  • Boosts performance for loop-based, data-heavy workloads

Cons

  • Works best with uniform data access patterns
  • Not suited for irregular control flow

Parallelism in I/O Context

• DMA and interrupts can overlap I/O and computation → implicit parallelism
• SIMD offers explicit data-level parallelism
• DMA + SIMD → extremely fast I/O & processing (video frames, audio streams, etc)

Types of Parallelism & CPU Cores

TypeDescriptionRole of CPU Cores
Instruction-Level (ILP)Executes multiple instructions in a single core using pipelining, superscalar, etcWithin a single core
Data-Level (DLP/SIMD)One instruction operates on multiple data elements (SIMD)Within a single core using vector units
Threat-Level (TLP)Runs multiple independent threads at the same timeEach core can run one or more threads concurrently
Process-LevelMultiple independent programs run in parallelEach program uses one or more cores

True Parallelism

  • The use of multiple CPU cores can enable true parallelism
  • Single-core CPUS can only simulate parallelism using context-switching
    • Only one task can be running at a time
  • Multi-core CPUS can execute multiple threads of processes simultaneously
    • True parallel execution
  • For instance, a quad-core CPU can run 4 independent tasks at once oruse all 4 cores to split a larger task (such as video rendering)

Cores & Threads

  • Modern OS and CPUs use multithreading
    • Each core can handle 2 threads (such as Intel Hyper-Threading)
  • This means that a CPU with 4 physical cores may run 8 threads (logical cores)

───✱*.。:。✱*.:。✧*.。✰*.:。✧*.。:。*.。✱ ───