Thinking Machines

High Performance Computing
Parallel programming and parallel algorithms

Live online class through

Batch details

Time 11 PM - 12 AM (Night) IST
Days : Mon, Wed & Fri.
Starts on 23-Feb-2026
Fee Rs.24000/- (Cannot be paid in installments)
Duration : 10 to 12 months.

Programming language that I will use to teach : C & C++
Prerequisite : You should not be afraid of pointers, pointers to pointers or using pointers as an array. You should be aware of classes, objects, encapsulation, polymorphism, inheritance, virtual polymorphism, function templates, class templates, how abstract classes are created to impose guidelines, constructors, destructors & operator overloading.

How can you register for the course ?
Register for two free demo lectures without paying any fee.
Demo lectures will help you decide, whether you should go for this course or not.
Before joining demo lecture just see to it that you know the math behind 2D Matrix Multiplication.
After attending demo lectures, you will have a window of two days to register for the course by paying fee.
Register now

Live class audio recording / code will be available for download through our website.
This helps students to revise, write down missed theory and document everything in neat and tidy fashion.
If you are going to miss a class for some serious reason, then you will have to tell us in advance to enable us to provide you audio / video recording.

Course Contents

HPC v/s HFT and the HFC overlap

What is HPC ?
What is HFT ?
How HFC is intersection of both ?
Can it be learned on one machine or a cluster is required ?

Minimizing Latency

Understanding latency
Measuring latency
Is it always related to network programming ?
Cache hit/miss
Cache friendly data access to maximize cache hit
Aligning Data
Alternative to virtual polymorphism
Optimizing DS, loops, function calls
Exception handling will slow down
Compiler optimization flags
Dynamic memory allocation will slow down
Avoiding dynamic memory allocations
Creating memory pools
Creating lock free data structures to avoid wait times
Low latency logging
Network programming

Multi-threading

Matrix multiplication
Measuring processing time
Understanding cache hit/cache miss
Revised matrix multiplication
Concurrency
Creating threads
Threaded matrix multiplication
lambdas
Locks
Lock guards
Preventing deadlocks
Condition variables
Atomics
Tasks and futures
Synchronizing threads
Communication between threads
Creating lock based data structures
Creating thread pools
Creating lock free data structures
Parallel Standard Template Library
execution policies
vectors in parallel
for_each in parallel
load balancing
exception handling in parallel execution
for_each_n and ranges
custom iterators in parallelism
synchronization
parallel data transformation using transform
reduce and accumulate in parallel
sorting in parallel
searching in parallel

Open Multi-Processing (OpenMP)

Open MP Directives
Parallelize loops
Implementing reduction
Environment variables
Parallel Regions
Work sharing
Decomposing data structures for parallelism
Controlling / Removing data dependencies
Synchronization
Mutual exclusion
Synchronizing events
Communication between threads
Thread affinity
SIMD Vectorization
GPU Offloading

Compute Unified Device Architecture (CUDA)

CPU v/s GPU
Which GPU for learning CUDA ?
Data v/s Task parallelism
GPU Architecture
Setting up development environment for CUDA Programming
Parallel programming begins with SIMD
Compilation
Writing kernel function
Measuring GPU processing time
Thread / Block / Grid
Organizing parallel threads
Query GPU Information
Error handling
CUDA Memory model
Asynchronous execution with streams/events
Setting up launch configurations
Designing parallel algorithms
Reduction algorithm
Sorting in parallel
Profiling and optimizing code
Unrolling loops
Debugging techniques
CUDA Streams
Creating library for integration with other programming languages

What next ?

HPC using distributed computing frameworks - An introduction
How to get into High Frequency Trading domain as a programmer ?
What is FPGA, Verilog & VHDL ? - An introduction