High Performance Computing
Parallel programming and parallel algorithms
Parallel programming and parallel algorithms
Live online class through
Batch details
Time 11 PM - 12 AM (Night) IST
Days : Mon, Wed & Fri.
Starts on 23-Feb-2026
Fee Rs.24000/- (Cannot be paid in installments)
Duration : 10 to 12 months.
Days : Mon, Wed & Fri.
Starts on 23-Feb-2026
Fee Rs.24000/- (Cannot be paid in installments)
Duration : 10 to 12 months.
Programming language that I will use to teach : C & C++
Prerequisite : You should not be afraid of pointers, pointers to pointers or using pointers as an array. You should be aware of classes, objects, encapsulation, polymorphism, inheritance, virtual polymorphism, function templates, class templates, how abstract classes are created to impose guidelines, constructors, destructors & operator overloading.
Prerequisite : You should not be afraid of pointers, pointers to pointers or using pointers as an array. You should be aware of classes, objects, encapsulation, polymorphism, inheritance, virtual polymorphism, function templates, class templates, how abstract classes are created to impose guidelines, constructors, destructors & operator overloading.
How can you register for the course ?
Register for two free demo lectures without paying any fee.
Demo lectures will help you decide, whether you should go for this course or not.
Before joining demo lecture just see to it that you know the math behind 2D Matrix Multiplication.
After attending demo lectures, you will have a window of two days to register for the course by paying fee.
Register now
Register for two free demo lectures without paying any fee.
Demo lectures will help you decide, whether you should go for this course or not.
Before joining demo lecture just see to it that you know the math behind 2D Matrix Multiplication.
After attending demo lectures, you will have a window of two days to register for the course by paying fee.
Register now
Live class audio recording / code will be available for download through our website.
This helps students to revise, write down missed theory and document everything in neat and tidy fashion.
If you are going to miss a class for some serious reason, then you will have to tell us in advance to enable us to provide you audio / video recording.
This helps students to revise, write down missed theory and document everything in neat and tidy fashion.
If you are going to miss a class for some serious reason, then you will have to tell us in advance to enable us to provide you audio / video recording.
Course Contents
HPC v/s HFT and the HFC overlap
- What is HPC ?
- What is HFT ?
- How HFC is intersection of both ?
- Can it be learned on one machine or a cluster is required ?
Minimizing Latency
- Understanding latency
- Measuring latency
- Is it always related to network programming ?
- Cache hit/miss
- Cache friendly data access to maximize cache hit
- Aligning Data
- Alternative to virtual polymorphism
- Optimizing DS, loops, function calls
- Exception handling will slow down
- Compiler optimization flags
- Dynamic memory allocation will slow down
- Avoiding dynamic memory allocations
- Creating memory pools
- Creating lock free data structures to avoid wait times
- Low latency logging
- Network programming
Multi-threading
- Matrix multiplication
- Measuring processing time
- Understanding cache hit/cache miss
- Revised matrix multiplication
- Concurrency
- Creating threads
- Threaded matrix multiplication
- lambdas
- Locks
- Lock guards
- Preventing deadlocks
- Condition variables
- Atomics
- Tasks and futures
- Synchronizing threads
- Communication between threads
- Creating lock based data structures
- Creating thread pools
- Creating lock free data structures
- Parallel Standard Template Library
- execution policies
- vectors in parallel
- for_each in parallel
- load balancing
- exception handling in parallel execution
- for_each_n and ranges
- custom iterators in parallelism
- synchronization
- parallel data transformation using transform
- reduce and accumulate in parallel
- sorting in parallel
- searching in parallel
Open Multi-Processing (OpenMP)
- Open MP Directives
- Parallelize loops
- Implementing reduction
- Environment variables
- Parallel Regions
- Work sharing
- Decomposing data structures for parallelism
- Controlling / Removing data dependencies
- Synchronization
- Mutual exclusion
- Synchronizing events
- Communication between threads
- Thread affinity
- SIMD Vectorization
- GPU Offloading
Compute Unified Device Architecture (CUDA)
- CPU v/s GPU
- Which GPU for learning CUDA ?
- Data v/s Task parallelism
- GPU Architecture
- Setting up development environment for CUDA Programming
- Parallel programming begins with SIMD
- Compilation
- Writing kernel function
- Measuring GPU processing time
- Thread / Block / Grid
- Organizing parallel threads
- Query GPU Information
- Error handling
- CUDA Memory model
- Asynchronous execution with streams/events
- Setting up launch configurations
- Designing parallel algorithms
- Reduction algorithm
- Sorting in parallel
- Profiling and optimizing code
- Unrolling loops
- Debugging techniques
- CUDA Streams
- Creating library for integration with other programming languages
What next ?
- HPC using distributed computing frameworks - An introduction
- How to get into High Frequency Trading domain as a programmer ?
- What is FPGA, Verilog & VHDL ? - An introduction