CUDA Programming for Gaussian Splatting

March 10, 2026

Introduction

In attempting to better understand both Cpp and CUDA, I have been implementing a simple KNN algorithm with AABB and morton codes. This post along with the code available on GitHub ('https://github.com/msaif00/simple_knn_cuda') will be updated as I continue to implement the algorithm.

function greet(name: string): string {
  return `Hello, ${name}!`
}

// This will throw an error at compile time, preventing potential runtime issues.
let message: string = greet(123)

Break down of steps to implement project

Step 0: Implementing in CLion

I used CLion to implement the project as I liked working with jetbrains IDEs. First, we begin with creating a project and selecting the Cpp and cmake version. A simple test to comfirm that the toolchain is correctly established is to run the 'hello world' program.

Step 1:

__global__ void hello_kernel() {
    int globalThread = threadIdx.x + blockIdx.x * blockDim.x;
    printf("Hello from RTX 2080 GPU, thread %d in block %d at global %d!\n",
        threadIdx.x, blockIdx.x, globalThread);
}

Step 2:

Introduction to CUDA memory and moving points between the host and device.

Step 3:

As a validation step, prior to implementing the KNN algorithm on the GPU, I implemented a brute force CPU version of the alogrithm (knn_cpu.cpp). I compute the mean 3-nearest neighbors of each point (in the randomly generated points) then complete the same steps on the GPU version after moving pointers of the points to the GPU.

// Review of methods to update best[3] in knn_cpu.cpp
static inline void updateBest(float dist2, float best[3) {
    for (int j = 0; j < 10; j++) {
        if(dist2 < best[j]) {
            float t = best[j];
            best[j] = dist2;
            dist2 = t;
        }
    }
}

Animation of SMEM

General Cpp notes on syntax and semantics

  1. Comments are denoted by '//'.
  2. The '#'-symbol defines a preprocessor directive and '#define' is a function-like macro, the preprocessor normally treats a newline as the end of the macro definition. the "" at the end of each line is the line-continuation character. Ensure no trailing spaces after "".
  3. The 'const' keyword is used to declare read-only entities.
  4. FLT_MAX is a pre-processor macro that holds the value of the maximum finite representable floating-point number for the float data type.
  5. The expression std::mt19937 rng(seed); declares a pseudo-random number generator which uses the Mersenne Twister algorithm and inits with a seed value.
  6. "file.h" header files hold declarations of functions, classes and other identifiers allowing the files to be shared and re-used among multiple source files.
  7. Templates are used to define generic classes or functions. This allows us to write code once and use it for different types.
  8. 'pragma' is a compiler directive that can be used to specify compiler-specific options.
  9. 'restrict' keyword is a compiler specific extension that serves as a hint to the compiler that the pointer is the exclusive way to access the memory within scope, therefore the compiler can optimize more aggressively without causing problems by out of scope calls.

Conclusion