Building a Neural Network Framework from Scratch

24 Feb 2026 c++ python deep-learning autograd tensors View on GitHub

I use PyTorch every day at work. It’s incredible - but I’ve always wondered: how does it actually work under the hood? How does backward() actually compute gradients through a neural network?

So I built my own minimal deep learning framework called Synap. It’s written in C++ for performance, with Python bindings via pybind11. No external ML libraries - just raw tensor operations and automatic differentiation from scratch.

Why Build Another Framework?

This wasn’t about competing with PyTorch. That would be insane.

It was about understanding. I use loss.backward() without thinking, but the magic behind automatic differentiation always seemed mysterious. Building it myself was the best way to learn.

Plus, Andrej Karpathy’s micrograd inspired me. I wanted to do something similar but with C++ for the core operations - partly for performance, partly because I wanted to understand how to bridge C++ and Python.

What I Built

Synap gives you:

Tensors - Multi-dimensional arrays with shape, stride, and offset
Autograd - Reverse-mode automatic differentiation
Neural networks - MLPs with layers and neurons
Python API - Clean Python interface to C++ core

import synap as sp

# Create tensors
x = sp.Tensor([1, 4], requires_grad=True)
x.set_values([1, 2, 3, 4])

# Build a simple MLP
model = nn.MLP(4, [8, 1])

# Forward pass
out = model(x)

# Compute loss
loss = sp.Tensor.mse(out, target)

# Backward pass - the magic happens here
loss.backward()

How Autograd Works

This was the most interesting part to implement.

The key insight is that you build a computation graph as you do operations. Each tensor knows what created it - its parents and the operation that produced it.

Here’s how multiplication works:

// Forward: c = a * b
out->data()[i] = a->data()[i] * b->data()[i];

// Backward: da = db * dout, db = da * dout  
out->backward_fn_ = [out, a, b]() {
    a->grad->data()[i] += b->data()[i] * out->grad->data()[i];
    b->grad->data()[i] += a->data()[i] * out->grad->data()[i];
};

Each operation captures its own backward logic in a closure. When you call backward():

Start with gradient = 1.0 at the loss
Build topological order of the computation graph
Traverse in reverse, calling each backward_fn

The topological sort is crucial - you need to process operations in the right order so gradients flow correctly from output to input.

The C++ / Python Bridge

Using pybind11 was surprisingly pleasant. Here’s how I expose a tensor to Python:

py::class_<Tensor>(m, "Tensor")
    .def(py::init<std::vector<size_t>, bool>())
    .def("set_values", &Tensor::set_values)
    .def("backward", &Tensor::backward)
    .def_property_readonly("shape", &Tensor::shape);

That’s it. A few lines and I can use C++ tensors from Python. The performance-critical math runs in C++, but the API feels Pythonic.

Training a Neural Network

The ultimate test - can Synap actually learn?

# XOR dataset
X = [[0,0], [0,1], [1,0], [1,1]]
y = [[0], [1], [1], [0]]

model = nn.MLP(2, [4, 1])

for epoch in range(100):
    for xi, yi in zip(X, y):
        # Forward
        out = model(xi)
        loss = sp.Tensor.mse(out, yi)
        
        # Backward
        model.zero_grad()
        loss.backward()
        
        # Update weights
        for param in model.parameters():
            # gradient descent
            ...

And it works! After training, the MLP correctly learns XOR. The gradients flow, the weights update, the loss decreases. Magic? No - just math.

What I Learned

Autograd is elegant - The closure-based approach is clean. Each operation encapsulates its own gradient logic.
Dynamic graphs are easier - Synap rebuilds the graph each forward pass (like PyTorch eager mode). Static graphs (like TensorFlow 1.x) are more complex but allow more optimization.
Broadcasting is tricky - Supporting [M,N] + [N] operations means carefully tracking how gradients should reduce back.
C++/Python interop is practical - pybind11 made this surprisingly smooth.

The Code

It’s all there if you want to explore. The C++ core is in src/synap/, the Python bindings in bindings.cpp, and the neural network modules in python/nn.py.

Set it up with CMake, build, and you can import synap in Python. Try creating tensors, doing operations, calling backward() - see the gradients flow!

What’s Next?

There’s plenty more to add:

More operations (convolution, pooling)
GPU support via CUDA
Optimizers beyond SGD
Batch training

But for now, I’m happy with what Synap demonstrates: the core ideas behind deep learning frameworks, implemented from scratch.