Building a GitHub Analyzer with Go and Python
Ever wanted to quickly understand what a GitHub repository looks like without cloning and exploring it yourself? I built a tool that does exactly that - paste any GitHub URL, and it analyzes the entire codebase, extracting complexity metrics, code structure, and language distribution.
The best part? It combines Go and Python in a way that showcases how to pick the right tool for each job.
The Idea
I was working on a project and needed to understand a large open-source codebase quickly. Manually exploring files and trying to understand the architecture was tedious.
Wouldn’t it be nice, I thought, if I could just paste a URL and get a report?
- How many files? How many functions?
- What’s the language breakdown?
- Where are the complex areas?
So I built GPR-Analyzer - a web tool that clones a repo and runs static analysis on it.
Architecture: Go + Python
Here’s why I chose this hybrid approach:
Go handles:
- Web server (HTTP routing)
- Repository cloning (using go-git)
- Executing the Python analyzer as a subprocess
- Parsing JSON responses
Python handles:
- File system traversal
- AST generation (using tree-sitter)
- Code analysis and metrics
The key design decision was keeping them decoupled. Go just executes Python and reads JSON from stdout. This means you could swap Python for anything - Ruby, Node.js, whatever - without touching Go code.
How It Works
Here’s the flow:
- You paste a URL like
https://github.com/user/repo - Go validates it’s a valid GitHub URL
- Go clones the repo to a temporary directory
- Go spawns Python to analyze it
- Python walks every file, parses ASTs, counts functions/classes
- Python outputs JSON
- Go parses JSON and sends it back to the browser
The interface is dead simple - a text input and a submit button. That’s it.
The Interesting Parts
AST Analysis with tree-sitter
Tree-sitter is incredible. It generates precise Abstract Syntax Trees for code, which means you can actually understand code structure rather than just regex matching.
For example, counting functions in Python:
def calculate_complexity(node):
if 'function' in node.type.lower():
function_count += 1
for child in node.children:
calculate_complexity(child)
Simple recursive tree traversal. But it works across 25+ languages!
The Interface Between Languages
I kept the Go↔Python interface dead simple:
# Python outputs JSON to stdout
print(json.dumps(report))
# Go captures it
cmd := exec.Command("python3", "analyzer/main.py")
output, _ := cmd.CombinedOutput()
No sockets, no HTTP between them. Just stdout. This makes debugging easy too - you can run the Python script manually and see exactly what it outputs.
What I Learned
-
Language mixing is practical - Using Go for orchestration and Python for analysis played to each language’s strengths
-
tree-sitter is powerful - It’s not just for editors. Any code analysis tool can benefit from precise ASTs
-
Simple interfaces win - The JSON-over-stdout pattern is nothing fancy, but it’s robust and easy to debug
What’s Missing
There’s a lot more I could add:
- Cyclomatic complexity metrics
- Coupling analysis between files
- Visualization of code structure
- Support for private repos
Maybe version 2?
Check It Out
The project is ready to use if you want to try it. Clone the repo, run ./run, and open http://localhost:8081. Paste any public GitHub URL and see what it finds.
It’s fascinating to see what these tools reveal about different codebases!
Victor