Back to Blog
Engineering

Building GitCheck: Our Tech Stack and Architecture Decisions

December 28, 2024
15 min read
GitCheck Engineering Team

A deep dive into how we built a scalable GitHub analytics platform using Next.js, Prisma, and statistical algorithms.

Building a platform that analyzes 100K+ GitHub profiles while maintaining sub-second response times required careful architectural decisions. Here's how we did it.

The Challenge

We needed to build a system that:

  • Fetches data from GitHub API efficiently
  • Performs complex statistical calculations
  • Handles 10K+ requests per day
  • Maintains 24-hour caching
  • Scales cost-effectively

Tech Stack Overview

Frontend

  • Next.js 14: App router, server components, streaming
  • React 18: Concurrent features, suspense
  • Tailwind CSS: Utility-first styling
  • Framer Motion: Smooth animations
  • TypeScript: Type safety throughout

Backend

  • Next.js API Routes: Serverless functions
  • Prisma ORM: Type-safe database access
  • PostgreSQL: Primary data store
  • Redis: Caching layer (planned)

Infrastructure

  • Vercel: Hosting and edge functions
  • Vercel Postgres: Managed database
  • GitHub API: Primary data source

Architecture Decisions

Server-Side First

Decision: Use server components by default

Reasoning:

  • Reduced client bundle size
  • Better SEO
  • Faster initial page load
  • Access to backend resources

Smart Caching Strategy

Decision: 24-hour cache with background revalidation

Why 24 hours?

  • GitHub API rate limits (5000 req/hour)
  • User data doesn't change significantly daily
  • Optimal balance between freshness and cost

Statistical Engine

Decision: Server-side z-score normalization

Our z-score algorithm calculates how many standard deviations a developer's metrics are from the mean, then normalizes the result to a 0-100 scale. This provides fair comparisons across developers with different contribution styles.

Rate Limit Handling

Challenge: GitHub API limits (5000 req/hour)

Solution: Multi-layered approach

  1. Database caching (primary)
  2. Request queuing
  3. Rate limit monitoring
  4. Graceful degradation

Data Model

Decision: Denormalized for read performance

Our database schema stores pre-calculated metrics like total stars, forks, and commit counts directly in the profile table. This denormalized approach trades storage space for query speed.

Why Denormalized?

  • Faster dashboard loads
  • Simpler queries
  • Fewer joins
  • Better caching

Performance Optimizations

Parallel Data Fetching

Instead of fetching data sequentially, we use Promise.all to fetch user data, repositories, and statistics in parallel. This significantly reduces overall response time.

Streaming Responses

For large datasets, we use React Suspense to stream content to users as it becomes available, improving perceived performance.

Edge Functions

We deploy serverless functions to edge locations globally, ensuring fast response times for users worldwide.

Database Indexes

Strategic indexing on commonly queried fields like score, percentile, and username ensures fast lookups even as the database grows.

Scalability Considerations

Current Load

  • 10K+ unique profiles analyzed
  • 1K+ active users daily
  • 50K+ API requests/month
  • 99.9% uptime

Scaling Strategy

Vertical Scaling (Current):

  • Vercel Pro plan
  • Postgres connection pooling
  • Efficient queries

Horizontal Scaling (Future):

  • Redis for caching
  • Read replicas
  • CDN for static assets
  • Worker queues for heavy jobs

Monitoring and Observability

Metrics We Track

  1. Response times (p50, p95, p99)
  2. Error rates
  3. API rate limit usage
  4. Database query performance
  5. User engagement

Tools

  • Vercel Analytics
  • Database query logs
  • Custom logging
  • Error tracking

Cost Optimization

Current Costs

  • Hosting: approximately $20/month (Vercel Pro)
  • Database: approximately $25/month (Postgres)
  • API: $0 (GitHub API is free)
  • Total: approximately $45/month

Optimization Strategies

  1. Aggressive caching
  2. Efficient queries
  3. Serverless architecture
  4. Static page generation

Lessons Learned

What Worked Well

  • Server components reduced complexity
  • Prisma made database work pleasant
  • Caching strategy solved rate limits
  • TypeScript caught bugs early
  • Vercel simplified deployment

What We'd Change

  • Add Redis earlier
  • Implement queue system sooner
  • More comprehensive error handling
  • Better monitoring from day one
  • API versioning strategy

Future Improvements

Short Term (Q1 2025)

  • Redis caching layer
  • Background job queue
  • Advanced analytics
  • API rate limit dashboard
  • Performance monitoring

Long Term (2025)

  • Real-time updates
  • ML-based predictions
  • Multi-language support
  • Mobile app
  • Enterprise features

Open Source

We believe in transparency. Check out:

  • Our statistical algorithms
  • Database schema
  • API documentation
  • Performance benchmarks

Conclusion

Building GitCheck taught us that:

  1. Simple architectures scale better
  2. Caching solves most problems
  3. TypeScript is worth it
  4. Measure everything
  5. Users care about speed

The tech stack matters, but architecture decisions matter more.

Related Topics

next.jsprismaarchitecturegithub apiscalabilitytech stack