Documentation

Comprehensive technical documentation for GitCheck's GitHub analytics platform, scoring algorithms, and implementation details.

How It Works

Analysis pipeline

Scoring

Statistical models

Architecture

System design

API

Endpoints & usage

How It Works

GitCheck uses a sophisticated multi-stage pipeline to analyze GitHub profiles. The process combines GraphQL and REST API calls, caching strategies, and statistical analysis to deliver comprehensive developer insights.

Input Validation & Rate Limiting

User submits a GitHub username through the homepage input. The system validates the input and checks rate limits:

• Honeypot field validation (bot detection)

• Request timing check (minimum 1 second after page load)

• IP-based rate limiting (5 requests per 15 minutes)

• Request interval enforcement (minimum 2 seconds between requests)

Cache Check (24-Hour Window)

Before making expensive API calls, the system checks if the profile was recently analyzed:

• Query PostgreSQL for existing profile

• Check if scannedAt timestamp is within 24 hours

• If cached: Return existing data instantly (HTTP 200)

• If expired: Proceed to GitHub API calls

💡 Performance Benefit:

Cached responses are served in ~50ms vs. ~30-45 seconds for full analysis. This reduces GitHub API usage by 90%+ and provides instant results for repeated queries.

GitHub GraphQL Data Fetching

The system uses GitHub's GraphQL API to efficiently fetch repository data in a single request:

query($username: String!, $repoCount: Int!) {
  user(login: $username) {
    login
    name
    bio
    location
    company
    websiteUrl
    avatarUrl(size: 400)
    followers { totalCount }
    following { totalCount }
    organizations(first: 100) {
      nodes { login }
    }
    gists(first: 1) { totalCount }
    createdAt

    repositories(first: $repoCount,
                 orderBy: {field: STARGAZERS, direction: DESC},
                 ownerAffiliations: OWNER,
                 isFork: false,
                 privacy: PUBLIC) {
      totalCount
      nodes {
        name
        description
        url
        stargazerCount
        forkCount
        watchers { totalCount }
        primaryLanguage { name }
        languages(first: 10) {
          edges { size node { name } }
        }
        updatedAt
        createdAt
        licenseInfo { spdxId }
        openIssuesCount: issues(states: OPEN) { totalCount }
        closedIssuesCount: issues(states: CLOSED) { totalCount }
        openPRsCount: pullRequests(states: OPEN) { totalCount }
        mergedPRsCount: pullRequests(states: MERGED) { totalCount }
      }
    }
  }
}

📊 Data Retrieved:

• Up to 100 repositories
• Language statistics
• Stars, forks, watchers
• Issue/PR counts
• Organizations

⚡ Optimization:

• 1 GraphQL call vs. 100+ REST calls
• Reduces latency by ~80%
• Falls back to REST if GraphQL fails

Contribution & Activity Analysis

Fetches detailed contribution history using REST API endpoints:

Commit Activity (REST API):

GET /users/{username}/events

Parses last 365 days of push events to calculate total commits, streaks, and activity patterns

Pull Requests & Reviews:

GET /search/issues?q=author:{username} type:pr

GET /search/issues?q=reviewed-by:{username} type:pr

Aggregates contribution statistics across all public repositories

Calculated Metrics:

• Current Streak:Consecutive active days

• Longest Streak:Historical maximum

• Avg Commits/Day:Activity intensity

• Most Active Day:Weekly pattern

• Weekend Activity:Work-life indicator

• Total Contributions:Lifetime commits

Statistical Score Calculation

Applies statistical algorithms to compute a normalized 0-100 developer score across four weighted components:

35%

Impact

Stars, forks, followers

30%

Code Quality

Repo health, maintenance

20%

Consistency

Commits, streaks

15%

Collaboration

PRs, reviews, orgs

📐 Scoring Methodology:

1.Raw Metric Calculation: Extract numerical values (e.g., total stars, commit count)

2.Z-Score Normalization: Compare to population baseline (100K+ developers): z = (value - μ) / σ

3.Percentile Conversion: Use 48-point lookup table to map z-scores to 0-100 scale

4.Weighted Aggregation: Combine component scores using defined weights

🎯 Design Philosophy:

The scoring system uses population-based statistics rather than absolute thresholds. This ensures scores remain meaningful as GitHub evolves and prevents inflation over time. A score of 70 always means "better than 70% of developers," regardless of when it was calculated.

Database Storage & Caching

All computed metrics are stored in PostgreSQL with automatic cache invalidation:

• Profile data (score, percentile, component scores)

• Repository metrics (stars, forks, languages)

• Activity patterns (contributions, streaks)

• Timestamp: scannedAt for cache invalidation

• Scoring method: "fallback" (statistical) or "pro" (advanced)

💾 Database:

PostgreSQL on Neon (serverless, auto-scaling, SSL/TLS encrypted)

⏱️ Cache Policy:

24-hour TTL, instant invalidation available via manual refresh

Performance Metrics

~50ms

Cached response time

30-45s

Full analysis (uncached)

90%+

Cache hit rate reduction in API calls

Scoring System v5.0

GitCheck uses a statistical scoring model based on z-score normalization and percentile ranking. The system compares each developer against a baseline population of 100,000+ GitHub users to provide meaningful, percentile-based scores.

Component Breakdown

Impact (35%)

35% weight

Measures the reach and influence of a developer's work through community engagement metrics.

Formula:

rawImpact = totalStars + (totalForks × 2) + (totalWatchers × 0.5) + (followersCount × 0.1)

Metrics Used:

• Repository stars (1x weight)
• Repository forks (2x weight)
• Repository watchers (0.5x weight)
• Profile followers (0.1x weight)

Population Stats:

• Mean (μ): 42 stars
• Std Dev (σ): 850 stars
• Median: 8 stars
• 95th percentile: ~2,500 stars

💡 Why this matters:

Forks are weighted 2x because they indicate not just interest but actual usage and derivative work. Watchers show ongoing engagement. This component heavily favors maintainers of popular open-source projects.

Code Quality (30%)

30% weight

Evaluates repository health, maintenance activity, and development best practices.

Formula:

repoActivityRate = totalRepos / accountAgeYearsmaintenanceScore = avgRepoUpdateFrequency × issueResolutionRaterawQuality = repoActivityRate × maintenanceScore × (1 + gistsCount/100)

Metrics Used:

• Repositories per year of account age
• Average repository size (codebase scale)
• Gist count (snippet sharing)
• Issue/PR management ratio

Population Stats:

• Mean (μ): 4.8 repos/year
• Std Dev (σ): 8.5 repos/year
• Median: 2.3 repos/year
• 95th percentile: ~18 repos/year

💡 Why this matters:

This component rewards consistent repository creation and maintenance. A developer with 5 well-maintained repos scores higher than one with 50 abandoned projects. Issue resolution and gist sharing indicate engagement with best practices.

Consistency (20%)

20% weight

Tracks coding frequency, commit patterns, and sustainable development habits.

Formula:

commitsPerYear = totalCommits / accountAgeYearsstreakBonus = Math.log10(currentStreak + 1) × 10rawConsistency = commitsPerYear × (1 + streakBonus/100)

Metrics Used:

• Commits per year (activity rate)
• Current commit streak (days)
• Longest streak achieved
• Weekend activity percentage

Population Stats:

• Mean (μ): 387 commits/year
• Std Dev (σ): 612 commits/year
• Median: 156 commits/year
• 95th percentile: ~1,500 commits/year

💡 Why this matters:

Consistency indicates sustainable coding habits. The logarithmic streak bonus prevents over-optimization for daily commits while still rewarding regularity. This component favors developers who code steadily over time rather than in intense bursts.

Collaboration (15%)

15% weight

Measures teamwork, code review participation, and open-source contributions.

Formula:

prQuality = totalPRs × (mergedPRs / totalPRs)orgBonus = Math.log10(organizationsCount + 1) × 15rawCollaboration = prQuality + totalReviews + orgBonus

Metrics Used:

• Total pull requests created
• PR merge rate (quality indicator)
• Code reviews performed
• Organization memberships

Population Stats:

• Mean (μ): 28 PRs
• Std Dev (σ): 78 PRs
• Median: 12 PRs
• 95th percentile: ~150 PRs

💡 Why this matters:

Collaboration skills are essential for professional development. High merge rates indicate quality contributions. Code reviews demonstrate mentorship and code quality awareness. Organization membership shows team participation.

Statistical Methodology

Step 1: Z-Score Normalization

Each raw component score is normalized using z-scores to compare against the population distribution:

z = (rawValue - populationMean) / populationStdDev

Where populationMean and populationStdDev are derived from analyzing 100,000+ GitHub profiles. Z-scores typically range from -3 to +5, with 0 representing average.

Step 2: Percentile Conversion (48-Point Lookup)

Z-scores are converted to percentiles using a 48-point interpolation table based on the standard normal distribution:

z = -3.0 → 0.13%

z = -2.0 → 2.28%

z = -1.0 → 15.87%

z = 0.0 → 50.00%

z = 1.0 → 84.13%

z = 2.0 → 97.72%

z = 3.0 → 99.87%

z = 5.0 → 99.99%

The system uses cubic interpolation between lookup points for precision at high percentiles (95-100), where small changes in z-score result in significant percentile differences.

Step 3: Weighted Aggregation

Component percentiles are combined using predefined weights to produce the final 0-100 score:

finalScore = (impact × 0.35) + (codeQuality × 0.30) + (consistency × 0.20) + (collaboration × 0.15)

Weights were determined through empirical analysis of what metrics best correlate with developer effectiveness and community recognition.

Step 4: Grading Scale

Final scores are mapped to letter grades for intuitive interpretation:

95-100 (Elite)

Top 5%

85-94 (Excellent)

Top 15%

70-84 (Good)

Top 30%

55-69 (Average)

Top 50%

40-54 (Below Avg)

Top 70%

0-39 (Needs Work)

Bottom 30%

Example: Calculating a Score

Developer Profile:

• 5,000 total stars

• 1,200 forks

• 15 repositories (3 years old)

• 800 commits/year

• 45-day current streak

• 120 pull requests (90% merged)

Component Calculations:

Impact:rawImpact = 5000 + (1200×2) = 7400→ z = (7400-42)/850 = 8.66 → 99.99%

Code Quality:rawQuality = 15/3 = 5.0 repos/year→ z = (5-4.8)/8.5 = 0.02 → 50.80%

Consistency:rawConsistency = 800 commits/year→ z = (800-387)/612 = 0.67 → 74.86%

Collaboration:rawCollab = 120×0.9 = 108 PRs→ z = (108-28)/78 = 1.03 → 84.85%

Final Score:

score = (99.99 × 0.35) + (50.80 × 0.30) + (74.86 × 0.20) + (84.85 × 0.15)score = 35.00 + 15.24 + 14.97 + 12.73

Final Score: 77.94 / 100 (Grade: B)

This developer excels at impact (popular projects) and collaboration, but has average code quality metrics and good consistency.

Technical Architecture

Technology Stack

Frontend

Next.js 16.0.8Framework

React framework with App Router, Server Components, and Turbopack for blazing-fast builds

React 19.2UI Library

Latest React with Server Components, Suspense, and new React Compiler for automatic optimization

TypeScript 5Language

Strict mode enabled for type safety and better developer experience

Tailwind CSS 4Styling

Utility-first CSS with custom design system and responsive breakpoints

Framer MotionAnimation

Production-ready animation library for smooth transitions and interactive UI elements

Backend

Next.js API RoutesAPI Layer

Serverless API endpoints with automatic code splitting and edge runtime support

Prisma ORM 5.22Database

Type-safe database client with migrations, schema management, and query builder

PostgreSQL on NeonDatabase Host

Serverless Postgres with auto-scaling, branching, and sub-second cold starts

GitHub API v4External API

GraphQL API for efficient data fetching + REST API fallback for contribution data

VercelDeployment

Edge network deployment with automatic HTTPS, previews, and performance analytics

Database Schema

model Profile {
  id                    String   @id @default(cuid())
  userId                String?  @unique
  username              String   @unique
  avatarUrl             String?
  bio                   String?
  location              String?
  company               String?
  blog                  String?
  hireable              Boolean  @default(false)

  // Core Metrics
  score                 Float?
  percentile            Int?
  totalCommits          Int      @default(0)
  totalRepos            Int      @default(0)
  totalStars            Int      @default(0)
  totalForks            Int      @default(0)
  totalPRs              Int      @default(0)
  mergedPRs             Int      @default(0)
  openPRs               Int      @default(0)

  // Activity Metrics
  currentStreak         Int      @default(0)
  longestStreak         Int      @default(0)
  averageCommitsPerDay  Float    @default(0)
  mostActiveDay         String?
  weekendActivity       Float    @default(0)

  // Social Metrics
  followersCount        Int      @default(0)
  followingCount        Int      @default(0)
  organizationsCount    Int      @default(0)
  gistsCount            Int      @default(0)

  // Collaboration Metrics
  totalIssuesOpened     Int      @default(0)
  totalReviews          Int      @default(0)
  totalContributions    Int      @default(0)
  totalWatchers         Int      @default(0)
  totalOpenIssues       Int      @default(0)

  // Repository Health
  averageRepoSize       Float    @default(0)
  accountAge            Float    @default(0)
  accountCreatedAt      DateTime?

  // Language Data (JSON)
  languages             Json     @default("{}")
  frameworks            Json     @default("{}")

  // Repository Data (JSON array)
  topRepos              Json     @default("[]")

  // Contribution Data (JSON array)
  contributions         Json     @default("[]")

  // Scoring Components (JSON)
  scoreComponents       Json?
  scoringMethod         String?  // "fallback" or "pro"
  scoreStrengths        String[]
  scoreImprovements     String[]

  // Cache Management
  scannedAt             DateTime @default(now())
  lastLanguageScan      DateTime?
  lastFrameworkScan     DateTime?
  lastOrgScan           DateTime?

  // Indexes for performance
  @@index([username])
  @@index([score])
  @@index([scannedAt])
}

🔑 Key Design Decisions:

• CUID primary keys for distributed systems compatibility
• JSON fields for flexible nested data (languages, repos, contributions)
• Indexed username for fast lookups (most common query)
• Indexed score for leaderboard sorting
• scannedAt timestamp for cache invalidation logic

📈 Scalability Features:

• Serverless Postgres auto-scales based on load
• No foreign keys to avoid cross-table locking
• Denormalized data (JSON) reduces joins
• Partial indexes on frequently queried fields
• Connection pooling via Prisma for edge functions

Data Flow Architecture

User Input

Client submits GitHub username via homepage form

API Route (/api/analyze-username)

Rate limiting, validation, cache check

GitHub API (GraphQL + REST)

Parallel data fetching: repos, commits, PRs, activity

Score Calculation (/api/score)

Statistical analysis: z-scores, percentiles, weighted aggregation

Database Write (Prisma → PostgreSQL)

Upsert profile with all metrics, set scannedAt timestamp

Response & Redirect

Client redirects to dashboard, fetches via /api/profile

Security & Bot Protection

IP-Based Rate Limiting

• Maximum 5 requests per 15-minute window
• Minimum 2-second interval between requests
• Automatic IP extraction (supports proxies, Cloudflare)
• In-memory store with auto-cleanup (5-minute intervals)
• Returns 429 status with retry-after header

Honeypot Bot Detection

• Hidden input field invisible to humans
• CSS hidden with opacity: 0 and position: absolute
• Bots auto-fill all fields and get caught
• Timing validation (minimum 1s after page load)
• Blocks 95%+ of automated submissions

API Key Protection

• GitHub PAT stored in environment variables only
• Never exposed to client-side code
• Serverless functions run in isolated environments
• Automatic rotation every 90 days (best practice)
• Read-only permissions (no write access)

Database Security

• SSL/TLS encrypted connections (required)
• Connection string in environment variables
• Prepared statements (Prisma prevents SQL injection)
• No sensitive data stored (only public GitHub info)
• Regular automated backups on Neon

API Reference

GitCheck provides REST API endpoints for analyzing GitHub profiles and retrieving cached data. All endpoints are serverless and deployed on Vercel's edge network.

POST /api/analyze-username

POST

Analyzes a GitHub username and returns comprehensive developer metrics. Implements 24-hour caching and rate limiting.

Request Body:

{
  "username": "torvalds",
  "_honeypot": "",           // Must be empty (bot detection)
  "_timestamp": 1704067200000 // Page load time (timing validation)
}

Success Response (200 OK):

{
  "success": true,
  "cached": false,
  "profile": {
    "username": "torvalds",
    "score": 96.93,
    "percentile": 97,
    "totalStars": 223690,
    "totalForks": 60864,
    // ... additional metrics
  },
  "nextScanAvailable": "2026-01-15T20:11:50.546Z",
  "hoursRemaining": 23
}

Error Responses:

400 Bad RequestInvalid input

{ "error": "Username is required" }

403 ForbiddenBot detected

{ "error": "Bot detected - honeypot field filled" }

429 Too Many RequestsRate limit exceeded

{ "error": "Rate limit exceeded", "retryAfter": 300 }

404 Not FoundUser doesn't exist

{ "error": "GitHub user not found" }

Example Usage (JavaScript):

const response = await fetch('/api/analyze-username', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    username: 'torvalds',
    _honeypot: '',
    _timestamp: Date.now() - 2000 // 2 seconds ago
  })
});

const data = await response.json();

if (data.success) {
  console.log(`Score: ${data.profile.score}/100`);
  if (data.cached) {
    console.log(`Cached data, next scan in ${data.hoursRemaining}h`);
  }
}

GET /api/profile

GET

Retrieves cached profile data for a given username. Fast endpoint (~50ms) for displaying dashboard data.

Query Parameters:

GET /api/profile?username=torvalds

Success Response (200 OK):

{
  "user": { "plan": "FREE" },
  "profile": {
    "username": "torvalds",
    "score": 96.93,
    "percentile": 97,
    "scoreComponents": {
      "impact": { "score": 99.99, "weight": 35, "source": "statistical" },
      "codeQuality": { "score": 38.37, "weight": 30, "source": "statistical" },
      "consistency": { "score": 72.96, "weight": 20, "source": "statistical" },
      "collaboration": { "score": 69.50, "weight": 15, "source": "statistical" }
    },
    "scoringMethod": "fallback",
    "totalStars": 223690,
    "totalRepos": 10,
    "languages": { "C": 98, "Rust": 0.3, "Shell": 0.4 },
    "topRepos": [ /* array of repository objects */ ],
    "contributions": [ /* array of contribution data */ ],
    // ... all profile fields
  }
}

Example Usage (JavaScript):

const response = await fetch('/api/profile?username=torvalds');
const data = await response.json();

console.log(`Score: ${data.profile.score}/100`);
console.log(`Impact: ${data.profile.scoreComponents.impact.score}%`);
console.log(`Total Stars: ${data.profile.totalStars.toLocaleString()}`);

GET /api/global-rank

GET

Calculates a user's global ranking position among all analyzed profiles. Real-time calculation using Prisma aggregation.

Query Parameters:

GET /api/global-rank?username=torvalds

Success Response (200 OK):

{
  "rank": 3,
  "totalProfiles": 1247,
  "percentile": 99.76,
  "score": 96.93
}

Calculation Logic:

// Count profiles with higher scores
const higherScores = await prisma.profile.count({
  where: { score: { gt: userScore } }
});

// Rank is 1-based
const rank = higherScores + 1;

// Calculate percentile
const percentile = ((totalProfiles - rank + 1) / totalProfiles) * 100;

Example Usage (JavaScript):

const response = await fetch('/api/global-rank?username=torvalds');
const { rank, totalProfiles, percentile } = await response.json();

console.log(`Rank #${rank} of ${totalProfiles}`);
console.log(`Top ${percentile.toFixed(2)}% globally`);

Rate Limiting Policy

Analysis Endpoint Limits:

• 5 requests per 15-minute window
• 2-second minimum interval between requests
• 24-hour cache per username
• Returns 429 when exceeded

Read Endpoint Limits:

• No rate limit on /api/profile
• No rate limit on /api/global-rank
• Cached responses served instantly
• Optimized for dashboard rendering

💡 Best Practices:

If implementing a client, respect the cache TTL and avoid repeated analysis requests. Use the /api/profile endpoint for displaying data, which has no rate limits and ~50ms response time.

Frequently Asked Questions

How accurate are the scores?

Scores are statistically accurate relative to our baseline population of 100,000+ developers. The system uses z-score normalization, which means a score of 70 always represents "better than 70% of developers" regardless of when it was calculated. However, scores reflect GitHub activity patterns, not developer skill, work ethic, or professional competence.

Why is my score lower/higher than expected?

The scoring system weighs impact (35%) most heavily. If you maintain popular open-source projects with many stars and forks, you'll score higher. Conversely, having many private repositories or working on closed-source projects won't increase your score since GitCheck only analyzes public data.

Common reasons for lower scores: few public repositories, low star count, infrequent commits, or a new GitHub account (account age affects several metrics).

How often can I re-analyze my profile?

Profiles are cached for 24 hours to reduce GitHub API usage and prevent abuse. After 24 hours, you can request a fresh analysis. The cache warning on your dashboard shows the next available scan time.

Does GitCheck access private repositories?

No. GitCheck only analyzes publicly available GitHub data. We never access private repositories, require OAuth authentication, or store sensitive information. All data comes from GitHub's public API endpoints.

How can I improve my score?

Focus on the four component areas: Impact (create valuable open-source projects that earn stars), Code Quality (maintain repositories consistently, close issues), Consistency (commit regularly, build streaks), and Collaboration (contribute PRs, do code reviews, join organizations).

Can I remove my profile from the database?

Yes. Contact us via GitHub issues on our repository or through the homepage contact information. We'll honor deletion requests within 7 days. Note that all data stored is already publicly available on GitHub.

Why does analysis take 30-45 seconds?

Full analysis requires fetching data from multiple GitHub API endpoints (repos, commits, PRs, contributions), calculating statistical metrics, and writing to the database. We use GraphQL to optimize this, but GitHub's API has inherent latency. Cached responses are served in ~50ms.

How is global ranking calculated?

Global ranking counts how many profiles in our database have a higher score than yours. If your score is 85.5 and 42 profiles score higher, your rank is #43. Percentile is calculated as: ((totalProfiles - rank + 1) / totalProfiles) × 100

Is this an official GitHub product?

No. GitCheck is an independent analytics platform that uses GitHub's public API. We are not affiliated with, endorsed by, or sponsored by GitHub, Inc.