Skip to content

Performance Testing with Grafana k6

This document outlines the comprehensive performance testing setup for Fast-Crawl using Grafana k6, including local development workflows and CI/CD integration with strict timeout safeguards.

Overview

Fast-Crawl includes a robust performance testing suite built with Grafana k6 that validates the performance and reliability of all API endpoints. The testing framework emphasizes safety first with strict timeout controls to prevent infinite runs and resource exhaustion.

Key Features

  • ⏱️ Timeout Safety: All tests are limited to ≤15 seconds duration using explicit --vus and --duration flags
  • 🔄 CI/CD Integration: Automated performance testing in GitHub Actions on every pull request
  • 🐳 Local Docker Builds: Tests run against actual PR code changes, not registry images
  • 📊 Comprehensive Metrics: Response times, error rates, and custom performance indicators
  • 🛡️ CI-Compatible: Graceful handling of external service dependencies in CI environments

Test Coverage

The performance testing suite covers all API endpoints with optimized configurations:

EndpointDurationVirtual UsersFocus Area
/v1/health10s10 VUsHealth check responsiveness
/v1/search15s5 VUsSearch aggregation performance
/v1/scrap12s3 VUsContent scraping efficiency
/v1/results10s8 VUsResults processing speed
Combined15s5 VUsMixed workload simulation

Local Development

Prerequisites

  1. Install k6: Follow the official installation guide
  2. Start Fast-Crawl: Ensure the server is running locally:
    bash
    bun run start

Running Tests

The project includes convenient npm scripts for running performance tests:

bash
# Run all endpoints in a combined test (recommended)
npm run load-test

# Run individual endpoint tests
npm run load-test:health      # Health endpoint only
npm run load-test:search      # Search endpoint only  
npm run load-test:scrap       # Scraping endpoint only
npm run load-test:results     # Results endpoint only

Custom Test Execution

For advanced testing scenarios, you can run k6 directly with custom parameters:

bash
# Custom virtual users and duration (max 15s)
k6 run --vus 8 --duration 12s k6/health-test.js

# Different base URL (for testing different environments)
BASE_URL=http://localhost:8080 k6 run --vus 5 --duration 10s k6/combined-test.js

# Disable summary for CI-style output
k6 run --vus 5 --duration 10s --no-summary k6/search-test.js

GitHub Actions Workflow

The performance testing workflow (performance-test.yml) runs automatically on:

  • Push to main/develop branches
  • Pull requests to main
  • Manual trigger (workflow_dispatch)

Workflow Steps

  1. Code Checkout: Retrieves the latest code from the PR/branch
  2. Local Docker Build: Builds the Docker image from current code using docker build -t fast-crawl:test .
  3. Service Startup: Starts the containerized application with test environment variables
  4. k6 Setup: Installs Grafana k6 using grafana/setup-k6-action@v1
  5. Health Check: Waits for the service to be ready before running tests
  6. Load Testing: Executes all test suites with proper timeout flags
  7. Cleanup: Stops and removes test containers

Why Local Docker Builds?

The workflow builds Docker images locally instead of pulling from registries to ensure:

  • Testing Actual Changes: Performance tests validate the exact code in the PR
  • No Registry Dependencies: Eliminates failures due to missing or outdated images
  • Consistency: Same build process across development and CI environments

Safety Features & Timeout Handling

Mandatory Timeout Protection

All k6 commands include explicit timeout parameters to prevent infinite execution:

bash
# ✅ SAFE: Explicit duration and virtual users
k6 run --vus 10 --duration 10s k6/health-test.js

# ❌ UNSAFE: Could run indefinitely
k6 run k6/health-test.js

Configuration Safeguards

Each test script includes built-in safety configurations:

javascript
export const options = {
  duration: '10s',          // Maximum test duration
  vus: 10,                  // Virtual user count
  thresholds: {
    http_req_duration: ['p(95)<500'],  // Performance thresholds
    http_req_failed: ['rate<0.1'],     // Error rate limits
  },
};

Resource Management

  • Virtual User Limits: Optimized per endpoint complexity (3-10 VUs)
  • Sleep Delays: Built-in delays prevent overwhelming the API
  • Memory Limits: Container resource constraints in CI environment

Performance Thresholds

Each endpoint has specific performance criteria that must be met:

Health Endpoint (/v1/health)

  • Response Time: 95% of requests < 200ms
  • Error Rate: < 10%
  • Availability: Should always respond successfully
  • Response Time: 95% of requests < 2.5s
  • Error Rate: < 80% (CI-compatible due to external API dependencies)
  • Throughput: Handle concurrent search aggregation

Scrap Endpoint (/v1/scrap)

  • Response Time: 95% of requests < 4s
  • Error Rate: < 30% (Playwright browser automation)
  • Resource Usage: Efficient content extraction

Results Endpoint (/v1/results)

  • Response Time: 95% of requests < 800ms
  • Error Rate: < 10%
  • Processing Speed: Fast results manipulation

CI Environment Considerations

External Service Dependencies

Some endpoints depend on external services (Google, Bing) that may not be available in CI:

  • Error Rate Tolerance: Higher error rate thresholds (80%) for search/scrap endpoints
  • Environment Variables: DISABLE_EXTERNAL_APIS=true for testing
  • Graceful Degradation: Tests validate API structure even when external calls fail

Firewall Restrictions

The CI environment may block certain domains:

  • k6 telemetry endpoints (stats.grafana.org) are blocked
  • Tests use --no-summary flag to avoid telemetry calls when needed
  • Local testing remains unaffected

Test Scripts Architecture

Individual Test Files

Each endpoint has a dedicated test file with optimized configurations:

k6/
├── health-test.js      # Health endpoint (10s, 10 VUs)
├── search-test.js      # Search endpoint (15s, 5 VUs)
├── scrap-test.js       # Scraping endpoint (12s, 3 VUs)
├── results-test.js     # Results endpoint (10s, 8 VUs)
├── combined-test.js    # Mixed workload (15s, 5 VUs)
└── README.md           # Test-specific documentation

Custom Metrics

Tests include custom metrics for detailed performance insights:

javascript
import { Trend } from 'k6/metrics';

const customMetric = new Trend('endpoint_response_time');

export default function() {
  const response = http.get('/v1/endpoint');
  customMetric.add(response.timings.duration);
}

Troubleshooting

Common Issues

Tests timeout in CI

bash
# Solution: Verify duration flags are set correctly
k6 run --vus 5 --duration 10s script.js

High error rates in CI

  • Expected for endpoints with external dependencies
  • Check if DISABLE_EXTERNAL_APIS=true is set
  • Verify thresholds are appropriate for CI environment

Docker build failures

bash
# Debug locally:
docker build -t fast-crawl:test .
docker run -d -p 3000:3000 fast-crawl:test

Service not ready

  • Workflow includes 30-second timeout for service startup
  • Check container logs: docker logs fast-crawl-test

Debugging Performance Issues

  1. Run tests locally with verbose output:

    bash
    k6 run --vus 1 --duration 5s --http-debug k6/search-test.js
  2. Check application logs during test execution:

    bash
    docker logs -f fast-crawl-test
  3. Monitor resource usage:

    bash
    docker stats fast-crawl-test

Customizing Tests

Adding New Endpoints

  1. Create a new test file in k6/ directory
  2. Follow the existing patterns for timeout safety
  3. Add npm script in package.json
  4. Include in GitHub Actions workflow
  5. Update this documentation

Modifying Thresholds

Update the options.thresholds object in test files:

javascript
export const options = {
  thresholds: {
    http_req_duration: ['p(95)<1000'],    // 95% under 1s
    http_req_failed: ['rate<0.05'],       // 5% error rate
    custom_metric: ['avg<500'],           // Custom threshold
  },
};

Environment-Specific Configuration

Use environment variables for different testing scenarios:

javascript
const BASE_URL = __ENV.BASE_URL || 'http://localhost:3000';
const ERROR_THRESHOLD = __ENV.CI ? 0.8 : 0.1; // Higher tolerance in CI

Best Practices

  1. Always Use Timeouts: Include --duration flag in all k6 commands
  2. Optimize Virtual Users: Balance load testing with resource constraints
  3. Monitor External Dependencies: Account for third-party service availability
  4. Test Realistic Scenarios: Use representative data and request patterns
  5. Document Changes: Update thresholds and configurations as the API evolves