DuckDB-WASM Tutorial: Complete Guide to Browser-Based SQL Analytics

DuckDB-WASM brings the power of analytical databases directly to your browser. In this comprehensive tutorial, you'll learn how to leverage this revolutionary technology for fast, private data analytics.

What is DuckDB-WASM?

DuckDB-WASM is a WebAssembly port of DuckDB, an in-process analytical database. It enables:

Native-speed SQL processing in browsers
Zero-setup analytics - no servers required
Privacy-first processing - data never leaves your device
Full SQL support including complex joins and window functions

Getting Started with DuckDB-WASM

Installation

First, let's get DuckDB-WASM integrated into your project. There are two main approaches depending on your setup - a simple CDN link for quick prototyping or npm installation for production applications.

For web applications, include DuckDB-WASM via CDN:

<script src="https://unpkg.com/@duckdb/duckdb-wasm@latest/dist/duckdb-browser.js"></script>

For production applications or when using a build system, install via npm to get better dependency management and version control:

npm install @duckdb/duckdb-wasm

Basic Setup

Now we'll initialize DuckDB in your application. This setup creates a database instance that runs in a Web Worker, which prevents blocking your main thread during heavy data processing operations.

The initialization process downloads the WebAssembly module and sets up the worker communication. This happens once per session and enables all subsequent database operations

import * as duckdb from '@duckdb/duckdb-wasm';

// Initialize DuckDB
const JSDELIVR_BUNDLES = duckdb.getJsDelivrBundles();
const bundle = await duckdb.selectBundle(JSDELIVR_BUNDLES);
const worker = await duckdb.createWorker(bundle.mainWorker);
const logger = new duckdb.ConsoleLogger();
const db = new duckdb.AsyncDuckDB(logger, worker);
await db.instantiate(bundle.mainModule);

Loading and Querying Data

Working with CSV Files

One of DuckDB-WASM's most powerful features is its ability to directly query CSV files without any preprocessing. This is perfect for analyzing spreadsheet exports, log files, or any tabular data.

The following example shows how to register a CSV file in memory and query it using standard SQL. The read_csv_auto function automatically detects column types and handles common formatting issues

// Register a CSV file
await db.registerFileText('data.csv', csvContent);

// Query the data
const conn = await db.connect();
const result = await conn.query(`
  SELECT * FROM read_csv_auto('data.csv')
  WHERE column1 > 100
  ORDER BY column2 DESC
  LIMIT 10
`);

console.table(result.toArray());

Parquet File Support

Parquet files are columnar data files that offer excellent compression and query performance. DuckDB-WASM can read Parquet files directly from URLs, making it easy to analyze large datasets stored in cloud storage or data lakes.

This approach is incredibly powerful for analytics because you can query multi-gigabyte datasets without downloading the entire file - DuckDB only fetches the columns and rows you actually need

// Load from URL
const result = await conn.query(`
  SELECT customer_id, SUM(amount) as total_spent
  FROM 'https://example.com/sales.parquet'
  GROUP BY customer_id
  HAVING total_spent > 1000
`);

JSON Data Processing

Modern applications often work with JSON data from APIs or NoSQL databases. DuckDB-WASM includes built-in JSON functions that let you extract and query nested data structures using familiar SQL syntax.

This is particularly useful for analyzing API responses, configuration files, or semi-structured data where you need to extract specific fields for analysis

// Query JSON data directly
const result = await conn.query(`
  SELECT json_extract(data, '$.name') as name,
         json_extract(data, '$.age') as age
  FROM read_json_auto('users.json')
  WHERE json_extract(data, '$.active') = true
`);

Advanced Analytics Examples

Time Series Analysis

Time series analysis is essential for understanding trends and patterns in data over time. DuckDB's window functions make it easy to calculate moving averages, growth rates, and other temporal metrics.

This example calculates a 7-day moving average, which smooths out daily fluctuations to reveal underlying trends. This technique is commonly used in financial analysis, sales forecasting, and performance monitoring

-- Calculate moving averages
SELECT 
  date,
  value,
  AVG(value) OVER (
    ORDER BY date 
    ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
  ) as moving_avg_7d
FROM timeseries_data
ORDER BY date;

Cohort Analysis

Cohort analysis tracks groups of users over time to understand retention patterns and customer lifecycle behavior. This is crucial for SaaS businesses, e-commerce, and any application where you need to measure user engagement.

The following query groups customers by their first purchase month and tracks how many return in subsequent months. This reveals retention patterns and helps identify the most valuable customer segments

-- Customer cohort analysis
WITH first_purchase AS (
  SELECT customer_id, MIN(order_date) as cohort_month
  FROM orders
  GROUP BY customer_id
),
cohort_data AS (
  SELECT 
    DATE_TRUNC('month', f.cohort_month) as cohort_month,
    DATE_DIFF('month', f.cohort_month, o.order_date) as period_number,
    COUNT(DISTINCT o.customer_id) as customers
  FROM first_purchase f
  JOIN orders o ON f.customer_id = o.customer_id
  GROUP BY 1, 2
)
SELECT * FROM cohort_data
ORDER BY cohort_month, period_number;

Performance Optimization Tips

Memory Management

When working with large datasets in the browser, memory management becomes critical. Streaming results prevents memory overflow by processing data in manageable chunks rather than loading everything at once.

This streaming approach is essential when dealing with datasets larger than available browser memory. It allows you to process millions of rows efficiently

// Use streaming for large datasets
const stream = await conn.send(`
  SELECT * FROM large_table
  WHERE conditions
`);

for await (const batch of stream) {
  // Process batch
  console.log(batch.toArray());
}

Query Optimization

Use column pruning: Select only needed columns
Apply filters early: Use WHERE clauses to reduce data
Leverage indexing: Sort data by query columns
Batch operations: Group multiple queries when possible

Real-World Use Cases

Sales Dashboard

Building a sales dashboard demonstrates how DuckDB-WASM can power real-time analytics interfaces. This function processes sales data to generate key metrics that businesses need to track performance and make decisions.

The queries calculate monthly revenue trends and identify top-performing products - two essential metrics for any sales organization. The results can be directly fed into charting libraries for visualization

async function buildSalesDashboard(salesData) {
  await db.registerFileText('sales.csv', salesData);
  const conn = await db.connect();
  
  // Monthly revenue
  const revenue = await conn.query(`
    SELECT 
      DATE_TRUNC('month', order_date) as month,
      SUM(amount) as revenue
    FROM read_csv_auto('sales.csv')
    GROUP BY month
    ORDER BY month
  `);
  
  // Top products
  const products = await conn.query(`
    SELECT 
      product_name,
      COUNT(*) as orders,
      SUM(amount) as revenue
    FROM read_csv_auto('sales.csv')
    GROUP BY product_name
    ORDER BY revenue DESC
    LIMIT 10
  `);
  
  return { revenue: revenue.toArray(), products: products.toArray() };
}

Log Analysis

Server log analysis is crucial for monitoring application health and identifying issues. DuckDB-WASM can process log files directly in the browser, enabling real-time monitoring without sending sensitive log data to external services.

This query analyzes HTTP status codes to identify error patterns. The results show both absolute counts and percentages, helping you quickly spot problematic trends

// Analyze web server logs
const logAnalysis = await conn.query(`
  SELECT 
    status_code,
    COUNT(*) as requests,
    ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (), 2) as percentage
  FROM read_csv_auto('access.log', 
    columns = {
      'timestamp': 'VARCHAR',
      'ip': 'VARCHAR', 
      'status_code': 'INTEGER',
      'response_time': 'DOUBLE'
    })
  WHERE status_code >= 400
  GROUP BY status_code
  ORDER BY requests DESC
`);

Integration with Popular Frameworks

React Component

Integrating DuckDB-WASM into React applications enables powerful client-side analytics with modern UI frameworks. This component demonstrates the complete lifecycle - initialization, data loading, and query execution.

The component uses React hooks to manage the database connection and results state. This pattern ensures proper cleanup and provides a responsive user interface for data analysis

import { useState, useEffect } from 'react';
import * as duckdb from '@duckdb/duckdb-wasm';

function DataAnalyzer({ csvData }) {
  const [results, setResults] = useState([]);
  const [db, setDb] = useState(null);
  
  useEffect(() => {
    async function initDB() {
      const JSDELIVR_BUNDLES = duckdb.getJsDelivrBundles();
      const bundle = await duckdb.selectBundle(JSDELIVR_BUNDLES);
      const worker = await duckdb.createWorker(bundle.mainWorker);
      const logger = new duckdb.ConsoleLogger();
      const database = new duckdb.AsyncDuckDB(logger, worker);
      await database.instantiate(bundle.mainModule);
      setDb(database);
    }
    initDB();
  }, []);
  
  const runQuery = async (sql) => {
    if (!db) return;
    
    await db.registerFileText('data.csv', csvData);
    const conn = await db.connect();
    const result = await conn.query(sql);
    setResults(result.toArray());
  };
  
  return (
    <div>
      <button onClick={() => runQuery('SELECT * FROM read_csv_auto("data.csv") LIMIT 100')}>
        Load Data
      </button>
      <table>
        {/* Render results */}
      </table>
    </div>
  );
}

Best Practices

Security Considerations

Validate user input: Sanitize SQL queries to prevent injection
Resource limits: Set memory and execution time limits
File access: Restrict file system access in production

Error Handling

Robust error handling is essential when dealing with user-generated SQL queries or unpredictable data sources. DuckDB-WASM provides detailed error messages that help identify syntax errors, data type mismatches, and other common issues.

This error handling pattern catches common problems and provides user-friendly error messages instead of raw technical details

try {
  const result = await conn.query(userQuery);
  return result.toArray();
} catch (error) {
  if (error.message.includes('syntax error')) {
    throw new Error('Invalid SQL syntax');
  }
  throw new Error('Query execution failed');
}

Testing Strategies

Testing data analytics code requires verifying both query logic and performance characteristics. This example shows how to set up unit tests for DuckDB-WASM operations using popular testing frameworks.

The test verifies that analytical functions like moving averages produce expected results. This approach ensures your analytics logic remains correct as you add new features

// Unit test example
describe('DuckDB Analytics', () => {
  let db, conn;
  
  beforeAll(async () => {
    // Initialize test database
    db = await createTestDB();
    conn = await db.connect();
  });
  
  test('should calculate moving average', async () => {
    const result = await conn.query(`
      SELECT AVG(value) OVER (ROWS 3 PRECEDING) as ma
      FROM test_data
    `);
    expect(result.toArray()).toHaveLength(10);
  });
});

Troubleshooting Common Issues

Memory Errors

If you encounter out-of-memory errors:

Reduce batch size: Process data in smaller chunks
Optimize queries: Use LIMIT and WHERE clauses
Increase memory: Configure WASM memory limits

Performance Issues

For slow queries:

Profile queries: Use EXPLAIN to understand execution plans
Optimize data types: Use appropriate column types
Consider partitioning: Split large datasets

Conclusion

DuckDB-WASM opens up new possibilities for browser-based analytics. With its SQL compatibility, high performance, and privacy-first approach, it's perfect for:

Interactive dashboards that run entirely client-side
Data exploration tools without server requirements
Privacy-sensitive applications in healthcare and finance
Embedded analytics in web applications

Ready to get started? Try LakeClient for a complete DuckDB-WASM analytics platform, or implement your own solution using the techniques covered in this tutorial.

Want to learn more about browser-based analytics? Explore our other tutorials on privacy-first analytics and browser-based SQL processing.