Skip to Content
DocumentationAdministrationWorkspaceRecipesExporting Query History

Analyzing data from Query History export

You can use Query History export to bring Query History data to an external monitoring solution for further analysis.

In this recipe, we will show you how to export Query History data to Amazon S3, and then analyze it using Cube by reading the data from S3 using DuckDB.

Configuration

Vector configuration for exporting Query History to Amazon S3 and also outputting it to the console of the Vector agent in your Cube Cloud deployment.

In the example below, we are using the aws_s3 sink to export the cube-query-history-export-demo bucket in Amazon S3, but you can use any other storage solution that Vector supports.

[sinks.aws_s3] type = "aws_s3" inputs = [ "query-history" ] bucket = "cube-query-history-export-demo" region = "us-east-2" compression = "gzip" [sinks.aws_s3.auth] access_key_id = "$CUBE_CLOUD_MONITORING_AWS_ACCESS_KEY_ID" secret_access_key = "$CUBE_CLOUD_MONITORING_AWS_SECRET_ACCESS_KEY" [sinks.aws_s3.encoding] codec = "json" [sinks.aws_s3.healthcheck] enabled = false [sinks.my_console] type = "console" inputs = [ "query-history" ] target = "stdout" encoding = { codec = "json" }

You’d also need to set the following environment variables in the Settings → Environment variables page of your Cube Cloud deployment:

CUBE_CLOUD_MONITORING_AWS_ACCESS_KEY_ID=your-access-key-id CUBE_CLOUD_MONITORING_AWS_SECRET_ACCESS_KEY=your-secret-access-key CUBEJS_DB_DUCKDB_S3_ACCESS_KEY_ID=your-access-key-id CUBEJS_DB_DUCKDB_S3_SECRET_ACCESS_KEY=your-secret-access-key CUBEJS_DB_DUCKDB_S3_REGION=us-east-2

Data modeling

Example data model for analyzing data from Query History export that is brought to a bucket in Amazon S3. The data is accessed directly from S3 using DuckDB.

With this data model, you can run queries that aggregate data by dimensions such as status, environment_name, api_type, etc. and also calculate metrics like count, total_duration, or avg_duration:

cubes: - name: requests sql: | SELECT *, api_response_duration_ms / 1000 AS api_response_duration, EPOCH_MS(start_time_unix_ms) AS start_time, EPOCH_MS(end_time_unix_ms) AS end_time FROM read_json_auto('s3://cube-query-history-export-demo/**/*.log.gz') dimensions: - name: trace_id sql: trace_id type: string primary_key: true - name: deployment_id sql: deployment_id type: number - name: environment_name sql: environment_name type: string - name: api_type sql: api_type type: string - name: api_query sql: api_query type: string - name: security_context sql: security_context type: string - name: cache_type sql: cache_type type: string - name: start_time sql: start_time type: time - name: end_time sql: end_time type: time - name: duration sql: api_response_duration type: number - name: status sql: status type: string - name: error_message sql: error_message type: string - name: user_name sql: "SUBSTRING(security_context::JSON ->> 'user', 3, LENGTH(security_context::JSON ->> 'user') - 4)" type: string segments: - name: production_environment sql: "{environment_name} IS NULL" - name: errors sql: "{status} <> 'success'" measures: - name: count type: count - name: count_non_production description: | Counts all non-production environments. See for details: https://cube.dev/docs/product/workspace/environments type: count filters: - sql: "{environment_name} IS NOT NULL" - name: total_duration type: sum sql: "{duration}" - name: avg_duration type: number sql: "{total_duration} / {count}" - name: median_duration type: number sql: "MEDIAN({duration})" - name: min_duration type: min sql: "{duration}" - name: max_duration type: max sql: "{duration}" pre_aggregations: - name: count_and_durations_by_status_and_start_date measures: - count - min_duration - max_duration - total_duration dimensions: - status time_dimension: start_time granularity: hour refresh_key: sql: SELECT MAX(end_time) FROM {requests.sql()} every: 10 minutes

Result

Example query in Playground:

Was this page useful?