Skip to Content
DocumentationData modelingRecipesAverages and percentiles

Calculating averages and percentiles

Use case

We want to understand the distribution of values for a certain numeric property within a dataset. We’re used to average values and intuitively understand how to calculate them. However, we also know that average values can be misleading for skewed  distributions which are common in the real world: for example, 2.5 is the average value for both (1, 2, 3, 4) and (0, 0, 0, 10).

So, it’s usually better to use percentiles . Parameterized by a fractional number n = 0..1, where the n-th percentile is equal to a value that exceeds a specified ratio of values in the distribution. The median  is a special case: it’s defined as the 50th percentile (n = 0.5), and it can be casually thought of as “the middle” value. 2.5 and 0 are the medians of (1, 2, 3, 4) and (0, 0, 0, 10), respectively.

Data modeling

Let’s explore the data in the users cube that contains various demographic information about users, including their age:

[ { "users.name": "Abbott, Breanne", "users.age": 52 }, { "users.name": "Abbott, Dallas", "users.age": 43 }, { "users.name": "Abbott, Gia", "users.age": 36 }, { "users.name": "Abbott, Tom", "users.age": 39 }, { "users.name": "Abbott, Ward", "users.age": 67 } ]

Calculating the average age is as simple as defining a measure with the built-in avg type.

Calculating the percentiles would require using database-specific functions. However, almost every database has them under names of PERCENTILE_CONT and PERCENTILE_DISC, Postgres  and Snowflake  included. For BigQuery , you’d need to use the APPROX_QUANTILES function.

cubes: - name: users # ... measures: - name: avg_age type: avg sql: age - name: median_age type: number sql: PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY age) - name: p95_age type: number sql: PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY age)
cube("users", { measures: { avg_age: { sql: `age`, type: `avg` }, median_age: { sql: `PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY age)`, type: `number` }, p95_age: { sql: `PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY age)`, type: `number` } } })

Result

Using the measures defined above, we can explore statistics about the age of our users.

[ { "users.avg_age": "52.3100000000000000", "users.median_age": 53, "users.p95_age": 82 } ]

For this particular dataset, the average age closely matches the median age, and 95% of all users are younger than 82 years.

Source code

Please feel free to check out the full source code  or run it with the docker-compose up command. You’ll see the result, including queried data, in the console.

Was this page useful?