In the dynamic world of embedded analytics, where end-users demand instantaneous results, the story of achieving sub-second query performance is not just a technical guideline, but a requirement for user satisfaction. The advanced capabilities of Cube Cloud can address the challenges of slow queries and build faster, more responsive reporting for your users. Take a look.
The Challenge of Slow Queries: A Tale of Optimization
Imagine an analytics application, integral to a business where users are frustrated with charts that load slowly. The journey begins with understanding the root causes of these delays. Our guide identifies two primary scenarios: queries that are not pre-aggregated and those that are.
- The Queue Graph and Low Concurrency: In cases where queries are not pre-aggregated, we delve into the queue graph. This graph is a revealing tale of concurrency, where each bar represents a query. Sometimes,the reason for long-running queries is the queue creating bottlenecks. The solution? Increasing concurrency in Cube and the database, and considering the use of pre-aggregations.
This is an example of very low concurrency (this example is set to 2 concurrent queries to the DB).
Each bar is a query. When the bar is solid it means it spent processing time in the DB. In this example, the first two queries in the queue are two long-running queries that hold the queue until they are done. You can see with arrows when the queries finished, a pending query starts executing.
- Handling Pre-Aggregated Queries: When queries are already pre-aggregated, the story takes a different turn. The focus shifts to the amount of data being queried and its granularity. For queries spanning over two months or multiple years, adjusting the granularity to days/weeks or weeks/months, respectively, becomes crucial. Additionally, managing the number of partitions and ensuring they are optimally sized (not exceeding 50 MBs for 2 cube store workers) is key. This might involve increasing the number of Cube Store Workers if you want to keep partition size large or reducing partition granularity and removing unused dimensions to decrease size of partition.
Lastly, please take into account that we recommend using at most five partitions with a single query. It’s ok to have more than five partitions built but querying more than five negatively affects performance.
Enhancing Performance with Indexes: The Next Chapter in Our Quest for Speed After navigating the intricacies of handling pre-aggregated queries, our journey towards achieving sub-second query performance introduces a new, powerful ally: the use of indexes. In the realm of Cube Cloud, indexes are not just tools, but catalysts that can significantly boost query speed. They act like signposts, guiding the system to retrieve data more efficiently and swiftly.
Incorporating indexes into our strategy comes with a clear directive. These indexes, when applied correctly, can drastically reduce the time taken to process queries, especially in scenarios where large datasets are involved. It's akin to finding a shortcut in a labyrinth; suddenly, paths that seemed long and winding are now quick and direct.
The art of using indexes effectively involves understanding which columns to index based on the queries frequently executed. This decision is not arbitrary but grounded in the patterns of data access and retrieval specific to each application.
For those embarking on this phase of optimization, Cube Cloud provides comprehensive guidance on harnessing the power of indexes to enhance query performance. By visiting Using Indexes in Cube Cloud, developers and data engineers can unlock deeper insights into this crucial aspect of query optimization.
In summary, indexes stand as a testament to the continuous quest for performance excellence in the world of embedded analytics. They represent the fine-tuning of an already robust system, ensuring that end users experience the joy of instant data visualization, making the wait for chart loading a relic of the past.
Overcoming Pre-Aggregation Build Failures: A Path to Stability
Another chapter in our story addresses the challenges of pre-aggregation build failures. This issue often arises from resource constraints when multiple pre-aggregations or partitions are built simultaneously.
The strategy here is to lower the CUBEJS_SCHEDULED_REFRESH_CONCURRENCY, thus reducing the concurrent building of pre-aggregations and partitions. Additionally, ensuring that queries hit the correct pre-aggregation and optimizing them based on memory limits and the EXPLAIN ANALYZE form crucial parts of the solution.
Conclusion: A Story of Success
Transforming this guide into a narrative illustrates how Cube Cloud can be effectively utilized to achieve sub-second query performance in an embedded analytics environment. It's a journey that starts with identifying bottlenecks, extends through optimizing query processes and managing resources efficiently, and ends with providing users the swift, seamless experience they desire.
By following these guidelines, developers and data professionals can write their own success stories, turning slow, cumbersome analytics into a tale of speed and efficiency with Cube Cloud.