Materialised views is a new feature recently made available in BigQuery. A view represents a virtual table constructed with a SQL statement. Views have the advantage that they can be queried just like regular tables, but can be as complex as needed. They can include any type of join, subqueries or any other SQL statements.
On the other hand, one big disadvantage of using views is that they run the SQL query every time the view is queried. Depending on the size of the tables and the number of view invocations, this can prove costly both in terms of resources and time. In order to bypass this, BigQuery now offers the possibility of creating materialised views. Just like the name suggests, these are views where the query results persist.
In order to create a materialised view the BigQuery web interface can be used. The statement for creating them is:
CREATE MATERIALIZED VIEW project.dataset.view AS SELECT … FROM project.dataset.table
At the moment, materialised views suffer from a series of limitations. First of all, no joins or unnests can be used inside their queries. Secondly, both they and the original tables they are based on have to reside in the same dataset. In addition, no more than 20 materialised views can be created using a table. Some other operations which are not yet implemented are loading data into or exporting data from them.
In spite of these limitations, materialised views can prove very advantageous in certain scenarios. One such example are queries involving aggregations of the data. In this case, better performance can be achieved by making the results persistent. BigQuery guarantees that the materialised view will always present up-to-date results, as data refresh mechanisms are implemented in order to reflect any changes made to the original table.
Furthermore, if the query is using aggregations on top of joins, materialised views can still prove useful. In this case, it is worth investigating if the order of the operations can be changed. Moving aggregations to materialised views, and only joining them later, can bring a valuable increase in time performance.
Let’s consider for example the query below. Here, the first set of aggregations are performed inside the materialised view. Afterwards, the view is used for joining with the countries table and the final averages are computed. With this approach, the sums persist and only the averages are computed each time the query is run.
CREATE MATERIALIZED VIEW project.dataset.view AS SELECT country_code, category, SUM(cost) as category_cost FROM table GROUP BY country_code, category; SELECT country_name, AVG(category_cost) as country_cost FROM countries as c JOIN view as v ON c.country_code = v.country_code GROUP BY country_name;
If you need help in setting up materialised views or with BigQuery in general, feel free to contact us and we will do our best to find a solution together!