Overview of BigQuery analytics  |  Google Cloud (2024)

This document describes how BigQuery processes queries and providesan overview of several features that are useful for data analytics.

BigQuery is optimized to run analytic queries on large datasets,including terabytes of data in seconds and petabytes in minutes. Understandingits capabilities and how it processes queries can help you maximize your dataanalysis investments.

To take a tour of BigQuery's data analytics features directlyin the Google Cloud console, click Take the tour.

Take the tour

Analytic workflows

BigQuery supports several data analysis workflows:

  • Ad hoc analysis. BigQuery usesGoogleSQL,the SQL dialect in BigQuery, to support ad hocanalysis. You can run queries in the Google Cloud console or throughthird-party toolsthat integrate with BigQuery.

  • Geospatial analysis. BigQuery uses geography data types andGoogleSQL geography functions to let you analyze and visualizegeospatial data. For information about these data types and functions, seeIntroduction to geospatial analytics.

  • Machine learning. BigQuery MLuses GoogleSQL queries to let you create and execute machinelearning (ML) models in BigQuery.

  • Business intelligence. BigQuery BI Engineis a fast, in-memory analysis service that lets youbuild rich, interactive dashboards and reports without compromisingperformance, scalability, security, or data freshness.

Queries

The primary unit of analysis in BigQuery is the SQL query.BigQuery has two SQL dialects:GoogleSQL and legacy SQL.GoogleSQL is the preferred dialect. It supportsSQL:2011 and includes extensions that support geospatial analysis or ML.

The following sections describe how BigQuery supports and runsdata queries.

Data sources

BigQuery lets you query the following types of data sources:

  • Data stored in BigQuery. You canload data into BigQuery foranalysis. You can also generate data by using data manipulation language(DML)statementsor by writing query results into a table. You can query data stored insingle-region or multi-region locations, but you cannot run a query againstmultiple locations even if one is a single-region location and the other isthe multi-region location containing that single-region location. For moreinformation, see Locations, reservations, andjobs.

  • External data. You can query various external data sources such otherGoogle Cloud storage services (like Cloud Storage) or databaseservices (like Spanner or Cloud SQL). For information about how toset up connections to external sources, seeIntroduction to external data sources

  • Multi-cloud data. You can query data that's stored in other public cloudssuch as AWS or Azure. For information on how to set up connections toAmazon S3 or Azure blob storage,read an introduction to BigQuery Omni.

  • Public datasets. If you don't have your own data, you can analyze any ofthe datasets that are available in thepublic dataset marketplace.

Query jobs

Jobs are actions that BigQuery runs on your behalf to load data, export data, query data, or copy data.

When you use the Google Cloud console or the bq tool to perform one of these jobs, a job resource is automatically created, scheduled, and run. You can also programmatically create a load, export, query, or copy job. When you create a job programmatically, BigQuery schedules and runs the job for you.

Because jobs can potentially take a long time to complete, they run asynchronously and can bepolled for their status. Shorter actions, such as listing resources or getting metadata, are notmanaged by a job resource.

Types of queries

After you load your data into BigQuery, you canquery the datausing one of the following query job types:

  • Interactive query jobs. Bydefault, BigQuery runs interactive (on-demand) query jobs assoon as possible.
  • Batch query jobs. With these jobs,BigQuery queues each batch query on your behalf and then startsthe query when idle resources are available, usually within a few minutes.

You can run interactive or batch query jobs by using the following methods:

  • Compose and run a query in the Google Cloud console.
  • Run the bq query command in the bq command-line tool.
  • Programmatically call thejobs.queryorjobs.insertmethod in the BigQueryREST API.
  • Use the BigQuery client libraries.

Saved and shared queries

BigQuery lets yousave queriesandshare querieswith others.

When you save a query, it can be private (visible only to you), shared at theproject level (visible to specific principals), or public (anyone can view it).For more information, seeWork with saved queries.

How BigQuery processes queries

Several processes occur when BigQuery runs a query:

  • Execution tree. When you run a query, BigQuerygenerates an execution tree that breaks the query into stages. These stagescontain steps that can run in parallel.

  • Shuffle tier. Stages communicate with one another by using a fast,distributed shuffle tier that stores intermediate data produced by theworkers of a stage. When possible, the shuffle tier leverages technologiessuch as a petabit network and RAM to quickly move data to worker nodes.

  • Query plan. When BigQuery has all the information that itneeds to run a query, it generates a query plan. You can view this plan inthe Google Cloud console and use it to troubleshoot or optimize queryperformance.

  • Query monitoring and dynamic planning. Besides the workers that performthe work of the query plan itself, additional workers monitor and direct theoverall progress of work throughout the system. As the query progresses,BigQuery might dynamically adjust the query plan to adapt tothe results of the various stages.

  • Query results. When a query is complete, BigQuery writesthe results to persistent storage and returns them to the user. This designlets BigQuery serve cached results the next time that query isrun.

Query concurrency and performance

The performance of queries that are run repeatedly on the same data cansometimes vary by milliseconds. Performance variances can occur because of theshared nature of the BigQuery environment, or becauseBigQuery dynamically adjusts the query plan while the query runs.For a typical busy system where many queries run concurrently,BigQuery uses several processes to smooth out variances in queryperformance:

  • BigQuery runs many queries in parallel, so there's rarely aneed to queue queries.

    In busy systems, queues are a major source of less-predictable performancebecause it's unclear how long a query might sit in the queue. The time a queryis in the queue can depend more on other queries that are running or are inthe queue than upon the qualities of the query itself.

  • As queries start and finish, BigQuery redistributesresources fairly between new and running queries. This process ensures thatquery performance doesn't depend on the order in which queries are submittedbut rather on the number of queries run at a given time.

Query optimization

After the query is complete, you canview the query planin the Google Cloud console. You can also request execution details by usingtheINFORMATION_SCHEMA.JOBS* viewsor thejobs.get REST API method.

The query plan includes details about query stages and steps. These details canhelp you identify ways to improve query performance. For example, if you noticea stage that writes a lot more output than other stages, it might mean that youneed to filter earlier in the query.

For more information about the query plan and query optimization, see thefollowing resources:

  • To learn more about the query plan and see examples of how the planinformation can help you to improve query performance, seeQuery plan and timeline.
  • For more information about query optimization in general, seeIntroduction to optimizing query performance.

Query monitoring

Monitoring and logging are crucial for running reliable applications in thecloud. BigQuery workloads are no exception, especially if yourworkload has high volumes or is mission critical. BigQueryprovides various metrics, logs, and metadata views to help you monitor yourBigQuery usage.

For more information, see the following resources:

  • To learn about monitoring options in BigQuery, seeIntroduction to BigQuery monitoring.
  • To learn about audit logs and how to analyze query behavior, seeBigQuery audit logs.

Query pricing

BigQuery offers two pricing models for analytics:

  • On-demand pricing.You pay for the data scanned by your queries. You have afixed,query-processing capacityfor each project,and your cost is based on the number of bytes processed.
  • Capacity-based pricing.You purchase dedicated query-processing capacity.

For information about the two pricing models and to learn more about making reservationsfor capacity-based pricing, see Introduction to reservations.

Quotas and query cost controls

BigQuery enforces project-level quotas on running queries. Forinformation on query quotas, seeQuotas and limits.

To control query costs, BigQuery provides several options,including custom quotas and billing alerts. For more information, seeCreating custom cost controls.

Data analytics features

BigQuery supports both descriptive and predictive analytics. Toquery your data directly to answer some statistical questions, you can use theGoogle Cloud console. To visually explore the data, such as for trends andanomalies, you can use tools likeTableauorLookerthat integrate with BigQuery.

BigQuery Studio

BigQuery Studio helps you discover, analyze, and runinference on data in BigQuery with the following features:

  • A robust SQL editor that provides codecompletion, query validation, andestimation of bytes processed.
  • Embedded Python notebooks built usingColab Enterprise.Notebooks provide one-click Python development runtimes, andbuilt-in support forBigQuery DataFrames.
  • A PySpark editorthat lets you create stored Python procedures for Apache Spark.
  • Asset management and version history for code assets such as notebooks andsaved queries, built on top ofDataform.
  • Assistive code development in the SQL editor and in notebooks, built on top ofGemini generative AI (Preview).
  • Dataplex features fordata discovery,and data profiling anddata quality scans.
  • The ability to view job historyon a per-user or per-project basis.
  • The ability to analyze saved query results by connecting to other tools suchas Looker and Google Sheets, and to export saved query results foruse in other applications.

To use BigQuery Studio, follow the instructions atEnable BigQuery Studio for asset management.This process enables the following APIs:

  • The Compute Engine API: required to execute Python functions in your project.
  • The Dataform API: required to store code assets, for example notebookfiles.
  • The Vertex AI API: required to execute Colab EnterprisePython notebooks in BigQuery.

BigQuery ML

BigQuery ML lets you use SQL in BigQuery to performmachine learning (ML) and predictive analytics. For more information,see Introduction to BigQuery ML.

Analytics tools integration

In addition to running queries in BigQuery, you can analyze yourdata with various analytics and business intelligence tools that integrate withBigQuery, such as the following:

  • Looker. Looker is an enterprise platform forbusiness intelligence, data applications, and embedded analytics. TheLooker platform works with many datastores includingBigQuery. For information on how to connectLooker to BigQuery, seeUsing Looker.

  • Looker Studio. After you run a query, you can launchLooker Studio directly from BigQuery in theGoogle Cloud console. Then, in Looker Studio you can createvisualizations and explore the data that's returned from the query. Forinformation about Looker Studio, seeLooker Studio overview.

  • Connected Sheets. You can also launchConnected Sheets directly from BigQuery in theconsole. Connected Sheets runsBigQuery queries on your behalf either upon your request or ona defined schedule. Results of those queries are saved in your spreadsheet foranalysis and sharing. For information about Connected Sheets,seeUsing connected sheets.

Third-party tool integration

Several third-party analytics tools work with BigQuery.For example, you can connectTableauto BigQuery data and use its visualization tools to analyze andshare your analysis. For more information on considerations when usingthird-party tools, seeThird-party tool integration.

ODBC and JDBC drivers are available and can be used to integrate yourapplication with BigQuery. The intent of these drivers is to helpusers leverage the power of BigQuery with existing tooling andinfrastructure. For information on latest release and known issues, seeODBC and JDBC drivers for BigQuery.

The pandas libraries like pandas-gbq let you interact withBigQuery data in Jupyter notebooks. For information about thislibrary and how it compares with using the BigQueryPython client library,seeComparison with pandas-gbq.

You can also use BigQuery with other notebooks and analysistools. For more information, seeProgrammatic analysis tools.

For a full list of BigQuery analytics and broader technologypartners, see thePartnerslist on the BigQuery product page.

What's next

  • For links to sample code and technical reference guides for common analyticsuse cases, seeSmart analytics reference patterns.
  • For an introduction and overview of supported SQL statements, seeIntroduction to SQL in BigQuery.
  • To learn about the GoogleSQL syntax used for querying data inBigQuery, seeQuery syntax in GoogleSQL.
  • For information about reading the query explain plan, seeUsing the query plan explanation.
  • To learn how to schedule a recurring query, seeScheduling queries.
Overview of BigQuery analytics  |  Google Cloud (2024)
Top Articles
Latest Posts
Article information

Author: Fr. Dewey Fisher

Last Updated:

Views: 5788

Rating: 4.1 / 5 (42 voted)

Reviews: 81% of readers found this page helpful

Author information

Name: Fr. Dewey Fisher

Birthday: 1993-03-26

Address: 917 Hyun Views, Rogahnmouth, KY 91013-8827

Phone: +5938540192553

Job: Administration Developer

Hobby: Embroidery, Horseback riding, Juggling, Urban exploration, Skiing, Cycling, Handball

Introduction: My name is Fr. Dewey Fisher, I am a powerful, open, faithful, combative, spotless, faithful, fair person who loves writing and wants to share my knowledge and understanding with you.