Create a Google BigQuery Data Source
Private preview
Google BigQuery is available to select accounts. Reach out to your Immuta representative for details.
Requirements
CREATE_DATA_SOURCE
Immuta permission-
Google BigQuery roles:
roles/bigquery.metadataViewer
on the source table (if managed at that level) or datasetroles/bigquery.dataViewer
(or higher) on the source table (if managed at that level) or datasetroles/bigquery.jobUser
on the project
Prerequisites
Create a Google Cloud service account for creating Google BigQuery data sources
Google BigQuery data sources in Immuta must be created using a Google Cloud service account rather than a Google Cloud user account. If you do not currently have a service account for the Google Cloud project separate from the Google Cloud service account you created when configuring the Google BigQuery integration, you must create a Google Cloud service account with privileges to view and run queries against the tables you are protecting.
You have two options to create the required Google Cloud service account:
Create a service account using the Google Cloud web console
-
Using the Google Cloud documentation, create a service account with the following roles:
- BigQuery User
- BigQuery Data Viewer
-
Using the Google Cloud documentation, generate a service account key for the account you just created.
Create a service account using gcloud
- Copy the script below and update the SERVICE_ACCOUNT, PROJECT_ID, and
IMMUTA_GCP_KEY_FILE
values.- SERVICE_ACCOUNT is the name for the new service account.
- PROJECT_ID is the project ID for the Google Cloud Project that is integrated with Immuta.
IMMUTA_GCP_KEY_FILE
is the path to a new output file for the private key.
-
Use the script below in the
gcloud
command line. This script is a template; change values as necessary:# Fill these out # Please use .json extension for key export SERVICE_ACCOUNT=datasource-account export PROJECT_ID=project123 export IMMUTA_GCP_KEY_FILE=~/GCP_${SERVICE_ACCOUNT}_key.json # Create service account for creating data sources gcloud iam service-accounts create ${SERVICE_ACCOUNT} --project ${PROJECT_ID} # Generate keyfile gcloud iam service-accounts keys create ${IMMUTA_GCP_KEY_FILE} --iam-account=${SERVICE_ACCOUNT}@${PROJECT_ID}.iam.gserviceaccount.com # Allow account to execute queries #gcloud projects add-iam-policy-binding ${PROJECT_ID} \ #--member="serviceAccount:${SERVICE_ACCOUNT}@${PROJECT_ID}.iam.gserviceaccount.com" --role=projects/${PROJECT_ID}/roles/bigquery.user gcloud projects add-iam-policy-binding ${PROJECT_ID} \ --member="serviceAccount:${SERVICE_ACCOUNT}@${PROJECT_ID}.iam.gserviceaccount.com" --role=roles/bigquery.user # Allow account to view data gcloud projects add-iam-policy-binding ${PROJECT_ID} \ --member="serviceAccount:${SERVICE_ACCOUNT}@${PROJECT_ID}.iam.gserviceaccount.com" --role=roles/bigquery.dataViewer echo if something went wrong and you want to delete the service account, run: echo gcloud iam service-accounts delete ${SERVICE_ACCOUNT}@${PROJECT_ID}.iam.gserviceaccount.com --project ${PROJECT_ID}
Register data sources in Immuta
Required Google BigQuery roles
Ensure that the user creating the data source has these Google BigQuery roles:
roles/bigquery.metadataViewer
on the source table (if managed at that level) or datasetroles/bigquery.dataViewer
(or higher) on the source table (if managed at that level) or datasetroles/bigquery.jobUser
on the project
- Click the + button in the top-left corner of the screen and select New Data Source.
- Select the Google BigQuery tile in the Data Platform section.
- Complete these fields in the Connection Information box:
- Account Email Address: Enter the email address of a user with access to the dataset and tables. This is the account created in the Google BigQuery configuration guide.
- Project: Enter the name of the project that has been integrated with Immuta.
- Dataset: Enter the name of the dataset with the tables you want Immuta to ingest.
- Upload a BigQuery Key File in the modal. Note that the account in the key file must match the account email address entered in the previous step.
- Click the Test Connection button. If the connection is successful, a check mark and successful connection notification will appear and you will be able to proceed. If an error occurs when attempting to connect, the error will be displayed in the UI. In order to proceed to the next step of data source creation, you must be able to connect to this data source using the connection information that you just entered.
- Decide how to virtually populate the data source by selecting one of the options:
- Create sources for all tables in this database: This option will create data sources and keep them in sync for every table in the dataset. New tables will be automatically detected and new Immuta views will be created.
- Schema / Table: This option will allow you to specify tables or datasets that you want Immuta to register.
- Provide basic information about your data source to make it discoverable to users.
- Enter the SQL Schema Name Format to be the SQL name that the data source exists under in Immuta. For BigQuery the schema will be the BigQuery dataset. The format must include a schema macro but you may personalize it using lowercase letters, numbers, and underscores to personalize the format. It can have up to 255 characters.
- Enter the Schema Project Name Format to be the name of the schema project in the Immuta UI. This is an Immuta
project that will hold all of the metadata for the tables in a single dataset.
- When selecting Create sources for all tables in this database and monitor for changes, you may personalize this field as you wish, but it must include a schema macro to represent the dataset name.
- When selecting Schema/Table, this field is pre-populated with the recommended project name and you can edit freely.
- Select the Data Source Name Format, which will be the format of the name of the data source in the Immuta UI.
<Tablename>
: The Immuta data source will have the same name as the original table.<Schema><Tablename>
: The Immuta data source will have both the dataset and original table name.- Custom: This is a template you create to make the data source name. You may personalize this field
as you wish, but it must include a tablename macro. The case of the macro will apply to the data source
name (i.e.,
<Tablename>
will result in "Data Source Name,"<tablename>
will result in "data source name," and<TABLENAME>
will result in "DATA SOURCE NAME").
- Enter the SQL Table Name Format, which will be the format of the name of the table in Immuta. It must include a table name macro, but you may personalize the format using lowercase letters, numbers, and underscores. It may have up to 255 characters.
- When selecting the Schema/Table option, you can opt to enable schema monitoring by selecting the checkbox in this section. This step will only appear if all tables within a server have been selected for creation.
-
Optional Advanced Settings:
- Column Detection: To enable, select the checkbox in this section. This setting monitors when remote tables' columns have been changed, updates the corresponding data sources in Immuta, and notifies data owners of these changes. See schema projects overview to learn more about column detection.
-
Data Source Tags: Adding tags to your data source allows users to search for the data source using the tags and governors to apply global policies to the data source. Note if schema detection is enabled, any tags added now will also be added to the tables that are detected.
- Click the Edit button in the Data Source Tags section.
- Begin typing in the Search by Tag Name box to select your tag, and then click Add.
-
Click Create to save the data source(s).
Next steps
With data sources registered in Immuta, your organization can now start
- building global subscription and data policies to govern data.
- creating projects to collaborate.