To identify columns in Google Cloud Platform's BigQuery table that only contain null values, you would need to examine each column individually using SQL queries.
For a single column, you can use a query like this:
SELECT COUNT(*)
FROM `project.dataset.table`
WHERE column_name IS NOT NULL
This will return a count of non-null values in the column. If the count is 0, then all values in the column are null.
However, if you have numerous columns and want to check all of them, you can leverage the INFORMATION_SCHEMA
view to dynamically generate and execute the queries. Here's an example:
DECLARE columns ARRAY<STRING>;
DECLARE queries STRING;
DECLARE i INT64 DEFAULT 0;
SET columns = (
SELECT ARRAY_AGG(column_name)
FROM `projectname.dataset.INFORMATION_SCHEMA.COLUMNS`
WHERE table_name = 'tablename'
);
SET queries = (
SELECT STRING_AGG("SELECT '" || column_name || "' AS column_name, COUNT(*) AS non_nulls FROM `projectId.dataset.tablename` WHERE " || column_name || " IS NOT NULL", ' UNION ALL ')
FROM UNNEST(columns) AS column_name
);
EXECUTE IMMEDIATE """
CREATE TEMPORARY TABLE results AS
SELECT
column_name,
non_nulls
FROM (
""" || queries || """
);
""";
EXECUTE IMMEDIATE """
SELECT column_name
FROM results
WHERE non_nulls = 0;
""";