Bigquery Arrays

Melody Xu
4 min readAug 27, 2019

--

I was testing a few things about Bigquery arrays with my coworker Reese. The table below was created by her.

Create an array in Bigquery

Reese’s way

My way

They are essentially the same. Union all is ‘,’ and array() is []. I am just lazier to type the words, plus I do not like having select inside select. I prefer the logic to be shown in the simplest way in the code.

The output table looks like this

Array_to_string and to_json_string

The next thing we tried is to make the table shown as string, not array.

We name the table we just created data. So the simplest way Reese used is below

select name, TO_JSON_STRING(pets), sum(pageviews)
from data
group by 1,2

The output is below

Updated with Mosha Pasumansky ‘s comment

Another way to make it string format is by using

select name, format(“%t”,pets)pets,pageviews
from data

And the results look like below

Array Aggregation

Next trying to combine the arrays

select data.name, array_agg(p)
from data, unnest(pets) as p
group by 1

select data.name, array_agg(p)
, array_agg(pageviews) as page_views

from data, unnest(pets) as p
group by 1

If adding pageviews to the array_agg, the pageviews were populated for each row. So it is kind of duplicated.

If you do a sum for the pageviews, you will get the sum of all the duplicated rows.

select data.name, array_agg(p)
, sum(pageviews) as pageviews
from data, unnest(pets) as p
group by 1

So currently the only way we can think of to get the data we want is by doing to_json_string.

Query from JSON string

I was reading Bigquery’s documentation about json_extract and it has this example.

SELECT JSON_EXTRACT(json_text, '$.class.students[0]') AS first_student
FROM UNNEST([
'{"class" : {"students" : [{"name" : "Jane"}]}}',
'{"class" : {"students" : []}}',
'{"class" : {"students" : [{"name" : "John"}, {"name": "Jamie"}]}}'
]) AS json_text;
+-----------------+
| first_student |
+-----------------+
| {"name":"Jane"} |
| NULL |
| {"name":"John"} |
+-----------------+

I tried the same on my pets dataset above and it did not work. Turned out it is because my json string is in a list. To make it work, the syntax should be

select pets, json_extract_scalar(pets,’$[0].name’) as name

How to keep rows with empty arrays

Reese and I recently found that if using unnest directly, the rows with empty arrays will not appear. The solution is to left join unnest. To demonstrate, first we created a table with empty arrays.

create table array_test (
some_field string,
some_other_field array<struct<a string, b string>>
) as (
select ‘a’ some_field, null some_other_field union all
select ‘b’ some_field, [] some_other_field union all
select ‘c’ some_field, [struct(‘:)’ as a, ‘:(‘ as b), struct(‘:D’ as a, ‘>:(‘ as b)]
)

Now we use unnest directly

SELECT
some_field,
a
FROM
array_test,
UNNEST(some_other_field)

It will only give you the none empty row

Now if we use left join unnest below

SELECT
some_field,
a
FROM
array_test
LEFT JOIN
UNNEST(some_other_field)

The results include the rows with empty arrays.

Resources:

Array_agg: https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#array_agg

Array_to_string: https://cloud.google.com/bigquery/docs/reference/standard-sql/arrays#converting-arrays-to-strings

Array_to_string: https://cloud.google.com/bigquery/docs/reference/standard-sql/array_functions#array_to_string

To_json_string: https://cloud.google.com/bigquery/docs/reference/standard-sql/json_functions#to_json_string

Json_extract and json_extract_scalar:

https://cloud.google.com/bigquery/docs/reference/standard-sql/json_functions#to_json_string

--

--

Melody Xu
Melody Xu

Responses (1)