We’ve worked with a lot of customers who writes SQL on a regular basis. And from time to time they come to us with SQL question - that we thought would be interesting to share with others!
Today, one of our customers came to us with a question - They were having problems unnesting a string/array into rows in Amazon Redshift.
Question - Splitting comma-delimited string into rows in Redshift
Suppose we have a
books table in Amazon Redshift that has these data:
tags (varchar) ----- A A, B C, D E
And we want to split/unnest/explode it so that it becomes like this
tag ----- A A B C D E
In Postgres, we can immediately use
select TRIM( UNNEST( STRING_TO_ARRAY(tags, ',') ) ) from books;
But how do we do this in Amazon Redshift (that doesn’t support unnest)?
String to Array in Amazon Redshift
The solution (or workaround) is trying to split the string into multiple part:
with NS AS ( select 1 as n union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9 union all select 10 ) select TRIM(SPLIT_PART(B.tags, ',', NS.n)) AS tag from NS inner join books B ON NS.n <= REGEXP_COUNT(B.tags, ',') + 1
The above query will give you the exact results as using
unnest in Postgres.
Here, the NS (number sequence) is a CTE that returns a list of number from 1 to N, here we have to make sure that our max number is greater than the size of our maximum tags, so you can try adding more numbers to the list depending on your context.
Similar problem in MySQL
Similar approach can be applied if you’re using MySQL, though you have to change up the syntax/functions a little bit:
select TRIM(SUBSTRING_INDEX(SUBSTRING_INDEX(B.tags, ',', NS.n), ',', -1)) as tag from ( select 1 as n union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9 union all select 10 ) NS inner join books B ON NS.n <= CHAR_LENGTH(B.tags) - CHAR_LENGTH(REPLACE(B.tags, ',', '')) + 1
A few notes:
- MySQL doesn’t have CTE, so you have to write NS as a subquery
- MySQL doesn’t have
SPLIT_PART, so we use 2 nested
SUBSTRING_INDEXto achieve the same thing
- MySQL doesn’t have
REGEXP_COUNT, so we have to work around a bit to count the number of
,in the string.
What if I have an Redshift JSON array instead?
Okay, what if your
tags column is actually a string that contains a JSON string?
tags (string with json) ----- ['A'] ['A', 'B'] ['C', 'D'] ['E']
One approach is you can just treat them as string, and apply the same string_split above with some string cleaning (remove quotes, square brackets). Or you can use some JSON array functions:
with NS AS ( select 1 as n union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9 union all select 10 ) select TRIM(JSON_EXTRACT_ARRAY_ELEMENT_TEXT(B.tags, NS.n - 1)) AS val from NS inner join books B ON NS.n <= JSON_ARRAY_LENGTH(B.tags)
Working with array in Redshift and MySQL is usually a pain, but most of the times there’s always creative workaround that helps you achieve the same purpose!
I’ve compiled the list of examples with sample data here for your reference (you can copy and paste directly into your SQL terminal)