adam-gligor monologue about stuff, brain dump

Rows to columns using sub-queries in postgres

Converting rows to columns using sub-queries in postgres.

Scenario

Given two entities users and roles, each entity has an identifier and a name and users can have one or more roles (say up to 3).

How can this be modeled in postgres then queried to produce a list of users and associated roles.

Example.

Input

  • roles: reader and writer
  • users: John and Mary. John is reader and Mary is reader and writer.

Output

row name roles
1 John reader
2 Mary reader, writer

Option 1

First the obvious, using many to many relationship. This involves setting up three tables

CREATE TABLE public.users (id integer, name character varying)
CREATE TABLE public.roles (id integer, name character varying)
CREATE TABLE public.user_roles (user_id integer, role_id integer)

and the query

SELECT * 
FROM users 
JOIN user_roles ON user.id = user_role.user_id
JOIN roles ON role.id = user_role.role_id

which needs one further aggregation to get one row per user.

Option 2

Postgres supports array type and related sub-queries in the select clause, so is also possible to store the roles in an array column.

The db model.

CREATE TABLE public.users (id integer, name character varying, role_ids integer[], CONSTRAINT pk_users PRIMARY KEY (id))

CREATE TABLE public.roles (id integer, name character varying, CONSTRAINT pk_roles PRIMARY KEY (id))

fill some test data in.

INSERT INTO roles (id, name) SELECT generate_series(1,5000), 'role' || md5(random()::text)

INSERT INTO users (id, name, role_ids) 
SELECT *, 'role' || md5(random()::text), ARRAY[floor(1+ random() * 1000), floor(1+ random() * 1000)]  FROM generate_series(1,5000)

Then the query making use of a sub-query in the select clause. This time the result is already in the expected format, each user is one row and the roles names are also included.

SELECT
*, (SELECT ARRAY(SELECT name FROM roles WHERE id = ANY(u.role_ids))) AS "role_names"
FROM users AS u WHERE id IN (1,10,100,1000)

which returns role_names as array -or-

SELECT
*, (SELECT string_agg(name, ',') FROM roles WHERE id = ANY(u.role_ids)) as "role_names"
FROM users AS u WHERE id IN (1,10,100,1000)

which returns role_names as a concatenated string.

The execution plan for this query looks as follows

1.Index Scan using pk_users on public.users as u (...)
  Index Cond: (u.id = ANY ('{1,10,100,1000}'::integer[]))
  2.Result (...)
    3.Index Scan using pk_roles on public.roles as roles (...)
      Index Cond: (roles.id = ANY (u.role_ids))

And it requires two index scans to fulfill.