While experimenting with building a simple Twitter clone in Postgres, I found that I needed a way to parse hashtags and mentions from tweets like:
#example tweet - #testing with @postgresql
Imagine that we have a table called tweets
defined with the following structure:
CREATE TABLE tweets (
id uuid PRIMARY KEY NOT NULL DEFAULT uuid_generate_v4(),
post text NOT NULL,
hashtags text[] NOT NULL DEFAULT '{}',
mentions text[] NOT NULL DEFAULT '{}'
);
Wouldn't it be nice if the hashtag and mention tokens were automatically parsed when we INSERT
posts into the tweets
table like the following:
INSERT INTO tweets (post)
VALUES ('#example tweet - #testing with @postgresql');
SELECT * FROM tweets;
id | post | hashtags | mentions
---------------------------------------+--------------------------------------------+-------------------+--------------
e133820e-7329-4852-b40b-6e9b7e2fa69d | #example tweet - #testing with @postgresql | {example,testing} | {postgresql}
(1 row)
It turns out that this is pretty easy to achieve with Postgres! First we need to define a function to parse tokens from content and return an array of text.
CREATE FUNCTION parse_tokens(content text, prefix text)
RETURNS text[] AS $$
DECLARE
regex text;
matches text;
subquery text;
captures text;
tokens text[];
BEGIN
regex := prefix || '(\S+)';
matches := 'regexp_matches($1, $2, $3) as captures';
subquery := '(SELECT ' || matches || ' ORDER BY captures) as matches';
captures := 'array_agg(matches.captures[1])';
EXECUTE 'SELECT ' || captures || ' FROM ' || subquery
INTO tokens
USING LOWER(content), regex, 'g';
IF tokens IS NULL THEN
tokens = '{}';
END IF;
RETURN tokens;
END;
$$ LANGUAGE plpgsql STABLE;
Let's test it out by parsing hashtags from a tweet:
SELECT parse_tokens('#example tweet - #testing with @postgresql', '#');
tokens
-------------------
{example,testing}
(1 row)
Parsing mentions from a tweet is just as simple:
SELECT parse_tokens('#example tweet - #testing with @postgresql', '@');
tokens
--------------
{postgresql}
(1 row)
Now that our parse_tokens
function is working, we need to define some triggers to parse hashtags and mentions when a tweet record is inserted or updated.
CREATE TRIGGER parse_hashtags
BEFORE INSERT OR UPDATE ON tweets
FOR EACH ROW EXECUTE PROCEDURE parse_hashtags_from_post();
CREATE FUNCTION parse_hashtags_from_post()
RETURNS trigger AS $$
BEGIN
NEW.hashtags = parse_tokens(NEW.post, '#');
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER parse_mentions
BEFORE INSERT OR UPDATE ON tweets
FOR EACH ROW EXECUTE PROCEDURE parse_mentions_from_post();
CREATE FUNCTION parse_mentions_from_post()
RETURNS trigger AS $$
BEGIN
NEW.mentions = parse_tokens(NEW.post, '@');
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
Now when we create or update tweets the hashtags
and mentions
fields are automatically updated! This whole process was pretty fun and interesting to get working. I look forward to attempting to push even more logic down into Postgres!