While experimenting with building a simple Twitter clone in Postgres, I found that I needed a way to parse hashtags and mentions from tweets like:
#example tweet - #testing with @postgresql
Imagine that we have a table called tweets defined with the following structure:
CREATE TABLE tweets (  
  id        uuid PRIMARY KEY NOT NULL DEFAULT uuid_generate_v4(),
  post      text NOT NULL,
  hashtags  text[] NOT NULL DEFAULT '{}',
  mentions  text[] NOT NULL DEFAULT '{}'
);
Wouldn't it be nice if the hashtag and mention tokens were automatically parsed when we INSERT posts into the tweets table like the following:
INSERT INTO tweets (post)  
VALUES ('#example tweet - #testing with @postgresql');
SELECT * FROM tweets;
                   id                  |                     post                   |      hashtags     |   mentions
---------------------------------------+--------------------------------------------+-------------------+--------------
 e133820e-7329-4852-b40b-6e9b7e2fa69d  | #example tweet - #testing with @postgresql | {example,testing} | {postgresql}
(1 row)
It turns out that this is pretty easy to achieve with Postgres! First we need to define a function to parse tokens from content and return an array of text.
CREATE FUNCTION parse_tokens(content text, prefix text)  
  RETURNS text[] AS $$
    DECLARE
      regex text;
      matches text;
      subquery text;
      captures text;
      tokens text[];
    BEGIN
      regex := prefix || '(\S+)';
      matches := 'regexp_matches($1, $2, $3) as captures';
      subquery := '(SELECT ' || matches || ' ORDER BY captures) as matches';
      captures := 'array_agg(matches.captures[1])';
      EXECUTE 'SELECT ' || captures || ' FROM ' || subquery
      INTO tokens
      USING LOWER(content), regex, 'g';
      IF tokens IS NULL THEN
        tokens = '{}';
      END IF;
      RETURN tokens;
    END;
  $$ LANGUAGE plpgsql STABLE;
Let's test it out by parsing hashtags from a tweet:
SELECT parse_tokens('#example tweet - #testing with @postgresql', '#');
       tokens
-------------------
 {example,testing}
 (1 row)
Parsing mentions from a tweet is just as simple:
SELECT parse_tokens('#example tweet - #testing with @postgresql', '@');
    tokens
--------------
 {postgresql}
 (1 row)
Now that our parse_tokens function is working, we need to define some triggers to parse hashtags and mentions when a tweet record is inserted or updated.
CREATE TRIGGER parse_hashtags  
  BEFORE INSERT OR UPDATE ON tweets
  FOR EACH ROW EXECUTE PROCEDURE parse_hashtags_from_post();
CREATE FUNCTION parse_hashtags_from_post()  
  RETURNS trigger AS $$
    BEGIN
      NEW.hashtags = parse_tokens(NEW.post, '#');
      RETURN NEW;
    END;
  $$ LANGUAGE plpgsql;
CREATE TRIGGER parse_mentions  
  BEFORE INSERT OR UPDATE ON tweets
  FOR EACH ROW EXECUTE PROCEDURE parse_mentions_from_post();
CREATE FUNCTION parse_mentions_from_post()  
  RETURNS trigger AS $$
    BEGIN
      NEW.mentions = parse_tokens(NEW.post, '@');
      RETURN NEW;
    END;
  $$ LANGUAGE plpgsql;
Now when we create or update tweets the hashtags and mentions fields are automatically updated! This whole process was pretty fun and interesting to get working. I look forward to attempting to push even more logic down into Postgres!