Sean Huber

Awesome Awesomeness

Sean Huber — Tue, 23 Jun 2015 23:58:00 GMT

I just came across this curated list of awesome curated lists! So meta, I love it!

Use the timestamptz shorthand for time zones in Postgres

Sean Huber — Fri, 22 May 2015 01:22:00 GMT

The following CREATE TABLE statements are equivalent:

CREATE TABLE users (  
  id          serial PRIMARY KEY,
  username    text NOT NULL,
  created_at  timestamptz
);

CREATE TABLE users (  
  id          serial PRIMARY KEY,
  username    text NOT NULL,
  created_at  timestamp WITH TIME ZONE
);

Sure, it only saves us a few words, but now we're aware of it for future use!

Read more about Postgres Date/Time types if you're interested!

Porting ActiveRecord "soft delete" behavior to Postgres

Sean Huber — Thu, 21 May 2015 23:25:00 GMT

Let's try to implement the behavior of libraries like paranoia and acts_as_paranoid in Postgres!

tldr - jump to the summary below!

We'll make it possible to restore deleted user accounts! Let's assume that we already have an existing users table with a few records.

CREATE TABLE users (  
  id        serial PRIMARY KEY,
  username  text NOT NULL
);

INSERT INTO users (username) VALUES ('sean'), ('sam'), ('doug');

Now let's SELECT * FROM users to see what we're working with.

 id | username 
----+----------
  1 | sean
  2 | sam
  3 | doug
(3 rows)

The first thing we can do is add a deleted_at timestamptz column to users. We should also add an index for records with deleted_at IS NULL since those are the ones that we'll want to scope most of our queries to.

ALTER TABLE users ADD COLUMN deleted_at timestamptz;

CREATE INDEX not_deleted ON users WHERE deleted_at IS NULL;

Let's SELECT * FROM users again to see the changes.

 id | username | deleted_at 
----+----------+------------
  1 | sean     | 
  2 | sam      | 
  3 | doug     | 
(3 rows)

Now that we've got our new column, let's create a TRIGGER that intercepts DELETE statements and set's deleted_at instead of actually removing the record.

CREATE TRIGGER soft_delete_user  
  BEFORE DELETE ON users
  FOR EACH ROW EXECUTE PROCEDURE soft_delete();

Let's check out out the implementation of soft_delete().

CREATE FUNCTION soft_delete()  
  RETURNS trigger AS $$
    DECLARE
      command text := ' SET deleted_at = current_timestamp WHERE id = $1';
    BEGIN
      EXECUTE 'UPDATE ' || TG_TABLE_NAME || command USING OLD.id;
      RETURN NULL;
    END;
  $$ LANGUAGE plpgsql;

There are a few interesting things here to note:

TG_TABLE_NAME - this variable contains the name of the table or view that initiated the TRIGGER call. In our case this variable evaluates to users. This allows us to reuse this same soft_delete function as a trigger on multiple tables! Check out the other special trigger variables.
current_timestamp - this returns the current timestamp with time zone. Check out the other built in date/time functions.
RETURN NULL - this instructs Postgres to do nothing instead of deleting the row.

Let's DELETE FROM users WHERE id = 1 and then SELECT * FROM users to test!

 id | username |          deleted_at           
----+----------+-------------------------------
  2 | sam      | 
  3 | doug     | 
  1 | sean     | 2015-05-21 15:33:50.164675-07
(3 rows)

Soft delete works! Now we can restore deleted records by calling SET deleted_at = NULL.

But hold on, there's a couple of issues with this implementation!

We can't actually DELETE records since the TRIGGER prevents it! What if we want to truncate users that we're deleted over a year ago?
We'll probably need to add WHERE deleted_at IS NULL conditions to multiple user related queries since we want to interact with the active ones most of the time.

We can solve these issues by creating a view!

CREATE VIEW users_without_deleted AS  
  SELECT * FROM users WHERE deleted_at IS NULL;

Let's check out SELECT * FROM users_without_deleted to see if it works.

 id | username | deleted_at 
----+----------+------------
  2 | sam      | 
  3 | doug     | 
(2 rows)

Success! The user with id = 1 does not exist in the results!

Most of the time we want to work with active users, not ALL users including the deleted ones. For convenience, let's rename our table and view.

ALTER TABLE users RENAME TO users_with_deleted;  
ALTER VIEW users_without_deleted RENAME TO users;

Now we can SELECT from users and not have to worry about adding WHERE deleted_at IS NULL all over the place! Since it's a simple view, we can INSERT, UPDATE, and DELETE to users as well!

Let's move our TRIGGER from our users_with_deleted table over to the users view instead. This allows us to soft delete when working with users and hard delete when working with users_with_deleted!

ALTER TABLE users_with_deleted DROP TRIGGER soft_delete_user;

CREATE TRIGGER soft_delete_user  
  INSTEAD OF DELETE ON users
  FOR EACH ROW EXECUTE PROCEDURE soft_delete();

Let's review what we've got using SELECT * FROM users_with_deleted.

 id | username |          deleted_at           
----+----------+-------------------------------
  2 | sam      | 
  3 | doug     | 
  1 | sean     | 2015-05-21 15:33:50.164675-07
(3 rows)

We should be able to soft delete from the users view.

DELETE FROM users WHERE id = 2;

SELECT * FROM users_with_deleted;

 id | username |          deleted_at           
----+----------+-------------------------------
  2 | sam      | 2015-05-21 15:34:30.164675-07
  3 | doug     | 
  1 | sean     | 2015-05-21 15:33:50.164675-07
(3 rows)

Now let's try to hard delete from the users_with_deleted table.

DELETE FROM users_with_deleted WHERE id = 2;

SELECT * FROM users_with_deleted;

 id | username |          deleted_at           
----+----------+-------------------------------
  3 | doug     | 
  1 | sean     | 2015-05-21 15:33:50.164675-07
(2 rows)

Success! When we SELECT * FROM users there should only be one record!

 id | username |          deleted_at           
----+----------+-------------------------------
  3 | doug     | 
(1 row)

Thaigo brought up a great question in the comments below:

What if we want to add a UNIQUE index to username, but only for the non-deleted ones?

If someone deletes their account, we need to allow them to recreate a new account in the future. This means that we can't solve it with a regular unique index like the following because the old deleted user still actually exists in the table.

CREATE UNIQUE INDEX unique_username ON users_with_deleted (username);

Instead, we can solve this issue by using a partial index!

CREATE UNIQUE INDEX unique_username ON users_with_deleted (username) WHERE deleted_at IS NULL;

Now the index is only applied to non-deleted users!

tldr

Here's a summary of what we ended up with:

CREATE TABLE users_with_deleted (  
  id          serial PRIMARY KEY,
  username    text NOT NULL,
  deleted_at  timestamptz
);

CREATE INDEX not_deleted ON users_with_deleted WHERE deleted_at IS NULL;

CREATE UNIQUE INDEX unique_username ON users_with_deleted (username) WHERE deleted_at IS NULL;

CREATE VIEW users AS  
  SELECT * FROM users_with_deleted WHERE deleted_at IS NULL;

CREATE FUNCTION soft_delete()  
  RETURNS trigger AS $$
    DECLARE
      command text := ' SET deleted_at = current_timestamp WHERE id = $1';
    BEGIN
      EXECUTE 'UPDATE ' || TG_TABLE_NAME || command USING OLD.id;
      RETURN NULL;
    END;
  $$ LANGUAGE plpgsql;

CREATE TRIGGER soft_delete_user  
  INSTEAD OF DELETE ON users
  FOR EACH ROW EXECUTE PROCEDURE soft_delete();

If you enjoyed this, check out how to port ActiveRecord validations and counter cache behavior to Postgres as well!

Porting ActiveRecord "counter cache" behavior to Postgres

Sean Huber — Thu, 21 May 2015 04:01:00 GMT

Let's try to mimic the following belongs_to relationship with counter_cache behavior in Postgres!

tldr - jump to the summary below!

class Comment < ActiveRecord::Base  
  belongs_to :post, counter_cache: true
end

class Post < ActiveRecord::Base  
end

First we need to create the relevant tables and columns.

CREATE TABLE posts (  
  id              serial PRIMARY KEY,
  title           text NOT NULL,
  body            text NOT NULL,
  comments_count  integer NOT NULL DEFAULT 0
);

CREATE TABLE comments (  
  id       serial PRIMARY KEY,
  post_id  integer NOT NULL REFERENCES posts(id),
  body     text NOT NULL
);

Then let's INSERT a sample post.

INSERT INTO posts (title, body) VALUES  
  ('My first post!', 'Sample content.');

Now we can SELECT * FROM posts to see what we're working with.

 id |     title      |      body       | comments_count 
----+----------------+-----------------+----------------
  1 | My first post! | Sample content. |              0
(1 row)

Since we don't have any comments yet, the comments_count is already correctly set to 0. We need to define a TRIGGER to increment or decrement the counter cache for a few different cases:

when a new comment is added with INSERT
when a comment is removed with DELETE
when a comment's post_id is changed with UPDATE

CREATE TRIGGER update_post_comments_count  
  AFTER DELETE OR INSERT OR UPDATE ON comments
  FOR EACH ROW EXECUTE PROCEDURE counter_cache('posts', 'comments_count', 'post_id');

Now that our trigger is in place we'll need to define the functions that it relies on. Let's start by creating a function called increment_counter to handle the actual increment and decrement updates.

CREATE FUNCTION increment_counter(table_name text, column_name text, id integer, step integer)  
  RETURNS VOID AS $$
    DECLARE
      table_name text := quote_ident(table_name);
      column_name text := quote_ident(column_name);
      conditions text := ' WHERE id = $1';
      updates text := column_name || '=' || column_name || '+' || step;
    BEGIN
      EXECUTE 'UPDATE ' || table_name || ' SET ' || updates || conditions
      USING id;
    END;
  $$ LANGUAGE plpgsql;

If we call increment_counter('posts', 'comments_count', 99, 1) then it executes the following SQL statement:

UPDATE posts SET comments_count = comments_count + 1 WHERE id = 99;

Now we just need to define the counter_cache function that ties our TRIGGER and increment_counter behavior together.

CREATE FUNCTION counter_cache()  
  RETURNS trigger AS $$
    DECLARE
      table_name text := quote_ident(TG_ARGV[0]);
      counter_name text := quote_ident(TG_ARGV[1]);
      fk_name text := quote_ident(TG_ARGV[2]);
      fk_changed boolean := false;
      fk_value integer;
      record record;
    BEGIN
      IF TG_OP = 'UPDATE' THEN
        record := NEW;
        EXECUTE 'SELECT ($1).' || fk_name || ' != ' || '($2).' || fk_name
        INTO fk_changed
        USING OLD, NEW;
      END IF;

      IF TG_OP = 'DELETE' OR fk_changed THEN
        record := OLD;
        EXECUTE 'SELECT ($1).' || fk_name INTO fk_value USING record;
        PERFORM increment_counter(table_name, counter_name, fk_value, -1);
      END IF;

      IF TG_OP = 'INSERT' OR fk_changed THEN
        record := NEW;
        EXECUTE 'SELECT ($1).' || fk_name INTO fk_value USING record;
        PERFORM increment_counter(table_name, counter_name, fk_value, 1);
      END IF;

      RETURN record;
    END;
  $$ LANGUAGE plpgsql;

There are some interesting things here to note:

We're using some special trigger variables!
TG_ARGV - this variable contains an array of the arguments that the trigger function was called with. Trigger functions do not support declaring named parameters!
TG_OP - contains the name of the operation that fired the trigger e.g. DELETE, INSERT, or UPDATE.
NEW and OLD - the NEW object refers to the newly inserted row. The OLD object refers to a deleted row. Since our trigger supports both INSERT and DELETE statements, we need to make sure that we're referring to the correct object.
Only triggers fired by UPDATE statements have both NEW and OLD objects present.
RETURN record - our trigger needs to return the record being inserted or deleted, either NEW or OLD in this case.

Let's INSERT some comments to see if the comments_count increments correctly for the post.

INSERT INTO comments (post_id, body) VALUES  
  (1, 'This is a comment!'),
  (1, 'And this is another comment!'),
  (1, 'One more comment!');

If we SELECT * FROM posts again we should see an updated comments_count.

 id |     title      |      body       | comments_count 
----+----------------+-----------------+----------------
  1 | My first post! | Sample content. |              3
(1 row)

Success! Now if we delete the first comment then it should decrement the post's comments_count as well!

DELETE FROM comments WHERE id = 1;

SELECT * FROM posts;

 id |     title      |      body       | comments_count 
----+----------------+-----------------+----------------
  1 | My first post! | Sample content. |              2
(1 row)

Let's try changing the a comment's post_id to make sure it handles that case correctly. First we'll need to add a second post.

INSERT INTO posts (title, body) VALUES  
  ('Another post!!', 'Sample content.');

Then let's SELECT * FROM posts and SELECT * FROM comments to review our data.

 id |     title      |      body       | comments_count 
----+----------------+-----------------+----------------
  1 | My first post! | Sample content. |              2
  2 | Another post!! | Sample content. |              0
(2 rows)

 id | post_id |   body    
----+---------+-----------
  2 |       1 | Comment 2
  3 |       1 | Comment 3
(2 rows)

If we UPDATE comments SET post_id = 2 WHERE id = 2 then we should see comments_count change for both posts!

SELECT * FROM posts;

 id |     title      |      body       | comments_count 
----+----------------+-----------------+----------------
  1 | My first post! | Sample content. |              1
  2 | Another post!! | Sample content. |              1
(2 rows)

SELECT * FROM comments;

 id | post_id |   body    
----+---------+-----------
  2 |       2 | Comment 2
  3 |       1 | Comment 3
(2 rows)

Awesome! It's pretty cool that we can define "macro" like triggers that are reusable across tables!

tldr

Here's a summary of what we ended up with:

CREATE TABLE posts (  
  id              serial PRIMARY KEY,
  title           text NOT NULL,
  body            text NOT NULL,
  comments_count  integer NOT NULL DEFAULT 0
);

CREATE TABLE comments (  
  id       serial PRIMARY KEY,
  post_id  integer NOT NULL REFERENCES posts(id),
  body     text NOT NULL
);

CREATE FUNCTION increment_counter(table_name text, column_name text, id integer, step integer)  
  RETURNS VOID AS $$
    DECLARE
      table_name text := quote_ident(table_name);
      column_name text := quote_ident(column_name);
      conditions text := ' WHERE id = $1';
      updates text := column_name || '=' || column_name || '+' || step;
    BEGIN
      EXECUTE 'UPDATE ' || table_name || ' SET ' || updates || conditions
      USING id;
    END;
  $$ LANGUAGE plpgsql;

CREATE FUNCTION counter_cache()  
  RETURNS trigger AS $$
    DECLARE
      table_name text := quote_ident(TG_ARGV[0]);
      counter_name text := quote_ident(TG_ARGV[1]);
      fk_name text := quote_ident(TG_ARGV[2]);
      fk_changed boolean := false;
      fk_value integer;
      record record;
    BEGIN
      IF TG_OP = 'UPDATE' THEN
        record := NEW;
        EXECUTE 'SELECT ($1).' || fk_name || ' != ' || '($2).' || fk_name
        INTO fk_changed
        USING OLD, NEW;
      END IF;

      IF TG_OP = 'DELETE' OR fk_changed THEN
        record := OLD;
        EXECUTE 'SELECT ($1).' || fk_name INTO fk_value USING record;
        PERFORM increment_counter(table_name, counter_name, fk_value, -1);
      END IF;

      IF TG_OP = 'INSERT' OR fk_changed THEN
        record := NEW;
        EXECUTE 'SELECT ($1).' || fk_name INTO fk_value USING record;
        PERFORM increment_counter(table_name, counter_name, fk_value, 1);
      END IF;

      RETURN record;
    END;
  $$ LANGUAGE plpgsql;

CREATE TRIGGER update_post_comments_count  
  AFTER INSERT OR UPDATE OR DELETE ON comments
  FOR EACH ROW EXECUTE PROCEDURE counter_cache('posts', 'comments_count', 'post_id');

If you enjoyed this, check out how to port ActiveRecord validations and soft delete behavior to Postgres as well!

Porting ActiveRecord validations to Postgres

Sean Huber — Wed, 20 May 2015 06:52:00 GMT

Let's try to port some ActiveRecord validations to Postgres using constraints!

validates_presence_of

class User < ActiveRecord::Base  
  validates_presence_of :email
end

ALTER TABLE users ALTER COLUMN email SET NOT NULL;  
ALTER TABLE users ADD CONSTRAINT email_presence CHECK (char_length(email) > 0);

validates_uniqueness_of

class User < ActiveRecord::Base  
  validates_uniqueness_of :email
end

ALTER TABLE users ADD CONSTRAINT email_uniqueness UNIQUE (email);

class User < ActiveRecord::Base  
  validates_uniqueness_of :email, case_sensitive: false
end

ALTER TABLE users ALTER COLUMN email TYPE citext;  
ALTER TABLE users ADD CONSTRAINT email_uniqueness UNIQUE (email);

class User < ActiveRecord::Base  
  validates_uniqueness_of :email, scope: :account_id
end

ALTER TABLE users ADD CONSTRAINT email_uniqueness UNIQUE (email, account_id);

validates_numericality_of

class User < ActiveRecord::Base  
  validates_numericality_of :age, greater_than_or_equal_to: 18
end

ALTER TABLE users ADD CONSTRAINT age_numericality check (age >= 18);

class User < ActiveRecord::Base  
  validates_numericality_of :age, equal_to: 50
end

ALTER TABLE users ADD CONSTRAINT age_numericality check (age = 50);

class User < ActiveRecord::Base  
  validates_numericality_of :age, odd: true
end

ALTER TABLE users ADD CONSTRAINT age_numericality check (age % 2 != 0);

class User < ActiveRecord::Base  
  validates_numericality_of :age, even: true
end

ALTER TABLE users ADD CONSTRAINT age_numericality check (age % 2 = 0);

validates_inclusion_of

class User < ActiveRecord::Base  
  validates_inclusion_of :age, in: [1, 2, 3]
end

ALTER TABLE users ADD CONSTRAINT age_inclusion check (age IN (1, 2, 3));

class User < ActiveRecord::Base  
  validates_inclusion_of :age, in: 18..25
end

ALTER TABLE users ADD CONSTRAINT age_inclusion check (age IN generate_sequence(18, 25));

class User < ActiveRecord::Base  
  validates_inclusion_of :age, in: (1..100).step(2)
end

ALTER TABLE users ADD CONSTRAINT age_inclusion check (age IN generate_sequence(1, 100, 2));

validates_exclusion_of

class User < ActiveRecord::Base  
  validates_exclusion_of :age, in: [1, 2, 3]
end

ALTER TABLE users ADD CONSTRAINT age_exclusion check (age NOT IN (1, 2, 3));

class User < ActiveRecord::Base  
  validates_exclusion_of :age, in: 18..25
end

ALTER TABLE users ADD CONSTRAINT age_exclusion check (age NOT IN generate_sequence(18, 25));

class User < ActiveRecord::Base  
  validates_exclusion_of :age, in: (1..100).step(2)
end

ALTER TABLE users ADD CONSTRAINT age_exclusion check (age NOT IN generate_sequence(1, 100, 2));

validates_length_of

class User < ActiveRecord::Base  
  validates_length_of :password, minimum: 6, maximum: 32
end

ALTER TABLE users ADD CONSTRAINT password_length CHECK (char_length(password) BETWEEN 6 AND 32);

validates_format_of

class User < ActiveRecord::Base  
  validates_format_of :email, with: /\A([^@\s]+)@((?:[-a-z0-9]+\.)+[a-z]{2,})\Z/i
end

ALTER TABLE users ADD CONSTRAINT email_format CHECK (email ~* '\A([^@\s]+)@((?:[-a-z0-9]+\.)+[a-z]{2,})\Z');

class User < ActiveRecord::Base  
  validates_format_of :email, with: /.+@.+/, allow_nil: true
end

ALTER TABLE users ADD CONSTRAINT email_format CHECK (email IS NULL OR email ~* '.+@.+');

class User < ActiveRecord::Base  
  validates_format_of :email, with: /.+@.+/, allow_blank: true
end

ALTER TABLE users ADD CONSTRAINT email_format CHECK (email = '' OR email ~* '.+@.+');

class User < ActiveRecord::Base  
  validates_format_of :email, with: /.+@.+/, allow_blank: true, allow_nil: true
end

ALTER TABLE users ADD CONSTRAINT email_format CHECK (email IN (NULL, '') OR email ~* '.+@.+');

custom validations

class User < ActiveRecord::Base  
  validate :validate_email_domain

  def domain
    email.to_s.split('@').last
  end

  private

  def validate_email_domain
    if banned_domains.include?(domain)
      errors.add(:email, 'domain is invalid')
    end
  end

  def banned_domains
    %w(gmail.com live.com)
  end
end

CREATE OR REPLACE FUNCTION validate_email_domain()  
  RETURNS trigger
AS $$  
  DECLARE
    banned text[];
    domain text;
  BEGIN
    banned := ARRAY['gmail.com', 'live.com'];

    FOREACH domain IN ARRAY banned LOOP
      IF NEW.email ~* (domain || '$') THEN
        RAISE EXCEPTION 'Invalid email domain %', domain;
      END IF;
    END LOOP;

    RETURN NEW;
  END;
$$ language plpgsql;

CREATE TRIGGER email_domain_validation  
  BEFORE INSERT OR UPDATE ON users
  FOR EACH ROW EXECUTE PROCEDURE validate_email_domain();

Caveats

You'll need convert these constraint exceptions into user friendly error messages within your applications. The exceptions contain the name of the failed constraint like email_domain_validation so a common way to accomplish this is to map the name to a hash containing messages like Your email domain #{domain.inspect} has been banned.
Postgres returns the first raised exception that is encountered. This means that you can't get a list of ALL failing validations when calling save, only one at a time as each error is fixed.

If you enjoyed this, check out how to port ActiveRecord counter cache and soft delete behavior to Postgres as well!

Postgres treats table rows like composite types

Sean Huber — Wed, 20 May 2015 05:49:00 GMT

Let's create a basic users table with sample data:

CREATE TABLE users (  
  id    serial PRIMARY KEY,
  name  text
);

INSERT INTO users ('name') VALUES ('Bob'), ('Tom'), ('Sam');

Then SELECT * FROM users to see what we're working with:

 id | name
----+------
  1 | Bob
  2 | Tom
  3 | Sam

Now check out what happens when we SELECT users FROM users:

  users
----------
 (1,Bob)
 (2,Tom)
 (3,Sam)

That's pretty interesting! Postgres returns a tuple containing all of the columns for each row in users.

Case insensitive UNIQUE constraints in Postgres

Sean Huber — Tue, 19 May 2015 07:05:00 GMT

Adding UNIQUE constraints to tables in Postgres is very easy!

Imagine we have the following table:

CREATE TABLE users (  
  id     uuid PRIMARY KEY NOT NULL DEFAULT uuid_generate_v4(),
  email  text
);

If we want to ensure that each user has a unique email we simply add:

ALTER TABLE users ADD CONSTRAINT email_unique UNIQUE (email);

Let's try it out by inserting some data:

INSERT INTO users (email) VALUES ('test@example.com');  
INSERT INTO users (email) VALUES ('test@example.com');

ERROR:  duplicate key value violates unique constraint "email_unique"  
DETAIL:  Key (email)=(test@example.com) already exists.

But there's a problem, the UNIQUE constraint is case sensitive!

INSERT INTO users (email) VALUES ('test@example.com');  
INSERT INTO users (email) VALUES ('TEST@example.com');

SELECT * from users;

                  id                  |      email       
--------------------------------------+------------------
 ccfcddd2-bdc5-4cf4-9475-4171960e6262 | test@example.com
 431308b4-8df8-44c9-bed4-7c44cf4e1ec1 | TEST@example.com
(2 rows)

Unfortunately, Postgres does now allow us to define a unique constraint using LOWER like:

ALTER TABLE users ADD CONSTRAINT email_unique UNIQUE (LOWER(email));

ERROR:  syntax error at or near "("

But don't worry there's a workaround! Instead of using the text data type, we can use the citext (case insensitive text) type! First we need to enable the citext extension:

CREATE EXTENSION IF NOT EXISTS citext;

Then we'll need to change the email data type in the users table:

ALTER TABLE users ALTER COLUMN email TYPE citext;

Now our existing UNIQUE constraint should be case insensitive!

INSERT INTO users (email) VALUES ('test@example.com');  
INSERT INTO users (email) VALUES ('TEST@example.com');

ERROR:  duplicate key value violates unique constraint "email_unique"  
DETAIL:  Key (email)=(TEST@example.com) already exists.

Another solution is to add a UNIQUE INDEX instead of a CONSTRAINT like:

CREATE UNIQUE INDEX email_unique_idx on users (LOWER(email));

Parsing tags from text content in Postgres

Sean Huber — Tue, 19 May 2015 05:59:00 GMT

While experimenting with building a simple Twitter clone in Postgres, I found that I needed a way to parse hashtags and mentions from tweets like:

#example tweet - #testing with @postgresql

Imagine that we have a table called tweets defined with the following structure:

CREATE TABLE tweets (  
  id        uuid PRIMARY KEY NOT NULL DEFAULT uuid_generate_v4(),
  post      text NOT NULL,
  hashtags  text[] NOT NULL DEFAULT '{}',
  mentions  text[] NOT NULL DEFAULT '{}'
);

Wouldn't it be nice if the hashtag and mention tokens were automatically parsed when we INSERT posts into the tweets table like the following:

INSERT INTO tweets (post)  
VALUES ('#example tweet - #testing with @postgresql');

SELECT * FROM tweets;

                   id                  |                     post                   |      hashtags     |   mentions
---------------------------------------+--------------------------------------------+-------------------+--------------
 e133820e-7329-4852-b40b-6e9b7e2fa69d  | #example tweet - #testing with @postgresql | {example,testing} | {postgresql}
(1 row)

It turns out that this is pretty easy to achieve with Postgres! First we need to define a function to parse tokens from content and return an array of text.

CREATE FUNCTION parse_tokens(content text, prefix text)  
  RETURNS text[] AS $$
    DECLARE
      regex text;
      matches text;
      subquery text;
      captures text;
      tokens text[];
    BEGIN
      regex := prefix || '(\S+)';
      matches := 'regexp_matches($1, $2, $3) as captures';
      subquery := '(SELECT ' || matches || ' ORDER BY captures) as matches';
      captures := 'array_agg(matches.captures[1])';

      EXECUTE 'SELECT ' || captures || ' FROM ' || subquery
      INTO tokens
      USING LOWER(content), regex, 'g';

      IF tokens IS NULL THEN
        tokens = '{}';
      END IF;

      RETURN tokens;
    END;
  $$ LANGUAGE plpgsql STABLE;

Let's test it out by parsing hashtags from a tweet:

SELECT parse_tokens('#example tweet - #testing with @postgresql', '#');

       tokens
-------------------
 {example,testing}
 (1 row)

Parsing mentions from a tweet is just as simple:

SELECT parse_tokens('#example tweet - #testing with @postgresql', '@');

    tokens
--------------
 {postgresql}
 (1 row)

Now that our parse_tokens function is working, we need to define some triggers to parse hashtags and mentions when a tweet record is inserted or updated.

CREATE TRIGGER parse_hashtags  
  BEFORE INSERT OR UPDATE ON tweets
  FOR EACH ROW EXECUTE PROCEDURE parse_hashtags_from_post();

CREATE FUNCTION parse_hashtags_from_post()  
  RETURNS trigger AS $$
    BEGIN
      NEW.hashtags = parse_tokens(NEW.post, '#');
      RETURN NEW;
    END;
  $$ LANGUAGE plpgsql;

CREATE TRIGGER parse_mentions  
  BEFORE INSERT OR UPDATE ON tweets
  FOR EACH ROW EXECUTE PROCEDURE parse_mentions_from_post();

CREATE FUNCTION parse_mentions_from_post()  
  RETURNS trigger AS $$
    BEGIN
      NEW.mentions = parse_tokens(NEW.post, '@');
      RETURN NEW;
    END;
  $$ LANGUAGE plpgsql;

Now when we create or update tweets the hashtags and mentions fields are automatically updated! This whole process was pretty fun and interesting to get working. I look forward to attempting to push even more logic down into Postgres!

Silently drop everything in Postgres

Sean Huber — Sun, 17 May 2015 03:47:00 GMT

Sometimes when working in Postgres I like to reset my development database by deleting everything in it without dropping the actual database itself.

An easy way to achieve this is to just drop all database schema(s) with the CASCADE option. By default, everything lives in the public schema.

DROP SCHEMA "public" CASCADE;

Postgres will output notices as it drop things with CASCADE. These messages can be supressed by changing the log level temporarily.

SET client_min_messages TO WARNING;  
DROP SCHEMA "public" CASCADE;  
SET client_min_messages TO NOTICE;

Reading from the filesystem with Postgres

Sean Huber — Sun, 17 May 2015 03:41:00 GMT

Let's try to make the following SQL statement work:

SELECT file.read('/tmp/test.txt');

We can start by creating the file /tmp/test.txt with the following contents:

Hello PostgreSQL!

The next step is to create a simple plpgsql function named file.read!

CREATE FUNCTION file.read(file text)  
  RETURNS text AS $$
    DECLARE
      content text;
    BEGIN
      content := 'Static content for now!';
      RETURN content;
    END;
  $$ LANGUAGE plpgsql VOLATILE;

This initial placeholder function just returns static content so that we can make sure things are working properly so far.

SELECT file.read('/tmp/test.txt');

          read
------------------------
 Static content for now
(1 row)

Now we just need to fill in the function with logic to read from the filesystem!

CREATE FUNCTION file.read(file text)  
  RETURNS text AS $$
    DECLARE
      content text;
      tmp text;
    BEGIN
      file := quote_literal(file);
      tmp := quote_ident(uuid_generate_v4()::text);

      EXECUTE 'CREATE TEMP TABLE ' || tmp || ' (content text)';
      EXECUTE 'COPY ' || tmp || ' FROM ' || file;
      EXECUTE 'SELECT content FROM ' || tmp INTO content;
      EXECUTE 'DROP TABLE ' || tmp;

      RETURN content;
    END;
  $$ LANGUAGE plpgsql VOLATILE;

Now we should see the contents of the file that we created!

SELECT file.read('/tmp/test.txt');

          read
------------------------
 Hello PostgreSQL!
(1 row)

OK... so what's going on here? Let's break it down!

We begin by declaring a couple of variables using the text data type.

content - stores the contents of the file read from the filesystem
tmp - a unique string used as the name of a temporary table

Then we move into the BEGIN block and set a couple of variables.

file - the quoted file name that the function was called with
tmp - the quoted unique name for a temporary table

Once we have our variables all setup, we EXECUTE some SQL to read the file contents.

First we create a temporary table named tmp with a single field named content
Then we use PostgreSQL's COPY command to read the contents of file into the tmp table
Once the data has been imported, we SELECT the contents from the tmp table and insert it INTO the local content variable
Finally we DROP the tmp table since we don't need it anymore

This was pretty interesting to get working! PostgreSQL has some other ways to read files from the filesystem.

file_fdw - a read-only foreign data wrapper for filesystem access built on the COPY command
pg_read_file - this function is restricted to superusers and only allows files within the database cluster directory and the log_directory to be accessed
pgsql-fio - an extension for basic file system functions

Another interesting challenge could be to write a file.write function to save content to a file!

Using the uuid data type in Postgres

Sean Huber — Sun, 17 May 2015 00:24:00 GMT

It's incredibly simple to use the uuid data type in PostgreSQL!

First we need to enable the uuid-ossp extension by executing the following SQL:

CREATE EXTENSION IF NOT EXISTS "uuid-ossp";

Now the uuid data type is available for us to use! Let's try using it as our primary key in a table.

CREATE TABLE users (  
  id   uuid PRIMARY KEY NOT NULL DEFAULT uuid_generate_v4(),
  name text NOT NULL
);

Notice that we're generating default ids by calling the uuid_generate_v4() function.

Let's test it out by inserting a record.

INSERT INTO users (name) VALUES ("Sean Huber");

We can verify that it worked with a simple SELECT * FROM users!

                  id                  |       name        
--------------------------------------+-------------------
 cf5c79cd-8a8d-4afd-8091-ef93f662a44d | Sean Huber

(1 row)

Enumerable#each_with_object

Sean Huber — Sat, 16 May 2015 23:45:00 GMT

The Enumerable#each_with_object method behaves very similarly to Enumerable#inject! There are only a couple of differences!

The block arguments are reversed

The inject method passes the block arguments in the following order:

The accumulator which is initially the object that inject was called with
The value/object for each member in the Enumerable that inject was called on

For the example below, subtotal is the accumulator and item is the value/object.

@order.line_items.inject(0) |subtotal, item|
  subtotal + (item.quantity * item.price)
end

The arguments are the same with each_with_object but just in the reverse order!

The initial object is always returned

While inject allows you to perform some logic within a block and return a new accumulator object with each iteration, the each_with_object method always returns the initial object that it was called with.

This is useful in cases where the initial object is mutated each iteration.

quantities = LineItem.for_orders_today.each_with_object({}) |item, hash|  
  hash[item.name] ||= 0
  hash[item.name] += item.quantity
end

Compare that to the inject equivalent!

quantities = LineItem.for_orders_today.inject({}) |hash, item|  
  hash[item.name] ||= 0
  hash[item.name] += item.quantity
  hash
end

Return random records in Postgres

Sean Huber — Sat, 16 May 2015 01:45:00 GMT

Lately I've been experimenting with building a simple "twitter" app in Postgres by pushing as much logic in the database layer as possible. This includes data validation/sanitization, triggers, and functions.

I needed a way to seed the database with sample data which sometimes required me to specify foreign keys to existing records. I ended up writing a couple plpgsql functions to make this easier.

CREATE FUNCTION random.record(table_name text, exclude uuid DEFAULT uuid_generate_v4())  
  RETURNS record AS $$
    DECLARE
      record record;
    BEGIN
      EXECUTE 'SELECT * FROM ' || table_name || ' WHERE id != $1 ORDER BY random() LIMIT 1'
      INTO record
      USING exclude;

      RETURN record;
    END;
  $$ LANGUAGE plpgsql VOLATILE;

The function above returns a random record from any specied table. It can be called with something like SELECT random.record('users').

It optionally accepts an id to exclude in case you're running a query like:

INSERT INTO followers (follower_id, user_id)  
SELECT id as follower_id, random.id('users', id) as user_id  
FROM users;

Use this other function if you only need the id: SELECT random.id('users').

CREATE FUNCTION random.id(table_name text, exclude uuid DEFAULT uuid_generate_v4())  
  RETURNS uuid AS $$
    DECLARE
      record record;
    BEGIN
      record := random.record(table_name, exclude);
      RETURN record.id;
    END;
  $$ LANGUAGE plpgsql VOLATILE;

These functions expect your table's primary key to be a uuid called id. You may need to tweak them to match your specific schema if you're using integer or some other primary key column name.

Check out how to use uuids in PostgreSQL if you're not already!

I wish this was valid Ruby

Sean Huber — Fri, 06 Feb 2015 17:38:00 GMT

Valid Ruby (the only valid line of code in this post)

before_validation(on: :create) { contact.email ||= email }

Drop braces and use a stabby proc

before_validation(on: :create) -> contact.email ||= email

Drop parentheses around the hash arguments

before_validation on: :create -> contact.email ||= email

Shorthand for `{ :symbol => :symbol }` key/value pairs

before_validation on:create -> contact.email ||= email

Maybe drop the getter/setter name duplication somehow

before_validation on:create -> contact:||= email  
before_validation on:create -> contact .= email

That last one seems a little weird but would be an interesting syntax. I really like the condensed hash arguments syntax though. Maybe one day we'll see stuff like this in Ruby!

Docker

Sean Huber — Fri, 14 Nov 2014 09:43:00 GMT

tldr; Let's get started

I've heard a lot about Docker over the past year but haven't really dived into it much until recently. So far the experience has been amazing! I've found the technology to be very easy and fun to work with. I believe Docker is going to be incredibly popular in the near future and I'd like to get you excited about it by showing you what I've learned so far.

What is Docker?

Docker is a platform for building, shipping, and running distributed applications.

It uses lightweight, portable runtimes called containers to package up an application with all of its dependencies, including the operating system, native packages, and any other libraries or plugins required to run it.

Containers are pretty similar to virtual machines. The main difference is that Docker allows containers to share the same Linux kernel as the host system that they're running on. Containers only need to provide the packages and dependencies that are not already available on the host system. This greatly reduces the size of an application and provides a significant boost to performance allowing containers to be booted up in mere milliseconds.

Snapshots of containers called images allow applications to be distributed amongst developer laptops, production data centers, or any cloud service provider, running and behaving exactly the same in all environments.

Why should I use Docker?

It makes it incredibly easy for developers to create, manage, and deploy production ready application containers
It allows developers to work in an environment that perfectly mirrors production, reducing the introduction of bugs to live sites
It dramatically speeds up productivity by allowing new developers to quickly spin up a local development environment that behaves exactly the same as the rest of the teams'
It encourages the decoupling of application concerns by breaking apart monolithic code bases into smaller and more focused components or services
It provides the flexibility to allow containers that run completely different frameworks, programming languages, or operating systems to seamlessly work and communicate with each other on the same host system

How do I use Docker?

Let's learn by example and integrate Docker with a Rails application!

We'll use a pretty common stack consisting of:

Ubuntu 14.04
Ruby 2.1.2
Rails 4.1.7
Redis 2.8.17
PostgreSQL 9.3.5

Let's get started

Read on or feel free to jump around to the different sections using the links below.

Install Docker

We need a Linux host to run Docker since containers share the host's kernel.

I'm not running a Linux distribution for development so let's spin up a host VM by either:

Booting one up with Vagrant.
Using the lightweight boot2docker Linux distribution

Boot up Ubuntu 14.04 with `Vagrant`

First we'll need to define a Vagrantfile so let's add one to the root of our application. A box called ubuntu/trusty64 running Ubuntu 14.04 already exists so we can just use that.

VAGRANTFILE_API_VERSION = "2"

Vagrant.configure VAGRANTFILE_API_VERSION do |config|
  config.vm.box = "ubuntu/trusty64"
end

Then let's open our terminal and ssh to the new Ubuntu image by calling vagrant up then vagrant ssh.

Install the docker.io package with apt-get

The apt-get and docker commands both require sudo so let's change to the root user since we'll be running the docker command frequently.

The root password is vagrant.

vagrant@ubuntu-trusty-64:~$ su
Password:

Now let's update and install the docker.io package.

root@ubuntu-trusty-64:/vagrant# apt-get update && apt-get install docker.io

After Docker installs we can use docker build to create a new image.

Use the lightweight `boot2docker` Linux distribution

The boot2docker distro was made specifically for running Docker containers. It currently supports both OSX and Windows hosts.

If you're using homebrew you can just brew install boot2docker docker to install both. Then call boot2docker init && boot2docker start.

Build images with a `Dockerfile`

New images are defined in a file called Dockerfile. This file is a simple list of statements that define the commands required to run a container. Let's add a Dockerfile to the root of our application next to our Vagrantfile.

Inherit from a base image with `FROM`

Docker images remind me a lot of classes in object oriented programming.

Just like classes, Docker images are inheritable.

All images start with a base image. Like Vagrant, there are existing official images for Ubuntu so let's use that as our base.

FROM ubuntu:14.04

The FROM statement is the only thing required to build a Dockerfile image. Let's try building it from our terminal.

root@ubuntu-trusty-64:/vagrant# docker build .
Sending build context to Docker daemon  2.56 kB
Sending build context to Docker daemon
Step 0 : FROM ubuntu:14.04
5506de2b643b: Pulling dependent layers
511136ea3c5a: Download complete
d497ad3926c8: Download complete
ccb62158e970: Download complete
e791be0477f2: Download complete
3680052c0f5c: Download complete
22093c35d77b: Download complete
22093c35d77b: Download complete
 ---> 5506de2b643b
Successfully built 5506de2b643b

Success! We just built our first Docker image called 5506de2b643b!

Wait... what's all that noise about pulling and downloading dependent layers?

Fetch images from the Docker Hub with `docker pull`

Like RubyGems for Ruby, Docker has a registry for images called Docker Hub.

This public Docker registry allows developers from all over the world to share open source images that run all kinds of different services and applications!

There are even official images for running popular projects like Ubuntu, Redis, PostgreSQL, Nginx, Node.js, and Jenkins.

When we called docker build just now, Docker noticed that the ubuntu:14.04 image didn't exist on our local system, so it searched the Docker Hub and automatically downloaded the official one for us!

Our application actually requires Redis and PostgreSQL as well so let's try pulling the official images for those from the Docker Hub registry!

root@ubuntu-trusty-64:/vagrant# docker pull redis:2.8.17
Pulling repository redis
3ce54e911389: Download complete
511136ea3c5a: Download complete
...snip...
6a8a6a35a96b: Download complete
28fdd31ac753: Download complete

Wow, that was easy! Now for PostgreSQL!

root@ubuntu-trusty-64:/vagrant# docker pull postgres:9.3.5
Pulling repository postgres
746b819f315e: Download complete
511136ea3c5a: Download complete
...snip...
ec77bb5a53d3: Download complete
165394769d57: Download complete

Let's use docker images to print out a list of all images on our system so far.

root@ubuntu-trusty-64:/vagrant# docker images
REPOSITORY    TAG        IMAGE ID          CREATED        VIRTUAL SIZE
redis         2.8.17     3ce54e911389      3 days ago         110.7 MB
postgres      9.3.5      746b819f315e      4 days ago         212.9 MB
ubuntu        14.04      5506de2b643b      3 weeks ago        197.8 MB

Notice that the ubuntu:14.04 image has the same image ID as the one that we built earlier! This is because our Dockerfile just inherits FROM ubuntu:14.04 and doesn't add any additional behavior! Docker is smart enough to know that it doesn't need to create a whole new image in this case, it can just use ubuntu.

Let's try starting up a new container with this 5506de2b643b Ubuntu image.

Start a container from an image with `docker run`

Containers can be started by simply specifying an image id.

root@ubuntu-trusty-64:/vagrant# docker run -i -t 5506de2b643b /bin/bash
root@b58db262db27:/#

Whoa, did you see how fast that started up!?

Now we're in our new container with its own prompt root@b58db262db27:/#

Anything we do or change in the container will not effect the host system.

All modifications made to the container's filesystem will be lost as soon as the container is shut down. It is an incredibly lightweight stateless runtime that can be stopped and thrown away whenever we're done with it.

When we started the container we specified a couple options:

-i keeps STDIN open even if we're not attached to the container
-t allocates a pseudo-tty

For interactive processes (like a shell) we typically want a tty and persistent standard input (STDIN), so we'll use -i -t together in most interactive cases.

We also specified a couple of arguments:

5506de2b643b is the ID of the image that we want to run
/bin/bash is the command that we want to execute on start up

This container is not very useful since it doesn't actually provide any additional functionality. Let's exit back out to our Vagrant host and add some packages that are required by our application to our Dockerfile.

Install required packages with `apt-get`

We need to install native dependencies for:

Precompiling assets - nodejs
Pulling packages and gems - git
Generating image thumbnails - imagemagick
Connecting to PostgreSQL - postgresql-client and libpq-dev
Parsing XML documents with Nokogiri - libxml2-dev and libxslt-dev
Building and installing Ruby with Rbenv - build-essential, wget, libreadline-dev, libssl-dev, libyaml-dev

We can use apt-get commands to install these packages in our Dockerfile by declaring RUN statements.

FROM ubuntu:14.04

RUN apt-get update && \
    apt-get -y install \
               build-essential \
               git \
               imagemagick \
               postgresql-client \
               nodejs \
               libpq-dev \
               libreadline-dev \
               libssl-dev \
               libyaml-dev \
               libxml2-dev \
               libxslt-dev \
               wget && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

Notice that we run apt-get clean and rm -rf to clean up after ourselves once the packages are installed.

Docker commits changes as layers to an image every time it processes commands like RUN in Dockerfile build steps.

Like a Git repository, if we add files to an image and then delete them in future RUN commands, the image's history still contains the files in previous layers. If we're not careful, we could accidentally add bloat to images with leftover files.

Let's try building our Dockerfile with these new changes!

root@ubuntu-trusty-64:/vagrant# docker build .
Sending build context to Docker daemon 10.75 kB
Sending build context to Docker daemon
Step 0 : FROM ubuntu:14.04
 ---> 5506de2b643b
Step 1 : RUN apt-get update && apt-get -y install build-essential git imagemagick postgresql-client nodejs libpq-dev libreadline-dev libssl-dev libyaml-dev libxml2-dev libxslt-dev wget && apt-get clean && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
 ---> Running in d3ea35797f02
Ign http://archive.ubuntu.com trusty InRelease
Ign http://archive.ubuntu.com trusty-updates InRelease
Ign http://archive.ubuntu.com trusty-security InRelease
Ign http://archive.ubuntu.com trusty-proposed InRelease
Get:1 http://archive.ubuntu.com trusty Release.gpg [933 B]
Get:2 http://archive.ubuntu.com trusty-updates Release.gpg [933 B]
Get:3 http://archive.ubuntu.com trusty-security Release.gpg [933 B]
...snip...
Processing triggers for sgml-base (1.26+nmu4ubuntu1) ...
Processing triggers for libgdk-pixbuf2.0-0:amd64 (2.30.7-0ubuntu1) ...
 ---> ea8c31ce96ce
Removing intermediate container d3ea35797f02
Successfully built ea8c31ce96ce

Sweet! We've got a new Docker image called ea8c31ce96ce. Let's try it out!

root@ubuntu-trusty-64:/vagrant# docker run -it ea8c31ce96ce /bin/bash
root@0d67a0fd8284:/# which git
/usr/bin/git
root@0d67a0fd8284:/# which psql
/usr/bin/psql

We've got Git and PostgreSQL! Just to make sure that everything is isolated to this container, let's exit back out to our host and check for those commands.

root@ubuntu-trusty-64:/vagrant# which git
root@ubuntu-trusty-64:/vagrant# which psql

The git and psql commands don't exist on our host system, awesome!

Now that we've got our packages installed, let's try to get Ruby 2.1.2 working!

Install Ruby with `ENV`, `rbenv`, and `ruby-build`

First we'll install rbenv and ruby-build by cloning them under /.rbenv.
Next we'll configure rbenv with ENV and run the ruby-build installation script.
Then we'll install Ruby version 2.1.2.
Finally we'll install bundler since we'll need it to install gem depedencies.

Let's update our Dockerfile to perform these actions.

RUN github="https://github.com/sstephenson" && \
    git clone --depth=1 $github/rbenv.git /.rbenv && \
    git clone --depth=1 $github/ruby-build.git /.rbenv/plugins/ruby-build

ENV PATH /.rbenv/bin:/.rbenv/shims:$PATH
ENV RBENV_ROOT /.rbenv

RUN /.rbenv/plugins/ruby-build/install.sh && \
    echo 'eval "$(rbenv init -)"' >> /.bashrc && \
    echo "gem: --no-rdoc --no-ri" >> /.gemrc

RUN version="2.1.2" && \
    rbenv install $version && \
    rbenv global $version

RUN gem install bundler && \
    rbenv rehash

Notice that we used a new statement called ENV. This simply sets an environment variable. Existing environment variables can be referenced in ENV declarations just like we did with PATH.

Let's build our new image and try it out!

root@ubuntu-trusty-64:/vagrant# docker build .
Sending build context to Docker daemon 11.26 kB
Sending build context to Docker daemon
Step 0 : FROM ubuntu:14.04
 ---> 5506de2b643b
Step 1 : RUN apt-get update && apt-get -y install build-essential git imagemagick postgresql-client nodejs libpq-dev libreadline-dev libssl-dev libyaml-dev libxml2-dev libxslt-dev wget && apt-get clean && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
 ---> Using cache
 ---> ea8c31ce96ce
 ...snip...
 Successfully built 8a8f8ad165b7

Whoa, check it out, it skipped the apt-get update step this time!

Docker is smart enough to know that it doesn't need to re-execute commands like RUN unless the command has changed since the last time it was evaluated.

In this case, our list of packages to install didn't change so Docker just used its cached copy.

Let's try out the updated 8a8f8ad165b7 image by starting up an irb console!

root@ubuntu-trusty-64:/vagrant# docker run -it 8a8f8ad165b7 irb
irb(main):001:0> RUBY_VERSION
=> "2.1.2"

Awesome! Now we've got Ruby 2.1.2 and all of the other system level dependencies that are required to run our application!

Now, let's figure out how we can add our Rails code into the container so we can actually run our application.

Mount volumes in containers with `docker run -v`

Docker allows containers to mount directories on the host system so that we can share files with it. This allows us to share our Rails application in the current directory with a container!

First we'll need to update our Dockerfile to support a VOLUME for our Rails app.

FROM ubuntu:14.04

...snip...

RUN version="2.1.2" && \
    rbenv install $version && \
    rbenv global $version

VOLUME /app

This simple VOLUME command let's Docker know that our image expects /app to be a shared directory that lives somewhere on the host system. Let's try it out by building a new image from these changes.

root@ubuntu-trusty-64:/vagrant# docker build .
...snip...
Step 6 : RUN version="2.1.2" && rbenv install 2.1.2 && rbenv global 2.1.2
 ---> Using cache
 ---> 8a8f8ad165b7
Step 7 : VOLUME /app
 ---> Running in a999838c4841
 ---> e4d5a879c353
Removing intermediate container a999838c4841
Successfully built 59778b8a0ce2

Now let's try it mounting /app to our Rails app in the current directory.

root@ubuntu-trusty-64:/vagrant# docker run -it -v $PWD:/app 59778b8a0ce2 /bin/bash
root@64c5e7b2ba49:/# ls -la /app
drwxr-xr-x  15 1000  1000    510 Nov 14 13:33 .git
-rw-r--r--   1 1000  1000    706 Nov 11 11:19 .gitignore
-rw-r--r--   1 1000  1000      6 Nov  5 15:56 .ruby-version
-rw-r--r--   1 1000  1000   1015 Nov 12 01:31 Dockerfile
-rw-r--r--   1 1000  1000   1338 Nov 11 11:18 Gemfile
-rw-r--r--   1 1000  1000   8556 Nov 11 11:30 Gemfile.lock
-rw-r--r--   1 1000  1000    249 Nov  5 15:56 Rakefile
drwxr-xr-x  20 1000  1000    680 Nov  5 15:56 app
drwxr-xr-x  76 1000  1000   2584 Nov 12 01:01 bin
drwxr-xr-x  25 1000  1000    850 Nov 10 22:33 config
-rw-r--r--   1 1000  1000    154 Nov  5 15:56 config.ru
drwxr-xr-x   5 1000  1000    170 Nov 10 13:24 db
drwxr-xr-x  26 1000  1000    884 Nov 10 13:46 lib
drwxr-xr-x   3 1000  1000    102 Nov 11 11:59 log
drwxr-xr-x  10 1000  1000    340 Nov 10 23:49 public
drwxr-xr-x  18 1000  1000    612 Nov  7 14:25 spec
drwxr-xr-x   3 1000  1000    102 Nov 11 11:59 vendor

Sweet, now we've got our Rails code mounted under the /app directory!

Any changes that a container makes to mounted volumes will be shared and reflected on the host system as well.

Now let's try install gem dependencies with bundle install and write them to the vendor/bundle directory.

We'll need to configure Nokogiri to use the system libraries for libxml2 first.

root@ubuntu-trusty-64:/vagrant# docker run -it 59778b8a0ce2 /bin/bash
root@f44fde88a15d:/# cd /app
root@f44fde88a15d:/app# bundle config build.nokogiri --use-system-libraries --with-xml2-include=/usr/include/libxml2
root@f44fde88a15d:/app# bundle install --path vendor/bundle
Don't run Bundler as root. Bundler can ask for sudo if it is needed, and installing your bundle as root will break this application for all non-root users on this machine.
Fetching gem metadata from https://rubygems.org/...........
Resolving dependencies...
Installing rake 10.3.2
Installing i18n 0.6.11
...snip...
Installing turbolinks 2.5.2
Installing uglifier 2.5.3
Your bundle is complete!
It was installed into ./vendor/bundle

Great, all of the gem dependencies installed successfully! Let's check our host system to make sure that vendor/bundle was created and actually has contents.

root@f44fde88a15d:/app# exit
root@ubuntu-trusty-64:/vagrant# ls -la vendor/bundle
total 0
drwxr-xr-x 1 vagrant vagrant 102 Nov 16 06:23 .
drwxr-xr-x 1 vagrant vagrant 102 Nov 16 06:23 ..
drwxr-xr-x 1 vagrant vagrant 102 Nov 16 06:23 ruby

Looks like bundle installed the gems to vendor/bundle as expected!

To verify, let's start up a new container and try to bundle install again.

root@ubuntu-trusty-64:/vagrant# docker run -it -v $PWD:/app 59778b8a0ce2 /bin/bash
root@f44fde88a15d:/# cd /app
root@837a7a749b19:/app# bundle install
Don't run Bundler as root. Bundler can ask for sudo if it is needed, and installing your bundle as root will break this application for all non-root users on this machine.
Using rake 10.3.2
Using i18n 0.6.11
...snip...
Using turbolinks 2.5.2
Using uglifier 2.5.3
Your bundle is complete!
It was installed into ./vendor/bundle

Success! Bundler skips Installing all of our gems and imforms us that it's Using the existing versions from vendor/bundle instead!

Notice that we didn't have to configure Nokogiri again! Since the bundle config command saves settings to .bundle/config (which is mounted on our host system) this configuration can persist and be shared between containers.

Since we keep calling cd /app when we first run the container, let's make that the default working directory.

Set the working directory with `WORKDIR`

The Dockerfile WORKDIR directive simply sets the working directory to the specified value for all commands that run after it. Let's try adding it to our Dockerfile.

WORKDIR /app

Now we can rebuild it and try it out!

root@ubuntu-trusty-64:/vagrant# docker build .
...snip...
Successfully built 9075d0fd41f5
root@ubuntu-trusty-64:/vagrant# docker run -it -v $PWD:/app 9075d0fd41f5 /bin/bash
root@130321bd404c:/app# pwd
/app

Sweet, now we don't have to cd /app all the time!

It's a little annoying having to copy paste the image ids like 9075d0fd41f5 when we want to run a container. There's got to be an easier way!

Tag images with `docker tag` or `docker build -t`

Since we've already built the 9075d0fd41f5 image for our Rails application, let's try tagging it as example-app so we can refer to it by that instead.

root@ubuntu-trusty-64:/vagrant# docker tag 9075d0fd41f5 example-app
root@ubuntu-trusty-64:/vagrant# docker images
REPOSITORY      TAG         IMAGE ID         CREATED      VIRTUAL SIZE
example-app     latest      9075d0fd41f5     26 minutes ago   554.7 MB
                e4d5a879c353     2 hours ago      550.7 MB
                4ecd2ce76f10     46 hours ago     550.7 MB
...snip...
                73ab02dbc5df     46 hours ago     437.1 MB
                047c19d8acb0     2 days ago       397.9 MB
redis           2.8.17      3ce54e911389     5 days ago       110.7 MB

Sweet, Docker will now allow us to refer to our image as example-app. Let's give it a shot!

root@ubuntu-trusty-64:/vagrant# docker run -it example-app /bin/bash
root@fa627a840e57:/app#

Great, it worked as expected! We can also tag an image when it's built by specifying the -t flag.

root@ubuntu-trusty-64:/vagrant# docker build -t example-app .

That's pretty convenient! We can run that command whenever we update our Dockerfile and the example-app image will always stay up to date!

Did you notice all of those images from the docker images output earlier? Those are all of the images we made from our previous calls to docker build .. Since we don't really need those anymore, let's try removing them.

Remove images with `docker rmi`

If your docker images list starts to fill up, you can remove them by their name or IMAGE ID.

root@ubuntu-trusty-64:/vagrant# docker rmi 4ecd2ce76f10
Deleted: 4ecd2ce76f10611d6f0f6a31653f9245414d198c92bbc265886bb1c79152a06c

We've got a bunch of images to delete but it's pretty tedious and annoying to have to list out all of those image ids. Here's a couple of commands to make removing multiple images easier.

To delete all untagged images

docker rmi $(docker images | grep "^" | awk '{print $3}')

To delete all images

docker rmi $(docker images -q)

Now that we've got our Docker images ready, let's figure out how to allow running containers to communicate with each other.

Run containers in the background with `docker run -d`

Let's try booting up our Redis image to see if it works.

root@ubuntu-trusty-64:/vagrant# docker run -t redis:2.8.17
[1] 17 Nov 08:32:30.476 # Warning: no config file specified, using the default config. In order to specify a config file use redis-server /path/to/redis.conf
                _._
           _.-``__ ''-._
      _.-``    `.  `_.  ''-._           Redis 2.8.17 (00000000/0) 64 bit
  .-`` .-```.  ```\/    _.,_ ''-._
 (    '      ,       .-`  | `,    )     Running in stand alone mode
 |`-._`-...-` __...-.``-._|'` _.-'|     Port: 6379
 |    `-._   `._    /     _.-'    |     PID: 1
  `-._    `-._  `-./  _.-'    _.-'
 |`-._`-._    `-.__.-'    _.-'_.-'|
 |    `-._`-._        _.-'_.-'    |           http://redis.io
  `-._    `-._`-.__.-'_.-'    _.-'
 |`-._`-._    `-.__.-'    _.-'_.-'|
 |    `-._`-._        _.-'_.-'    |
  `-._    `-._`-.__.-'_.-'    _.-'
      `-._    `-.__.-'    _.-'
          `-._        _.-'
              `-.__.-'

[1] 17 Nov 08:32:30.485 # Server started, Redis version 2.8.17
[1] 17 Nov 08:32:30.487 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
[1] 17 Nov 08:32:30.487 * The server is now ready to accept connections on port 6379

Uh oh, this process never ends so our shell is stuck! We'll have to run this container in the background using the -d flag. Let's hit ctrl+c to stop the process and exit back out to our host system.

root@vagrant-ubuntu-trusty-64:/vagrant# docker run -d redis:2.8.17
9e3b945adf4e1e7fc18c8429a2f8696a7bec0b387d088279b2fda8fff306182a

Docker allows us to daemonize a container process with -d so it runs in the background

The docker run command printed out the id of the container that it created. Let's try starting a background container running PostgreSQL as well.

root@ubuntu-trusty-64:/vagrant# docker run -d postgres:9.3.5
8170b29887b793dce20fa63a1b0424e1edbf4eafea9ec294192c194915c6a213

That was pretty easy! Let's figure out a way to list out the names and ids of the currently running containers just to make sure.

List running containers with `docker ps`

Simply calling docker ps will list out all currently running containers.

root@ubuntu-trusty-64:/vagrant# docker ps
CONTAINER ID        IMAGE               COMMAND                CREATED              STATUS              PORTS               NAMES
8170b29887b7        postgres:9.3.5      /docker-entrypoint.s   About a minute ago   Up About a minute   5432/tcp            dreamy_lalande
9e3b945adf4e        redis:2.8.17        /entrypoint.sh redis   6 minutes ago        Up 5 minutes        6379/tcp            distracted_leakey
e83564543aed        redis:2.8.17        /entrypoint.sh redis   8 minutes ago        Up 8 minutes        6379/tcp            naughty_wozniak

Wait... why do we have two Redis containers running? One of them was from earlier when we didn't start the container in the background with -d! We detached from the process but we didn't actually stop the container.

Stop containers with `docker stop`

The docker stop command accepts the id of a container and stops it. Pretty straightforward!

Let's try stopping the old Redis container.

root@ubuntu-trusty-64:/vagrant# docker stop e83564543aed
e83564543aed

Looks like it worked! Let's run docker ps again just to make sure.

root@ubuntu-trusty-64:/vagrant# docker ps
CONTAINER ID        IMAGE               COMMAND                CREATED             STATUS              PORTS               NAMES
8170b29887b7        postgres:9.3.5      /docker-entrypoint.s   11 minutes ago      Up 11 minutes       5432/tcp            dreamy_lalande
9e3b945adf4e        redis:2.8.17        /entrypoint.sh redis   16 minutes ago      Up 16 minutes       6379/tcp            distracted_leakey

The old Redis container is gone! Now let's try connecting our running background services to our Rails application.

Link containers with `docker run --link`

TODO: Demonstrate naming and linking containers for Redis and PostgreSQL.

root@ubuntu-trusty-64:/vagrant# docker run -v $PWD:/app --link 8170b29887b7:db --link 9e3b945adf4e:redis example-app bundle exec rake db:create:all

Name containers with `docker run --name`

TODO: Demonstrate naming the Redis and PostgreSQL containers.

root@ubuntu-trusty-64:/vagrant# docker run -d --name redis redis:2.8.17
root@ubuntu-trusty-64:/vagrant# docker run -d --name db postgres:9.3.5

Set environment variables with `docker run -e`

TODO: Demonstrate setting DB_ADAPTER, DB_USER, DB_PASS, and REDIS_URL.

root@ubuntu-trusty-64:/vagrant# docker run -v $PWD:/app --link db:db --link redis:redis -e "DB_ADAPTER=postgis" -e "DB_USER=docker" -e "DB_PASS=docker" -e "REDIS_URL=redis://redis" example-app bundle exec rake db:create:all

List all containers with `docker ps -a`

TODO: Demonstrate listing all containers, including stopped, with docker ps -a.

Remove containers with `docker rm` and `docker run --rm`

TODO: Demonstrate removing stopped containers with docker rm. Demonstrate shortcut for removing multiple containers at once. Auto remove containers after its process ends with docker run --rm.

Start up multiple containers with `Fig`

TODO: Demonstrate spinning up a full stack with Rails, Redis, and PostgreSQL.

Run Docker within Docker by mounting `/var/lib/docker`

TODO: Demonstrate managing Docker images and containers from within a running container.

TODO

compare Dockerfile definitions to classes in programming
- they're inheritable
- they're only supposed to "do" one thing, in this case, perform a single task
- dependencies are passed into containers when initializing
- fig is like a dependecy injection framework for provisioning docker containers
explain how Docker containers are super light weight
- they only run one process
- they only contain the dependencies required to run that single process
- they are supposed to be "stateless", services and data volumes run in their own containers
explain how to run containers
- naming containers with --name
- creating volumes with -v example
- mounting volumes with --volumes-from
explain data only volumes and how to mount them
explain how to commit running containers
explain how to push images to the Docker Hub
explain how private Docker image registries work
explain how the ADD, COPY, and RUN commands cache works
explain how we can create a package of our application
explain how to import and export images
- compressing tar archives with gzip
- exporting to and from S3

Sean Huber

Awesome Awesomeness

Use the timestamptz shorthand for time zones in Postgres

Porting ActiveRecord "soft delete" behavior to Postgres

tldr

Porting ActiveRecord "counter cache" behavior to Postgres

tldr

Porting ActiveRecord validations to Postgres

validates_presence_of

validates_uniqueness_of

validates_numericality_of

validates_inclusion_of

validates_exclusion_of

validates_length_of

validates_format_of

custom validations

Caveats

Postgres treats table rows like composite types

Case insensitive UNIQUE constraints in Postgres

Parsing tags from text content in Postgres

Silently drop everything in Postgres

Reading from the filesystem with Postgres

Using the uuid data type in Postgres

Enumerable#each_with_object

The block arguments are reversed

The initial object is always returned

Return random records in Postgres

I wish this was valid Ruby

Valid Ruby (the only valid line of code in this post)

Drop braces and use a stabby proc

Drop parentheses around the hash arguments

Shorthand for { :symbol => :symbol } key/value pairs

Maybe drop the getter/setter name duplication somehow

Docker

What is Docker?

Why should I use Docker?

How do I use Docker?

Let's get started

Install Docker

Boot up Ubuntu 14.04 with Vagrant

Use the lightweight boot2docker Linux distribution

Build images with a Dockerfile

Inherit from a base image with FROM

Fetch images from the Docker Hub with docker pull

Start a container from an image with docker run

Install required packages with apt-get

Install Ruby with ENV, rbenv, and ruby-build

Mount volumes in containers with docker run -v

Set the working directory with WORKDIR

Tag images with docker tag or docker build -t

Remove images with docker rmi

Run containers in the background with docker run -d

List running containers with docker ps

Stop containers with docker stop

Link containers with docker run --link

Name containers with docker run --name

Set environment variables with docker run -e

List all containers with docker ps -a

Remove containers with docker rm and docker run --rm

Start up multiple containers with Fig

Run Docker within Docker by mounting /var/lib/docker

TODO

Shorthand for `{ :symbol => :symbol }` key/value pairs

Boot up Ubuntu 14.04 with `Vagrant`

Use the lightweight `boot2docker` Linux distribution

Build images with a `Dockerfile`

Inherit from a base image with `FROM`

Fetch images from the Docker Hub with `docker pull`

Start a container from an image with `docker run`

Install required packages with `apt-get`

Install Ruby with `ENV`, `rbenv`, and `ruby-build`

Mount volumes in containers with `docker run -v`

Set the working directory with `WORKDIR`

Tag images with `docker tag` or `docker build -t`

Remove images with `docker rmi`

Run containers in the background with `docker run -d`

List running containers with `docker ps`

Stop containers with `docker stop`

Link containers with `docker run --link`

Name containers with `docker run --name`

Set environment variables with `docker run -e`

List all containers with `docker ps -a`

Remove containers with `docker rm` and `docker run --rm`

Start up multiple containers with `Fig`

Run Docker within Docker by mounting `/var/lib/docker`