[Courses] [Postgresql] Week 4, here we go: More SQL

Wed Dec 12 21:27:31 UTC 2007

Sorry I'm a bit late again. Anyway, here's this weeks lesson and assignments

==== Week 4 ====

This week we'll talk a bit more about managing databases and tables,
talk about normalization of databases, and getting data in and out of a
database.

**Databases and Tables**

So, you know how to create databases and tables. Let's delve into both
of those in more detail this week.

First, a bit about PostgreSQL's commands. A lot of commands that you
might want to do inside psql, you can also do outside. PG has a createdb
command:
    $ createdb [options] database_name [description]
is equivalent to
    => CREATE DATABASE database_name with ... ;

See: http://www.postgresql.org/docs/8.2/static/manage-ag-createdb.html

Options include things like template. Templates are one of the most
useful things in Postgresql. There are two databases that exist no
matter what, when you initialize pg. 'template0' and 'template1'.
template0 is a database that can never be modified. template1 is a copy
of template0, and is used as the default template when creating a new
database. It's basically empty. However, you can modify template1, and
you can also create databases using a template of any other database.
Thus cloning a database (full of data) is trivially easy.

Although one caviat is that no-one can be connected to a database in
order for it to be used as a template. So you have to basically take the
database offline in order to do this. If you can't, there are other
methods of cloning a db, just more steps.

To destroy a database, use:
    $ dropdb [options] database_name or => DROP DATABASE database_name

Be very careful when using this command - because everything is
**//gone//** ... forever!

To destroy tables, use DROP TABLE table_name; inside of psql. Again, be
careful!

**Normalization of Databases**

This relates specifically to relational databases. What are relational
databases? Well, once upon a time, databases were all just individual
"flat" tables. In order, for instance, to put information about
individual items someone bought, you'd have to put all of that info into
one big table. Now, we have tables that can relate to one another, so
that data can be logically arranged.

But when you've got this kind of database, it's really important that
you follow some rules. And those rules are called database
normalization. You don't **have** to follow these rules, but nine times
out of ten, if you don't, you'll be very, very sorry.

Each rule is called a "normal form" (don't ask me, I have no idea how
they came up with this.)

The "first normal form" is just that the table does not allow null rows,
or duplicate rows. This is generally taken care of by including a
primary key. This field (generally one) will contain a unique key
(usually an integer), will not allow null values in this field, and will
(generally) be automatically generated.

Here's one way to do this in postrgres:

    => CREATE TABLE people (
    -> people_id serial NOT NULL PRIMARY KEY,
    ...

This creates a field with serial data type (autonumbered integer), that
does not allow null values, and is a primary key.

You can then have, for instance, a second table, which relates people to
things:

    => CREATE TABLE things (
    -> thing_id serial NOT NULL PRIMARY KEY,
    -> FOREIGN KEY people_id REFERENCES people,
    ...
Check out http://www.postgresql.org/docs/8.2/static/sql-createtable.html
for details on this. I don't want to go too deeply here - I don't want
to lose people.

For the rest of the normal forms, check out the wikipedia article:
http://en.wikipedia.org/wiki/Database_normalization
It's almost impossible to translate that into english - look carefully
at the examples, they will make sense.

Basically, the purpose of normalization is to make it as easy as
possible to find data when you need to, as simply as possible. And to
make reduce possibilities of error on data entry or updates.

OK, so recreate testtable1, except use the following as the first line:
     test=> create table testtable2 (
     test(> userid serial NOT NULL PRIMARY KEY,
     test(> firstname varchar(20),
     ....
     test(> code char(2) );

     NOTICE:  CREATE TABLE will create implicit sequence
"testtable2_userid_seq" for serial column "testtable2.userid"
     NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
"testtable2_pkey" for table "testtable2"
     CREATE TABLE

Aha! What happened? First off, serial is not really a data type (I told
you.) It actually is a super sekret code for "create this thing called a
sequence". Adding the primary key made an index.

**Getting data into your tables**

OK, so you've created at least one table. Let's get some data into it.

    => INSERT INTO testtable1 (firstname,lastname,... , code) VALUES
('michelle','murrain',...,'qw');

Now, if you are lazy, you can just do this:

    => INSERT INTO testable1 VALUES ('michelle','murrain',...,'qw');

But actually, that's not really being lazy, because you have to make
sure that you include the fields **in the right order**. Also, if you
want to insert partial data, using the first command is better - just
specify the fields you want to use.

So, how to get data out?

    => SELECT ALL FROM testtable1;

That will give you all of the fields, and all of the rows of a table.
That's useful if there aren't many rows, or aren't many fields. That's
not likely to be the case with working databases.

    => SELECT userid,lastname,city FROM testtable1;

That will return only those three fields, but will return all of the rows.

    => SELECT userid,lastname,address FROM testtable1 WHERE city='Atlanta';

Much more useful, yes? Check out
http://www.postgresql.org/docs/8.2/static/sql-select.html for the full
skinny on SELECT. It's rich!

Assignments:

   - What's a sequence?
   - What's an index?
   - Insert about 20 records into your database
   - Try out some different kinds of selects. Can you sort by city? By
name?
   - OK, time to work on your project. Draft a schema, and share it with
the group! How many tables are there? How are they related?

Extra credit assignments
   - You want to know how much people in London spent. How'd you do that?
   - You don't want to see anyone in your results from Zurich. How to do
that?
   - Make sure your project tables are normalized

-- 
Michelle Murrain
Coordinator, Nonprofit Open Source Initiative
http://nosi.net

Skype: pearlbear
AIM: pearlbear0