The Case for Left Joins

March 6, 2010

When working with educational data, the student is going to be the center of nearly every database query.  From there we might attach test records, classes or involvement in programs.  Often times programmers accomplish this with what is known as a “full inner join”.  A method by which relevant records from both database tables are only shown when there is a definite match.  This is the most basic of SQL calls and usually the first one learned.  It goes a little something like this:

select * from students,
assessments as a3,
assessments as a4
where
student.id = a3.studentid
and student.id = a4.studentid
and a3.grade = 3 and a4.grade = 4

The above database query would return all of the students and assessment scores for students who have 3rd and 4th grade assessments.  After the “where” reserved word I’ve specified how to relate the table to the student and which grade I’m expecting.  A major downfall of this query is that it won’t  show me students who don’t have both assessments.    If a student has a 3rd grade test but no 4th grade test he just won’t be in the results. Nor will students with no tests.  Most often educators and teachers are going to want to see their entire class list with holes where students haven’t taken assessments.  For this kind of information a different type of query is required, known as a “left outer join”:

select * from students
left join assessments as a3
on (students.id = a3.studentid and a3.grade = 3)
left join assessments as a4
on (students.id = a4.studentid and a4.grade = 4)

A query in this format will pull all of the students and then fill in the assessment data where available, leaving blanks where students have no data.  This is a much more useful and intuitive way to display student information.  Students don’t appear to be “missing” from class rosters, and staff can quickly locate scores or even identify students who didn’t receive the assessment.

Clarity and Modularity of Code

I feel that code making use of left joined tables is easier to read.  Instead of parsing through the where clause to find the ways in which tables are joined together and limited, I can quickly find the same information in the on clause immediately following the introduction of the table.

Because the pieces are grouped together so well, it becomes easier to write a program that will build the SQL statement.  I can start with the student table and then just ask if any other modules have left join information to include.  An assessment module might provide the left join statements for including 3rd and 4th grade tests.  A Student Groups module might include the left join that specifies whether or not the student is ELL.  Thanks to the format, these can all happen independently of each other.

Flexibility

The left join also allows us some additional flexibility with regards to filtering the results. By appending some simple where clauses to the example SQL statement above we can find out some quick and useful information.

where a3.id is null

will show us students who don’t have a 3rd grade score but any 4th grade scores they may have.

where a3.id is not null

will show us only those students who have 3rd grade scores.

Of course, we could mix and match a3 is null and a4 is not null and so on to find all kinds of groups including

where a3.id is not null and a4.id is not null

which simulates the original non-left joined query above.

This level of flexibility is one of the main reasons I prefer to use left joins.  I can quickly provide a list of students who were enrolled in a given class and who have not yet taken an assessment so that they can be brought in to test.

Throughout the OSIMS system I’ll be making near exclusive use of the left outer join format.

Leave a Reply