Pulling the Best Assessment Record

March 25, 2010

During the course of a school year, it may be necessary to assess students multiple times.  This information can be very useful for monitoring the progress of students, but it also causes problems when you simply want to know the best scores that a group of students have posted.  Getting the best record for an assessment often means a two part process of identifying the best records and then pulling those records together for display. In many information systems there are whole routines set up which go through the data student by student every night to identify and flag the best score for each student.

Maybe it’s just me, but this seems like a really inefficient method of finding the information.  What if you want to enter scores and get a report today, rather than waiting for a process to run?  What if you want to know the best score between October and December instead of the best score for the entire year which the system flagged?  I’m exploring a method to give schools realtime, flexible access to to best records.

Why you can’t just use MAX

Assessments records are really a collection of scores also called traits.  A student may score 100 overall, 130 in estimation and 95 in measurement all on the same test.  The next time he may get 110, 100, and 105 respectively.  If I use the SQL max function on each trait I would get a mix of the two test records that would look something like 110,130,105.  While this is the best he performed in each trait, it’s not indicative of any one attempt, they both got jumbled up together.

A better method would be to use max on the overall trait, somehow identify the record and them come back and pull all the traits for that record.  Even that has it’s issues within SQL and requires at least two queries against the database.

More Left Joins

I’ve spoken before about my love of left joins for querying student data.  When dealing with a class list, it simply makes sense to have all the students represented once, whether they have data or not.

What I’ve found is that with a more creative use of left joins I can readily get the best record in a number of situations.  I’ll start with the students, left join assessment records, and then left join the assessments again to force the best record to show.

In pseudo code it might look like this:

select stu.*, a1.*
from students stu
left join assessments a1 on ( ... define the assessments I'm looking for ...)
left join assessments a2 on ( ... exact same as the definition of a1 above ... and a2.overall>a1.overall)
where a2.id is null

So what have I done here?  I’ve told the SQL engine that I want two assessment records and that a2 should be “better” than a1 in the overall trait.  Then I’ve used a bit of reverse logic to say that I only want records where a2 doesn’t exist.  That would effectively mean cases in which there is no record better than a1.

A little Deeper

There is a danger that the two assessments may have identical overall scores, which would cause the system to return two records for any given student.  To address this, I actually append the record ID number to the overall score before comparing the records.  Since the IDs are unique by definition, I’ m guaranteed that one record will be “higher” than the other.

In practice the on statement code that does the c0mparision actually looks like this:

... and ( concat(lpad(a2.overall,10,0),lpad(a2.id,10,0)) > concat(lpad(a1.overall,10,0),lpad(a1.id,10,0)) )

The padding makes certain that the two records are compared on even grounds.  In the event of a tie in overall score the record with the higher ID number wins, which tends to favor more recent records, but that should be acceptable in most cases.

Inherent Flexibility

Unlike many existing systems that flag records ahead of time, this left join approach allows me to change at any time the range of records I’m examining.  For example, I might start out looking for the best Math tests regardless of grade level, then limit them to 5th grade scores, then go even further and ask for the best scores on 5th grade math tests given between October 5th and November 5th in Aardvark Elementary.  There’s no recalculating and reflagging required because this single SQL call can handle any of these requests by simply changing the ON clauses related to the assessment records.

Other Uses

I’m sure there are dozens of uses for this type of query.  I’ve found it useful for pulling the best record on any number of criterion, including the performance level, any of the traits, and even the “Most Recent” record within any time frame.

Leave a Reply