Tuesday, January 19, 2010

Nested CTEs

Had my first need for nested CTEs today - for a table with prices by object and another with adjustment factors by object, I needed to multiply the prices by all factors occurring after the price date. This could be done in the C++ code I was working on via an ugly for loop for each price date, but I knew there had to be a better, set-based way within SQL.

The answer was nested Common Table Expressions (CTEs), which Microsoft officially says are not supported, but which can actually be implemented by defining multiple CTEs (with a comma between each) and using the first one in the second.

For this application, the first CTE orders the object ids and adjustment factors and adds row numbers. An ORDER BY would have sufficed to order everything, but in order to avoid inefficient subqueries in the next CTE, I needed consistent row numbering. The second CTE aggregates the adjustment factors for each date by recursing down the ordered factors for each object and multiplying them as it goes.

This solution condenses all the selection logic into one set of SQL statements, is compact, fairly readable, and has a better complexity profile than the for loop alternative.

(Table and column names changed to protect confidentiality.)

WITH AdjFactorsOrdered (ObjectId, AdjDate, AdjFactor, ord) AS
(SELECT ObjectId, AdjDate, AdjFactor, ROW_NUMBER() OVER (PARTITION BY ObjectId ORDER BY AdjDate desc)
FROM vwAdjFactors
WHERE AdjDate > '2009-01-01'
AND AdjDate <= '2010-01-01'
AND ISNULL(AdjFactor, 1) != 1
AND SubTypeId != 1
AdjFactorsCombined (ObjectId, AdjDate, AdjFactor, ord) AS
(SELECT ObjectId, AdjDate, AdjFactor, ord
FROM AdjFactorsOrdered
WHERE ord = 1
SELECT mul.ObjectId, mul.AdjDate, base.AdjFactor * mul.AdjFactor as AdjFactor, mul.ord
FROM AdjFactorsCombined base
INNER JOIN AdjFactorsOrdered mul
ON mul.ord = base.ord + 1
AND mul.ObjectId = base.ObjectId
SELECT ObjectId, AdjDate, AdjFactor
FROM AdjFactorsCombined
ORDER BY ObjectId, AdjDate

Thursday, January 14, 2010

Daily WTF: SSIS edition

With apologies to Alex P of the real Daily WTF, here's my latest legacy code annoyance:

We have a variety of data imports that employ multiple SSIS packages in order to break the import logic into manageable, consistent stages. So far, so good. Except that a few particularly intrepid former team members of mine preferred to construct parent packages that called child packages, complete with variable passing to maintain state awareness, directory location consistency, dynamic environment configuration, etc. instead of just calling them from the job management system in successive tasks.

Oh, and none of these packages are in source control, they're just scattered about the file system like leaves in an ill-kept yard.

All of this leads me to my current exercise in SSIS surgery - I needed to change the behavior of one of these packages, so I decided to update it to our modern standards of SSIS behavior and move it to a new location. This one has a final step that calls the next package, a process I wasn't quite prepared to rework, so I wanted to simply update the directory in which it expects to find the next package. "Hopefully it does it on the fly," I thought, "since all the related packages are kept in the same directory and it could just pull the working directory at runtime..."

No such luck.

Instead, the package uses a variable to hold the path of the next package, which is configured via an expression that pulls from another variable, which pulls from another variable, which pulls from a package configuration, which pulls from a registry string. Which I don't have on my machine.

Got that?

Registry Key -> Package Configuration -> Variable A -> Variable B -> Variable C -> Script Task

I've got news for you, former developer: it's called IO.Directory.GetCurrentDirectory().