NYC DBA: 2010

Friday, December 17, 2010

Back in Laptopistan

At Port City Java in Capitol Hill today, just up the street from Peregrine, where I was last week, and there's a fair concentration of laptop users, but certainly not the packed house that there was at Peregrine. I'm guessing that's correlated to Port City's 3 stars on Yelp vs Peregrine's 4. But the WiFi is fast and there are some comfy chairs. Judgement on coffee still forthcoming.

One of the difficulties of working remotely in a shared environment is phone calls. I had a meeting today which is often cancelled, so I was hoping that it would be again today. But it wasn't, so I had to fire up Skype (which, thankfully, worked fine here), dial up the conference line, and participate in the call, wincing every time I had to speak from fear of offending the people around me. Fortunately, there was only one other person besides my girlfriend sitting nearby, so I wasn't as concerned as I would have been, but in a tighter space like Peregrine, it would be impossible to take a phone call without bothering people. The alternative would be to go outside or sit in one's car, both of which I've done in the past, but it's below freezing outside and my car is more than a block away, making neither of those options particularly palatable.

I suppose that in a true co-working space, phone calls would be expected and unavoidable, so both my concern and others' irritation would be lower. But I have yet to identify a suitable co-working facility in DC, and it's nice to get to change scenery each time I go to a new cafe, so for now I'll just live with the minor difficulties and keep hunting for the best WiFi and coffee.

Thursday, December 16, 2010

More notes on Resolver One

Still generally liking Resolver One, but there are some limitations that are bothering me and associated requests I'd like to make.

1) Sorting!

How is there no sorting in place yet? They implemented CHITEST before sorting-in-place!?

2) Assignment

Some more "magic" assignment methods would be welcome. Something like

myCellRange = myListOfLists # aka 2D array

myRow = myList

That seems like a straightforward thing to implement, but instead I have to iterate over both and assign each value manually.

3) Row and Column Numbers

It would be very helpful if the code editor had a display with row and column numbers at the bottom.

Friday, December 10, 2010

Notes from Laptopistan

Credit to David Sax of the NYT for the name. Trying the working mode advocated in the article and occupying a spot in a cafe once a week or so. Clearly it requires some technique, however...

This afternoon, I worked out of Peregrine Espresso in Capitol Hill. It's a nice place, with good tea and coffee, so I guess I shouldn't have been surprised that it would be full of laptoppers. Every seat was full when I walked in. Fortunately, one person was packing up, so I got the table he vacated, but then gave it up to some people who were trying to have a conversation in favor of a seat at a bar. But after about an hour my back ached, and I didn't like fighting for elbow space with the people on either side. Plus, it's distracting to have two bright laptop screens on either side of you - I kept glancing over at each without meaning to, and alternated between feeling superior and jealous when noting that they were not doing any actual work.

Additional gear for the cafe survival kit for next time: headphones and an extension cord.

Thursday, December 2, 2010

Resolver One CellRange oddities

I've started working on a project in which we're using Resolver One as a dashboard. The workbook consumes multiple data sources (services, DB, YAML files, etc.), collates and formats, and may eventually do other cool stuff like kick off simulations or generate charts. But we're still in the early stages, so I'm just fleshing out the configuration routines and table displays.

What I've found so far is that Resolver One is a cool package, but seems to have some weak points. The documentation is okay, but the layout seems less than intuitive, and the volume of documentation is somewhat lacking in comparison to The Enterprise Spreadsheet app. That's to be expected, so it doesn't bother me much.

The thing that does bother me is an apparent code deficiency in combination with my inability to log into their support forums to ask questions. I assume the forum issue is temporary and will be resolved (ha) shortly, but the code issue seems like a bug.

Basically, my problem is that CellRanges don't act Pythonically, or even correctly, it would seem. For one, you can't do this:

cellrange.HeaderRow = worksheet.HeaderRow

Why not? That seems like the most logical (and Pythonic) operation you could perform. I actually couldn't set the HeaderRow through any means, no matter what sort of value I passed in.

I had a second issue, but I just determined that it was user error and fixed it. Will continue this post if I come across something else...

Monday, October 18, 2010

Connection Manager Attributes Changed error in SSIS can be resolved by cleaning up your query

Just resolved a funny issue that was making me crazy for about 15 minutes. In the "Use results of an SQL query" window in the SQL Query a Lookup task in SSIS 2005, I pasted in a query I had written in SQL Management Studio 2008 R2. The formatting looked a little funny, with the little squares in place of some line breaks and tabs all over the place, but I ignored it and clicked through to the Columns tab. At the bottom of the tab, where errors and warnings appear, I got this:

Connection Manager attributes have changed. The mappings of previous columns are no longer valid.

The message stayed there even after I changed the OLE DB connection manager, added and removed columns to the lookup, and clicked every available button in the Lookup Transformation Editor. And it prevented me from clicking OK to finalize and save the transformation. I decided that I should look through the code for any bad syntax or other anomalies, but it was difficult with the screwy formatting, so I "washed" it by cutting and pasting it into Notepad++ and then back into SSIS.

And guess what? Not only did the code look better, the warning message went away. So if you're seeing that warning, try cleaning up the formatting of your SQL query.

Tuesday, October 12, 2010

Python scratchpad entry

Simplest syntax I've found for converting dates to datetimes, given a date d:

datetime.combine(d, time())

I've got a couple other blog entries in the pipeline, but I've had blog apathy lately and haven't finished any of them. Hopefully soon.

Wednesday, March 10, 2010

FILLFACTOR is really not that complicated

I keep coming across code that explicitly sets the FILLFACTOR option on an index for no good reason. FILLFACTOR is not complicated - it leaves some percentage of index pages empty in order to prevent page splits and resulting fragmentation and low page density during inserts into the middle of a key range. It can be a useful option iff:

- the table will actually get rows inserted into it!

- there is no monotonically increasing index key (like a primary key identity column)

- the author is intimately familiar with the usage pattern of the table and knows how to balance the additional I/O required to read the extra pages with the I/O saved by preventing page splits

Otherwise you're just taking an automatic IO performance hit of n% (where n is 100-(FILLFACTOR)) for no good reason.

Particularly ridiculous is the code that I've found which does something like this:

CREATE TABLE #tempstuff (foo int, bar varchar(20))

INSERT INTO #tempstuff SELECT foo, bar FROM dbo.sometable

CREATE INDEX ix_reallyNotUseful ON #tempstuff (foo, bar) WITH FILLFACTOR = 90

First off, make the index clustered. If this table will not see any more inserts, there's no reason to leave it as a heap and then have to create surrogate keys in order to build the non-clustered index. Might as well physically order it by the index keys and gain the additional performance of the clustered index.

Second, the table is not getting any more inserts after this. Why would you care about leaving extra space on the pages? Set the FILLFACTOR to 100!

This has been a public service rant from your irritable neighbor database engineer.

Tuesday, January 19, 2010

Nested CTEs

Had my first need for nested CTEs today - for a table with prices by object and another with adjustment factors by object, I needed to multiply the prices by all factors occurring after the price date. This could be done in the C++ code I was working on via an ugly for loop for each price date, but I knew there had to be a better, set-based way within SQL.

The answer was nested Common Table Expressions (CTEs), which Microsoft officially says are not supported, but which can actually be implemented by defining multiple CTEs (with a comma between each) and using the first one in the second.

For this application, the first CTE orders the object ids and adjustment factors and adds row numbers. An ORDER BY would have sufficed to order everything, but in order to avoid inefficient subqueries in the next CTE, I needed consistent row numbering. The second CTE aggregates the adjustment factors for each date by recursing down the ordered factors for each object and multiplying them as it goes.

This solution condenses all the selection logic into one set of SQL statements, is compact, fairly readable, and has a better complexity profile than the for loop alternative.

(Table and column names changed to protect confidentiality.)

WITH AdjFactorsOrdered (ObjectId, AdjDate, AdjFactor, ord) AS

(SELECT ObjectId, AdjDate, AdjFactor, ROW_NUMBER() OVER (PARTITION BY ObjectId ORDER BY AdjDate desc)

FROM vwAdjFactors

WHERE AdjDate > '2009-01-01'

AND AdjDate <= '2010-01-01'

AND ISNULL(AdjFactor, 1) != 1

AND SubTypeId != 1

AdjFactorsCombined (ObjectId, AdjDate, AdjFactor, ord) AS

(SELECT ObjectId, AdjDate, AdjFactor, ord

FROM AdjFactorsOrdered

WHERE ord = 1

UNION ALL

SELECT mul.ObjectId, mul.AdjDate, base.AdjFactor * mul.AdjFactor as AdjFactor, mul.ord

FROM AdjFactorsCombined base

INNER JOIN AdjFactorsOrdered mul

ON mul.ord = base.ord + 1

AND mul.ObjectId = base.ObjectId

)

SELECT ObjectId, AdjDate, AdjFactor

FROM AdjFactorsCombined

ORDER BY ObjectId, AdjDate

Thursday, January 14, 2010

Daily WTF: SSIS edition

With apologies to Alex P of the real Daily WTF, here's my latest legacy code annoyance:

We have a variety of data imports that employ multiple SSIS packages in order to break the import logic into manageable, consistent stages. So far, so good. Except that a few particularly intrepid former team members of mine preferred to construct parent packages that called child packages, complete with variable passing to maintain state awareness, directory location consistency, dynamic environment configuration, etc. instead of just calling them from the job management system in successive tasks.

Oh, and none of these packages are in source control, they're just scattered about the file system like leaves in an ill-kept yard.

All of this leads me to my current exercise in SSIS surgery - I needed to change the behavior of one of these packages, so I decided to update it to our modern standards of SSIS behavior and move it to a new location. This one has a final step that calls the next package, a process I wasn't quite prepared to rework, so I wanted to simply update the directory in which it expects to find the next package. "Hopefully it does it on the fly," I thought, "since all the related packages are kept in the same directory and it could just pull the working directory at runtime..."

No such luck.

Instead, the package uses a variable to hold the path of the next package, which is configured via an expression that pulls from another variable, which pulls from another variable, which pulls from a package configuration, which pulls from a registry string. Which I don't have on my machine.

Got that?

Registry Key -> Package Configuration -> Variable A -> Variable B -> Variable C -> Script Task

I've got news for you, former developer: it's called IO.Directory.GetCurrentDirectory().

NYC DBA