Tuesday, September 18, 2012

sqlcmd swallows errors by default

I'm sure this is well-known out in SQL land, but I keep forgetting about it, so I'm writing this short post as a way to cement the knowledge in my brain: sqlcmd, the current command line utility for running T-SQL and replacement for OSQL, does not return a DOS ERRORLEVEL > 0 by default, even when the command being executed fails.

You need to pass the -b flag to raise an ERRORLEVEL 1 when a real error (SQL error severity > 10) occurs, which is crucial when running batch jobs in job systems that report status based on DOS ERRORLEVELs.

Thursday, July 12, 2012

pyodbc autocommits when using "with"

Discovered an interesting variation in pyodbc behavior today.  A coworker asked me if he needed to commit explicitly when using pyodbc to execute a stored procedure.  I told him he did, but he protested that his code worked fine without it.  Here's a paraphrased version:

def db_exec(qry):
    with pyodbc.connect(cnxn_str) as db:

I tested it, and what do you know?  It works.  But the following does not:

db = pyodbc.connect(cnxn_str)

In this case, the results of proc will be rolled back automatically when closing the connection.  So evidently the way that the with handler is coded includes an autocommit.

Wednesday, June 20, 2012

Stripping time from GETDATE(), 2008 edition

Stripping the time from GETDATE() to get the date at 00:00:00 is a common practice, and has been accomplished by various methods in the past, including CASTing to VARCHAR and back, using DATEADD and DATEDIFF, and using FLOOR after casting to INT.  It should be less necessary these days since the introduction of the DATE data type, but sometimes it's still a useful comparison trick.

And now because of DATE there's a simpler way to do the conversion: CAST(@dt as DATE).  But is this any faster?

The short answer is no, but it doesn't seem to be any slower either.  I ran a quick test:

declare @dtest datetime
declare @i int = 0, @j int = 0
declare @start datetime

set @start = GETDATE()

while @i < 1000000
    set @dtest = DATEADD(dd, 0, DATEDIFF(dd, 0, GETDATE()))
    set @i = @i + 1
print datediff(ms, @start, getdate())
set @start = GETDATE()

while @j < 1000000
    set @dtest = CAST(GETDATE() as DATE)
    set @j = @j + 1

print datediff(ms, @start, getdate())

In half a dozen runs, the two versions never varied by more than a handful of ms.  Feel free to try on your own and let me know if you see anything different.

Friday, March 30, 2012

Another hazard of Laptopistan - slow WiFi

This is hardly unexpected, but one of the side effects of everyone using Starbucks as a de facto workplace and internet cafe is that their WiFi is heavily stressed. People watching online video, using Remote Desktop (like me), probably even BitTorrent. I don't know what sort of routers they put in these coffee shops, but they often seem to be unequal to the load they're handling.

This morning I'm in a Starbucks, and my connection went from perfectly fine, at 8:30 AM before anyone was here, to barely usable now that I'm surrounded by laptops. One more reason for actual workspace.

Friday, March 16, 2012

Sliding Window partitioning

Implemented my first "sliding window" partitioned table. Wound up being easier than I expected, especially since the table will only have 2 partitions for now - Current and Archive.

Most sliding window schemes I've seen out there on the intertubes make use of a date as the partition key and update the sliding window based on the current date. In this case, the data comes in somewhat sporadically, so although it's date-based (a month key), I made the partitioning data-driven instead of date-driven. So for starters we have a RANGE LEFT scheme like this:

Partition Month Key
Current -
Archive 88

Say we're currently on month 90. So months 89 and 90 are in Current, and 1-88 are in Archive. When the sliding window proc runs, it does essentially this:

SELECT @NewBoundary = MAX(monthKey)-2 FROM tbPartitionedFoo
SELECT @CurBoundary = CAST(prv.value as smallint) FROM sys.partition_functions AS pf
JOIN sys.partition_range_values as prv
ON prv.function_id = pf.function_id
WHERE pf.name = 'partfn_foo'

(If there was more than one partition range value, a MIN() or something would be needed to grab the appropriate value.)

IF @NewBoundary > @CurBoundary

So when we start getting monthKey 91 data and the maintenance proc runs, it will do this:

IF 89 > 88

Now we have 3 partitions:

Partition Month Key
Current -
temp 89
Archive 88


And now we're back to 2:

Partition Month Key
Current -
Archive 89

Monday, March 5, 2012

Python Gets Things Done

A quick example of why I've grown to love Python.

My girlfriend had a laboriously scanned document in two pieces that needed to be combined. I had a .NET command line utility that I wrote a couple years ago that does just that, but because one of the scanned pages was a different size, it got cut off in the resulting output file. So I could either try to dig up some docs on the library I used, play around with Intellisense in Visual Studio, or see if there was a Python library that might work.

2 minutes of Googling later I had pyPDF, and maybe 10 minutes after that, I had reset the mediaBox on the outsized page, written it to a new pdf file, and emailed it off. Q.E.D.

DATEs aren't INTs, at least not any more

I updated a database column today to utilize the SQL 2008 DATE type, upgrading from a SMALLDATETIME which always had a 00:00:00 hh:mm:ss component. I was fairly certain that this would not break any existing code, so I did not perform an exhaustive code search. My mistake.

A function broke with the following message:

Msg 206, Level 16, State 2, Line 1
Operand type clash: date is incompatible with int

"Strange," I thought. "How would an int be getting used as a date?"

Of course there's the basic method of datetime construction from a string: CAST("2012-01-01" as datetime), but that doesn't involve any INTs. Did someone try to use a similar INT (20120101) in place of the string? No, because that would cause an arithmetic overflow...

Well, it turns out that there's another way of using INTs as DATETIMEs:

select cast(CAST('2012-01-10' as datetime) as int)

Apparently this doesn't work any more:

select cast(CAST('2012-01-10' as date) as int)

And this is what was being performed in the function in question:

CASE WHEN io_end > 40947 THEN 1 ELSE 0 END

Which promptly broke when I changed the column to a DATE. So let this be a warning: even something as innocuous as a DATETIME to DATE change needs QA and code search.

Thursday, February 23, 2012

Laptopistan is getting crowded

This is purely anecdotal, of course, but on the occasions when I've ventured out from my home office to get some fresh air or avoid being pestered by my dog, I've seen more and more people working remotely. Your average Starbucks is completely overrun with laptop-facing contractors, remote workers, and students these days, as are most other public areas with seating and free WiFi. Getting a table all to yourself is becoming rare, and forget about finding an open power outlet!

This says to me that there is a serious hole in the co-working space market. Assuming these Starbucks campers order an average of a $4 latte per day (some get black coffee, which is cheaper, but some add a muffin), and the cost of a dedicated desk at Affinity Lab is on the order of $900, that works out to a price differential of $800/mo. Surely there must be a point somewhere in the middle for which one could add certain beneficial services while keeping the cost enough to attract customers.

Starbucks, after all, does not exactly inspire loyalty - the seating is functional, but rarely comfortable; there are never enough outlets; the coffee is okay but somewhat expensive; the WiFi is often slow; the environment is noisier than a worker would prefer; and the venues themselves hardly encourage hard work. If I could pay $200/mo for a similarly casual but better-outfitted space in which I could work, chat, and drink coffee, I would jump at the chance. I've been trying to check out DCIOLab, but they haven't emailed me back.

Makes me think there's a definite business opportunity here, but certainly a risky one. Timing is important - is NOW the inflection point in density of remote workers, or a year from now? You can quickly go broke renting a commercial space for a year with not enough customers. Is there enough consistency in the office support requirements of the proposed customers? A good coffee pot or two (or a Chemex or Aeropress, maybe) is definitely necessary, but a copier? Fax machine? Phones? A conference room?

These are ways to differentiate a workspace from a Starbucks, but I think they may raise the cost too much and get too little use. I don't need any of those things besides the coffee-making apparatus. In fact, all I want is coffee, power, internet, good light, and preferably high ceilings. Could one provide those things, pack people in as tightly as they do at Starbucks, and make money charging $200/mo?

Thursday, February 16, 2012

Query plan troubleshooting

I was troubleshooting a medium-complexity query problem this morning in which the query execution within an app was orders of magnitude slower than the execution with the same parameters within Management Studio. I knew this could be due to different cached plans, but couldn't remember why.

Fortunately, Erland Sommerskog had written a detailed and helpful article explaining why this happens: http://www.sommarskog.se/query-plan-mysteries.html

And indeed, the app was using ODBC and therefore the results of sys.dm_exec_plan_attributes showed ARITHABORT OFF, while my instance of Management Studio had it set to ON, therefore generating a separate cached plan. I'm going to update my settings in SSMS to set this off by default, especially since it appears that it has no effect with ANSI_WARNINGS set to on, other than to foil attempts to troubleshoot slow procs.

Thursday, January 19, 2012

Refreshing Intellisense in SQL Management Studio 2008

If you're like me, you create and drop a lot of tables while developing, and you may be getting annoyed with SQL Server Management Studio because it fills in the names of old tables while you're typing unless you hit Escape. Turns out that refreshing the Intellisense cache is as easy as


Wednesday, January 18, 2012

If your DATE comes through pyodbc as a unicode, it's probably the driver

File this under "should have been obvious." I had a test script break today because I was trying to do date math on an object that wound up being a unicode instead of a date. I had gotten the variable value from a pyodbc query against a SQL 2008 database table with a DATE column. I knew that DATETIME columns came in as datetime.datetime objects, but the DATE column came in as a unicode, which seemed strange.

Turns out it's the ODBC driver. Demo:

In [78]: cnxn = pyodbc.connect("DSN=local;")

In [79]: crsr = cnxn.execute("select cast('2012-01-01' as date) as bar")

In [80]: r = crsr.fetchone()

In [81]: type(r.bar)
Out[81]: unicode

Then, when switching to SQLNCLI10:

In [96]: cnxn = pyodbc.connect("Driver={SQL Server Native Client 10.0};Server=localhost;Trusted_Connection=Yes;")

In [97]: crsr = cnxn.execute("select cast('2012-01-01' as date) as bar")

In [98]: r = crsr.fetchone()

In [99]: type(r.bar)
Out[99]: datetime.date

Tuesday, January 17, 2012

Faster, thinner, lighter

It's almost time to retire my trusty old Dell E4300. The warranty just expired, which means that if Dell has optimized their average component MTBF vs warranty span correctly, the laptop will implode shortly. I replaced the original hard drive with a 2nd-gen Sandforce SSD a year ago, and it's made a big difference, but I'm ready for a little more speed. And, with the advent of the "Ultrabook" form factor, for even less weight and size. I've decided I want one.

At CES this year, there were plenty of ultrabooks on display - Engadget's roundup has a few, more on Anandtech). And most of them seemed to get at least some part of the formula right, but I don't think any of them have quite grabbed me yet. Maybe I'm being picky, but I've decided that my top priorities are:

- 1600 x 900 resolution
- Thunderbolt port (for docking)
- decent battery life (5 hrs+)
- light weight

I'm sick of my current 1280 x 800 screen - the drop in my productivity from 2 big 1920 x 1080 screens down to the tiny laptop screen is painfully apparent. 1366 x 768 is arguably worse, since I really need those vertical lines of resolution. And although I move around a lot with a laptop, which is why I want a good battery and light weight, I also use it as my primary workstation, so I need to be able to dock it and use my big monitors. In fact, a laptop that would support 3 screens instead of 2 like my current Dell would be preferable. Few of the ultrabooks seem to have this capability, but at least a Thunderbolt port would make it theoretically possible.

Other requirements are pretty normal. The number of USB ports matters, but not hugely, since most peripherals I use at home while connected to a dock. I do use my SD card reader pretty often, so it would be nice to keep that. A decent keyboard, preferably backlit, would be useful.

Requirements in mind, I took a look at the current crop of ultrabooks just demoed at CES, and... was totally disappointed. Nothing had all the features I wanted. The closest ones were:

Samsung Series 9
- no Thunderbolt but both HDMI and DisplayPort connections
- 1600 x 900 resolution (on what I hear is a great screen)
- light weight (2.5 lbs)

Sony Vaio Z
- Thunderbolt, sorta, via a proprietary connector as usual (thanks a lot, Sony)
- 1600 x 900
- light weight
- ungodly price ($1900+)

Here's my spreadsheet comparing the current crop of ultrabooks and their cousins using my completely biased and proprietary scoring system. It's a definite work in progress and the scoring may change without warning. If something comes along that scores above a 3, I'll probably buy it, but right now, none of these are worth the money.

Wednesday, January 4, 2012

More fun with SSIS type conversions

Just finished painfully debugging a strange SSIS issue about which I could find no documentation anywhere on the web, so I'm noting it here.

I had changed a type within a large unwieldy data flow from DT_R4 (float) to DT_R8 (double), since the smaller data type was causing some weirdness around the least significant figures on some option prices. This seemed to work in testing, but I got a cryptic error about a conditional operation failing during a derived column transformation when I deployed it to Production. I opened the package, added an error redirection for the component and a data viewer, and sure enough, it was failing on a transformation that utilized some of the columns I had modified.

I couldn't find any reason why the conditional would fail, since it was faithfully spitting out BOOLs, but the resulting transformation split the float into Integer and Decimal by casting the value into a WSTR, 20 and then finding the decimal point. After trying a dozen other things, I tried adjusting the size of the WSTR cast to a WSTR, 40. That worked.

Note that the values that were failing would not even come close to 40 characters (ex: 110.68), but apparently the possibility of running into an overflow breaks something in the SSIS runtime.

So, a word to the wise: if casting from float (DT_R4 or DT_R8) to Unicode string (WSTR), make sure you leave enough room to cast any possible value.