NYC DBA

Friday, May 2, 2008

How to break your FTP client

Haven't posted in a while because I was in Russia for a few weeks checking out the smog, vodka, and beautiful women. Now I'm back, and I have a quick non-database item to relate.

Today I was working on scheduling the execution of some updater software from a small data vendor of ours. As it turns out, they have no spec or unified framework for their error handling and messages, so I had to work out what sort of errors could happen on my own. One thing the updater does is log on to the vendor's server via FTP and download new files. So naturally I wanted to prevent this to see what would happen.

The easiest way would have been to just unplug my network connection, but the app needs some files that live on the network, so that was out.

I tried to block the program with Windows Firewall, but we've got a bunch of Group Policy stuff applied there, which complicated things. Even when I unchecked my local firewall exception for the app, and then also the exception for FTP, the application connected to the vendor's FTP server without any trouble.

Next I went out and found a 3rd party app that could block ports. The one I settled on after a quick browse of some security sites was Ghost Personal Firewall. Simple, straightforward, free, quick installation - suited my needs perfectly. And it worked great. It took 2 minutes to download, install (no reboot needed), and configure a rule to block all traffic on port 21. This had the desired effect of breaking the app and producing an error log.

So consider this an unsolicited endorsement of Ghost's firewall. Served my purpose just fine, and it's going into my security toolkit for future app testing.

Thursday, March 27, 2008

Returning rows from a stored procedure with pyodbc

I'll start with the executive summary for those of you short on time or who don't care about the background:

When executing multiple statements within a SQL Server stored procedure, including a SELECT statement that you expect to return rows to a rowset via pyodbc, it is necessary to SET NOCOUNT ON.

Backstory:

I ran into a new issue with pyodbc yesterday - an inability to parse rows returned by a SQL Server stored procedure. The proc in question was executing a number of queries, generating logging as rows to a table variable, and then returning the log via a select statement against the table variable.

However, when I executed the statement from a pyodbc connection, it didn't work.

This was a fairly complex proc that caused changes in the database when executed and I didn't want to run it repeatedly, so I mocked up "spTest" as follows:

create proc spTest as
select 'a' as msg
union all select 'b'
go

When executed from Python:

db = pyodbc.connect(CNXNSTRING, True)
rows = db.execute("exec spTest")
for row in rows:
print row

This returned rows. So I added a table variable:

create proc spTest2 as
declare @msgs table (msg varchar(255))

insert into @msgs
select 'a' as msg
union all select 'b'

select * from @msgs
go

This, being more similar to the actual proc, did not work. I changed a few things, tried a temp table instead of a table variable, etc., but didn't get any better results. Then my esteemed colleague Pijus asked if it was possible that pyodbc was reading some other channel of information from SQL that was obscuring the results. Aha!

SQL spits out rowcounts by default. I'm not sure where these turn up in ODBC world, but I know that they do get provided to the ODBC driver, and that does constitute an additional channel of information that could be confusing pyodbc. So I SET NOCOUNT ON, and voila, the final select returns results regardless of how many other operations occur within the procedure.

Wednesday, March 12, 2008

Case Sensitive GROUP BY

This is fairly trivial, but it took a few minutes to work out, so I figured I'd post it. Someone on my team needed to perform a case-sensitive GROUP BY operation on a table with a case-insensitive collation. I had performed case-sensitive SELECTs before by "casting" the field in question to an alternate collation, so I tried that, but couldn't get it to work at first. Of course, had I thought about it, I would have realized that the same operation needs to be applied in both the GROUP BY and the SELECT.

Ex:

create table foo
(bar varchar(20))

insert into foo select 'Banana'
insert into foo select 'banana'
insert into foo select 'Banana'

select * from foo

select
bar COLLATE SQL_Latin1_General_CP1_CS_AS
, count(*)
from foo
group by bar COLLATE SQL_Latin1_General_CP1_CS_AS

Alternately, one could use Microsoft's recommended method and convert the field to binary and then back. I haven't performance-tested the two methods, so I'm not sure which is faster. I leave that as an exercise for the reader. ;)

select convert(varchar(20),x.bar), x.ct
from
(select
convert(binary(20), bar) as bar
, count(*) as ct
from foo
group by convert(binary(20),bar)
) as x

Monday, March 10, 2008

INSERT problems via pyodbc

This is really stupid, but I just wasted about an hour of dev time trying to figure out why my INSERT statements weren't working from a Python script using pyodbc even though SELECTs were fine. As it turns out, they were getting rolled back because I wasn't explicitly committing the transactions and hadn't set the AUTOCOMMIT option to true for the connection.

So if you happen to search for "pyodbc INSERT problems" (as I did), hopefully you'll stumble across this (as opposed to nothing, which is what I found) and slap yourself in the head (as I did).

SQL Server CLR Remote Debugging

My CLR stored procs have matured to the point that they're being deployed, which is to say that the bugs are now insidious rather than blatant. It's a royal pain to make changes, deploy the project locally, run the post-deploy correction script to fix the decimal precision on everything (see previous post), debug any issues, and then do the whole deployment again to the dev server. It's much easier to just debug it straight on the server.

Fortunately, remote debugging is a straightforward process. Predictably, Microsoft's docs (here's one of many) on the subject are somewhat convoluted, although this msdn blog post helped.

All you really need to do is install Remote Debugging Monitor on the server and run it under the same login you use to connect to SQL. It comes with its own standalone installer on the Visual Studio CD, just look for the Remote Debugger directory and run the setup app found there. Then you can run it either manually or as a service.

Next, place a breakpoint in your code somewhere and deploy it to the server. I've had trouble with my test scripts when deploying/debugging all at once - basically, they hang without ever running - so what I've started doing is this:

1) Start without debugging (Ctrl-F5) to deploy to the remote server
2) Run my decimal parameter correction script
3) Attach to Process in Visual Studio to sqlservr.exe on the remote server
4) Execute the proc or UDF in a SQL Management Studio window

Then I can step through the code, look at locals, and generally debug to my heart's content.

Friday, February 22, 2008

SqlDecimal truncation oddity

I've been working with all sorts of SQL CLR integration lately - User Defined Types (which are a bad idea), stored procs, and UDFs. In general, I'm pretty happy with the stored procedures, although I've run up against a number of irritating (but understandable) limitations, like the inability to dynamically load libraries and the requirement that only SQL data types be passed in and out of the CLR stored procs. But I haven't really broken any new ground, so I haven't written about it.

Today's issue is also not ground-breaking, but it is odd, so I thought I'd mention it. I wrote a function to convert a date into a decimal. Better that you not ask why. Here's the code:

public partial class UserDefinedFunctions
{
[Microsoft.SqlServer.Server.SqlFunction]
public static SqlDouble fnTimeSeriesCalcFracYrFromDateS(SqlDateTime inputDT)
{
DateTime dt = (DateTime)inputDT;
double temp = (dt.Year + (Math.Round(((double)dt.Month * (365 / 12) + (double)dt.Day) * 1000000 / 365) / 1000000));
SqlDecimal temp2 = (SqlDecimal)temp;
return temp2;
}
};

Obviously I could have dispensed with almost all of that by casting the data types within a single line, but I broke it out for debugging purposes.

Here's the funny part: when placing a breakpoint at the return statement, here are the values of the variables:

inputDT {1/2/2008 12:00:00 AM} System.Data.SqlTypes.SqlDateTime
dt {1/2/2008 12:00:00 AM} System.DateTime
temp 2008.087671 double
temp2 {2008.0876710000000} System.Data.SqlTypes.SqlDecimal

All as they should be. But the output of the function is 2008.

Why? I have no idea. I'm guessing it has something to do with the SqlDecimal declaration of the function, because when I changed it to SqlDouble, it returns 2008.087671 as it should. Anybody know why this is happening?

UPDATE: Figured it out. Visual Studio deploys the UDF without specifying a precision and scale, which causes the scale to be set to 0. One can circumvent this by dropping and re-creating the function manually within SQL Server and declaring the decimal data type with the desired precision and scale (in this case, (10,6)).

However, this is annoying because it turns a one-button deploy/debug process into a multi-step process, since one can't use the test script functionality in Visual Studio to run a script with multiple "GOs" to drop and re-create the function and then run it. The workaround I devised was to

1) Deploy the function to SQL via Visual Studio.
2) Drop and re-create the function manually in SQL Management Studio.
3) Set a breakpoint in VS.
4) Attach VS to the sqlservr.exe process.
5) Run the function. It should hit the breakpoint, allowing you to debug when it has the correct variable type settings.

But it would be nice to get back to where one press of F5 deployed and debugged the whole thing. Maybe I can figure out a way to set up a custom deploy script from VS...

UPDATE 2: When attempting to deploy the CLR functions after manually dropping them and re-creating them with the desired precision on the decimal return types, you get a Deploy error like the following:

The assembly module 'fnTimeSeriesCalcFracYrFromDateS' cannot be re-deployed because it was created outside of Visual Studio. Drop the module from the database before deploying the assembly.

To get around this, you can write a SQL script to drop the functions if they exist, save it into your project directory, and use the Pre-build event command line on the Build Events tab of the project properties to execute that drop script before each deploy. One less step per build!

Next I have to figure out how to get the post-build to execute the script to drop and recreate the functions with the proper precision.

Monday, January 28, 2008

Arggh

I wrote this nice little piece of code to eliminate tons of redundant rows from a 70-million row table, tested it in a vacuum in Dev, put it in the Integration environment for 3 weeks... and had it blow up spectacularly when applied to Production this weekend because of triggers on the table that I hadn't accounted for. Stupid legacy code!

There's no real content to this story, I'm just venting. The only moral is TEST YOUR CODE.