Tuesday, August 23, 2011

SSIS, MSDTC, and SQL Server clusters

Note to self: when receiving the following error from an SSIS package, the first thing to check is the Authentication setting for MSDTC.

Description: The SSIS Runtime has failed to enlist the OLE DB connection in a distributed transaction with error 0x8004D00E "The transaction has already been implicitly or explicitly committed or aborted".

The security setting should be set to No Authentication Required. Mutual does not appear to work with clusters. This is especially significant if, like me, you have a non-clustered instance in QA and a clustered one in Prod and you have been beating your head against the wall trying to figure out why MSDTC, with all the same settings, works in the former but not the latter.

Friday, July 1, 2011

The Query Did Not Run

Why thank you, Excel, that's a very helpful error message. Except I was watching the query execute on SQL Server, and it DOES run. Perhaps there could be a different problem?

The actual, exact text of the error is

The query did not run, or the database table could not be opened.

The sheet in question was running a stored procedure, which had been working fine, albeit very slowly. A coworker tuned the proc and happened to use a temp table to accumulate the results for multiple queries instead of a UNION, producing better execution plans for each. However, despite being 4x faster, the proc stopped running from Excel, throwing the error message above.

At first I thought there might be a problem with the final DROP #table statement in the proc, so I removed that. No help. I adjusted various properties of the database connection from Excel. No change. I fiddled with the table setup in Excel. Nothing.

Then I remembered that I had some difficulties with pyodbc a while back that were resolved by setting NOCOUNT ON in the query. I can't recall the exact cause, but it's something about the driver seeing the first count returned as the query result, when of course it's not in a proc that populates a table with a few queries and then selects from it as the final step. Anyway, I set NOCOUNT ON at the beginning of the proc and OFF at the end and voila, it worked. So try that if you're having difficulties with Excel not executing your stored proc correctly.

Tuesday, June 21, 2011

So you've got a problem, and you want to solve it using SQL_VARIANT

Now you've got a bigger problem, because using SQL_VARIANT for anything is a terrible idea. It's a memory and storage hog, it generally can't be indexed, it produces inconsistent and unpredictable behavior in code, and it's not supported by most client libraries. Don't use it.

Of course, you may not have a choice, if some esteemed predecessor has implemented database structures with it. In that case, you should definitely read up on it, but I just wanted to throw out one particularly useful function that I encountered yesterday while tearing my hair out trying to solve a problem with our resident SQL_VARIANT infection.

CREATE TABLE dbo.tbFoo (
ID INT IDENTITY(1,1),
bar SQL_VARIANT)

That function would be SQL_VARIANT_PROPERTY(). The primary use of this function is to get the BaseType of the SQL_VARIANT column or variable, as in:

SELECT SQL_VARIANT_PROPERTY(bar, 'BaseType') FROM dbo.tbFoo

This will let you know whether your variable or column is actually being used as an int, varchar, nvarchar, etc. There are other properties that can be checked as well using this function; essentially anything that sp_help would usually give you.

Knowing the base type can be enormously helpful. For example, when you encounter code that joins the table to itself to compare SQL_VARIANT values directly...

SELECT * FROM dbo.tbFoo f1
INNER JOIN dbo.tbFoo f2
ON f1.bar = f2.bar
AND f1.ID < f2.ID

Now say you have dozens of processes inserting into this table, and they're not particularly standardized. Say one of them inserts non-unicode strings (VARCHAR) while another inserts unicode strings (NVARCHAR).

INSERT INTO dbo.tbFoo SELECT CAST('A1B2C3' as varchar)
INSERT INTO dbo.tbFoo SELECT CAST('A1B2C3' as nvarchar)

If you had CAST them in your query, they would be comparable, but without the cast, they may or may not be, and you might not get the row back that you expected.

The best fix is to avoid writing this code, and be sure to CAST your SQL_VARIANTs into the type you're expecting to use, but if you need to restore functionality to this type of code, you can update your SQL_VARIANT column base type.

UPDATE dbo.tbFoo SET bar = CAST(bar as NVARCHAR) WHERE SQL_VARIANT_PROPERTY(bar, 'BaseType') = 'varchar'

All this may or may not help you with your SQL_VARIANT problems, as there's plenty of unexplored territory even within the issues I've touched upon here. But it would have helped me yesterday. Good luck.

Wednesday, May 11, 2011

The Rare, Simple SQL Wait Fix

A "power user" reported some slowness on one of our big servers today while it was being pummeled by about 60 simultaneous executions of the same script. The wait type was PAGELATCH_UP, which I vaguely recalled to mean something about allocation contention. The Microsoft Waits and Queues tuning guide confirmed that, so I checked the wait resources. They all started with 2:6 and 2:9, so it definitely had to do with tempdb.

This server is pretty beefy, a 4-socket quad-core with Xeon 7350s, but tempdb only had 8 files. I know that the rule of thumb of 1 file per core is no longer quite so hard and fast, but I figured it probably wouldn't hurt here. Created an extra 8 files, equalized the file sizes on the current ones, and had the user kick off the process again.

No waits! Or at least, no PAGELATCH_UP waits. Some SOS_SCHEDULER_YIELDs and CXPACKETs, but I took that to mean that we had successfully shifted the bottleneck off of allocations and onto CPU, where it should be.

It's rare that 5 minutes of configuration change can effect a significant gain in process speed, but it's pretty satisfying when it happens.

Friday, April 1, 2011

In SSIS, memory corruption error == bad parameter passing

Had a strange situation with an SSIS package that would occasionally produce the following error message from an Execute SQL task:

"Attempted to read or write protected memory. This is often an indication that other memory is corrupt."

After several rounds of machine reboots, file checking, and messing with the connection manager, I finally checked the actual query being executed (I know, way to work backwards). It had a parameter being passed in that must have been hand-entered, because it was lacking the namespace, i.e.

ParamName1

instead of

User::ParamName1.

Corrected that, and everything worked fine. So the error message is thoroughly misleading, but if you investigate the query execution, you may find a bad parameter name.


Wednesday, March 16, 2011

More Resolver One nice-to-haves

Working on Resolver One stuff again, and came up with a few more things I'd like to see.

I've been having problems moving some of my IronPython code out of my worksheet and into an IronPython module, but I assume that's due to my own lack of understanding of the clr - the external libraries seem to get hung up on the import statements and not get any further.

Which leads me to my first request: enable an easy way to reload a workbook. At the moment, any time I make a change in an external file, I have to close the entire workbook and re-open it. I can't just recalculate and have it re-import the module, I have to close the workbook (which generally means the entire program) and re-open it. It would be nice to have a way to "Reload from disk" or something similar that does a thorough reinitialization of the environment.

The second thing is something I just posted about on the Resolver forums, so I'm hopeful that I'll come upon a solution via that line of inquiry, but basically, I want SSL support in Resolver. I was trying to connect to a gmail server and use starttls() from smtplib, but couldn't do it because of the lack of SSL support in IronPython 2.6.0, which I believe is what's under the hood in Resolver One.

I'm using Resolver for a couple projects now, so here's hoping that development continues and that I can see some of these things working in the future.

Friday, March 11, 2011

Context switching is more expensive when bandwidth is limited

This post is just made up of more idle musings and could really be shortened to a tweet, but while on the road this week with access to the office via questionable and/or slow internet connections, I've noticed that context switching, both literal and figurative, is much more expensive with low bandwidth.

This applies across all sorts of situations. I get a lot of IMs from co-workers, for example. Normally I don't notice their impact much, since I have two big monitors worth of screen real estate and I can switch back and forth between IM windows and coding or testing quickly. But with less screen space (1280 x 800 instead of 3840 x 1080), the IM windows take up more space and I have to move them around frequently, which takes up more time.

Also, with slow internet connections, actually switching between windows is sometimes difficult, as redrawing a bigger chunk of the screen causes the whole connection to freeze for as long as several minutes at a time. And as upstream bandwidth is always more limited than downstream, sometimes my typing or clicks don't get transmitted for extended periods of time. It's very frustrating, to the point where I'm planning to get a mobile internet solution of some kind very soon. The travel has also cost me quite a few mobile phone minutes, as I can't use VOIP over the sketchy internet connections.

In other remote working news, I've noticed two startups in the space mentioned recently on TechCrunch: LiquidSpace and LooseCubes. I'm hoping that one or both will help provide the co-working experience I've been seeking and not finding in DC (and potentially elsewhere).