Let me start by saying that this is not a post about concurrent BULK INSERTs into a single table or SELECTs during a BULK INSERT, both of which the SQL CAT has covered pretty effectively, along with many other questions about bulk data loading. My problem is with concurrent BULK INSERTs into different tables on the same database.
Basically, there are two different data sets, one large and one small, that get BULK INSERTed into two different tables in the same DB on different schedules. Like Alan's bicycle in Cryptonomicon, every so often the two schedules will align and the chain will fall off, or more accurately, the big BULK INSERT will hold up the small BULK INSERT. There doesn't seem to be a reason for the SQL engine to block one with the other; they're not hitting the same table, so locking should not be a problem, and the machine in question is a dual quad-core, so SOS_SCHEDULER_YIELDs are not prevalent.
OK, if it's not the SQL engine, it's probably a shared resource issue. Sure enough, an inspection of the wait types during a concurrent execution of these inserts revealed a lot of PAGEIOLATCH_SH, indicating a disk problem. But this database is in Simple recovery mode, so it shouldn't be doing a lot of logging on a bulk-logged operation like BULK INSERT. We should just be writing dirty pages to memory and then checkpointing at our leisure, right? Why are the disks getting hit so hard? Unless we don't have enough memory to write those dirty pages...
Which a perfmon counter log revealed to be precisely the case. The box only has 3.5 GB of RAM, has a number of other applications running, and shows high levels of paging and low Page Life Expectancy. The lack of memory available to SQL means that the larger dataset being BULK INSERTed won't fit into RAM, so checkpointing starts almost immediately and blocks the smaller BULK INSERT from completing until the slow local disks can finish with the bigger one. The easiest and cheapest solution will almost certainly be to add RAM, if only I can talk Systems Engineering into digging some up.
Wednesday, June 3, 2009
Wednesday, April 22, 2009
Performance of LIKE vs = in SQL queries
Got a question from a coworker yesterday about the relative performance of LIKE vs = in a SQL query, and had to think about it a bit. He wanted to know whether there was a performance penalty to using LIKE if there were no wildcards. In other words, if this:
SELECT * FROM tbFoo WHERE colA LIKE 'bar'
would perform equivalently to this:
SELECT * FROM tbFoo WHERE colA = 'bar'
My knee-jerk response was that the = would be faster, but I thought about it and realized that the query optimizer would actually see them as the same thing. A check of the query plans against a quickly-created tbFoo confirmed it. So that's what I told him.
Except that I realized a moment later that there was a major caveat - the query optimization depends on how the statement is parameterized. If it's purely ad hoc SQL and being compiled at run-time, then the statements are equivalent, but if there's going to be any plan re-use, either by including the statement in a stored proc or preparing it and executing via sp_executesql, the LIKE will impose a significant penalty.
This is because the optimizer doesn't know at compile time whether the parameter to the LIKE operator will contain a wild card, so it can't use the more specific optimization (including index selection) that it could with an = operator. So if you mostly pass parameters without wildcards, you will be executing a suboptimal query plan. Keep this in mind when designing your query!
---
I realized after writing this that this topic has been covered in more depth and with more eloquence by Erland Sommarskog and any number of other SQL MVPs and bloggers, but this post is already written, so I'm releasing it into the wild anyway.
SELECT * FROM tbFoo WHERE colA LIKE 'bar'
would perform equivalently to this:
SELECT * FROM tbFoo WHERE colA = 'bar'
My knee-jerk response was that the = would be faster, but I thought about it and realized that the query optimizer would actually see them as the same thing. A check of the query plans against a quickly-created tbFoo confirmed it. So that's what I told him.
Except that I realized a moment later that there was a major caveat - the query optimization depends on how the statement is parameterized. If it's purely ad hoc SQL and being compiled at run-time, then the statements are equivalent, but if there's going to be any plan re-use, either by including the statement in a stored proc or preparing it and executing via sp_executesql, the LIKE will impose a significant penalty.
This is because the optimizer doesn't know at compile time whether the parameter to the LIKE operator will contain a wild card, so it can't use the more specific optimization (including index selection) that it could with an = operator. So if you mostly pass parameters without wildcards, you will be executing a suboptimal query plan. Keep this in mind when designing your query!
---
I realized after writing this that this topic has been covered in more depth and with more eloquence by Erland Sommarskog and any number of other SQL MVPs and bloggers, but this post is already written, so I'm releasing it into the wild anyway.
Notepad++ 5.3.1 breaks the Compare plugin
I use Notepad++ all the time for quick editing of SQL, Python, XML, and all sorts of other types of files, and when I opened it today, it told me there was an update available, so I updated to 5.3.1 .
Then this afternoon when I went to compare two SQL files using the Compare plugin, I couldn't find it, and spent a good 5 minutes combing through the menus before I realized it was really gone. Arggh!
I use the Compare tool all the time, so I immediately downgraded to version 5.2, where it still works. So just a note of caution if you use Compare in Notepad++, 5.3.1 appears to break it - hopefully they'll get that straightened out soon.
Then this afternoon when I went to compare two SQL files using the Compare plugin, I couldn't find it, and spent a good 5 minutes combing through the menus before I realized it was really gone. Arggh!
I use the Compare tool all the time, so I immediately downgraded to version 5.2, where it still works. So just a note of caution if you use Compare in Notepad++, 5.3.1 appears to break it - hopefully they'll get that straightened out soon.
Friday, March 13, 2009
pyodbc doesn't seem to like BLOBs
I tried to pull a Binary Large OBject out of a SQL Server database yesterday via pyodbc, and it blew up.
In [4]: c = pyodbc.connect('DRIVER={SQL Server};SERVER=servername;')
In [5]: r = c.execute('select Blob from dbo.tbDoc where DocID = 8')
In [6]: row = r.fetchone()
assertion failed: getdata.cpp(171)
cbRead == 0 || pch[cbRead-element_size] != 0
And crashed my IPython shell, to boot.
Python's DB API spec (aka PEP 249) doesn't appear to specify exactly how to handle BLOBs, but it does mention them, so I presume they're supposed to be supported. I wound up grabbing this particular BLOB in a C# app instead, but this seems like it could be an issue in the future, so if anyone knows what's going on, please let me know. Otherwise I'll dig into this issue next time I have some spare time or a pressing need to manipulate BLOBs in Python.
UPDATE: I wasn't on the latest build of pyodbc, so I updated to 2.1.3. After updating, I was able to fetch the binary stream from that row. Hooray!
Although when attempting to print it out, I got some weird behavior - my PC's internal speaker started beeping like mad and the window stopped responding. As my colleague said:
P.V. [1:57 PM]:
What the hell was stored in that blob?
Ira Pfeifer [1:58 PM]:
just a hex stream
P.V. [1:58 PM]:
This is like something from Gibson.
Strange...
In [4]: c = pyodbc.connect('DRIVER={SQL Server};SERVER=servername;')
In [5]: r = c.execute('select Blob from dbo.tbDoc where DocID = 8')
In [6]: row = r.fetchone()
assertion failed: getdata.cpp(171)
cbRead == 0 || pch[cbRead-element_size] != 0
And crashed my IPython shell, to boot.
Python's DB API spec (aka PEP 249) doesn't appear to specify exactly how to handle BLOBs, but it does mention them, so I presume they're supposed to be supported. I wound up grabbing this particular BLOB in a C# app instead, but this seems like it could be an issue in the future, so if anyone knows what's going on, please let me know. Otherwise I'll dig into this issue next time I have some spare time or a pressing need to manipulate BLOBs in Python.
UPDATE: I wasn't on the latest build of pyodbc, so I updated to 2.1.3. After updating, I was able to fetch the binary stream from that row. Hooray!
Although when attempting to print it out, I got some weird behavior - my PC's internal speaker started beeping like mad and the window stopped responding. As my colleague said:
P.V. [1:57 PM]:
What the hell was stored in that blob?
Ira Pfeifer [1:58 PM]:
just a hex stream
P.V. [1:58 PM]:
This is like something from Gibson.
Strange...
Monday, February 2, 2009
Weird SQLCMD issue
I've got a server that runs the open source Job Scheduler to execute scheduled tasks, and I encountered a strange problem with SQLCMD today. Basically, a very simple script fails to execute:
2009-02-02 13:52:01.224 [info] C:\Program Files\scheduler>sqlcmd -S TestServer1 -E -i "c:\temp\test.sql"
2009-02-02 13:52:01.224 [info] SCHEDULER-915 Process event
2009-02-02 13:52:01.224 [ERROR] SCHEDULER-280 Process terminated with exit code 1 (0x1)
This script executes without a problem on the command line, and
echo %errorlevel%
after its execution returns 0. I thought this might be an authentication issue, but switching to SQL authentication and entering the credentials explicitly on the command line produced the same result.
The strangest part of the whole thing is that switching from SQLCMD to OSQL makes the script work fine. I'd rather use SQLCMD, but I don't have time to investigate the minute difference in output that's causing JS to barf on it, so I guess it's back to the old reliable OSQL. At least until it's no longer supported.
2009-02-02 13:52:01.224 [info] C:\Program Files\scheduler>sqlcmd -S TestServer1 -E -i "c:\temp\test.sql"
2009-02-02 13:52:01.224 [info] SCHEDULER-915 Process event
2009-02-02 13:52:01.224 [ERROR] SCHEDULER-280 Process terminated with exit code 1 (0x1)
This script executes without a problem on the command line, and
echo %errorlevel%
after its execution returns 0. I thought this might be an authentication issue, but switching to SQL authentication and entering the credentials explicitly on the command line produced the same result.
The strangest part of the whole thing is that switching from SQLCMD to OSQL makes the script work fine. I'd rather use SQLCMD, but I don't have time to investigate the minute difference in output that's causing JS to barf on it, so I guess it's back to the old reliable OSQL. At least until it's no longer supported.
Thursday, January 22, 2009
SSIS transactions, or Another Reason To Hate Windows Firewall
I created an SSIS package today to archive a bunch of tables from a source server into one table on a destination server and then drop them on the source. I wanted the process to be idempotent, such that the tables would either be fully archived and dropped or the whole thing rolled back. Thus the process could be re-run if it failed in the middle without fear of duplicating data or prematurely dropping tables.
The obvious solution was a transaction. I set the TransactionOption property on my ForEach loop to Required and ran the package. However, MSDTC was not cooperating, and I got this failure:

With the following error message: "The transaction has already been implicitly or explicitly committed or aborted."
I checked MSDTC on both servers involved, and they each seemed to be set up correctly, albeit with slightly different settings. MSDTC on my machine was also configured correctly. Then I remembered that my desktop has Windows Firewall running. Don't ask me why, it wasn't my idea, but I can't turn it off.
So I added an exception to Windows Firewall for c:\windows\system32\msdtc.exe (port 135) and the package started working. Hooray!

Except that when I killed it midway through to simulate a network or SSIS host failure, it rolled back the whole thing. I wanted the ForEach loop to commit each table load and drop as it went so that I wouldn't have to start from scratch if the package failed after a few tables. So I created a Sequence container inside the ForEach container, set the TransactionOption to Required for that container, moved the Transfer and Drop tasks inside it, and set the TransactionOption to Supported for the ForEach.
The obvious solution was a transaction. I set the TransactionOption property on my ForEach loop to Required and ran the package. However, MSDTC was not cooperating, and I got this failure:

With the following error message: "The transaction has already been implicitly or explicitly committed or aborted."
I checked MSDTC on both servers involved, and they each seemed to be set up correctly, albeit with slightly different settings. MSDTC on my machine was also configured correctly. Then I remembered that my desktop has Windows Firewall running. Don't ask me why, it wasn't my idea, but I can't turn it off.
So I added an exception to Windows Firewall for c:\windows\system32\msdtc.exe (port 135) and the package started working. Hooray!

Except that when I killed it midway through to simulate a network or SSIS host failure, it rolled back the whole thing. I wanted the ForEach loop to commit each table load and drop as it went so that I wouldn't have to start from scratch if the package failed after a few tables. So I created a Sequence container inside the ForEach container, set the TransactionOption to Required for that container, moved the Transfer and Drop tasks inside it, and set the TransactionOption to Supported for the ForEach.
Wednesday, January 7, 2009
sql_variant causes SSIS metadata validation error
Just a quick note about an error that stumped me for a minute.
I was setting up a Lookup task in SSIS 2005 using the results of a SQL query as the lookup, but I kept getting the following error when attempting to create the column mappings:
TITLE: Microsoft Visual Studio
------------------------------
Error at Load Position Limit Data [Lookup [49]]: SSIS Error Code DTS_E_OLEDBERROR. An OLE DB error has occurred. Error code: 0x8000FFFF.
Error at Load Position Limit Data [Lookup [49]]: The call to Lookup transform method, ReinitializeMetadata, failed.
------------------------------
ADDITIONAL INFORMATION:
Exception from HRESULT: 0xC02090F0 (Microsoft.SqlServer.DTSPipelineWrap)
This didn't seem like a very helpful error, until I thought about what ReinitializeMetadata probably did, and then realized that my query selected a sql_variant column without casting it. Casting the sql_variant to a more appropriate data type (varchar, in this case) resolved the error.
I was setting up a Lookup task in SSIS 2005 using the results of a SQL query as the lookup, but I kept getting the following error when attempting to create the column mappings:
TITLE: Microsoft Visual Studio
------------------------------
Error at Load Position Limit Data [Lookup [49]]: SSIS Error Code DTS_E_OLEDBERROR. An OLE DB error has occurred. Error code: 0x8000FFFF.
Error at Load Position Limit Data [Lookup [49]]: The call to Lookup transform method, ReinitializeMetadata, failed.
------------------------------
ADDITIONAL INFORMATION:
Exception from HRESULT: 0xC02090F0 (Microsoft.SqlServer.DTSPipelineWrap)
This didn't seem like a very helpful error, until I thought about what ReinitializeMetadata probably did, and then realized that my query selected a sql_variant column without casting it. Casting the sql_variant to a more appropriate data type (varchar, in this case) resolved the error.
Subscribe to:
Posts (Atom)
