Showing posts with label bulk insert. Show all posts
Showing posts with label bulk insert. Show all posts

Wednesday, October 5, 2011

Loading Fixed Width files with BCP / Bulk Insert

I had to craft some format files in order to load a couple of fixed width files using the SQL Server BULK INSERT / bcp tools, and had some issues with the documentation when trying to get them to work.

So let me state this explicitly: when attempting to load a file with no line breaks, you do not need a terminator for any field in your format file. Just specify all the field lengths, set the xsi:type to CharFixed, and the whole thing should stream in. Ex:

<record>
<field id="1" type="CharFixed" length="12" />
<field id="2" type="CharFixed" length="3" />
<field id="3" type="CharFixed" length="5" />
</record>
<row>
<column source="1" name="series" type="SQLCHAR" />
<column source="2" name="pool" type="SQLCHAR" />
<column source="3" name="deal" type="SQLCHAR" />
</row>

You can still mix and match - have a terminator for the last field in each line, for example - but if your file has 0 line breaks, you don't need it.

Also, the easiest way I found to handle skipping columns was to use OPENROWSET(BULK). Just make sure you're selecting the names of the columns from the ROW section of the format file, not the RECORD section.

i.e.


SELECT series, deal FROM
OPENROWSET(BULK 'source.txt',
FORMATFILE='sample.xml'
) as x;

The documentation covers that part a bit better, but just wanted to reproduce it for my own sake, since I'm sure I'll forget how I did it between now and the next time I use BULK INSERT a few years from now.

P.S. Gotta throw in a plug for tohtml.com here for making my code actually paste into Blogger and look decent to boot.

Wednesday, June 3, 2009

Conflicting BULK INSERTs

Let me start by saying that this is not a post about concurrent BULK INSERTs into a single table or SELECTs during a BULK INSERT, both of which the SQL CAT has covered pretty effectively, along with many other questions about bulk data loading. My problem is with concurrent BULK INSERTs into different tables on the same database.

Basically, there are two different data sets, one large and one small, that get BULK INSERTed into two different tables in the same DB on different schedules. Like Alan's bicycle in Cryptonomicon, every so often the two schedules will align and the chain will fall off, or more accurately, the big BULK INSERT will hold up the small BULK INSERT. There doesn't seem to be a reason for the SQL engine to block one with the other; they're not hitting the same table, so locking should not be a problem, and the machine in question is a dual quad-core, so SOS_SCHEDULER_YIELDs are not prevalent.

OK, if it's not the SQL engine, it's probably a shared resource issue. Sure enough, an inspection of the wait types during a concurrent execution of these inserts revealed a lot of PAGEIOLATCH_SH, indicating a disk problem. But this database is in Simple recovery mode, so it shouldn't be doing a lot of logging on a bulk-logged operation like BULK INSERT. We should just be writing dirty pages to memory and then checkpointing at our leisure, right? Why are the disks getting hit so hard? Unless we don't have enough memory to write those dirty pages...

Which a perfmon counter log revealed to be precisely the case. The box only has 3.5 GB of RAM, has a number of other applications running, and shows high levels of paging and low Page Life Expectancy. The lack of memory available to SQL means that the larger dataset being BULK INSERTed won't fit into RAM, so checkpointing starts almost immediately and blocks the smaller BULK INSERT from completing until the slow local disks can finish with the bigger one. The easiest and cheapest solution will almost certainly be to add RAM, if only I can talk Systems Engineering into digging some up.