Had a convoluted replication problem this morning that turned out to be caused by mirroring. Users noticed a lack of data on a replicated server that was quickly traced back to replication failing to deliver commands. The Log Reader Agent noted the following:
Replicated transactions are waiting for next Log backup or for mirroring partner to catch up
There was nothing wrong with log backups, so mirroring was the next thing to be checked. It was suspended, and attempting to resume it failed. Checking the Mirroring Monitor yielded nothing helpful, but the SQL Server error log on the host server showed the following:
Date x/xx/xxxx x:xx:xx AM
Log SQL Server (Current - x/xx/xxxx x:xx:00 AM)
Source spid34s
Message
'TCP://foo.bar.com:5022', the remote mirroring partner for database 'SomeDB', encountered error 3624, status 1, severity 20. Database mirroring has been suspended. Resolve the error on the remote server and resume mirroring, or remove mirroring and re-establish the mirror server instance.
On the mirrored server, there were additional errors, including a stack dump, but this was the most relevant:
Date x/xx/xxxx x:xx:xx AM
Log SQL Server (Current - x/xx/xxxx x:xx:00 AM)
Source spid39s
Message
Error: 3624, Severity: 20, State: 1.
Date x/xx/xxxx x:xx:xx AM
Log SQL Server (Current - x/xx/xxxx x:xx:00 AM)
Source spid39s
Message
SQL Server Assertion: File:, line=807 Failed Assertion = 'result == LCK_OK'. This error may be timing-related. If the error persists after rerunning the statement, use DBCC CHECKDB to check the database for structural integrity, or restart the server to ensure in-memory data structures are not corrupted.
Assume replication does not depend on mirroring. Server A is both the replication publisher and mirror source. It replicates rows to Server B and mirrors to Server C. Mirroring breaks at time t0. Rows continue to replicate, and are marked as replicated on Server A.
Server A fails over to Server C at time t1. Replication picks up from there with Server C as published. Server C republishes the rows between t0 and t1 to Server B because it does not have a record of those rows having been replicated.
Makes sense now, right?
Replicated transactions are waiting for next Log backup or for mirroring partner to catch up
There was nothing wrong with log backups, so mirroring was the next thing to be checked. It was suspended, and attempting to resume it failed. Checking the Mirroring Monitor yielded nothing helpful, but the SQL Server error log on the host server showed the following:
Date x/xx/xxxx x:xx:xx AM
Log SQL Server (Current - x/xx/xxxx x:xx:00 AM)
Source spid34s
Message
'TCP://foo.bar.com:5022', the remote mirroring partner for database 'SomeDB', encountered error 3624, status 1, severity 20. Database mirroring has been suspended. Resolve the error on the remote server and resume mirroring, or remove mirroring and re-establish the mirror server instance.
On the mirrored server, there were additional errors, including a stack dump, but this was the most relevant:
Date x/xx/xxxx x:xx:xx AM
Log SQL Server (Current - x/xx/xxxx x:xx:00 AM)
Source spid39s
Message
Error: 3624, Severity: 20, State: 1.
---
Date x/xx/xxxx x:xx:xx AM
Log SQL Server (Current - x/xx/xxxx x:xx:00 AM)
Source spid39s
Message
SQL Server Assertion: File:
Severity 20 == bad. This basically comes down to corruption on the mirror target, which irreparably broke mirroring. This, in turn, broke replication.
So why, the technical user asked, does replication between business-critical Production servers depend on mirroring to a disaster recovery environment? I had to think about that one, but there's a good reason.
Assume replication does not depend on mirroring. Server A is both the replication publisher and mirror source. It replicates rows to Server B and mirrors to Server C. Mirroring breaks at time t0. Rows continue to replicate, and are marked as replicated on Server A.
Server A fails over to Server C at time t1. Replication picks up from there with Server C as published. Server C republishes the rows between t0 and t1 to Server B because it does not have a record of those rows having been replicated.
Makes sense now, right?