T-SQL Tuesday #114 – The SQL Puzzle Party

T-SQL Tuesday logoThere were times when I tried to look for puzzles to solve, especially the T-SQL puzzles (what happened to the T-SQL Challenge site?). Now I don’t. Life is challenging as it is, especially if you work with SQL Server and really try to understand what’s going on.

So rather than coming up with some contrived problem for you to solve as part of this edition of T-SQL Tuesday (thank you Matthew McGiffen) I will share something that surprised me only last week. And yes, I have solved it already, and will be blogging more about it soon so no there is no big price for solving my production issue here 😉

Here is the scenario

There is a table that stores millions of records. It has a primary key, a date when a record was processed, a bit column indicating whether it was processed or not, and some text fields that are used for something, but in our example, it’s just data that takes space on pages.

There is also an application which is using nHibernate to generate a T-SQL query that retrieves one (just one at a time) records from that table where IsProcessed = 0. There are 10-50 records like that at peak times, in a table which holds tens of millions of records so making it very, very fast should be easy with a tiny little covering filtered index. Well… it turns out, SQL Server prefers to scan the clustered index instead.

Have a look

The challenge setup

use tempdb
go
drop table if exists dbo.LongProcessingTable
if not exists(select 1 from sys.tables where name = 'LongProcessingTable')
create table LongProcessingTable (
Id int not null identity primary key
,ProcessedOn datetime2 null
,IsProcessed bit null
,SomeData nvarchar(1024) not null
)

-- just some text to fill up the space on pages
declare @sometext nvarchar(1024) = (
select string_agg(convert(char(1),name), '')
from sys.all_objects
)

-- create just 100k records with some random date values
-- at this time all records are marked as processed
insert into dbo.LongProcessingTable(ProcessedOn, IsProcessed, SomeData)
select top(100000)
dateadd(second, -abs(checksum(a.object_id, b.object_id)%10000), getdate())
,1
,@sometext
from sys.all_objects a
cross join sys.all_objects b

-- now mark 10 rows as not processed
update d set IsProcessed = 0, ProcessedOn = null
from (
select top (10) *
from dbo.LongProcessingTable d
order by ProcessedOn desc
) d

Now the query:

declare @IsProcessed bit = 0

select top(1) Id, SomeData
from dbo.LongProcessingTable
where IsProcessed = @IsProcessed

The above query comes from the application and cannot be changed. It is what it is. And to help you start, here is the index I thought would work, but doesn’t.

create index IX_LongProcessingTable_NotProcessedYet
on dbo.LongProcessingTable(IsProcessed) include (SomeData)
where IsProcessed = 0

The index gets ignored and the server goes for the table scan instead.
Of course, there was somebody who discovered it earlier. I wasn’t all that surprised that Erik Darling blogged about it in 2015, 2017 and 2018 it turns out, he even says ‘IT IS KNOWN’… well, it wasn’t to me. But even now, with that knowledge, I still cannot change the query, so what can I do? How to make this query more efficient without changing it, and without creating a covering indexing on the whole table which can contain hundreds of GB of data just to get one row.

If you are still reading… well, enjoy the challenge. I will follow up with a few comments and a couple of my attempts at solving the problem later this month (hopefully).

T-SQL Tuesday #111 – Why?

T-SQL Tuesday logo

This post is a part of T-SQL Tuesday #111 that invites SQL Community to answer Andy Leonard’s question of Why? Why do we do, what we do? Here is my answer.

 

Why?

My job titles typically include words like ‘database’, ‘DBA’, ‘SQL’, ‘Server’ but that’s just a corporate technicality, really. But if we met and you asked me who I am, the most likely answer you would get is that I am an eXplorer.

An eXplorer! Perhaps it is not the most obvious answer, especially in a professional, an office setting. It took me quite a while to realise it myself. One day a colleague called me the most intrinsically curious person he has ever met. I’m not entirely sure how it was meant but I took it as a compliment. And then I realised that he might be onto something.

I want to know! How are things made? How do they work? Why this and not the other way? I want to know what are the limits and what is beyond them? What is out there?

And being like that, I suppose, I could be a mechanical engineer or a car mechanic. But those professions don’t appeal to me at all! Too obvious. Too practical. Too palpable. In a sense too easy while at the same time too much hard work! I’m drawn to problems that require building complex abstractions in mind, understanding rules, but at the same time being creative.

That curiosity and the need for the not-so-obvious has got me to where I am now. Professionally it started with games on ZX Spectrum. To understand how games are made I found an interest in programming. Programming led to an interest in how computers are made. Assembly programming back in the day was such fun and a way to explore the hardware. You cannot really see the registers in the CPU or the bits in memory but if you got it all right, it was possible to write your own keyboard drivers, or modifying the VGA memory and push the limits of what seemed possible. From there I got into networks, switches, routers, servers. Somehow bits, ones and zeros travel from a computer to computer and make stuff happen. It is all difficult to see so abstractions, mental models had to be created on every level of the OSI model. Database engines were the next obvious choice as they appear to be the things people I worked with found difficult and obscure while at the same time being the things where hardware and software, infrastructure and algorithms come together.

And then it stopped there. At least professionally. Sometimes I wonder if the Peter Principal applies to personal development too? Are the databases (and data they contain) a problem difficult enough for me? Have I stopped because I reached my own level of incompetence? Maybe. Or maybe not. Maybe it’s just the fact, that there are limits to the time and space, and what one can do within those constraints?

The time is an important factor as the same forces, the curiosity, the taste for mental gymnastics shaped my other interests too. I speak 4 languages and I continuously study more (not very successfully, I must say). I studied music. Bassoon to be exact. I still spend a lot of time playing all sorts of instruments, experimenting with algorithmic music, sometimes composing. On a day off I go caving or cave diving in tight, muddy Welsh sumps. Not for the pleasure of scuba diving. There is no pleasure in it. Being submerged in a freezing cold muddy water, finding my way mostly by touch in a complex three-dimensional space. No, ‘fun’ is definitely not the word, but that is one of the last frontiers that doesn’t involve space flights. There are still places no human has ever been to before on Earth. There are still places for which there are no maps, places which are beyond known limits. I go there to see what is out there, to eXplore, to be the first.

So perhaps, the real answer to the question of why do I do what I do, why am I a DBA or a data professional is that nobody I know is paying for being a cave diver, and while the consequences of getting things wrong are different, the risks and problems faced are almost exactly the same. Yes, the way I see them they are almost exactly the same. The same traits that make me a reasonable cave diver, make me a decent production DBA too, but that’s another story.