Weeknote 4

This week I spent my time trying to dig myself out from under a virtual pile of emails. I was trying to make a good start to reach new levels of productivity, but with so many things in progress, that was never going to happen. Maybe next week?

And so, I didn’t get to do anything interesting with data this week. I didn’t build any digital services either. But not all was lost. I got to experience digital public services – a truly demoralising experience! I managed to do some coding with Woody Zuill and Kevin Meadows too. 

Contents:

We need to make public digital services better. 

That is probably true about many of them, but especially in healthcare. We talk a lot about accessibility and user-centric design. In healthcare, they talk about delivering patient-centric care. So how is it possible to create such a monstrosity at the connection of the two worlds? 

Not only are the services not very accessible, but they are also used in strange ways. Do you want to register for a GP appointment? You can only do it online and only between 8:00 and 8:35am. You got to the place with an appointment, but you need to schedule another. No, it is not possible to do it at the reception. It has to be done online. But there is no mobile reception inside, so you must go outside in the rain to book an appointment. Who comes up with these schemes? 

Eventually, I ended in a phlebotomy waiting room. There were some twenty chairs and just three of us patients. The receptionist

went on a rant about patients not turning up on time or even on the right day. It is not his fault, he says, the patients don’t read or cannot understand the emails. Fair enough. It might not be his fault. But who is responsible for failed communication which appears to be working only for one side of the dialogue? 

Human-Computer Interactions with Acute Disability

The problem is even more serious. In services like that, relating to healthcare, standard accessibility just isn’t cutting it. We design and test with users who are long-term sufferers of various conditions. People like that are used to their conditions, have adapted, and know conventions we can use to help them use our services. 

But people who need medical help often have a sudden onset of the ailment. They can be distressed, distracted, not at their best. I regularly see people in such situations and see how modern, accessible technology fails us. Last week I was put through the mill myself and nearly lost all my faith in both the digital and the healthcare simultaneously. 

And I don’t think it is only me. I heard somebody make a very pertinent joke this week on the radio. The Government Communications Headquarters, the GCHQ, used to recruit code breakers by posting cryptic puzzles in the newspapers. Now they don’t have to. They simply offer jobs to people who manage to get a GP appointment. 

It shouldn’t be like that. We must be able to do something!

Good, digital services matter

Digital services and products can be a matter of life and death. Why is it that software with a distinctly 1990s feel is used in emergency services? If a healthcare professional used 1990s medicine, they would probably lose their professional registration. But if they use 1990s IT systems to deliver patient care, that’s OK. 

Let’s do something about it!

Rant Over. Promise.

Let’s talk about Software Teaming

It used to be known as Mob Programming. A practice where a group of people develop software using a single computer. I tried it before. I really like pair programming. So when an opportunity presented itself, I couldn’t say no. On Friday, I joined Woody Zuill and Kevin Meadows and two others for a 90-minute session of Software Teaming. If you haven’t heard those names before, check out their book Software Teaming: A Mob Programming, Whole-Team Approach

Besides Woody, Kevin and me, there were two more people on the call. One person who never programmed before and somebody who has done a little bit, but not in the language we have chosen. Nevertheless, by the end of the session, we were programming effectively together. Everybody had a go at coding. Everybody has learnt something. 

I have learnt a few things myself. First of all, there is cyber-dojo.org. Don’t know it? Check it out! Second, I have never done pair programming right – not in the strict sense promoted by Woody and Kevin. They say for this to make sense, the idea has to cross from one head to another before it enters the machine. And so the driver does only what they are told by the navigator. Watching somebody code with an occasional comment or suggestion is not enough. The navigator designs the solution, and the driver is executing. They compare it to being a passenger and a driver in a cab. It really works. I want to do it more. 

A year review (2022)

For me, relaxing is the most stressful thing. Time passes, and nothing gets done. With an attitude like that, you probably wouldn’t be surprised to learn that I’m interested in personal productivity. Well… I have some data inclinations too, and some charts to show you.

The above is my first full year (January to December) as a civil servant. I started in the bottom left corner on week one and some tasks in my backlog and progressed steadily through the year to the top right corner in week 51. Of course, there are some variations from week to week, but the thicker lines show that things are surprisingly constant on (six week) average.

It also shows me there is a limit of things I can do per week. In weeks 1-13 and 19-26, I managed ten tasks a week. In April, week 14, I tried harder, got up to seventeen, but that fizzled out by mid-May, week 19.

Looking at this graph before my first job anniversary in July, I realised the blue and green lines were pretty straight and divergent. It meant I was always putting more on my backlog than I could accomplish, so that was not sustainable. So I decided to try harder. Once more, I reached deep into my bag of personal productivity hacks to up my performance.

You can see that from July, I managed sixteen tasks per day, 160% of my earlier norm, but still, it wasn’t enough to match the demand, and so in mid-August, I started being more selective in the commitments I took on. Finally, the blue and green gap started to narrow down and things were going well until December!

Why is it so hard?

Ultimately, my commitment to commit to less work made a real difference to the length of my backlog. It was obvious, and yet it was the last thing I tried. Why?

I find it difficult to say no to opportunities to help others or to collaborate. Believe it or not (and I know with my occasionally blunt demeanour, it might take a leap of faith), I get out of bed to make a difference. And the only way to do it at scale is by collaborating and helping each other. And so I say, “yes, I will help” you more often than I should.

But despite that, the second list of the most stressful things people say to me, just after “relax”, is starting conversations with “I know you are very busy, but…”. How many opportunities to collaborate have I lost, because people think I’m too busy to talk? Too busy, or worse, too important, to see if their problem is something I can help with or not? Every communication that starts with “I know you are very busy” reminds me of all forever-lost opportunities to collaborate. And so I respond, as calmly as I can, that I will always find the time for a chat, even if I cannot help.

I’ll have to do something about it next year, but first, let’s go back to the data to see what else we can learn. My ambition for 2023 is to bend that green line up a bit more and find a way to do even more.

More Charts

The view above – weekly snapshots – shows more clearly that last summer’s effort to reduce the number of new commitments was working well until the beginning of December. What happened there? We had our internal conference. I had three days of in-person meetings that inspired me to take on ridiculous amounts of new things.

What the two views we looked at so far don’t show well is what really matters: how many things I can do a week, and how many things I start but cannot complete – the Waiting category. Those are the things I start, but hit a wall. I have to meet with somebody, get somebody’s opinion, get approval and so on. Every item like that means context switching and so lost effort and time.

Now, after excluding the things that are in the backlog, the picture is more compelling. We can clearly see the two weeks when I decided to do more. In April, week 14, it lasted a few weeks.

During Weeks 22-23, I was off and then, after coming back, had a lot of requests to do things which I tried to start as quickly as possible. This resulted in a lot of tasks in progress and a lot of waiting, but not much was done. That is what led to frustration and yet another attempt to increase productivity in June – weeks 27 and 28.

I worked hard on reducing the work in progress and the tasks in waiting. Still, I managed the reasonably constant level of tasks completed. But then, in the fourth quarter, I tried to do even more, but the only thing I achieved was the waiting queue going past sixty.

Sixty tasks I have started but cannot complete. I’m waiting for somebody else, but still, they are my responsibility. They still take my time, even if just to check if I can progress them or not!

It’s an unbelievable waste of time.

My plan for 2023? To change it and do even more with less effort. Achieve more with less strain.

But first a couple of days in the mountains to… relax! Scary! Although, I’m sure I’ll find something to do, not to relax properly. That would be too stressful. I’ll do some reading, some thinking, and hopefully something to find the much-needed new energy for civil servicing in 2023 at never before seen levels of productivity.

Happy New Year!

T-SQL Tuesday #114 – The SQL Puzzle Party

T-SQL Tuesday logoThere were times when I tried to look for puzzles to solve, especially the T-SQL puzzles (what happened to the T-SQL Challenge site?). Now I don’t. Life is challenging as it is, especially if you work with SQL Server and really try to understand what’s going on.

So rather than coming up with some contrived problem for you to solve as part of this edition of T-SQL Tuesday (thank you Matthew McGiffen) I will share something that surprised me only last week. And yes, I have solved it already, and will be blogging more about it soon so no there is no big price for solving my production issue here 😉

Here is the scenario

There is a table that stores millions of records. It has a primary key, a date when a record was processed, a bit column indicating whether it was processed or not, and some text fields that are used for something, but in our example, it’s just data that takes space on pages.

There is also an application which is using nHibernate to generate a T-SQL query that retrieves one (just one at a time) records from that table where IsProcessed = 0. There are 10-50 records like that at peak times, in a table which holds tens of millions of records so making it very, very fast should be easy with a tiny little covering filtered index. Well… it turns out, SQL Server prefers to scan the clustered index instead.

Have a look

The challenge setup

use tempdb
go
drop table if exists dbo.LongProcessingTable
if not exists(select 1 from sys.tables where name = 'LongProcessingTable')
create table LongProcessingTable (
Id int not null identity primary key
,ProcessedOn datetime2 null
,IsProcessed bit null
,SomeData nvarchar(1024) not null
)

-- just some text to fill up the space on pages
declare @sometext nvarchar(1024) = (
select string_agg(convert(char(1),name), '')
from sys.all_objects
)

-- create just 100k records with some random date values
-- at this time all records are marked as processed
insert into dbo.LongProcessingTable(ProcessedOn, IsProcessed, SomeData)
select top(100000)
dateadd(second, -abs(checksum(a.object_id, b.object_id)%10000), getdate())
,1
,@sometext
from sys.all_objects a
cross join sys.all_objects b

-- now mark 10 rows as not processed
update d set IsProcessed = 0, ProcessedOn = null
from (
select top (10) *
from dbo.LongProcessingTable d
order by ProcessedOn desc
) d

Now the query:

declare @IsProcessed bit = 0

select top(1) Id, SomeData
from dbo.LongProcessingTable
where IsProcessed = @IsProcessed

The above query comes from the application and cannot be changed. It is what it is. And to help you start, here is the index I thought would work, but doesn’t.

create index IX_LongProcessingTable_NotProcessedYet
on dbo.LongProcessingTable(IsProcessed) include (SomeData)
where IsProcessed = 0

The index gets ignored and the server goes for the table scan instead.
Of course, there was somebody who discovered it earlier. I wasn’t all that surprised that Erik Darling blogged about it in 2015, 2017 and 2018 it turns out, he even says ‘IT IS KNOWN’… well, it wasn’t to me. But even know, with that knowledge, I still cannot change the query, so what can I do? How to make this query more efficient without changing it, and without creating a covering indexing on the whole table which can contain hundreds of GB of data just to get one row.

If you are still reading… well, enjoy the challenge. I will follow up with a few comments and a couple of my attempts at solving the problem later this month (hopefully).

 

gMSA and Docker – Lessons Learnt

In the last two posts (here and here) I have documented how I use gMSAs to connect services running in docker containers on Windows to SQL Server using the domain authentication. In the end it was very simple, but there are things I wish I knew when I started. It would save me a lot of time. Here is an attempt to document the lessons learnt.

Versions are important!

While in the end I was able to make it work on Windows Server 2016, 1803, 2019 and 1809 I wasted some time trying to make it work with docker 17.06. Unsuccessfully. Docker 18.09.1 and 18.09.2 worked every time.

There are some reports of intermittent problems with specific OS updates breaking stuff, like the one here but I wasn’t able to reproduce it. I wonder if the updates changes something else that it causing problems, in other words is it the problem with the update itself or the update process?

The Set-AdServiceAccount

From the beginning I set to try the gMSA authentication on multiple VMs following blog posts which all included some use of the Set-AdServiceAccount powershell command (from `RSAT-AD-PowerShell). I could not make it work on more than one VM at a time. I thought I was going mad! The problem (and the clue) is in the name. Set. It is not add, not modify. So when I was doing something like this

Set-AdServiceAccount -Identity MyService `
  -PrincipalsAllowedToRetrieveManagedPasswords DH2019A$ 

it was setting the principals allowed to retrieve the managed passwords for MyService to DH2019A VM. As expected. But not as expected removing the privilege from all the other VMs I granted that permission before. With no warnings.

It is probably the most worrying part about using gMSAs for the service authentication in production, as I plan to do it. All it will take is one sysadmin to run a command like that to break all the services, potentially on all docker hosts. To mitigate we have decided to grant the permissions through a domain group to which we will add docker hosts. That way there should be no need to run this command when scaling out.

Misleading Get- and Test-AdServiceAccount

Understanding the above problem with Set-AdServiceAccount was made much worse, by my misunderstanding how Get-AdServiceAccount and `Test-AdServiceAccount work.

If you are a domain admin the Get-AdServiceAccount will always return details of the gMSA if it exists. So it is of no use to check if the specific gMSA can be used on a given host.

If the gMSA was previously installed the Test-AdServiceAccount will return true regardless if the host account has permissions to retrieve the password or not. That permission is necessary for the gMSA authentication to work.

So with that in mind neither command is fit for checking if a specific host has permissions it needs to use a gMSA. I was not able to find anything better than attempting to install it again with Install-AdServiceAccount. It will either install it again, or display error message indicating that the computer has no permissions to retrieve the password.

Remove-AdServiceAccount

This does not remove previously installed gMSA from the local host. It removes the gMSA from the domain!

The SSPI context error.

If you try to use domain authentication from the service running on a docker host which has no permissions to retrieve the gMSA password you will get fairly generic error tell you that the SSPI context couldn’t be created.

There are scores of blog posts and msdn documents explaining how to troubleshoot many SSPI context errors. Not a single one I found mentions any problems with gMSA. I have learnt a lot about SSPI and how it really works, just to eventually realise that everything is fine, and I have to look for problems somewhere else.

There is no localhost

When you run a standalone container you can access it from the same host on the localhost. By default a nat network is used and it allows communication on the hosts IP. When moving to the swarm mode (using docker service create not just docker create) by default the ingress network is used and the localhost is not available. You have to use the public IP address of the docker swarm. There is a lot of blog posts how to define your own overlay or bridge networks. I suppose they all work on Linux, but on windows when using an overlay network you cannot use host IPs (so no localhost) and you cannot create bridge networks at all.

PS C:\> docker network create -d bridge bnet
Error response from daemon: could not find plugin bridge in v1 plugin registry: plugin not found

I was able to make it work when publishing the port directly on the host using --publish published=8001,target=80,mode=host instead of the shorter -p 8001:80 (ports obviously may be different) but I don’t think that’s a configuration I’d be using so… I simply gave up. I use public IP and everything works.

Security considerations

To install a gMSA on a host which has permission to read the gMSA’s password you don’t need to have any extra permissions. It appears that anybody with access to PowerShell on the host can do Install-AdServiceAccount -Identity MyService. There are no restrictions which credential spec file can be sued for which servce either. This means that if somebody has access to the docker host they can create a new service using any gMSA to which the host itself has permissions.

gMSA name lenght limit

The Group Managed Service Account’s name is limited to 15 characters. Not a big deal, but it messed up a carefully agreed naming strategy and in the end I have vowel-less service names.

New-CredentialSpec silently overwrites existing files.

Active Directory, Windows Containers in Swarm Mode and SQL Server

In my previous post I have explained how I was able to connect from windows containers running on docker to a SQL Server cluster on a network using domain authentication (with gMSAs) rather than SA logins and passwords.

gMSAs in docker swarm mode

After I got the containers using Group Managed Service Accounts working on a single Docker host I went on to try the same in the swarm mode. My plan was to simply replace docker run -d part of the command creating the container with docker service create, but it turns out that it is not that simple, especially if you don’t have a lot of experience with swarm mode. It is also worth noting, that I had a lower success rate than when I was experimenting with standalone containers. I was able to make it work on all the same Windows versions (2016, 1803, 2019, 1809) but only when using Docker 18.09 and not on the 17.06 which was on the image I used for 1803 tests).

Demo setup

Similarily to the previous post I have tested this on a range of operating systems and docker versions, but what I want to show here is how it worked on Windows 2019 and Docker 18.09.

gMSA_Docker_Service_1

To make it a bit more exciting (and because of how Docker Swarm works) this time I will be testing the service from a web browser. To start with it does’t work. That is because there is no service listening on port 8101.

gMSA_Docker_Service_1_web

Creating a service

To create a service docker service create command is used. When compared to docker create some parameters are different, for example there is no --security-opt used in the previous post and instead --credential-spec is used. But first things first. Let’s just create a service using michalporeba/sqlgmsatest:1809nano image with minimal configuration and see what happens.

docker service create -p 8101:80 michalporeba/sqlgmsatest:1809nano

gMSA_Docker_Service_2

The -p 8101:80 makes the service available on the 8101 port using the default ingress network. No errors, the service is running, it is converged, so let’s try to connect to it!

gMSA_Docker_Service_2_WebRequest

And here is the first surprise. The localhost doesn’t work. That’s a swarm thing and although it is possible to publish ports in host mode it is not how I would be running in production, so I will just open the ports and connect to the service externally using a web browser.

gMSA_Docker_Service_2_web

OK, so the api/info call was successful. The service from michalporeba/sqlgmsatest:1809nano image is running and responding. So the next task is to use it to query theTestDBdatabase on my test instanceDB.sqlgmsa.local`.

gMSA_Docker_Service_2_web2

Adding the gMSA

Not authorized! But who? The NT AUTHORITY\ANONYMOUS LOGON. That is because despite the docker host being member of the sqlgmsa.local domain, the container running the service is not. To fix it, exactly as in the case of standalone container a Group Managed Service Account has to be created, installed and a credential file created.

# Create gMSA
New-AdServiceAccount -Name MyService -DNSHostName sqlgmsa.local `
  -PrincipalsAllowedToRetrieveManagedPassword "Domain Controllers", "Domain Admins", "CN=DockerHosts,CN=Computers,DC=sqlgmsa,DC=local" `
  -KerberosEncryptionType AES128, AES256

# Install it
Install-AdServiceAccount -Identity MyService

# Import the module to manage Credential Specs
Import-Module .\PsModules\CredentialSpec.psm1

# And create a spec file for MyService
New-CredentialSpec -Name MyService -AccountName MyService `
-Domain (Get-AdDomain -Current LocalComputer)

gMSA_Docker_Service_3

Consistency is everything, isn’t it? -Name, -Identity, -AccountName on those commands above refer to the same thing, the gMSA name and have to match. The -Name parameter on the New-CredentialSpec command is used to control the name of the json file containing the credential spec. The filename can be anything, and it doesn’t need to match the account name, but I find it easier if it does. The existing credential spec files can be found in C:\ProgramData\docker\CredentialSpecs\ or by using Get-CredentialSpec command from the CredentialSpec.psm1 module.

The next step is to use the newly created credential spec file when creating the service. The --security-opt is not supported when created a service and --credential-spec has to be used instead.

docker service create -p 8102:80 `
  --credential-spec file://MyService.json `
  michalporeba/sqlgmsatest:1809nano

gMSA_Docker_Service_4

The new service now runs on port 8102 and should use the new MyService identity. Let’s see.

gMSA_Docker_Service_4_web

Almost there! Login failed for user SQLGMSA\MyService$ That’s good, that means the correct identity has been picked up, so the last thing to do is to create the login on the SQL Server.

gMSA_Docker_Service_5_sql

And now, as if by magic

gMSA_Docker_Service_5_web

The test web service, written in C#, using .net core is hosted in a Docker container running on windows host,and queries a SQL Server database using domain authentication.