Semantically speaking – the weeknote

Last week was rather intense. I spent it at the Knowledge Graph Conference (KGConf) at Cornell Tech on Roosevelt Island, New York.

Around 2010 I got first interested in graphs and the semantic stuff. The ideas of Web 3.0 (not the blockchain, the semantic web), HTML 5, and Resource Description Framework (RDF) were compelling. They promised to address what was wrong with the internet and data of the previous decades. So, enthusiastically, I started reading, learning, and experimenting. It lasted a couple of years, but gradually it all started to look like solutions looking for problems, a bunch of fascinating but rather impractical and unnecessary ideas.

That was then. Fast forward a decade, and I joined the Civil Service. Within a few months, the semantic web was back on my mind. When I first looked at it, the ideas and the solutions were simply too big for my professional world of small startups and digital agencies. They were solving problems which were not mine to solve. But now? I am a part of an extra-large, very-federated and rather-anarchic organisation with an ambition to use the information it holds well for the public good. So many data challenges around me appear to be screaming: semantic web, linked data, graph databases, RDF, and knowledge graphs. Look at my Search for a Better Search if you need a detailed example.

I have been talking about this for a year now. I have presented some ideas in several cross-government fora, including Data Connect 22. As a result, I found some interest, a few fellow enthusiasts, and more than a healthy dose of suspicion towards the ideas. And this is surprising! Because a decade earlier, we were the pioneers, the early adopters of this technology. Back then, in the early 2010s, we were publishing public information as linked data. What happened? Unlike me back then, the Civil Service had and still has the problems this technology was designed to solve.

I thought I might not be in the right circles, not connected enough, that this is only my opinion. But now I have discussed these topics with several experts from many countries and organisations during the conference. Those who remember the early 2010s remember the British involvement and are puzzled as I am about how it all fizzled out.

What happened? Is it connected to the Government Digital Services revolution that started about the same time? Or is it just a coincidence? It isn’t because the problems have disappeared or the solutions are not worth pursuing. Last week I saw how various private and public organisations apply these technologies to get more out of their data and to collaborate better across organisational boundaries. I’m sure it would help us solve our problems, and achieve our ambitions too!

So let’s do it. Let’s at least start the discussion at least. Unlike ten years ago, we have Digital Data and Technology (DDaT) functions across the government. We have new data professions, like my own – data architecture. So far, as a profession, we have been focusing on relational data modelling and futile attempts at data centralisation. Let’s at least explore the possibility of a decentralised (federated, not blockchained), semantic data architecture for the government and the public good. It is more complex than what we are used to. But I’m sure it will be worth it. I know we can do it.

If you agree or disagree, let’s talk. If you are or were involved in things like that in the Civil Service, please reach out. A discussion is needed. Contact me here, on twitter, linked in, where ever.

Back to the weeknote. The Knowledge Graph Conference I attended was much better than I expected, and not only because of the content – which was good. It was a much-needed opportunity to reflect on my ideas about open, linked data in the public domain. I met experts, practitioners, and enthusiasts of the technology from around the world. I have learnt about successes but also challenges ahead. I will likely use these contacts to help us all progress the applications of structured graph knowledge in the public domain.

Note 7

I don’t find it easy to keep up with week-notes. There is too much stuff I’d like to write about to find the time for it. I thought I’d do month-notes, but I missed that deadline too, so let’s call it a note, perhaps note number seven: the highlights.

Over the last couple of months, I have attended a few conferences and a few meetups. The biggest one was the SQLBits, where I delivered three sessions – nothing technical, just some Welsh lessons, as the event was in Wales this year. Nothing big, but enough to remind me how much fun it is, so I have just submitted a session for DATA:Scotland in September.

The SQLBits event was full of great sessions. That was not a surprise. What was unexpected is the excellent non-technical sessions I have seen. It was my first time at SQLBits since I stopped working with Microsoft data technologies, so I had no reason to chase the latest technical detail. Instead, I focused on the “hallway track” and the less technical presentations. For me, it was the best SQLBits experience ever. My favourite was one about building high-performance data teams by Richard Campbel, followed shortly by live Knee Deep in Tech podcast recording with Alexander and Heini.

Listening to those serious podcasters (Richard runs .Net Rocks) reminded me, that I wanted to do something like that myself. And now, I think I’ve got the right topic to do it too, but I will write a separate post about it soon. For now, back to conferences and meetups.

I attended two interesting meetups in London (I’ve been there too often recently) neo4j and OpenUK. The latter one was especially interesting with the presentation delivery by Andy Piper on open-source contributions. There were also plenty of like-minded, open-source enthusiasts to discuss code reuse in Civil Service. Something interesting will come out of this chance meeting, I’m sure. And I will be there on the 23rd of May to listen and talk about building open-source communities.

The return of in-person events is very much welcomed. The online conferences and meetups are great, but they are missing something. With our current technology, there is no chance for spontaneous, free and fast-flowing, random, person-to-person interactions. A situation like when you overhear half a sentence between biting on a pizza and think: that might be interesting! What are they talking about!? And seconds later, you are part of that conversation.

It’s a topic that has been on my mind for quite some time. It feels like we are settling into a form of “hybrid” working, which has become familiar. But is it the right way? Are the benefits really outweighing the cost? I will explore it more soon. But for now, I have signed up for a few more in-person events to keep testing the theory!

Finally, I made it to 40 last week. Having achieved my life’s ambition, I can relax now!

Weeknote 4

This week I spent my time trying to dig myself out from under a virtual pile of emails. I was trying to make a good start to reach new levels of productivity, but with so many things in progress, that was never going to happen. Maybe next week?

And so, I didn’t get to do anything interesting with data this week. I didn’t build any digital services either. But not all was lost. I got to experience digital public services – a truly demoralising experience! I managed to do some coding with Woody Zuill and Kevin Meadows too. 

Contents:

We need to make public digital services better. 

That is probably true about many of them, but especially in healthcare. We talk a lot about accessibility and user-centric design. In healthcare, they talk about delivering patient-centric care. So how is it possible to create such a monstrosity at the connection of the two worlds? 

Not only are the services not very accessible, but they are also used in strange ways. Do you want to register for a GP appointment? You can only do it online and only between 8:00 and 8:35am. You got to the place with an appointment, but you need to schedule another. No, it is not possible to do it at the reception. It has to be done online. But there is no mobile reception inside, so you must go outside in the rain to book an appointment. Who comes up with these schemes? 

Eventually, I ended in a phlebotomy waiting room. There were some twenty chairs and just three of us patients. The receptionist

went on a rant about patients not turning up on time or even on the right day. It is not his fault, he says, the patients don’t read or cannot understand the emails. Fair enough. It might not be his fault. But who is responsible for failed communication which appears to be working only for one side of the dialogue? 

Human-Computer Interactions with Acute Disability

The problem is even more serious. In services like that, relating to healthcare, standard accessibility just isn’t cutting it. We design and test with users who are long-term sufferers of various conditions. People like that are used to their conditions, have adapted, and know conventions we can use to help them use our services. 

But people who need medical help often have a sudden onset of the ailment. They can be distressed, distracted, not at their best. I regularly see people in such situations and see how modern, accessible technology fails us. Last week I was put through the mill myself and nearly lost all my faith in both the digital and the healthcare simultaneously. 

And I don’t think it is only me. I heard somebody make a very pertinent joke this week on the radio. The Government Communications Headquarters, the GCHQ, used to recruit code breakers by posting cryptic puzzles in the newspapers. Now they don’t have to. They simply offer jobs to people who manage to get a GP appointment. 

It shouldn’t be like that. We must be able to do something!

Good, digital services matter

Digital services and products can be a matter of life and death. Why is it that software with a distinctly 1990s feel is used in emergency services? If a healthcare professional used 1990s medicine, they would probably lose their professional registration. But if they use 1990s IT systems to deliver patient care, that’s OK. 

Let’s do something about it!

Rant Over. Promise.

Let’s talk about Software Teaming

It used to be known as Mob Programming. A practice where a group of people develop software using a single computer. I tried it before. I really like pair programming. So when an opportunity presented itself, I couldn’t say no. On Friday, I joined Woody Zuill and Kevin Meadows and two others for a 90-minute session of Software Teaming. If you haven’t heard those names before, check out their book Software Teaming: A Mob Programming, Whole-Team Approach

Besides Woody, Kevin and me, there were two more people on the call. One person who never programmed before and somebody who has done a little bit, but not in the language we have chosen. Nevertheless, by the end of the session, we were programming effectively together. Everybody had a go at coding. Everybody has learnt something. 

I have learnt a few things myself. First of all, there is cyber-dojo.org. Don’t know it? Check it out! Second, I have never done pair programming right – not in the strict sense promoted by Woody and Kevin. They say for this to make sense, the idea has to cross from one head to another before it enters the machine. And so the driver does only what they are told by the navigator. Watching somebody code with an occasional comment or suggestion is not enough. The navigator designs the solution, and the driver is executing. They compare it to being a passenger and a driver in a cab. It really works. I want to do it more. 

A year review (2022)

For me, relaxing is the most stressful thing. Time passes, and nothing gets done. With an attitude like that, you probably wouldn’t be surprised to learn that I’m interested in personal productivity. Well… I have some data inclinations too, and some charts to show you.

The above is my first full year (January to December) as a civil servant. I started in the bottom left corner on week one and some tasks in my backlog and progressed steadily through the year to the top right corner in week 51. Of course, there are some variations from week to week, but the thicker lines show that things are surprisingly constant on (six week) average.

It also shows me there is a limit of things I can do per week. In weeks 1-13 and 19-26, I managed ten tasks a week. In April, week 14, I tried harder, got up to seventeen, but that fizzled out by mid-May, week 19.

Looking at this graph before my first job anniversary in July, I realised the blue and green lines were pretty straight and divergent. It meant I was always putting more on my backlog than I could accomplish, so that was not sustainable. So I decided to try harder. Once more, I reached deep into my bag of personal productivity hacks to up my performance.

You can see that from July, I managed sixteen tasks per day, 160% of my earlier norm, but still, it wasn’t enough to match the demand, and so in mid-August, I started being more selective in the commitments I took on. Finally, the blue and green gap started to narrow down and things were going well until December!

Why is it so hard?

Ultimately, my commitment to commit to less work made a real difference to the length of my backlog. It was obvious, and yet it was the last thing I tried. Why?

I find it difficult to say no to opportunities to help others or to collaborate. Believe it or not (and I know with my occasionally blunt demeanour, it might take a leap of faith), I get out of bed to make a difference. And the only way to do it at scale is by collaborating and helping each other. And so I say, “yes, I will help” you more often than I should.

But despite that, the second list of the most stressful things people say to me, just after “relax”, is starting conversations with “I know you are very busy, but…”. How many opportunities to collaborate have I lost, because people think I’m too busy to talk? Too busy, or worse, too important, to see if their problem is something I can help with or not? Every communication that starts with “I know you are very busy” reminds me of all forever-lost opportunities to collaborate. And so I respond, as calmly as I can, that I will always find the time for a chat, even if I cannot help.

I’ll have to do something about it next year, but first, let’s go back to the data to see what else we can learn. My ambition for 2023 is to bend that green line up a bit more and find a way to do even more.

More Charts

The view above – weekly snapshots – shows more clearly that last summer’s effort to reduce the number of new commitments was working well until the beginning of December. What happened there? We had our internal conference. I had three days of in-person meetings that inspired me to take on ridiculous amounts of new things.

What the two views we looked at so far don’t show well is what really matters: how many things I can do a week, and how many things I start but cannot complete – the Waiting category. Those are the things I start, but hit a wall. I have to meet with somebody, get somebody’s opinion, get approval and so on. Every item like that means context switching and so lost effort and time.

Now, after excluding the things that are in the backlog, the picture is more compelling. We can clearly see the two weeks when I decided to do more. In April, week 14, it lasted a few weeks.

During Weeks 22-23, I was off and then, after coming back, had a lot of requests to do things which I tried to start as quickly as possible. This resulted in a lot of tasks in progress and a lot of waiting, but not much was done. That is what led to frustration and yet another attempt to increase productivity in June – weeks 27 and 28.

I worked hard on reducing the work in progress and the tasks in waiting. Still, I managed the reasonably constant level of tasks completed. But then, in the fourth quarter, I tried to do even more, but the only thing I achieved was the waiting queue going past sixty.

Sixty tasks I have started but cannot complete. I’m waiting for somebody else, but still, they are my responsibility. They still take my time, even if just to check if I can progress them or not!

It’s an unbelievable waste of time.

My plan for 2023? To change it and do even more with less effort. Achieve more with less strain.

But first a couple of days in the mountains to… relax! Scary! Although, I’m sure I’ll find something to do, not to relax properly. That would be too stressful. I’ll do some reading, some thinking, and hopefully something to find the much-needed new energy for civil servicing in 2023 at never before seen levels of productivity.

Happy New Year!

T-SQL Tuesday #114 – The SQL Puzzle Party

T-SQL Tuesday logoThere were times when I tried to look for puzzles to solve, especially the T-SQL puzzles (what happened to the T-SQL Challenge site?). Now I don’t. Life is challenging as it is, especially if you work with SQL Server and really try to understand what’s going on.

So rather than coming up with some contrived problem for you to solve as part of this edition of T-SQL Tuesday (thank you Matthew McGiffen) I will share something that surprised me only last week. And yes, I have solved it already, and will be blogging more about it soon so no there is no big price for solving my production issue here 😉

Here is the scenario

There is a table that stores millions of records. It has a primary key, a date when a record was processed, a bit column indicating whether it was processed or not, and some text fields that are used for something, but in our example, it’s just data that takes space on pages.

There is also an application which is using nHibernate to generate a T-SQL query that retrieves one (just one at a time) records from that table where IsProcessed = 0. There are 10-50 records like that at peak times, in a table which holds tens of millions of records so making it very, very fast should be easy with a tiny little covering filtered index. Well… it turns out, SQL Server prefers to scan the clustered index instead.

Have a look

The challenge setup

use tempdb
go
drop table if exists dbo.LongProcessingTable
if not exists(select 1 from sys.tables where name = 'LongProcessingTable')
create table LongProcessingTable (
Id int not null identity primary key
,ProcessedOn datetime2 null
,IsProcessed bit null
,SomeData nvarchar(1024) not null
)

-- just some text to fill up the space on pages
declare @sometext nvarchar(1024) = (
select string_agg(convert(char(1),name), '')
from sys.all_objects
)

-- create just 100k records with some random date values
-- at this time all records are marked as processed
insert into dbo.LongProcessingTable(ProcessedOn, IsProcessed, SomeData)
select top(100000)
dateadd(second, -abs(checksum(a.object_id, b.object_id)%10000), getdate())
,1
,@sometext
from sys.all_objects a
cross join sys.all_objects b

-- now mark 10 rows as not processed
update d set IsProcessed = 0, ProcessedOn = null
from (
select top (10) *
from dbo.LongProcessingTable d
order by ProcessedOn desc
) d

Now the query:

declare @IsProcessed bit = 0

select top(1) Id, SomeData
from dbo.LongProcessingTable
where IsProcessed = @IsProcessed

The above query comes from the application and cannot be changed. It is what it is. And to help you start, here is the index I thought would work, but doesn’t.

create index IX_LongProcessingTable_NotProcessedYet
on dbo.LongProcessingTable(IsProcessed) include (SomeData)
where IsProcessed = 0

The index gets ignored and the server goes for the table scan instead.
Of course, there was somebody who discovered it earlier. I wasn’t all that surprised that Erik Darling blogged about it in 2015, 2017 and 2018 it turns out, he even says ‘IT IS KNOWN’… well, it wasn’t to me. But even now, with that knowledge, I still cannot change the query, so what can I do? How to make this query more efficient without changing it, and without creating a covering indexing on the whole table which can contain hundreds of GB of data just to get one row.

If you are still reading… well, enjoy the challenge. I will follow up with a few comments and a couple of my attempts at solving the problem later this month (hopefully).