When good is the enemy of the perfect

I’ve been working with Microsoft Teams Premium for a couple of weeks and it has got me worried. Part of the Teams Premium bundle is the ability to summarise a meeting recording, giving you an AI-generated set of collapsible bullet points of the key things that were discussed. It’s neat, clever, and…a little underwhelming, particularly when compared to the promise of the Microsoft marketing videos.

From the meetings that I have recorded, the ‘AI notes’ seem to lose a certain essence of some parts of the discussion, and the suggested follow-up tasks often don’t make sense. Aside from the often hilarious Teams chat response suggestions that have been in the product for a while now1, Teams Premium is my first encounter with the new large language model-based AI products from Microsoft. What concerns me is not how poor the product is today, but how close to perfect it is going to get over time.

I think the most worrisome aspect of AI systems in the short term is that we will give them too much autonomy without being fully aware of their limitations and vulnerabilities.
(Melanie Mitchell, Artificial Intelligence: A Guide for Thinking Humans)

I see the output of these AI large language models on a spectrum. At one end, the tools may spit out complete and utter garbage, perhaps not even words. Their uselessness would be obvious to everyone who uses them. At the other end, the AI could output a perfect response (or summary of a meeting, in the case of Teams Premium) every single time. The problem lies in the middle, and gets worse the closer the system is to being consistently perfect:

Right now, I think that the quality of the Teams Premium ‘AI notes’ feature sits somewhere in the green area. It’s good and useful a lot of the time. For example, I can scan the notes and check whether a topic was mentioned. If that topic isn’t in the notes, it doesn’t mean that it wasn’t discussed; I’d have to watch the video back to check that the AI didn’t miss it. If the meeting was very important and needed to be formally minuted, I would still rely on the video.

As the product improves over time, we’ll move out of the green zone and into the yellow. At this point, I may consciously or subconsciously decide to stop routinely verifying the AI-generated output. It’s good enough, most of the time. Again, if a meeting is really important, I may watch the video.

The real danger comes in the red zone. Here, the AI output is superb most of the time, so much so that I never check it. I rely on the summary even for my important meeting minutes. But it’s not quite at the ‘completely perfect’ end of the spectrum. Occasionally it will trip up. Something will get missed — maybe one meeting in a hundred — and perhaps that something is critical to the conversation we’ve had. Perhaps it will attribute a comment to the wrong person, or miss the nuance of a discussion which was important to get exactly right. We may only find out that the AI produced flawed output for this meeting when an incident arises down the line.

This isn’t a concern about AI getting ‘too good’ and becoming ‘sentient’ in a general sense.2 It’s more that we have decided to stop thinking, that we have handed control of some part of our workflow over to the AI and no longer verify its output. For me personally, one bad output every 100 recorded meetings might be tolerable. But if we scale this across a large organisation where hundreds or thousands of meetings take place every day, we’re going to have problems.

Baldur Bjarnason explores this in his book The Intelligence Illusion:

I mentioned two of [the flaws] before, automation and anchoring biases. We, as human beings, have a strong tendency to trust machines over our own judgement. This kills people, as it’s been a major problem in aviation. Anchoring bias comes from our tendency to let the initial perceptions, thoughts, and ideas set the context for everything that follows. AI adds a third issue: anthropomorphism. Even the smartest people you know will fall for this effect as large language models are incredibly convincing. These biases combined lead people to feel even more confident in the AI’s work and believe that it’s done a better job than it has.

We’re using the AI tools for cognitive assistance. This means that we are specifically using them to think less. In every other industry this dynamic inevitably triggers our automation bias and compromises our judgement of the work done by the tools. We use the assistant to think less, so we do.

These models are incredibly fluent and—as we saw at the start of this book—are consistently presented by their vendors as near-AGI. This triggers our instinct towards anthropomorphism, making us feel like we have a fully human-level intelligence assisting us, creating an intelligence illusion that again hinders are ability to properly assess the work it’s doing for us.

AI-generated meeting summaries in Teams Premium is a useful starting point for thinking about this technology. There’s no user input beyond hitting the ‘record’ button during a meeting, and everyone with a Teams Premium licence gets access to exactly the same summary. The possibility for getting something wrong is limited to how good or bad the summary of the meeting is. So far, so harmless. But Microsoft 365 Copilot will be arriving soon, vastly expanding the problem space with its interactive, prompt driven approach. Where on the ‘useless to perfect’ spectrum will it land? What if just being ‘very good’ isn’t good enough?


  1. Superb Teams suggested chat responses.

    Superb Teams suggested chat responses.

  2. The more I learn, the less I’m worried about General AI being a problem any time soon. 

Get off of my cloud

David Heinemeier Hansson’s blog post on his company’s move from the cloud back to their own servers is an interesting read:

I also think that there are probably some companies that have such high variance in their loads that renting makes sense. If you only need a plough thrice a year, it doesn’t make much sense keeping it in the barn unused for the remaining 363 days.

Perhaps this will increase the volume and awareness of more nuanced conversations about choosing the right application and organisational architecture, beyond a binary choice of cloud vs on-premises for everything.

One of the best articulations I’ve seen of how a cloud environment can work for an application is Troy Hunt’s explanation from 2018 of how he optimised haveibeenpwned.com using Cloudflare Workers and Azure Functions:

It’s costing me 2.6c per day to support 141M monthly queries of 517M records.

Just taking an application that is running on your own hardware and dumping it into the cloud is highly unlikely to yield any cost benefit unless it is re-architected and optimised to take advantage of the features of the cloud platform.

Hunt wrote a follow-up post last year, outlining how he suddenly received an eye-watering Azure bill after breaching a file size limit for a Cloudflare cache on his service. So even when you’re optimised, you need to be highly aware of the limits that apply to your setup, as well as having early warning alarms in place to catch anything that has gone wrong.

Cost is a major factor in determining a cloud versus on-premises architecture but there are other considerations too. Finding the right people with the right skillsets to run your own infrastructure is not trivial.

Hopefully the post won’t signal the start of a cyclical movement to and from the cloud, like the others that we already have in ‘enterprise’ technology — outsourcing/insourcing, offshoring/onshoring, centralisation/decentralisation etc.

Given Hansson’s company runs a product called HEY, since reading his post I haven’t been able to get this earworm out of my head. I suspect that this will be on my ‘internal hi-fi’ quite a lot over the next few years.

In a world of AI, PSTN is dead

I’ve been thinking about this article all day. A woman received a call from her distraught, crying daughter, who said she had been kidnapped. The kidnappers then told the woman the terrifying things they would do to her daughter, unless she paid them a ransom. She called her daughter’s phone and found that she was actually fine — the ‘kidnappers’ had generated her voice using AI technology. It’s absolutely terrifying.

I think that calls over the public switched telephone network (PSTN) are now effectively dead. The traditional telephone system runs on a protocol where you can call any number in the world and start a conversation, but you do not need to verify who you are. In a world of AI, this is going to be unsustainable. I am guessing that rogue calls like the one in the article will become more ubiquitous in a similar way to the rise of phishing and spam emails. However, unlike with email there are no audio equivalents of looking for specific clues to reason that the call is fake.

Perhaps in the future we’ll need to use another protocol, one that uses authentication. Before picking up an incoming call, there should be a simple indicator showing that the call is authentic. We already have caller ID, but this is trivial to spoof. The solution needs to be so simple that everyone can understand it, and it needs to work seamlessly without anyone needing to jump through hoops to enable it.

We already authenticate to our smartphones and other devices via our biometric data — our faces or fingerprints — or by entering a passcode. Perhaps whatever system we use for voice calling can use this to ‘prove’ that a call is coming from an authenticated source. When a call comes in, it could have a red/amber/green rating along these lines:

  • 🟢 Green: The caller is in your contacts and they have authenticated themselves using their device biometrics or passphrase within the last n minutes.
  • 🟡 Amber: The caller has authenticated themselves with their device in the last n minutes, but is not in your contacts.
  • 🔴 Red: All bets are off, proceed with caution.

Taking it further, we would possibly need to have something in place that allows people to verify themselves. So, not only has Caller X’s device validated that they have unlocked it recently, but Caller X is definitely the Caller X. On my Mastodon profile, I can provide evidence that I am who I say I am by adding a link to my website and then entering code on my site that Mastodon can read. I’ve then ‘proved’ that this is my site because I was able to edit the code there. It looks like this:

My Mastodon profile on indieweb.social

My Mastodon profile on indieweb.social

I’m not sure a mechanism like this would go far enough to ‘prove’ I am who I claim to be when I call a new number for the first time, but perhaps there are other ways of doing something similar. Ideally with a solution that wouldn’t rely on a centralised verification/validation service.

It’s been interesting to read that Adobe, Arm, the BBC, Intel, Microsoft, Sony and Truepic are collaborating as the Coalition for Content Provenance and Authenticity. They are focused on providing evidence of the provenance of content, i.e. being able to trace back to the source of an image, a video or a document as well as what has been done to it since it was created. I don’t think this helps with real-time generated content such as fake voice and video.

I’ve heard people talk about the fact that authentication via voice, such as that used in telephone banking, is now completely broken because of AI. But I think the problem goes much deeper than this. Without authentication, how will you ever know who you are really speaking to?

Side-effects of technological change

This fascinating question was posted in a school governor webinar that I attended today, as we covered the services offered by the NSPCC:

With mobile phones being far more popular than landline phones, are children finding it more difficult to access Childline? What are their options?

I’d never thought about this before. It’s an example of a side-effect of technological change that I hadn’t considered. Like many other people I know, we ditched our home ‘land line’ phone some time ago. Fortunately, as one avenue of communication has closed down, others have opened up; children are now able to contact the service via other methods such as live chat and email.

When I started work almost 25 years ago, the big London train stations all used split-flap displays for their departure boards, like this one:

At some point they were replaced with digital dot matrix displays, which themselves have recently been superseded (at Euston at least) by new full-colour high-definition dashboards. A side effect of getting rid of the split-flap displays is that there is no longer any noise as they update, forcing people to keep looking at them as opposed to doing something else whilst listening for the audio cues.

I wouldn’t want to give up the benefits of new technologies — cheap mobile phone plans and information-rich dashboards in these two cases — but it’s interesting to see these side effects and note that not all of the progress is completely positive.

📚 New Dark Age: Technology and the End of the Future

For some reason my rate of reading has been very slow this year. This may explain the feeling I had when I finished New Dark Age: Technology and the End of the Future by James Bridle — that it hadn’t made a big impression on me. Looking back over the 150 highlights I made as I read the book, I think I am mistaken. Bridle covers a lot of ground, and I can see in the highlights the origins of ideas that have been buzzing around in my head over the past couple of months.

The fascinating premise of the book is that that the more technology seeps further into our world, the less we understand about it — we enter a collective ‘dark age’ of understanding. This is a paradox given that we now have greater access to knowledge than at any time in the past. It made me think of something else I read or heard — perhaps from Alain de Botton — that modern knowledge work is now largely invisible. You can stand in the middle of an office full of people and not be able to simply see or understand what everyone is doing. This wasn’t true back in the days when computers were human. Scaling this notion up from the level of a single office to our whole society, the premise still holds true.

It was fascinating to read about the SSEC, a working computer that went on show in the window of premises opposite IBM’s headquarters in Manhattan. It’s a perfect metaphor for us not being able to see what the technology is doing:

[…]the IBM Selective Sequence Electronic Calculator (SSEC), installed in New York in 1948, refused such easy reading. It was called a calculator because in 1948 computers were still people, and the president of IBM, Thomas J. Watson, wanted to reassure the public that his products were not designed to replace them. […] The SSEC was installed in full view of the public inside a former ladies’ shoe shop next to IBM’s offices on East Fifty-Seventh Street, behind thick plate glass. […] To the crowds pressed up against the glass, even with the columns in place, the SSEC radiated a sleek, modern appearance. It took its aesthetic cues from the Harvard Mark I, which was designed by Norman Bel Geddes, the architect of the celebrated Futurama exhibit at the 1939 New York World’s Fair. It was housed in the first computer room to utilise a raised floor, now standard in data centres, to hide unsightly cabling from its audience […] after the first couple of weeks, the machine was largely taken up by top secret calculations for a programme called Hippo, devised by John von Neumann’s team at Los Alamos to simulate the first hydrogen bomb. Programming Hippo took almost a year, and when it was ready it was run continuously on the SSEC, twenty-four hours a day, seven days a week, for several months. The result of the calculations was at least three full simulations of a hydrogen bomb explosion: calculations carried out in full view of the public, in a shopfront in New York City, without anyone on the street being even slightly aware of what was going on.

Bridle asserts that we have mistaken the collection of masses of data for increased information and knowledge, but this is misplaced. The more data we have, the harder it is to make sense of it:

And so we find ourselves today connected to vast repositories of knowledge, and yet we have not learned to think. In fact, the opposite is true: that which was intended to enlighten the world in practice darkens it. The abundance of information and the plurality of worldviews now accessible to us through the internet are not producing a coherent consensus reality, but one riven by fundamentalist insistence on simplistic narratives, conspiracy theories, and post-factual politics. It is on this contradiction that the idea of a new dark age turns: an age in which the value we have placed upon knowledge is destroyed by the abundance of that profitable commodity, and in which we look about ourselves in search of new ways to understand the world.

With the rapid deployment of large language models and other types of artificial intelligence, this issue is probably going to get worse. People are working on trying to understand why generative AI works as it does; as I learned recently, the history of AI contains a substantial amount of trial and error.

It was also shocking to me to read that the mass surveillance that came to light through the Edward Snowden revelations a decade ago have been collectively shrugged off and continue to this day:

Ultimately, the public appetite for confronting the insane, insatiable demands of the intelligence agencies was never there and, having briefly surfaced in 2013, has fallen off, wearied by the drip-drip of revelation and the sheer existential horror of it all. We never really wanted to know what was in those secret rooms, those windowless buildings in the centre of the city, because the answer was always going to be bad. Much like climate change, mass surveillance has proved to be too vast and destabilising an idea for society to really get its head around.

And this is despite there being evidence that this kind of mass surveillance doesn’t work very well:

Studies have repeatedly shown that mass surveillance generates little to no useful information for counterterrorism offices. In 2013, the President’s Review Group on Intelligence and Communications Technologies declared mass surveillance ‘not essential to preventing attacks’, finding that most leads were generated by traditional investigative techniques such as informants and reports of suspicious activities.

I think that people don’t understand, or don’t care, enough about surveillance. When I tell people that I have Siri turned off on my Apple devices, that I won’t have an Amazon Alexa or Google Home ‘smart speaker’ in my house, and wouldn’t install a Ring doorbell, I sound like a tin-foil hat-wearing crazy person. But I’m really not keen on everything I’m saying being recorded, stored on some random servers somewhere and available to engineers that work at the company that owns them.

I’ve also been thinking about how our 1990s-era visions of the Internet being a democratising, distributed force have not played out like that at all. The tendency of both IT services and infrastructure has been to move towards monopolies and oligopolies. And when regulations arrive, the incumbents are the beneficiaries; they are able to respond to the regulations and implement any required changes with their deep pockets. Conversely, the price of entry for new companies may then be too high. The rising tide of the proliferation of technology into everything doesn’t lift all boats equally.

Technology is in fact a key driver of inequality across many sectors. The relentless progress of automation–from supermarket checkouts to trading algorithms, factory robots to self-driving cars–increasingly threatens human employment across the board. There is no safety net for those whose skills are rendered obsolete by machines; and even those who programme the machines are not immune. As the capabilities of machines increase, more and more professions are under attack, with artificial intelligence augmenting the process. The internet itself helps shape this path to inequality, as network effects and the global availability of services produces a winner-takes-all marketplace, from social networks and search engines to grocery stores and taxi companies. The complaint of the Right against communism–that we’d all have to buy our goods from a single state supplier–has been supplanted by the necessity of buying everything from Amazon. And one of the keys to this augmented inequality is the opacity of technological systems themselves.

It’s a fascinating read. I was already some way through the book before realising that there is an updated edition available. I haven’t been able to find out what has changed with this new version, but I am sure it will only have enhanced what is already a very good book.

SomaFM on Sonos integration — beta

Great email from SomaFM today on setting up their radio stations as a Sonos service. I’ve never found anything better to work to than their Groove Salad station and have been tuning in — on and off — for over twenty years.

To get set up:

In the Sonos mobile app, look at “about my system” under Setting->System. Make a note of this.

Now, add a custom service by opening a web browser to http://[your sonos IP]:1400/customsd.htm

Then fill in the form with the following:

SID: 255 (or any other number in range 1-253 if you’ve added other custom integrations before)
Service Name: SomaFM Beta
Service Endpoint URL: https://sonos.somafm.com/
(make sure it starts with https:// or it won’t work)
Polling Interval: 10 seconds
Authentication SOAP policy: Anonymous

Click the ‘Submit’ button. You’ll get acknowledgement that the custom integration was added.

Now, in your Sonos app, browse the list of available content providers, and ‘SomaFM Beta’ should appear. Add it, and try to play a SomaFM channel. It should start playing without any of the annoying TuneIn ads.

It would be even better if the songs scrobbled to Last.FM as they played, but I haven’t seen this on any Sonos radio service so far.

Trying to understand how ChatGPT works

I finally got around to reading the Stephen Wolfram essay on What Is ChatGPT Doing … and Why Does It Work? Despite being written in relatively simple terms, the article still pushed the boundaries of my comprehension. Parts of it landed on my brain like an impressionist painting.

Things that stuck out for me:

  • In order to improve the output, a deliberate injection of randomness (called ‘temperature’) is required, which means that ‘lower-probability’ words get added as text is generated. Without this, the output seems to be “flatter”, “less interesting” and doesn’t “show any creativity”.
  • Neural networks are better at more complex problems than on simple ones. Doing arithmetic via a neural network-based AI is very difficult as there is no sequence of operations as you would find in a traditional procedural computer program. Humans can do lots of complicated tasks, but we use computers for calculations because they are better at doing this type of work than we are. Now that plugins are available for ChatGPT, it can itself ‘use a computer’ in a similar way that we do, offloading this type of traditional computational work.
  • Many times, Wolfram says something along the lines of “we don’t know why this works, it just does”. The whole field of AI using neural networks seems to be trial and error, as the models are too complex for us to fathom and reason about.

Particularly over the past decade, there’ve been many advances in the art of training neural nets. And, yes, it is basically an art. Sometimes—especially in retrospect—one can see at least a glimmer of a “scientific explanation” for something that’s being done. But mostly things have been discovered by trial and error, adding ideas and tricks that have progressively built a significant lore about how to work with neural nets.

  • People do seem to be looking at the output from ChatGPT and then quickly drawing conclusions of where things are headed from a ‘general intelligence’ point of view. As Matt Ballantine puts it, this may be a kind of ‘Halo effect’, where we are projecting our hopes and fears onto the technology. However, just because it is good at one type of task — generating text — doesn’t necessarily mean that it is good at other types of tasks. From Wolfram’s essay:

But there’s something potentially confusing about all of this. In the past there were plenty of tasks—including writing essays—that we’ve assumed were somehow “fundamentally too hard” for computers. And now that we see them done by the likes of ChatGPT we tend to suddenly think that computers must have become vastly more powerful—in particular surpassing things they were already basically able to do […]

But this isn’t the right conclusion to draw. Computationally irreducible processes are still computationally irreducible, and are still fundamentally hard for computers—even if computers can readily compute their individual steps. And instead what we should conclude is that tasks—like writing essays—that we humans could do, but we didn’t think computers could do, are actually in some sense computationally easier than we thought.

  • So my last big takeaway is that — maybe — human language is much less complex than we thought it was.

My friction-filled information workflow

Every 18 months or so I find myself feeling that my personal information workflow is working against me. Sometimes I end up diving into an inevitably fruitless quest to find an application that could be ‘the answer to everything’.

Last year I thought that some of the friction might have been coming from where I am able to access each application that I use. In my personal life I have an iPhone, an iPad and a MacBook, but at work I use a Windows laptop. I always prefer web applications as they can, in theory, be accessed from anywhere. However, it’s difficult to find web apps that have all of the features that I want.

My whiteboard from December 2021 trying to work all of this out.

My whiteboard from December 2021 trying to work all of this out.

Mapping out each of the applications was useful; it made me realise that I could move my old documents and notes archive in Evernote over to OneNote, saving money on a subscription. After wrestling with the migration over a few days, that was that. Things got busy and I didn’t look at my personal workflow again. Until now.

After getting ‘the itch’ again, this time I’ve tried to map out exactly what my current personal workflow looks like, regardless of where the applications are accessible. Here is the resulting mess:

My workflow, such as it is, today.

My workflow, such as it is, today. (Click to enlarge.)

I haven’t decided where to go from here. What I do know is that I need to ponder this for a bit before making any changes. Experience tells me that the problems I have (or feel that I have) are less about the applications and more about the purposeful habits that I need to form.

Some disorganised thoughts:

  • There is still definitely an issue with where I can access each of the components from. Every time I need to switch devices, there is friction.
  • Finding apps that are super secure — i.e. those that encrypt data locally before being sent to the application’s cloud storage — do exist, but at the moment they feel like using a cheese grater to shave your legs. Yes, I could use Standard Notes everywhere, but the friction of working with it is much higher than being forced onto my Apple devices to use Ulysses.
  • Some of the apps are replacements for each other in theory, but not in practice.
    • Readwise Reader can keep YouTube videos I want to watch later, but they then become slightly less accessible if I am sitting down to watch them in front of a TV.
    • Readwise Reader can also accept RSS feeds, but at the moment the implementation is nowhere near as good as Feedbin. I tried it through exporting my OPML file of feed subscriptions and importing it into Reader, but when it wasn’t working for me I found I had to painstakingly back out my RSS subscriptions one by one.
  • I’m still searching for a good way to curate my reading backlog. I estimate that I have over 1,000 ebooks1, hundreds of physical books, hundreds of PDFs and nearly 9,000 articles saved to my ‘read later’ app. I’ve already done the maths to work out that even if I live to a ripe old age, there is not enough time left to get through all of the books that I’ve bought. As Ben Thompson has been saying: in an age of abundance, the most precious and valuable thing becomes attention. I have lists of all my books in Dynalist, but still rely on serendipity when it’s time to pick up another one to read.
  • I need to work out the best way to distinguish between the things I have to do versus the things I want to do. Not that these are absolutes; the amount of things that I absolutely, positively have to do is probably minimal. I might save a YouTube video that would be super helpful for my job right now, and want to prioritise this above others that I have saved for broader learning or entertainment. What’s the easiest way to distinguish them and be purposeful about what I pick up next?
  • Similarly, where should a list of ‘check out concept x’ tasks go? These aren’t really ‘tasks’. When is the right time to pick one of these up?
  • I’m finding that using Kanban for projects is much easier than long lists of tasks in a to-do app. At work we use Planview AgilePlace (formerly known as LeanKit) which from what I can tell is the most incredible Kaban tool out there; if you can imagine the swimlanes, you can probably draw them in AgilePlace. But it’s difficult to justify the cost of $20/month for a personal licence. I’m using Trello for now.
  • Needing to look at different apps to decide what to do next is a problem. But how much worse is it than using one app and changing focus between project views and task views?
  • Are date-based reminders (put the bins out, clean the dishwasher, replace the cycle helmet, stain the garden fence) a different class of tasks altogether? Are they the only things that should be put in a classic ‘to do’ tool?
  • One of the main sticking points of my current workflow is items hanging around for too long in my capture tools (Drafts and Dynalist) when they should be moved off somewhere else. Taking the time to regularly review any of these lists is also a key practice. Sometimes I haven’t decided what I want to do with a thing so it doesn’t move on anywhere, which is also a problem. I need to get more decisive the first time I capture a thing.
  • Document storage is a lost art. After I drew the diagram above, I’ve consolidated all of my cloud documents onto one platform — OneDrive — but now need to go through and file what’s there.

I know that there are no right answers. However, now that I can see it all, hopefully I can start to work out some purposeful, meaningful changes to how I manage all of this stuff. I’m going to make sure that I measure twice, cut once.


  1. The consequence of slowly building up a library as Kindle books were discounted. Aside from checking the Kindle Daily Deal page, I’ve largely stopped now. Looking back, I don’t think this was a great strategy. It seems much better to be mindful about making a few well-intentioned purchases, deliberately paying full price for books from authors I like. 

Another view on The A.I. Dilemma

Interesting to read Nick Drage’s riposte to The A.I. Dilemma which I watched a few weeks ago. I agree with his points on the presentation in terms of lack of citations and extreme interpretations, which when scrutinised does the subject a disservice.

The presentation is worth watching just to see what they get away with. And because the benefits and threats of AI are worth considering and adapting to, and especially because the presenters are so right in encouraging us to think about the systemic changes taking place and who is making those changes, but I’m really not sure this presentation helps anyone in that endeavor.

This isn’t to say that the topics raised are not important ones. I’m currently a third of the way through listening to a very long podcast interview between Lex Fridman and Eliezer Yudkowsky on “Dangers of AI and the End of Human Civilization”. Both of them know infinitely more about the topic than I do. It’s very philosophical, questioning whether we’d know if something had become ‘sentient’ in a world where the progress of AIs is gradual in a ‘boiling frogs’ sense. The way they talk about GPT-4 and the emergent properties of transformers in particular makes it sound like even the researchers aren’t fully sure of how these systems work. Which is interesting to me, a complete layperson in this space.

📚 Book summaries — with and without AI

This is an excellent blog post on working with ChatGPT to generate insightful book summaries. It’s long, but it covers a lot of ground in terms of what the technology does well and what it struggles with right now. Jumping to the conclusion, it seems that you get much better results if you feed the tool with your own notes first; it isn’t immediately obvious that the model doesn’t have access to (or hasn’t been trained on) the contents of a particular book.

When I finish a book that I’ve enjoyed, I like to write a blog post about it. It’s this process of writing which properly embeds the book into my memory. It also gives me something that I can refer back to, which I often do. As I read, I make copious highlights — and occasionally, notes — which all go into Readwise. If the book has captured my imagination, I start writing by browsing through these highlights. Any that seem particularly important, or make or support a point that I want to make somewhere in the write-up, get copied into a draft blog post. From there I try to work out what I’m really thinking. I love this process. It takes a lot of effort, but the end result can be super satisfying.

The summary that I’ve shared most often is A Seat at The Table by Mark Schwartz, which seems to pop up in conversations at work all the time. Going back to my own blog post is a great way to refresh my memory on the key points and to continue whatever conversation I happen to be in.

My favourite write-up is Hitman by Bret Hart. I picked the book up this time last year as a holiday read. I had no idea it would have such a big impact on me, bringing back lots of childhood memories and getting me thinking about the strange ways in which the rise of the Internet has changed our world. Getting my thoughts in order after I put the book down was incredibly satisfying.

Using ChatGPT or another Large Language Model to generate a book summary for me defeats the point. The process of crafting a narrative, in my head and then on a digital page, is arguably more valuable than the output. Getting a tool to do this for me could be a shortcut to a write-up, but at the expense of me learning and growing from what I’ve read.

It’s all AI, all the time

All my feeeds seem to be full of reflections on the inevitability of the changes that will soon be brought about by artificial intelligence. After spending time thinking about this at length last week it may be my cognitive biases kicking in, but I’m pretty sure it’s not just me noticing these posts more.

Ton Zijlstra has an interesting view on today’s corporations as ‘slow AI’, and how they are geared to take advantage of digital AI:

…‘Slow AI’ as corporations are context blind, single purpose algorithms. That single purpose being shareholder value. Jeremy Lent (in 2017) made the same point when he dubbed corporations ‘socio-paths with global reach’ and said that the fear of runaway AI was focusing on the wrong thing because “humans have already created a force that is well on its way to devouring both humanity and the earth in just the way they fear. It’s called the Corporation”. Basically our AI overlords are already here: they likely employ you. Of course existing Slow AI is best positioned to adopt its faster young, digital algorithms. It as such can be seen as the first step of the feared iterative path of run-away AI.

Daniel Miessler conceptualises Universal Business Components, a way of looking at and breaking down the knowledge work performed by white-collar staff today:

Companies like Bain, KPMG, and McKinsey will thrive in this world. They’ll send armies of smiling 22-year-olds to come in and talk about “optimizing the work that humans do”, and “making sure they’re working on the fulfilling part of their jobs”.

So, assuming you’re realizing how devastating this is going to be to jobs, which if you’re reading this you probably are—what can we do?

The answer is mostly nothing.

This is coming. Like, immediately. This, combined with SPQA architectures, is going to be the most powerful tool business leaders have ever had.

When I first heard about the open letter published in late March calling on AI labs to pause their research for six months, I immediately assumed it was a ploy by those who wanted to catch up. In some cases, it might have been — but I now feel much more inclined to take the letter and its signatories at face value.

The hallucinations of AI creators

Naomi Klein, writing in The Guardian:

The former Google CEO Eric Schmidt summed up the case when he told the Atlantic that AI’s risks were worth taking, because “If you think about the biggest problems in the world, they are all really hard – climate change, human organizations, and so forth. And so, I always want people to be smarter.”

According to this logic, the failure to “solve” big problems like climate change is due to a deficit of smarts. Never mind that smart people, heavy with PhDs and Nobel prizes, have been telling our governments for decades what needs to happen to get out of this mess: slash our emissions, leave carbon in the ground, tackle the overconsumption of the rich and the underconsumption of the poor because no energy source is free of ecological costs.

The reason this very smart counsel has been ignored is not due to a reading comprehension problem, or because we somehow need machines to do our thinking for us. It’s because doing what the climate crisis demands of us would strand trillions of dollars of fossil fuel assets, while challenging the consumption-based growth model at the heart of our interconnected economies. The climate crisis is not, in fact, a mystery or a riddle we haven’t yet solved due to insufficiently robust data sets. We know what it would take, but it’s not a quick fix – it’s a paradigm shift. Waiting for machines to spit out a more palatable and/or profitable answer is not a cure for this crisis, it’s one more symptom of it.

The whole article is an excellent read. I’d love us to move to a Star Trek-like future where everyone has what they need and the planet isn’t burning. But — being generous to the motives of AI developers and those with a financial interest in their work — there’s an avalanche of wishful thinking that the market will somehow get us there from here.

Increasingly obscured future

I recently watched this video from the Center for Humane Technology. At one point during the presentation, the presenters stop and ask everyone in the audience to join them in taking a deep breath. There is no irony. Nobody laughs. I don’t mind admitting that at that point I wanted to cry.

Back in the year 2000, I can remember exactly where I was when I read Bill Joy’s article in Wired magazine, Why the Future Doesn’t Need Us. I was in my first year of work after I graduated from university, commuting to the office on a Northern Line tube train, totally absorbed in the text. The impact of the article was massive — the issue of Wired that came out two months later contained multiple pages dedicated to emails, letters and faxes that they had received in response:

James G. Callaway, CEO, Capital Unity Network: Just read Joy’s warning in Wired – went up and kissed my kids while they were sleeping.

The essay even has its own Wikipedia page. The article has been with me ever since, and I keep coming back to it. The AI Dilemma video made me go back and read it once again.

OpenAI released ChatGPT at the end of last year. I have never known a technology to move so quickly to being the focus of everyone’s attention. It pops up in meetings, on podcasts, in town hall addresses, in webinars, in email newsletters, in the corridor. It’s everywhere. ‘ChatGPT’ has already become an anepronym for large language models (LLMs) as a whole — artificial intelligence models designed to understand and generate natural language text. As shown in the video, it is the fastest growing consumer application in history. A few months later, Microsoft announced CoPilot, an integration of the OpenAI technology into the Microsoft 365 ecosystem. At work, we watched the preview video with our eyes turning into saucers and our jaws on the floor.

Every day I seem to read about new AI-powered tools. You can use plain language to develop Excel spreadsheet formulas. You can accelerate your writing and editing. The race is on to work out how we can use the technology. The feeling is that we have do to it — and have to try to do it before everybody else does — so that we can gain some competitive advantage. It is so compelling. I’m already out of breath. But something doesn’t feel right.

My dad left school at 15. But his lack of further education was made up for by his fascination with in the world. His interests were infectious. As a child I used to love it when we sat down in front of the TV together, hearing what he had to say as we watched. Alongside David Attenborough documentaries on the natural world and our shared love of music through Top of The Pops, one of our favourite shows was Tomorrow’s World. It was fascinating. I have vivid memories of sitting there, finding out about compact discs and learning about how information could be sent down fibre optic cables. I was lucky to be born in the mid-1970s, at just the right time to benefit from the BBC Computer Literacy Project which sparked my interest in computers. When I left school in the mid-1990s, I couldn’t believe my luck that the Internet and World Wide Web had turned up as I was about to start my adult life. Getting online and connecting with other people blew my mind. In 1995 I turned 18 and felt I needed to take some time off before going to university. I landed on my feet with a temporary job at a telecommunications company, being paid to learn HTML and to develop one of the first intranet sites. Every day brought something new. I was in my element. Technology has always been exciting to me.

Watching The AI Dilemma gave me the complete opposite feeling to those evenings I spent watching Tomorrow’s World with my dad. As I took the deep breaths along with the presenters, I couldn’t help but think about my two teenage boys and what the world is going to look like for them. I wonder if I am becoming a luddite in my old age. I don’t know; maybe. But for the first time I do feel like an old man, with the world changing around me in ways I don’t understand, and an overwhelming desire to ask it to slow down a bit.

Perhaps it is always hard to see the bigger impact while you are in the vortex of a change. Failing to understand the consequences of our inventions while we are in the rapture of discovery and innovation seems to be a common fault of scientists and technologists; we have long been driven by the overarching desire to know that is the nature of science’s quest, not stopping to notice that the progress to newer and more powerful technologies can take on a life of its own. —[Bill Joy, Why The Future Doesn’t Need Us]

I’ve had conversations about the dangers of these new tools with colleagues and friends who work in technology. My initial assessment of the threat posed to an organisation was that this has the same risks as any other method of confidential data accidentally leaking out onto the Internet. Company staff shouldn’t be copying and pasting swathes of internal text or source code into a random web tool, e.g. asking the system for improvements to what they have written, as they would effectively be giving the information away to the tool’s service provider, and potentially anyone else who uses that tool in the future. This alone is a difficult problem to solve. For example, most people do not understand that email isn’t a guaranteed safe and secure mechanism for sending sensitive data. Even they do think about this, their need to get a thing done can outweigh any security concerns. Those of us with a ‘geek mindset’ who believe we are good at critiquing new technologies, treading carefully and pointing out the flaws are going to be completely outnumbered by those who rush in and start embracing the new tools without a care in the world.

The AI Dilemma has made me realise that I’ve not been thinking hard enough. The downside risks are much, much greater. Even if we do not think that there will soon be a super intelligent, self-learning, self-replicating machine coming after us, we are already in an era where we can no longer trust anything we see or hear. Any security that relies on voice matching should now be considered to be broken. Photographs and videos can’t be trusted. People have tools that can give them any answer, good or bad, for what they want to achieve, with no simple or easy way for a responsible company to filter the responses. We are giving children the ability to get advice from these anthropomorphised systems, without checking how the systems are guiding them. The implications for society are profound.

Joy’s article was concerned with three emerging threats — robotics, genetic engineering and nanotech. Re-reading the article in 2023, I think that ‘robotics’ is shorthand for ‘robotics and AI’.

The 21st-century technologies—genetics, nanotechnology, and robotics (GNR)—are so powerful that they can spawn whole new classes of accidents and abuses. Most dangerously, for the first time, these accidents and abuses are widely within the reach of individuals or small groups. They will not require large facilities or rare raw materials. Knowledge alone will enable the use of them. —[Bill Joy, Why The Future Doesn’t Need Us]

The video gives us guidance of “3 Rules of Technology”:

  1. When you invent a new technology, you uncover a new class of responsibilities [— think about the need to have laws on ‘the right to be forgotten’ now that all of our histories can be surfaced via search engines; the need for this law was much less pronounced before we were all online]
  2. If the tech confers power, it starts a race [— look at how Microsoft, Google et al have been getting their AI chatbot products out into the world following the release of ChatGPT, without worrying too much about whether they are ready or not]
  3. If you do not coordinate, the race ends in tragedy.

It feels like the desire to be the first to harness the power and wealth from utilising these new tools is completely dominating any calls for caution.

Nearly 20 years ago, in the documentary The Day After Trinity, Freeman Dyson summarized the scientific attitudes that brought us to the nuclear precipice:

“I have felt it myself. The glitter of nuclear weapons. It is irresistible if you come to them as a scientist. To feel it’s there in your hands, to release this energy that fuels the stars, to let it do your bidding. To perform these miracles, to lift a million tons of rock into the sky. It is something that gives people an illusion of illimitable power, and it is, in some ways, responsible for all our troubles—this, what you might call technical arrogance, that overcomes people when they see what they can do with their minds.” —[Bill Joy, Why The Future Doesn’t Need Us]

Over the years, what has stuck in my mind the most from Joy’s article is how the desire to experiment and find out can override all caution (emphasis mine):

We know that in preparing this first atomic test the physicists proceeded despite a large number of possible dangers. They were initially worried, based on a calculation by Edward Teller, that an atomic explosion might set fire to the atmosphere. A revised calculation reduced the danger of destroying the world to a three-in-a-million chance. (Teller says he was later able to dismiss the prospect of atmospheric ignition entirely.) Oppenheimer, though, was sufficiently concerned about the result of Trinity that he arranged for a possible evacuation of the southwest part of the state of New Mexico. —[Bill Joy, Why The Future Doesn’t Need Us]

There is some hope. We managed to limit the proliferation of nuclear weapons to a handful of countries. But developing a nuclear weapon is a logistically difficult process. Taking powerful software and putting it out in the world — not so much.

The new Pandora’s boxes of genetics, nanotechnology, and robotics are almost open, yet we seem hardly to have noticed. Ideas can’t be put back in a box; unlike uranium or plutonium, they don’t need to be mined and refined, and they can be freely copied. Once they are out, they are out. —[Bill Joy, Why The Future Doesn’t Need Us]

The future seems increasingly obscured to me, with so much uncertainty. As the progress of these technologies accelerates, I feel less and less sure of what is just around the corner.

Implications of the Twitter layoffs on all technology departments?

I’ve been pondering: does the fact that Twitter is still functioning set expectations for business executives, who will think it’s fine to slash a technology budget and still expect core services to remain? Will they be asking “what were all these IT staff doing all day”?

We know that the service is creaking and has some major problems, but the headline is that Musk slashed the workforce from 7,500 to 1,800 and it is still chugging along months later.

Internal blogs — an organisational hack

I love this Mastodon thread from Simon Willison:

Here’s an organizational hack I’ve used a few times which I think more people should try: run your own personal engineering blog inside your organization

You can use it as a place to write about projects you are working on, share TILs about how things work internally, and occasionally informally advocate for larger changes you’d like to make

Crucially: don’t ask for permission to do this! Find some existing system you can cram it into and just start writing

Systems I’ve used for this include:

  • a Slack channel, where you post long messages, maybe using the Slack “posts” feature
  • Confluence has a blog feature which isn’t great but it’s definitely Good Enough
  • A GitHub repo within your organization works fine too, you can write content there in Markdown files

One thing to consider with this: if you want your content to live on after you leave the organization (I certainly do) it’s a good idea to pick a platform for it that’s not likely to vanish in a puff of smoke when the IT team shuts down your organizational accounts

That’s one of the things I like about Confluence, Slack and private GitHub repos for this

The most liberating thing about having a personal internal blog is that it gives you somewhere to drop informal system documentation, without making a commitment to keep it updated in the future

Unlike out-of-date official documentation there’s no harm caused by a clearly dated blog post from two years ago that describes how the system worked at that point in time

I thoroughly endorse this. I’ve been setting up blogs and internal communication channels at all of the organisations I have worked at over the past few years. We’ve recently started an internal blog for our team using a ‘community’ on Viva Engage (formerly Yammer) as it is the only ready-made platform that has reach across the whole company. At the moment we are still talking into the void, but these things take time.

Microsoft 365 used to offer a blogging facility on your ‘Delve profile’, but this was squirrelled away on the web and was tied to your account; it wouldn’t be widely visible and would disappear when you left the company. That facility now seems to have gone away. We tried using SharePoint, but it felt a bit like using a cheese grater to shave your legs — it would do the job, but not without a lot of pain.

There is so much value in working out loud, but I’ve never had much success in persuading other people to start posting their thoughts in blog form. The closest thing we have to it are internal Teams posts, which team members do post and do look like blogs — they may have a title, there’s some content and then there is a thread of comments. Perhaps these are easier to write because the audience is limited to a few known colleagues. We’ll keep experimenting.

Sad Mac

My five-year-old MacBook Pro has started to play up again. I had Apple replace the battery late last year. Now when I turn it on it works for a couple of minutes before the screen goes black and the touchpad stops giving feedback, despite the battery being being charged. Plugging in brings it back after a minute or so. This is the problem that made me schedule a battery replacement in the first place.

For five years I’ve had a MacBook Pro at home and a couple of different well-specced Lenovo ThinkPads for work (X280, T14s). I have to say that I much prefer working on the ThinkPads. Windows in its current guise is excellent and rarely causes me any issues. There’s a lot to love about Apple products and the integration between devices, but I have never fallen in love with my Mac.

🎶 100,000 scrobbles

I reached the milestone of 100,000 scrobbles on Last.FM today. Every time a song plays on my hi-fi at home, or on my Spotify account when I am out and about, it gets logged on the service. I love that I have all of this data about my listening habits. It’s fascinating to see all of those song plays displayed graphically and look back on what I’ve been listening to.

From Scatter.FM. Those plays in the early hours are intriguing!

From Scatter.FM. Those plays in the early hours are intriguing!

My top artists and top albums from the Last.FM site

My top artists and top albums from the Last.FM site

Last.FM used to be a big deal back in the day but has faded into semi-obscurity. As my listening habits have moved back towards physical and downloaded media I’ve had to compensate by using different tools to get things logged:

  • I buy music from Bandcamp and download the lossless files which I like to listen to on my iPhone. Eavescrob does a great job of logging things played on my iPhone’s native music app (although you have to remember to open it after a listening session).
  • Discographic integrates with my physical music collection that I have catalogued in Discogs and lets me log an album play with a swipe.
  • I recently discovered Finale which has a myriad of useful features, such as listening to what’s playing around you now, Shazam-style, and logging it for you.

It’ll be interesting to see how quickly I log the next 100,000. Will it take another 12 years?

Thoughts on WB-40 at OpenUK

The latest episode of the excellent WB-40 podcast is filled with a series of interesting interviews from the recent OpenUK Open Source Software thought leadership event. The conversations are wide-ranging and well a listen.

One of the discussions noted that open source software development was resilient in the face of the COVID-19 pandemic, given that contributors worked remotely and asynchronously in the first place. This got me thinking about Automattic, the company behind WordPress. They are remote-first, with staff spread all around the world. Last year, Matt Mullenweg, Automattic’s founder, appeared on the Postlight Podcast where he enthused me with his passion for all things open source:

…WordPress is actually not the most important thing in the world to me, open source is. […] essentially a hack to get competitors to work together and sort of create a shared commons of knowledge and functionality in the case of software, that something getting bigger, it becomes better, where with most proprietary solutions, when something gets bigger, it becomes worse or becomes less lined with its users. Because the owners of WordPress are its users. And […] the sort of survival rate of proprietary software like they’re all evolutionary dead ends, the very long term, that might be 20, 30, 40 years. But it’s all going to move to open source because that’s where all the incentives are. I think even a company like Microsoft, being now one of the largest open contributors, source contributors in the world, is astounding, and something that I think most people wouldn’t have predicted 10 or 20 years ago, but I believe it’s actually inevitable.

Another interview covered the concept of a ‘software bill of materials’, where applications come with a breakdown of the components that they use. Driven by the US Government Cybersecurity and Infrastructure Security Agency (CISA), the goal is for organisations that use specific software to quickly identify where they may be exposed to security vulnerabilities in the underlying components. For open source projects that have not published this information, there are some automated tools such as It-Depends that go some way to discovering these dependencies.

There is often an argument that open source software is safer than closed source, proprietary software. The idea is that open source software will have more eyes on it and therefore people will have the ability to discover, report and fix critical security defects. I wonder if there is always a point where a community has built up around a product to make this true, with less popular products being more at risk of vulnerabilities or deliberately rogue code?

Ben Higgins and Ted Driggs from Extrahop appeared on an episode of the Risky Business podcast last year to take the ‘software bill of materials’ idea one step further. They advocate for a ‘bill of behaviours’ where software is supplied with details of what its users can expect, e.g. external and internal network destinations, and a list of ports and how they are used. These behaviours would be published in a format that common security products can understand. I love this idea and hope it gains traction. Driggs gave an update on the podcast in February about how the initiative is going.

BadSAM

Last night my phone scared the bejesus out of me.

At the end of July I requalified as a First Aider. As soon as I received my certificate I uploaded it to the GoodSAM app to reactivate my account. GoodSAM is a service where anyone who is suitably qualified can be alerted as a first responder if an ambulance or paramedic can’t get to a location quickly enough.

The concept is brilliant. The app is terrible.

Last night’s alert did its job. It was loud and made me jump out of my skin. Once I’d worked out what was going on, the app showed me where the incident was and asked whether I wanted to accept or reject it. I accepted, and was then shown a chat window with an address, age and sex of the person who was in trouble. As I put my shoes on and grabbed my CPR face mask I saw another person message on the chat to say that they were on their way. I guessed that they could see my messages so I said I was also heading there, but every time I wrote on the chat I got an automatic response to say that the chat wasn’t monitored.

I quickly found the house. Fortunately, the casualty was stood in their doorway, talking on the phone and already feeling better. About five minutes later another GoodSAM first responder turned up — the same person from the chat who had said that he was on his way. The casualty felt very upset about having called out a few volunteers to her house; the main job was therefore to reassure her that it was absolutely fine and that we were glad to help. What we couldn’t tell her was whether anyone else would be coming to see her. We don’t work for the ambulance service, and nothing on the app gave us any clue as to how to find out if a medical professional would be on their way. She called her sister to reassure her that she was feeling better, and then passed the phone to me; I felt like a wally as I was unable to tell her sister whether anyone would or wouldn’t be coming.

What the app did show me was this:

GoodSAM buttons

GoodSAM buttons

This user interface is terrible. I pressed the button marked ‘On scene’. The text then changed to ‘No longer on scene’, but the button didn’t change appearance. I couldn’t make it out — did the words indicate a status, with the app showing that I was now no longer on the scene, or did I need to press the button to tell people that I am no longer on the scene? There was no other indicator anywhere in the app to say that a message had been sent to anyone. (And what on earth does ‘Show Metronom’ mean? I didn’t press it to find out.)

After fifteen minutes or so, two paramedics arrived. The two of us explained that we were GoodSAM first responders, which was the last interaction that we had with anybody. Baffled, we walked away from the scene complaining about how rubbish the GoodSAM app was. The other responder said that he thought up to two people get alerted for any given incident, but this seemed to be guesswork based on experience rather than knowing this for sure.

I checked the app again a while later and found something under the ‘Reports’ tab. It also showed a whole bunch of unread notifications from my previous stint as a first responder about five years ago:

Reports? Alerts? Feedback?

Reports? Alerts? Feedback?

Swiping left on the latest alert gave me a form to complete which included a plethora of fields whose labels didn’t make any sense to me. I had to answer questions such as whether the casualty lived, died, or is still being treated. How would I know, if I had left the scene an hour ago?

Outcome. I’ve no idea which of the bottom three options was the correct one to pick.

Outcome. I’ve no idea which of the bottom three options was the correct one to pick.

What happens with this report? Who gets it and reviews it? I have no idea, and the app offers no clues.

I now have a chat thread called ‘Organisational messages’ which is another example of how confusing the application is. The the messages that I and the other responder sent are no longer visible, but some messages from 2018 are. It’s so random.

What will now disappear from my device?

What will now disappear from my device?

I love that this app exists and that it allows me to put my first aid skills to good use. I am sure that it has saved lives by getting skilled first aiders to casualties quickly. I just don’t understand how the interface can be so dreadful, and how it hasn’t improved in all the years that it has been available. NHS Trusts are paying to use this service, but I am not sure they are aware about how awful the experience is.

Superfans and marginal customer acquisition costs

Ben Thompson has been running some superb interviews for subscribers of his Stratechery newsletter. His recent interview with Michael Nathanson of the MoffettNathanson research group is no exception.

I loved this insight about how companies are valued when they are relatively new and growing quickly. Their maths may be over-optimistic as they underestimate the cost of adding marginal customers after their initial rapid rise:

Ben Thompson: …a mistake a lot of companies make is they over-index on their initial customer. The problem is when you’re watching a company, that customer wants your product really bad, they’ll jump through a lot of hoops, they’ll pay a high price to get it. Companies build lifetime value models and derive their customer acquisition costs numbers from those initial customers and then they say, “These are our unit costs”, and those unit costs don’t actually apply when you get to the marginal customer because you end up spending way more to acquire them than you thought you would have.

Michael Nathanson: That’s my question to Disney, which is, and I think you wrote this — your first 100 million subs, look at the efficiency of how you built Disney+, it was a hot knife through butter. But now to get the next 100 million subs, what are you going to do? You’re going to add sports, do entertainment, more localized content. My question to Disney is, is it better just to play the super-fan strategy where you know your fans are going to be paying a high ARPU [average revenue per user] and always be there, or do you want to, like Netflix, go broader? I don’t have an answer, but I keep asking management, “Have you done the math?”