I think that calls over the public switched telephone network (PSTN) are now effectively dead. The traditional telephone system runs on a protocol where you can call any number in the world and start a conversation, but you do not need to verify who you are. In a world of AI, this is going to be unsustainable. I am guessing that rogue calls like the one in the article will become more ubiquitous in a similar way to the rise of phishing and spam emails. However, unlike with email there are no audio equivalents of looking for specific clues to reason that the call is fake.
Perhaps in the future we’ll need to use another protocol, one that uses authentication. Before picking up an incoming call, there should be a simple indicator showing that the call is authentic. We already have caller ID, but this is trivial to spoof. The solution needs to be so simple that everyone can understand it, and it needs to work seamlessly without anyone needing to jump through hoops to enable it.
We already authenticate to our smartphones and other devices via our biometric data — our faces or fingerprints — or by entering a passcode. Perhaps whatever system we use for voice calling can use this to ‘prove’ that a call is coming from an authenticated source. When a call comes in, it could have a red/amber/green rating along these lines:
- 🟢 Green: The caller is in your contacts and they have authenticated themselves using their device biometrics or passphrase within the last n minutes.
- 🟡 Amber: The caller has authenticated themselves with their device in the last n minutes, but is not in your contacts.
- 🔴 Red: All bets are off, proceed with caution.
Taking it further, we would possibly need to have something in place that allows people to verify themselves. So, not only has Caller X’s device validated that they have unlocked it recently, but Caller X is definitely the Caller X. On my Mastodon profile, I can provide evidence that I am who I say I am by adding a link to my website and then entering code on my site that Mastodon can read. I’ve then ‘proved’ that this is my site because I was able to edit the code there. It looks like this:
I’m not sure a mechanism like this would go far enough to ‘prove’ I am who I claim to be when I call a new number for the first time, but perhaps there are other ways of doing something similar. Ideally with a solution that wouldn’t rely on a centralised verification/validation service.
It’s been interesting to read that Adobe, Arm, the BBC, Intel, Microsoft, Sony and Truepic are collaborating as the Coalition for Content Provenance and Authenticity. They are focused on providing evidence of the provenance of content, i.e. being able to trace back to the source of an image, a video or a document as well as what has been done to it since it was created. I don’t think this helps with real-time generated content such as fake voice and video.
I’ve heard people talk about the fact that authentication via voice, such as that used in telephone banking, is now completely broken because of AI. But I think the problem goes much deeper than this. Without authentication, how will you ever know who you are really speaking to?