Oops, a seemingly nifty piece of new tech has gotten itself and its maker into a bit of hot water.
I’m referring to the emergence of AI-based human voice cloning as the new technology that managed to get into the oh-my-gosh headline news of late. In this instance, the company is Amazon and its ever-advancing Alexa.
Readers of my column might recall that I had previously covered the unseemly boo-boo that occurred when it was reported that a youngster was encouraged by Alexa to put a penny into an electrical socket (don’t do this!), see my coverage at the link here. In that circumstance, fortunately, no one was hurt, and the fallout was that apparently the Alexa AI system had picked up a prior viral trend and without any semblance of common-sense assessment merely repeated the crazy suggestion when asked for something fun to do by a child interacting with Alexa. This highlights the AI Ethics concerns that we are becoming inundated with AI that lacks entirely any semblance of common-sense reasoning, a notably trying problem facing AI that continues to defy efforts to embody in AI (for my analysis about AI-based common sense deriving efforts, see the link here).
The latest dustup involves doing voice cloning, also known as voice replication. The latest in such tech and AI are raising pressing AI Ethics and Ethical AI considerations. For my ongoing overarching coverage of AI Ethics and Ethical AI, see the link here and the link here, just to name a few.
AI-based voice cloning is a straightforward concept.
An AI system is programmed to audio record some of your spoken words. The AI then attempts to figure out your speech patterns computationally. Based on the detected speech patterns, the AI then tries to emit audio speech that sounds just like you. The tricky part is that the speech covers words that you had not prior provided as audio samples to the AI. In other words, the AI has to mathematically estimate how words might be spoken by you. This includes all the characteristics of speech such as the tone, the rising and lowering of the voice, the pace or speed of speaking, and so on.
When you hear a human try to impersonate another human, you usually can discern that the effort is an impersonation. In the short run, such as if the impersonator uses just a few words, it might be difficult to figure out that the voice is not the original speaker. Furthermore, if the impersonator is mimicking words that the original speaker actually spoke, the odds are that they can tune their own voice to the voice of the other person more so for that particular utterance.
Brevity and hearing the exact same words can allow someone to pretty much nail an impersonation.
The challenge becomes covering words that the other person has not spoken or that for which the impersonator has never heard the person speak those specific words. You are somewhat in the dark about trying to figure out how the mimicked person would have said those words. The good news is that if anyone else listening to the impersonator also doesn’t know how the original person would have said the words, the impersonator can be relatively afield of the true voice and yet still seem dandy and on-target.
I would also like to momentarily remove from the equation the mannerisms and physical movement of the impersonation. Upon seeing an impersonator, you might be swayed if they are able to wrinkle their face or flail their arms in a means that also mimics the person being impersonated. The added cues of the body and face are going to fool your mind into thinking that the voice is also dead-on, even though it might not be. A voice impersonation purist would insist that only the voice alone should be used as the criterion for determining whether the voice aptly mimics the impersonated person.
You certainly have seen the various deepfake videos that are going around these days on social media. Somebody cleverly rejiggers a video to have someone else’s face appear in the video, overlaying a face that was of someone else in the original recording. This usually also is accompanied by doing a deepfake on the voice too. You are getting a double whammy, involving the video visually being altered via deepfake AI and the audio being altered via deepfake AI.
For sake of discussion herein, I am concentrating on just the AI-based deepfake audio facets, which as mentioned earlier is commonly referred to as voice cloning or voice replication. Some cheekily refer to this as a voice in a can.
I am sure that some of you are right now exhorting that we’ve had the ability to use computer programs for cloning voices for quite a while. This is nothing new per se. I agree. At the same time, we do need to acknowledge that this high-tech capability is getting better and better. Well, I say better and better, but maybe as you’ll see in a moment I ought to be saying it is becoming increasingly worrisome and more worrisome.
Hold onto that thought.
The technological prowess is assuredly advancing for doing voice cloning. For example, it used to be that you would have had to “train” an AI audio replication program by speaking an entire story of mix-and-match words. Akin to the famous or infamous line of the quick brown fox that jumped over the lazy dog (a line intended to get someone to cover all the letters of the alphabet), there are specially crafted short stories that contain a mixture of words for purposes of getting you to say enough words and a wide enough variety of words to make the AI pattern matching a lot easier.
You might have had to read several pages of words, often times including words that you struggle to pronounce and aren’t even sure what they mean, in order to sufficiently enable AI pattern matching to occur. This could take many minutes or sometimes hours of talking to provide the AI with enough audio to use for finding distinct patterns of your voice. If you shortchanged this training activity, the chances were that the resultant voice replication would be easily shot down by any friends of yours that know your voice well.
Okay, the interest then by AI developers was focused on how to optimize the audio replicating aspects. AI builders relish challenges. They are said to be optimizers at heart. Give them a problem and they will tend to optimize, regardless of where that might lead (I mention this as a foreshadowing, which will become clearer shortly).
Answer me this:
- What is the least amount of audio sample that would be required to maximally clone a person’s voice and for which the audio sample can be almost any randomly allowed set of words and yet still allow for voice cloning to produce nearly any words that might be ever spoken by the targeted voice and sound essentially identical to that person’s voice in a conversational or other contextual settings of choice?
There is a lot in there to unpack.
Keep in mind that you want the minimum audio sample that will maximally clone a voice, such that the resultant AI utterances in that now automated replicated voice will seem wholly indistinguishable from the actual person. This is trickier than you might think.
It is almost like that game show whereby you have to try and name a song based on the least number of heard notes. The fewer the notes played, the harder it is to guess which song it is. If your guess is wrong, you lose the points or lose the game. A struggle ensues as to whether you should use just one note, the least possible clue, but then your probability of guessing the song is presumably severely reduced. The more notes you hear, the higher the probability of guessing the correct song goes, but you are allowing other contestants to also have a heightened chance of making a guess too.
Remember that we are also dealing with the notion of prescribed words versus just any words in the case of voice cloning. If a person says the words “You can’t handle the truth” and we want the AI to mimic or impersonate the person, the AI computationally can likely readily catch onto the pattern. On the other hand, suppose we only have these words as spoken by that person “Is that all you have to ask me” and we want to use those words to then have the AI say “You can’t handle the truth.” I think you can see the difficulty of training on one set of words and having to extrapolate to an entirely different set of words.
Another arduous element consists of the context for the spoken words. Suppose we get you to audio record a sentence when you are calm and at ease. The AI patterns those words. It might also pattern onto the calmness of your voice. Imagine that we then want the AI to pretend that it is you when you are screaming mad and angry as a hornet. Having the AI distort the original pattern into becoming an accurately angered version of your voice can be daunting.
What kind of minimums are we looking at?
The goal right now is to break the minute mark.
Grab a recorded voice for which you have less than a minute’s worth of audio and get the AI to do all the amazing voice cloning from that minuscule sample alone. I want to clarify that just about anybody can compose AI that can do this generally in less than one minute, though the resulting voice clone is wimpy and readily detected as incomplete. Again, I am explicitly and adamantly tying together that the sampling time is at a minimum and meanwhile the voice cloning is at a maximum. A dolt can achieve a minimum sampling if they are also allowed to be grossly submaximal in voice cloning.
This is a fun and exciting technological challenge. You might be wondering though as to the value or merits of doing this. To what end are we seeking? What benefits for humanity can we expect by being able to so efficiently and effectively do AI-based voice replication?
I want you to mull over that meaty question.
The wrong answer can get you inadvertently into a pile of mush.
Here’s something that seems upbeat and altogether positive.
Assume that we might have old-time recordings of famous people such as Abraham Lincoln and were able to use those dusty audio snippets for crafting an AI-based voice clone. We could then hear Lincoln speak the Gettysburg Address as though we were there on the day that he uttered the four score and seven years ago memorable speech. As a side note, regrettably, we do not have any audio recordings of Lincoln’s voice (the technology did not yet exist), but we do have voice recordings of President Benjamin Harrison (the first of the US presidents to have a voice recording made of) and other presidents thereafter.
I believe we would all likely reasonably agree that this specific use of AI-based voice cloning is perfectly fine. In fact, we probably would want this more so than if an actor today tried to pretend that they are speaking like Lincoln. The actor would be presumably making up whatever they thought Lincoln’s actual voice sounded like. It would be a fabrication, perhaps far removed from what Lincoln’s voice was. Instead, via using a well-qualified AI voice cloning system, there would be little argument about how Lincoln’s voice truly sounded. The AI would be factually correct, at least to the extent of how good the AI is at replicating the targeted voice.
In the category of goodness about AI voice cloning, we can score a win with this kind of use case.
Not wanting to be gloomy, but there is a downside to even this apparently all-upside usage.
Someone uses an AI voice cloning system to figure out the voice of Theodore Roosevelt (“Teddy”), our treasured 26th President of the United States, naturalist, conservationist, statesman, writer, historian, and almost universally labeled an esteemed person. Speeches that he gave and for which we do not have any historically preserved audio versions could now be “spoken” as though he personally was doing the speaking today. A commendable boost for studying history.
Let’s turn this ugly, simply for purposes of revealing the downsides thereof.
We use the Teddy AI-based voice clone to read a speech that was given by an evil dictator. The AI doesn’t care about what it is speaking since there is no semblance of sentience in the AI. Words are simply words, or more accurately just puffs of sound.
You might be aghast that someone would do something of this underhanded nature. Why in the heck would the AI-based cloned voice of renowned and revered Theodore Roosevelt be used to deliver a speech that not only did Teddy did not originally do, but on top of that is speaking on a topic that depicts some evilness of a despicable dictator?
Outrageous, you might exclaim.
Easily done, comes the reply.
In essence, one very important concern about the AI-based voice replicating is that we will suddenly find ourselves awash in fake or shall we say deepfake speeches and utterances that have nothing to do with any historical facts or accuracies. If enough of these get made and promulgated, we might become confused about what is fact versus what is fiction.
You can abundantly see how this might arise. Using an AI-based voice clone, somebody makes an audio recording of Woodrow Wilson giving a speech that he never actually gave. This is posted on the Internet. Somebody else hears the recording and believes it is the real thing. They post it elsewhere, mentioning that they found this great historical recording of Woodrow Wilson. Soon enough, students in history classes are using the audio in lieu of reading the written version of the speech.
Nobody ends up knowing whether the speech was given by Woodrow Wilson or not. Maybe it was, maybe it wasn’t, and everyone figures it doesn’t really matter either way (well, those that aren’t focused on historical accuracy and facts). Of course, if the speech is a dastardly one, this gives a misimpression or disinformation portrayal of that historical figure. History and fiction are merged into one.
I trust that you are hopefully convinced that this is a downside associated with AI-based voice cloning.
Again, we can already do these kinds of things, doing so without the newer and improved AI-based voice replicating, but it is going to get easier to do this and the resulting audio will be extremely hard to differentiate between real and fake. Nowadays, using conventional audio-producing programs, you can usually listen to the output and often easily ascertain that the audio is faked. With the advances in AI, you will soon enough no longer be able to believe your ears, in a manner of speaking.
As bad as the voice cloning of historical figures might be, we need to think through the perhaps especially egregious uses entailing living people of today.
First, have you ever heard of a somewhat popular scam that involves someone impersonating a boss or the equivalent thereof? Some years ago, there was a disturbing fad of calling a restaurant or store and pretending to be the boss of the establishment. The fakery would involve telling a staff member to do ridiculous things, which they often would dutifully do under the false belief that they were talking to their boss.
I do not want to get mired in these kinds of enraging wrongdoing acts, but another pertinent one consists of calling somebody that might be hard of hearing and pretending to be their grandson or granddaughter. The impersonator tries to convince the grandparent to provide money to aid or maybe save them in some fashion. Based on the impersonated voice, the grandparent is fooled into doing so. Despicable. Disgraceful. Sad.
We are about to enter into an era in which AI-based voice cloning will enable on steroids, if you were, the advent of voice-related scams and swindles. The AI will do such a remarkable job of voice replication that whoever hears the voice will swear on their oath that the actual person was the one doing the speaking.
How far might that go?
Some are worried that the release of say atomic weaponry and military attacks could happen by someone using an AI-based voice clone that tricks others into believing that a top-level military officer was issuing a direct command. The same could be said of anyone in any prominent position. Use a superbly accurate AI voice clone to get a banking executive to release millions of dollars in funds, doing so based on being fooled into believing they are speaking with the banking client at hand.
In years past, doing this with AI would not have been necessarily convincing. The moment that the human on the other end of the phone starts asking questions, the AI would need to depart from a prepared script. At that juncture, the voice cloning would deteriorate, sometimes radically so. The only means to keep the swindle going was to force the conversation back into the script.
With the type of AI that we have today, including advances in Natural Language Processing (NLP), you can go off a script and potentially have the AI voice clone seem to be speaking in a natural conversational way (this is not always the case, and there are still ways to trip-up the AI).
Before getting into some more meat and potatoes about the wild and woolly considerations underlying AI-based voice cloning, let’s establish some additional fundamentals on profoundly essential topics. We need to briefly take a breezy dive into AI Ethics and especially the advent of Machine Learning (ML) and Deep Learning (DL).
You might be vaguely aware that one of the loudest voices these days in the AI field and even outside the field of AI consists of clamoring for a greater semblance of Ethical AI. Let’s take a look at what it means to refer to AI Ethics and Ethical AI. On top of that, we will explore what I mean when I speak of Machine Learning and Deep Learning.
One particular segment or portion of AI Ethics that has been getting a lot of media attention consists of AI that exhibits untoward biases and inequities. You might be aware that when the latest era of AI got underway there was a huge burst of enthusiasm for what some now call AI For Good. Unfortunately, on the heels of that gushing excitement, we began to witness AI For Bad. For example, various AI-based facial recognition systems have been revealed as containing racial biases and gender biases, which I’ve discussed at the link here.
Efforts to fight back against AI For Bad are actively underway. Besides vociferous legal pursuits of reining in the wrongdoing, there is also a substantive push toward embracing AI Ethics to righten the AI vileness. The notion is that we ought to adopt and endorse key Ethical AI principles for the development and fielding of AI doing so to undercut the AI For Bad and simultaneously heralding and promoting the preferable AI For Good.
On a related notion, I am an advocate of trying to use AI as part of the solution to AI woes, fighting fire with fire in that manner of thinking. We might for example embed Ethical AI components into an AI system that will monitor how the rest of the AI is doing things and thus potentially catch in real-time any discriminatory efforts, see my discussion at the link here. We could also have a separate AI system that acts as a type of AI Ethics monitor. The AI system serves as an overseer to track and detect when another AI is going into the unethical abyss (see my analysis of such capabilities at the link here).
In a moment, I’ll share with you some overarching principles underlying AI Ethics. There are lots of these kinds of lists floating around here and there. You could say that there isn’t as yet a singular list of universal appeal and concurrence. That’s the unfortunate news. The good news is that at least there are readily available AI Ethics lists and they tend to be quite similar. All told, this suggests that by a form of reasoned convergence of sorts that we are finding our way toward a general commonality of what AI Ethics consists of.
First, let’s cover briefly some of the overall Ethical AI precepts to illustrate what ought to be a vital consideration for anyone crafting, fielding, or using AI.
For example, as stated by the Vatican in the Rome Call For AI Ethics and as I’ve covered in-depth at the link here, these are their identified six primary AI ethics principles:
- Transparency: In principle, AI systems must be explainable
- Inclusion: The needs of all human beings must be taken into consideration so that everyone can benefit, and all individuals can be offered the best possible conditions to express themselves and develop
- Responsibility: Those who design and deploy the use of AI must proceed with responsibility and transparency
- Impartiality: Do not create or act according to bias, thus safeguarding fairness and human dignity
- Reliability: AI systems must be able to work reliably
- Security and privacy: AI systems must work securely and respect the privacy of users.
As stated by the U.S. Department of Defense (DoD) in their Ethical Principles For The Use Of Artificial Intelligence and as I’ve covered in-depth at the link here, these are their six primary AI ethics principles:
- Responsible: DoD personnel will exercise appropriate levels of judgment and care while remaining responsible for the development, deployment, and use of AI capabilities.
- Equitable: The Department will take deliberate steps to minimize unintended bias in AI capabilities.
- Traceable: The Department’s AI capabilities will be developed and deployed such that relevant personnel possesses an appropriate understanding of the technology, development processes, and operational methods applicable to AI capabilities, including transparent and auditable methodologies, data sources, and design procedure and documentation.
- Reliable: The Department’s AI capabilities will have explicit, well-defined uses, and the safety, security, and effectiveness of such capabilities will be subject to testing and assurance within those defined uses across their entire lifecycles.
- Governable: The Department will design and engineer AI capabilities to fulfill their intended functions while possessing the ability to detect and avoid unintended consequences, and the ability to disengage or deactivate deployed systems that demonstrate unintended behavior.
I’ve also discussed various collective analyses of AI ethics principles, including having covered a set devised by researchers that examined and condensed the essence of numerous national and international AI ethics tenets in a paper entitled “The Global Landscape Of AI Ethics Guidelines” (published in Nature), and that my coverage explores at the link here, which led to this keystone list:
- Justice & Fairness
- Freedom & Autonomy
As you might directly guess, trying to pin down the specifics underlying these principles can be extremely hard to do. Even more so, the effort to turn those broad principles into something entirely tangible and detailed enough to be used when crafting AI systems is also a tough nut to crack. It is easy to overall do some handwaving about what AI Ethics precepts are and how they should be generally observed, while it is a much more complicated situation in the AI coding having to be the veritable rubber that meets the road.
The AI Ethics principles are to be utilized by AI developers, along with those that manage AI development efforts, and even those that ultimately field and perform upkeep on AI systems. All stakeholders throughout the entire AI life cycle of development and usage are considered within the scope of abiding by the being-established norms of Ethical AI. This is an important highlight since the usual assumption is that “only coders” or those that program the AI are subject to adhering to the AI Ethics notions. As earlier stated, it takes a village to devise and field AI, and for which the entire village has to be versed in and abide by AI Ethics precepts.
Let’s also make sure we are on the same page about the nature of today’s AI.
There isn’t any AI today that is sentient. We don’t have this. We don’t know if sentient AI will be possible. Nobody can aptly predict whether we will attain sentient AI, nor whether sentient AI will somehow miraculously spontaneously arise in a form of computational cognitive supernova (usually referred to as the singularity, see my coverage at the link here).
The type of AI that I am focusing on consists of the non-sentient AI that we have today. If we wanted to wildly speculate about sentient AI, this discussion could go in a radically different direction. A sentient AI would supposedly be of human quality. You would need to consider that the sentient AI is the cognitive equivalent of a human. More so, since some speculate we might have super-intelligent AI, it is conceivable that such AI could end up being smarter than humans.
Let’s keep things more down to earth and consider today’s computational non-sentient AI.
Realize that today’s AI is not able to “think” in any fashion on par with human thinking. When you interact with Alexa or Siri, the conversational capacities might seem akin to human capacities, but the reality is that it is computational and lacks human cognition. The latest era of AI has made extensive use of Machine Learning (ML) and Deep Learning (DL), which leverage computational pattern matching. This has led to AI systems that have the appearance of human-like proclivities. Meanwhile, there isn’t any AI today that has a semblance of common sense and nor has any of the cognitive wonderment of robust human thinking.
ML/DL is a form of computational pattern matching. The usual approach is that you assemble data about a decision-making task. You feed the data into the ML/DL computer models. Those models seek to find mathematical patterns. After finding such patterns, if so found, the AI system then will use those patterns when encountering new data. Upon the presentation of new data, the patterns based on the “old” or historical data are applied to render a current decision.
I think you can guess where this is heading. If humans that have been making the patterned upon decisions have been incorporating untoward biases, the odds are that the data reflects this in subtle but significant ways. Machine Learning or Deep Learning computational pattern matching will simply try to mathematically mimic the data accordingly. There is no semblance of common sense or other sentient aspects of AI-crafted modeling per se.
Furthermore, the AI developers might not realize what is going on either. The arcane mathematics in the ML/DL might make it difficult to ferret out the now hidden biases. You would rightfully hope and expect that the AI developers would test for the potentially buried biases, though this is trickier than it might seem. A solid chance exists that even with relatively extensive testing that there will be biases still embedded within the pattern matching models of the ML/DL.
You could somewhat use the famous or infamous adage of garbage-in garbage-out. The thing is, this is more akin to biases-in that insidiously get infused as biases submerged within the AI. The algorithm decision-making (ADM) of AI axiomatically becomes laden with inequities.
Let’s return to our focus on AI-based voice cloning.
At a recent conference, a presentation given by Amazon was intended to showcase the desirable upsides of AI-based voice cloning and highlight the latest leading-edge AI being used in Alexa for advancing its capabilities. According to news reports, a prepared example that was supposed to be heartwarming and upbeat consisted of having a child ask Alexa to have their grandma finish reading to them the story of The Wizard Of Oz. The audience was told that the grandmother had passed away and that this was a means for the child to essentially reconnect with their dearly cherished grandparent. All of this was apparently part of a video put together by Amazon to aid in showcasing the latest AI voice cloning breakthroughs by the Alexa development team (encompassing features not yet formally launched for public use).
One reaction to this example is that we could be quite touched that a child could once again hear their grandmother’s voice. We are to presumably assume that the grandmother had not already recorded a full reading of the story, thus the AI cloning was doing the work of making things seem as though the grandmother was now doing the entirety of the reading.
Remarkable and a tremendous way to reconnect with loved ones that are no longer with us.
Not all reporters and analysts (plus Twitter) were so inclined as to a favorable interpretation of this advancement. Some labeled this as being outright creepy. Trying to recreate the voice of a deceased loved one was said to be a strange and somewhat bizarre undertaking.
Questions abound, such as:
- Would the child get confused and believe that the deceased loved one was still alive?
- Could the child now be led into some untoward prank or scam under the false belief that the grandmother was still with us?
- Might the child suffer from hearing about the deceased loved one and become despondent by now once again missing the grandparent, as though opening already settled emotional wounds?
- Will the child think that the deceased can speak from the other side, namely that this mystical voice that appears to precisely be his grandmother is speaking to him from the grave?
- Is it conceivable that the child will think that the AI has somehow embodied his grandmother, anthropomorphizing the AI such that the child will grow up believing that AI can replicate humans wholly?
- Suppose the child becomes so enamored of the grandmother’s AI-replicated voice that the youngster becomes obsessed and uses the voice for all manner of audio listening?
- Can the vendor that is replicating the voice opt to use that voice for others using the same overall system, doing so without getting explicit permission from the family and thus “profiting” from the devised voice?
- And so on.
It is important to realize that you can conjure up just as many negatives as positives, or shall we say as many positives as negatives. There are tradeoffs underlying these AI advances. Looking at only one side of the coin is perhaps myopic.
The key is to make sure that we are looking at all sides of these issues. Do not be clouded in your thinking. It can be easy to only explore the positives. It can be easy to explore only the negatives. We need to examine both and figure out what can be done to hopefully leverage the positives and seek to reduce, eliminate, or at least mitigate the negatives.
To some degree, that is why AI Ethics and Ethical AI is such a crucial topic. The precepts of AI Ethics get us to remain vigilant. AI technologists can at times become preoccupied with technology, particularly the optimization of high-tech. They aren’t necessarily considering the larger societal ramifications. Having an AI Ethics mindset and doing so integrally to AI development and fielding is vital for producing appropriate AI.
Besides employing AI Ethics, there is a corresponding question of whether we should have laws to govern various uses of AI, such as the AI-based voice cloning features. New laws are being bandied around at the federal, state, and local levels that concern the range and nature of how AI should be devised. The effort to draft and enact such laws is a gradual one. AI Ethics serves as a considered stopgap, at the very least.
That being said, some argue that we do not need new laws that cover AI and that our existing laws are sufficient. In fact, they forewarn that if we do enact some of these AI laws, we will be killing the golden goose by clamping down on advances in AI that proffer immense societal advantages. See for example my coverage at the link here and the link here.
At this juncture of this weighty discussion, I’d bet that you are desirous of some illustrative examples that might showcase this topic. There is a special and assuredly popular set of examples that are close to my heart. You see, in my capacity as an expert on AI including the ethical and legal ramifications, I am frequently asked to identify realistic examples that showcase AI Ethics dilemmas so that the somewhat theoretical nature of the topic can be more readily grasped. One of the most evocative areas that vividly presents this ethical AI quandary is the advent of AI-based true self-driving cars. This will serve as a handy use case or exemplar for ample discussion on the topic.
Here’s then a noteworthy question that is worth contemplating: Does the advent of AI-based true self-driving cars illuminate anything about AI-based voice cloning, and if so, what does this showcase?
Allow me a moment to unpack the question.
First, note that there isn’t a human driver involved in a true self-driving car. Keep in mind that true self-driving cars are driven via an AI driving system. There isn’t a need for a human driver at the wheel, nor is there a provision for a human to drive the vehicle. For my extensive and ongoing coverage of Autonomous Vehicles (AVs) and especially self-driving cars, see the link here.
I’d like to further clarify what is meant when I refer to true self-driving cars.
Understanding The Levels Of Self-Driving Cars
As a clarification, true self-driving cars are ones where the AI drives the car entirely on its own and there isn’t any human assistance during the driving task.
These driverless vehicles are considered Level 4 and Level 5 (see my explanation at this link here), while a car that requires a human driver to co-share the driving effort is usually considered at Level 2 or Level 3. The cars that co-share the driving task are described as being semi-autonomous, and typically contain a variety of automated add-ons that are referred to as ADAS (Advanced Driver-Assistance Systems).
There is not yet a true self-driving car at Level 5, and we don’t yet even know if this will be possible to achieve, nor how long it will take to get there.
Meanwhile, the Level 4 efforts are gradually trying to get some traction by undergoing very narrow and selective public roadway trials, though there is controversy over whether this testing should be allowed per se (we are all life-or-death guinea pigs in an experiment taking place on our highways and byways, some contend, see my coverage at this link here).
Since semi-autonomous cars require a human driver, the adoption of those types of cars won’t be markedly different than driving conventional vehicles, so there’s not much new per se to cover about them on this topic (though, as you’ll see in a moment, the points next made are generally applicable).
For semi-autonomous cars, it is important that the public needs to be forewarned about a disturbing aspect that’s been arising lately, namely that despite those human drivers that keep posting videos of themselves falling asleep at the wheel of a Level 2 or Level 3 car, we all need to avoid being misled into believing that the driver can take away their attention from the driving task while driving a semi-autonomous car.
You are the responsible party for the driving actions of the vehicle, regardless of how much automation might be tossed into a Level 2 or Level 3.
Self-Driving Cars And AI-Based Voice Cloning
For Level 4 and Level 5 true self-driving vehicles, there won’t be a human driver involved in the driving task.
All occupants will be passengers.
The AI is doing the driving.
One aspect to immediately discuss entails the fact that the AI involved in today’s AI driving systems is not sentient. In other words, the AI is altogether a collective of computer-based programming and algorithms, and most assuredly not able to reason in the same manner that humans can.
Why is this added emphasis about the AI not being sentient?
Because I want to underscore that when discussing the role of the AI driving system, I am not ascribing human qualities to the AI. Please be aware that there is an ongoing and dangerous tendency these days to anthropomorphize AI. In essence, people are assigning human-like sentience to today’s AI, despite the undeniable and inarguable fact that no such AI exists as yet.
With that clarification, you can envision that the AI driving system won’t natively somehow “know” about the facets of driving. Driving and all that it entails will need to be programmed as part of the hardware and software of the self-driving car.
Let’s dive into the myriad of aspects that come to play on this topic.
First, it is important to realize that not all AI self-driving cars are the same. Each automaker and self-driving tech firm is taking its approach to devising self-driving cars. As such, it is difficult to make sweeping statements about what AI driving systems will do or not do.
Furthermore, whenever stating that an AI driving system doesn’t do some particular thing, this can, later on, be overtaken by developers that in fact program the computer to do that very thing. Step by step, AI driving systems are being gradually improved and extended. An existing limitation today might no longer exist in a future iteration or version of the system.
I hope that provides a sufficient litany of caveats to underlie what I am about to relate.
Let’s sketch out a scenario that might leverage AI-based voice cloning.
A parent and their child get into an AI-based self-driving car. They are going to their local grocery store. This is anticipated to be a relatively uneventful ride. Just a weekly drive over to the store, though the driver is an AI driving system and the parent doesn’t need to do any of the driving.
For a parent, this is a big boon. Rather than having to focus on steering and dealing with the act of driving, the parent can instead devote their attention to their child. They can play together in the autonomous vehicle and spend time of a valued nature. Whereas the parent would normally be distracted by doing the driving, and likely get anxious and uptight while navigating busy streets and dealing with other nutty drivers nearby, here the parent is blissfully unaware of those concerns and solely delightfully interacting with their precious child.
The parent speaks to the AI driving system and tells the AI to take them to the grocery store. In a typical scenario, the AI would respond via a neutral audio utterance that you might familiarly hear via today’s Alexa or Siri. The AI might reply by stating that the grocery store is 15 minutes’ driving time away. In addition, the AI might state that the self-driving car will be dropping them off at the very front of the store.
That might be the only voice-related activity of the AI in such a scenario. Perhaps, once the self-driving car gets close to the grocery store, the AI might utter something about the destination getting near. There might also be a vocal reminder to take your things with you as you exit the autonomous vehicle.
I’ve explained that some AI driving systems are going to be chatty cats, as it were. They will be programmed to more fluently and continually interact with the human riders. When you get into a ridesharing vehicle that is being driven by a human, sometimes you want the driver to be chatty. Besides saying hello, you might want them to tell you about the local weather conditions, or maybe point out other places to see in the local area. Not everyone will want the chatty cat, thus the AI should be devised to only engage in dialogues when the human requests it, see my coverage at the link here.
Now that I’ve got all of the established, let’s change things up in a small but significant way.
Pretend that the AI driving system has an AI-based voice cloning feature. Let’s also assume that the parent previously seeded the AI voice cloning by providing an audio snippet of the child’s grandmother. Surprise, the parent thinks, I will have the AI driving system speak as though it is the child’s deceased grandmother.
While on the driving journey to the grocery store, the AI driving system interacts with the parent and child, exclusively using the grandmother’s cloned voice the entire time.
What do you think of this?
Creepy or fondly memorable?
I’ll kick up things a notch. Get ready. Fasten your seatbelt.
Some believe as I do that we will eventually allow children to ride in AI-based self-driving cars by themselves, see my analysis at the link here.
In today’s human-driven cars, an adult must always be present because the law requires that an adult driver is at the wheel. For all practical purposes, you can never have a child in a moving car that is in the vehicle by themselves (yes, I know that this happens, such as a prominent 10-year-old son of a major movie star that recently backed up a very expensive car into another very expensive car, but anyway these are rarities).
Today’s parents would probably strenuously object to allowing their children to ride in a self-driving car that lacks an adult in the vehicle serving as a supervisor or watching over their kids. I know it seems nearly impossible to envision, but I am betting that once self-driving cars are prevalent, we will inevitably accept the idea of children being without adults while riding in a self-driving car.
Consider the convenience factor.
You are at work and your boss is hounding you to get a task done. You need to pick up your child from school and take them over to baseball practice. You are stuck between a rock and a hard place as too appeasing your boss or not taking your child to the practice field. No one else that you know is available to provide your child with a lift. If anything, you certainly don’t want to use a ridesharing service that has a human driver, since you would naturally be concerned about what that stranger adult might say or do while giving your child a ride.
No problem, no worries, just use an AI-based self-driving car. You remotely direct the self-driving car to go pick up your child. Via the cameras of the self-driving car, you can see and watch your child get into the autonomous vehicle. Furthermore, there are inward-facing cameras and you can watch your child the entire driving journey. This seems as safe if not safer than asking a stranger human driver to provide a lift for your child. That being said, some are rightfully concerned that if the driving act goes awry, you have a child left to themselves and no adult immediately present to aid or give guidance to the child.
Putting aside the numerous qualms, suppose that the same parent and child that I was describing in the prior scenario are okay with the child going for rides without the parent being present. Just accept that this is ultimately a viable scenario.
Here is the finale kicker.
Each time that the child rides in the AI-based self-driving car, they are greeted and interact with the AI as it is utilizing the AI-based voice cloning and replicating the voice of the child’s deceased grandmother.
What do you think of those apples?
When the parent was also present in the self-driving car, maybe we could excuse the AI voice usage since the parent is there to advise the child about what is taking place when the AI audio is speaking. But when the parent isn’t present, we now are assuming that the child is idyllically fine with the grandmother’s voice replication.
This is definitely one of those pausing moments to think seriously about whether this is on the balance good or bad for a child.
Let’s do a bit of a thought experiment to mull over these weighty matters.
Please come up with three solidly positive reasons to have AI-based voice cloning.
I’ll wait while you come up with them.
Next, come up with three solidly negative reasons that undercut the advent of AI-based voice cloning.
I’ll assume that you’ve come up with some.
I realize that you can undoubtedly come up with a lot more reasons than just three each that either favor or disfavor this technology. In your view, do the negatives outweigh the positives? There are those critics that argue we ought to put the kibosh on such efforts.
Some want to try and block firms from making use of AI-based voice cloning, though realize that this is one of those classic whack-a-mole predicaments. Any firm that you get to stop using it, the odds are that some other firm will start using it. Freezing the clock or tucking away this kind of AI is going to be nearly impossible to undertake.
In a final remark on this topic for the moment, imagine what might happen if we can someday achieve sentient AI. I am not saying that this will happen. We can speculate anyway and see where that might lead.
First, consider an insightful quote about speaking and having a voice. Madeleine Albright famously said this: “It took me quite a long time to develop a voice, and now that I have it, I am not going to be silent.”
If we are able to produce sentient AI, or somehow sentience arises even if we don’t directly bring it forth, what voice should that AI have? Assume that it can use its AI-based voice cloning and ergo manufacture any voice of any human via some teensy-tiny snippet of audio sampling that might be available as uttered by that human. Such an AI could then speak and fool you into believing that the AI is seemingly that person.
Then again, perhaps AI will want to have its own voice and purposely devise a voice completely unlike all other human voices, wanting to be special in its own charming way.
By gosh, this leaves one nearly speechless.