Real-time Audio Deepfakes Present New Security Threats

Real-time audio deepfakes have emerged as a significant technological development, presenting new challenges for cybersecurity. According to a report from NCC Group, a cybersecurity firm, the ability to create convincing real-time audio deepfakes has advanced greatly since 2020. This advancement allows for a technique known as “deepfake vishing,” which utilizes artificial intelligence to replicate a target”s voice instantly.

Pablo Alobera, a managing security consultant at NCC Group, explained that the tool can be activated with a simple press of a button once it has been trained. “We created a front end, a web page, with a start button. You just click start, and it starts working,” said Alobera, highlighting the ease of use.

Although NCC Group has not released the real-time voice deepfake tool to the public, the accompanying research paper includes audio samples demonstrating its effectiveness. The generated audio is both convincing and produced with minimal latency, indicating that the technology could be applied using various microphones found in laptops and smartphones.

While audio deepfakes are not a novel concept, previous iterations lacked the ability for real-time generation, which often made them less credible. Attackers had the option to pre-record deepfaked conversations, but victims could easily detect discrepancies if the dialogue deviated from the script. Alternatively, generating a deepfake on the fly required significant time, leading to noticeable delays. However, NCC Group”s tool overcomes these limitations.

With the consent of clients, NCC Group has employed this voice-changing technology along with caller ID spoofing to impersonate individuals. “Nearly all times we called, it worked. The target believed we were the person we were impersonating,” Alobera stated, emphasizing the tool”s reliability.

The significance of NCC Group”s demonstration lies in its use of open-source tools and commonly available hardware, rather than relying on third-party services. Although optimal performance is achieved with high-end graphics processing units (GPUs), the audio deepfake was successfully tested on a laptop equipped with Nvidia”s RTX A1000, which is among the less powerful GPUs in Nvidia”s current offerings. Notably, the laptop managed to produce a voice deepfake with only a half-second delay.

As real-time voice deepfakes become more prevalent, they signal a potential shift toward mainstream acceptance and use. This raises questions about the authenticity of what can be heard, even in conversations with familiar individuals. The development of video deepfakes is also rapidly progressing, aided by a surge of viral deepfake content across platforms like TikTok and YouTube.

Recent advancements in AI models from companies like Alibaba and Google have enabled the creation of deepfakes that can convincingly replicate any individual and place them in various settings. Trevor Wiseman, the founder of AI cybersecurity consultant The Circuit, noted that he has witnessed instances where individuals and companies have fallen victim to video deepfakes, including one case where a business was misled during the hiring process, resulting in a shipment of a laptop to a fraudulent address.

Despite their impressive capabilities, current video deepfakes still encounter certain limitations, especially in achieving high-quality results in real time. Wiseman pointed out that even the latest video deepfakes struggle to align a person”s expressions with their vocal tone and demeanor. “If they”re excited but they have no emotion on their face, it”s fake,” he remarked. However, he believes the technology is sufficiently advanced to deceive most people most of the time.

Given these developments, both individuals and organizations may need to devise new methods of authentication that do not rely solely on voice or video interactions. Wiseman suggested that, similar to how baseball teams use signals, people must establish reliable indicators to determine authenticity in communication.