With just a 90-second audio clip, we used AI to bring her loved one back to life…
  • Home
  • News
  • With just a 90-second audio clip, we used AI to bring her loved one back to life…
By OpenTaskAI profile image OpenTaskAI
5 min read

With just a 90-second audio clip, we used AI to bring her loved one back to life…

Author: 数字生命卡兹克 The story goes like this. At the end of November last year, I wrote an article about Kimi's long text. I said I gave Kimi my 100 articles, and then created a "digital life" of my own. To be honest, at that time, it was a bit of

Author: 数字生命卡兹克

The story goes like this.

At the end of November last year, I wrote an article about Kimi's long text. I said I gave Kimi my 100 articles, and then created a "digital life" of my own.

To be honest, at that time, it was a bit of clickbait, after all, that thing was far too far from a real "digital life". It was more about making a splash.

However, a comment that followed completely broke me down 

The content of the comment is as follows:

Reader asked: Can you resurrect my loved one? I long for such a digital life to be with me, as I cannot move on from a life without him. I've been searching everywhere, thank you!

卡兹克 replied: Please DM me, and I'll help you "resurrect" them.

When I read this sentence, my nose really tingled.

I've always been searching for the significance of AI. Many times, AI is not just a tool for improving efficiency; it should be able to do other more meaningful things.

Like love.

So, I replied to her immediately. Not for any so-called money or anything, at that moment, I had only one very simple thought:

To fulfill her wish.

The next morning, we successfully added each other as Contacts.

Reader: Hello, may I ask your surname? Reading the few simple sentences you wrote, to be honest, I cried a lot. I don't know what to say, but really, thank you!

In the subsequent conversation, I learned that her lover was named Lao D (a pseudonym used here for privacy protection). He passed away suddenly in an accident, and at that time, she was away and didn't even get to say a proper goodbye. For the past few months, she has been living with regret and self-blame, hoping to find a way to continue feeling Lao D's presence and warmth.

And also to let their two children feel some of their father's tangible companionship and love.

Instantly brought tears to my eyes.

It's precisely because of love that humans are so unique.

Returning to rationality, before creating a real digital life, of course, one very important thing is needed:

Data.

Helping her create a digital person similar to one that can converse in real-time would naturally be the ideal situation in my view, but data is a big issue.

We need text (articles and chat records written during his lifetime), audio (clean speech sounds), and video (clean recordings with preferably more actions) to create a digital person with better effects.

However, the objective reality is that many people (especially men) do not leave behind a video dataset that could be used to create a digital person.

Her lover was the same, being introverted and not leaving behind any video material. It's even hard to find a single video of him alone.

We had to settle for the next best thing, not making a video, but creating a "digital life" that could converse via voice, similar to making a phone call.

The text dataset was actually resolved quickly; after all, people always leave behind a lot of written information, whether it be chat records, essays, or social media posts, etc.

But we hit a snag with the collection of the audio dataset.

Because the only usable clean audio we had was 90 seconds.

Anyone who has dabbled in open-source TTS knows that with traditional methods, 90 seconds of data is practically useless. For example, with BertVits2, you need an hour of data to achieve decent results.

Therefore, I could only pin my hopes on taking the route of a large voice model, similar to GPT, using 90 seconds of voice data as a prompt, and aiming to achieve voice cloning with a few-shot learning approach.

But a large voice model isn't something simple that I could manage on my own; it far exceeds my knowledge and capabilities. I could only go around begging some friends in my AI voice circles, to see if they had the resources or technology in this area to offer some support.

So, in December last year, within my limited network, I visited a few companies working in this direction. But to my surprise, the models in this area...weren't as mature as I expected.

Their models were either still being refined, too slow in synthesis, the timbre wasn't right, or the emotions were too flat...

I didn't want to make do with whatever was available just to fulfill some promise; I really wanted to create something with great effect, something that wouldn't break her immersion, something that could let her feel her lover's affection...

But I really had no choice; after searching through all my contacts, I still felt it wasn't right...

By mid-December, I could only apologize to her and say:

Please wait a bit longer, wait for AI technology to advance, I'm truly sorry.

That wait lasted two months.

At the end of January, I was chatting casually with a friend from MiniMax, talking about some AI industry gossip, and the topic of large voice models came up inadvertently. I mentioned this story again.

Then, she sent me this message:

卡兹克Friends: We also have a large language model, do you want to give me the audio, and I'll give it a try for you?

To be honest, I was a bit surprised at the time because, in my memory, they never seemed to have any voice products.

But giving it a try wouldn't hurt, considering I had already tried so many other companies. Trying one more wouldn't make a difference, so I gave them the 90 seconds of audio material.

A day later, just when I had almost forgotten about it, they sent over a Demo.

Due to privacy reasons, I can't share a screen recording to let you hear how much the voice resembles Lao D, how accurately it has been restored. 

Firstly, this AI entity on the Hai Luo Question platform is private, visible only to her and me, not set to be publicly interactive, so sharing a screen recording wouldn't be appropriate. Secondly, to protect her and Lao D's privacy and avoid any disturbances to them.

However, I still want to express my feelings with a sentence:

Thank you, AI Expert,  for helping me fulfill my promise, and for making her dream come true.

In the end, I finally fulfilled her wish. 

After obtaining her consent, I wrote down this story.

"Coco" says there are three deaths:

The moment you stop breathing, you die from a biological perspective.

The moment your funeral is held, you die from a sociological perspective.

The moment the last person who remembers you dies, you experience the ultimate death.

Death is not a final goodbye.

Being forgotten is.

AI can carry these enduring memories across time, making them even more profound.

Remember me, keep our love alive, I'll never fade away.

Let AI continue, so we may never part.

Reposted from WeChat Official Account: 数字生命卡兹克


OpenTaskAI is a global marketplace that connects AI freelancers and business needs. Our mission is to enable more people to achieve self-worth with AI tools.

WebSite|Twitter|Discord|Medium|LinkedIn

By OpenTaskAI profile image OpenTaskAI
Updated on
News