+++ title = ‘AIGC’ tags = [“AI”] draft = false +++

Image

Stable Diffusion

sd-pipeline

  • text_encoder: Stable Diffusion uses CLIP, but other diffusion models may use other encoders such as BERT
  • tokenizer: must match the one used by the text_encoder model
  • scheduler: the scheduling algorithm used to progressively add noise to the image during training
  • unet: the model used to generate the latent representation of the input
  • vae: autoencoder module that we’ll use to decode latent representations into real images

Tutorial

Install

conda create --name=ai python=3.10.9

sudo apt install nvidia-cuda-toolkit
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install transformers
noglob pip3 install diffusers["torch"]
pip install -U xformers

Colab

Stable diffusion web-ui

References

Torch version

pip install torch==1.12.1 torchvision==0.13.1

export CUDA_VISIBLE_DEVICES=-1

Command line arguments

--skip-torch-cuda-test
--upcast-sampling
--no-half-vae
--no-half
--use-cpu interrogate|all
--num_cpu_threads_per_process=6

--precision full

--nowebui

Models

  • Anime
    • Anything
    • Counterfeit
    • Dreamlike Diffusion
  • Realistic
  • 2.5D
    • Never Ending Dream
    • Protogen
    • Guofen3

Lora

Sites

References

Video

Github - stable-diffusion-videos

Voice

Preprocess

demucs

Github - demucs

Install

pip install -U demucs

Usage

# only separate vocals
demucs --two-stems=vocals myfile.mp3

Ultimate Vocal Remover

Github - UVR5

audio-slicer

audio-slicer

Install

pip install librosa
pip install soundfile

Usage

python slicer2.py audio [--out OUT] [--db_thresh DB_THRESH] [--min_length MIN_LENGTH] [--min_interval MIN_INTERVAL] [--hop_size HOP_SIZE] [--max_sil_kept MAX_SIL_KEPT]

TTS

Edge-TTS

Github - edge-tts

Install

pip install edge-tts

Usage

edge-tts --text "Hello, world!" --write-media hello.mp3 --write-subtitles hello.vtt

# player required, `brew install mpv`
edge-playback --text "Hello, world!"

# list voices
edge-tts --list-voices

# play with voice
edge-playback --voice zh-CN-shaanxi-XiaoniNeural --text "你好,世界"

Bark

Github - Bark

Coqui-TTS

Voice Conversion

SoVITS

RVC

Install env

conda install -c conda-forge gcc
conda install -c conda-forge gxx
conda install ffmpeg cmake
conda install pytorch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 pytorch-cuda=11.8 -c pytorch -c nvidia

Install requirements

pip install -r requirements.txt

Download models

cd tools
python download_models.py
Models

Tutorials

Other Projects

Speech Recognition (Speech To Text)

SpeechRecognition

Whisper

Wake Word Listener

References

Rendering

NeRF: Neural Radiance Fields

3D

AI Clone

商用:

开源:

Others

Problems