AIGC | YZCHEN.SPACE

+++ title = ‘AIGC’ tags = [“AI”] draft = false +++

Image

Stable Diffusion

sd-pipeline

text_encoder: Stable Diffusion uses CLIP, but other diffusion models may use other encoders such as BERT
tokenizer: must match the one used by the text_encoder model
scheduler: the scheduling algorithm used to progressively add noise to the image during training
unet: the model used to generate the latent representation of the input
vae: autoencoder module that we’ll use to decode latent representations into real images

Tutorial

Install

conda create --name=ai python=3.10.9

sudo apt install nvidia-cuda-toolkit
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install transformers
noglob pip3 install diffusers["torch"]
pip install -U xformers

Colab

How to Use Stable Diffusion to Generate Images

Stable diffusion web-ui

References

Torch version

pip install torch==1.12.1 torchvision==0.13.1

export CUDA_VISIBLE_DEVICES=-1

Command line arguments

--skip-torch-cuda-test
--upcast-sampling
--no-half-vae
--no-half
--use-cpu interrogate|all
--num_cpu_threads_per_process=6

--precision full

--nowebui

Models

Anime
- Anything
- Counterfeit
- Dreamlike Diffusion
Realistic
- Deliberate
- Realistic Vision
- LOFI
- ChilloutMix
2.5D
- Never Ending Dream
- Protogen
- Guofen3

Lora

Ghibli Style
- Base model: Anything v4.5/AbyssOrangeMix2
- VAE: OrangeMixs
- Prompt: lora:ghibli_style_offset:1
Anime Lineart
- Base model: Anything v4.5/AbyssOrangeMix2
- VAE: OrangeMixs
- Sampler: DPM++ 2M Karras - 20 steps - CFG: 7
- Prompt: lineart, monochrome, lora:animeoutlineV4_16:1
- Negative embedding: EasyNegative、badhandv4
Colorwater
- Weight: 0.8~1
- CFG: 3 ~ 6
- Prompt: lora:try2:1

Sites

original
Github - diffusers
Models
- civitAI
  - CivitAI - BreakDro
  - CivitAI - BreakDomain
- HuggingFace - diffusers gallery
  - Hugging Face - andite/anything-v4.0
  - Hugging Face - gsdf/Counterfeit-V3.0
Gallery
Prompts
Art styles
- Google - Art Movements
- Google - Artists

References

Video

Github - stable-diffusion-videos

Voice

Preprocess

demucs

Github - demucs

Install

pip install -U demucs

Usage

# only separate vocals
demucs --two-stems=vocals myfile.mp3

Ultimate Vocal Remover

Github - UVR5

音乐制作新时代：探索图形界面下的先进音乐分离模型 —— Ultimate Vocal Remover 5

audio-slicer

Install

pip install librosa
pip install soundfile

Usage

python slicer2.py audio [--out OUT] [--db_thresh DB_THRESH] [--min_length MIN_LENGTH] [--min_interval MIN_INTERVAL] [--hop_size HOP_SIZE] [--max_sil_kept MAX_SIL_KEPT]

TTS

Edge-TTS

Github - edge-tts

Install

pip install edge-tts

Usage

edge-tts --text "Hello, world!" --write-media hello.mp3 --write-subtitles hello.vtt

# player required, `brew install mpv`
edge-playback --text "Hello, world!"

# list voices
edge-tts --list-voices

# play with voice
edge-playback --voice zh-CN-shaanxi-XiaoniNeural --text "你好，世界"

Bark

Github - Bark

Coqui-TTS

Voice Conversion

SoVITS

Soft-VC: Soft Speech Units for Improved Voice Conversion
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
Sovits: Soft-VC + VITS
so-vits-svc: Singing Voice Conversion based on SoftVC + vits
- Github - so-vits-svc
- Github - so-vits-svc-fork
GPT-SoVITS

RVC

Github - Retrieval-based-Voice-Conversion-WebUI

Install env

conda install -c conda-forge gcc
conda install -c conda-forge gxx
conda install ffmpeg cmake
conda install pytorch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 pytorch-cuda=11.8 -c pytorch -c nvidia

Install requirements

pip install -r requirements.txt

Download models

cd tools
python download_models.py

Models

HuggingFace - SpliceGirl

Tutorials

Other Projects

Speech Recognition (Speech To Text)

SpeechRecognition

Github - SpeechRecognition

Whisper

Github - Whisper
- Github - Whisper Mic
- Github - whisper.cpp

Wake Word Listener

References

Github - voicebook

Rendering

NeRF: Neural Radiance Fields

3D

AI Clone

商用：

开源：

Others

Problems

stable-diffusion-webui - Lora cannot be loaded in API mode
- copy modules.script_callbacks.before_ui_callback() into def api_only()

Image#

Stable Diffusion#

Tutorial#

Install#

Colab#

Stable diffusion web-ui#

Models#

Lora#

Sites#

References#

Video#

Voice#

Preprocess#

demucs#

Ultimate Vocal Remover#

audio-slicer#

TTS#

Edge-TTS#

Bark#

Coqui-TTS#

Voice Conversion#

SoVITS#

RVC#

Models#

Tutorials#

Other Projects#

Speech Recognition (Speech To Text)#

SpeechRecognition#

Whisper#

Wake Word Listener#

References#

Rendering#

3D#

AI Clone#

Others#

Problems#

Image

Stable Diffusion

Tutorial

Install

Colab

Stable diffusion web-ui

Models

Lora

Sites

References

Video

Voice

Preprocess

demucs

Ultimate Vocal Remover

audio-slicer

TTS

Edge-TTS

Bark

Coqui-TTS

Voice Conversion

SoVITS

RVC

Models

Tutorials

Other Projects

Speech Recognition (Speech To Text)

SpeechRecognition

Whisper

Wake Word Listener

References

Rendering

3D

AI Clone

Others

Problems