Stable Diffusion

aus Metalab Wiki, dem offenen Zentrum für meta-disziplinäre Magier und technisch-kreative Enthusiasten.
Zur Navigation springenZur Suche springen
Language: English
Subpages:
Stable_Diffusion hat keine Unterseiten.


Stable Diffusion
Stable-diffusion-metalab.png
TBD
Metalab
anlumo
Vortrag
0
planning
Open Source AI Image Generation
Zuletzt aktualisiert: 04.04.2023


Stable Diffusion Introduction

Stable Diffusion (SD) is a new open source project for generating and manipulating images based on text prompts locally on the user's computer (with a powerful GPU). Even the training itself can be done with open source tools (and a lot of compute power).

anlumo has spent quite a significant amount of time learning the Ways of the Prompt Artist™ and wonders if anybody else would be interested in learning about this completely new field.

What Can Stable Diffusion Do?

Stable Diffusion is a collection of tools generally in the AI art generation. Among its features are the following capabilities:

  • Text-to-image: Enter a text prompt (positive and negative) and generate a low-res image out of that.
  • Image-to-image: Take an image as input and modify it based on a text prompt. This can be used for style transfer for example, or taking the composition of another image for a new creation. (example)
  • Inpainting: Same as image-to-image, but only modify a part of the image. This can be used to add or remove details in images, for example. (example)
  • Outpainting: Same as image-to-image, but extending an existing image instead. For example, if you have an image of the upper half of a person, you can add the lower half or add more of the environment (based on the text prompt).
  • Controlnet: Applicable to any of the above. Take a reference image, extract some property of it, like the pose of a person or a depth map, and nudge the AI to generate one of the above outputs with this extra information (example). This can also be used in text-to-image to convert a pencil sketch to a photorealistic image, for example (example).
  • Upscaling of images: This can increase the resolution of an image by adding details that weren't in the original image (like individual strands of hair). Usually this is used to increase the low resolution output of the techniques above to usable resolutions.

There were some recent attempts of applying these capabilities to video as well. Here is a YouTube Shorts demonstrating this (Full video of the end result).

So, SD is much more capable than the commercial offerings like Midjourney. However, it also has way more nobs to adjust and settings to optimize. Thus, the idea for this talk.

Plan

anlumo's plan would be to hold an interactive talk showing off how to install and use SD on a local machine.

Doing a workshop was considered, but the problem is that SD requires a beefy graphics card, which is hard to provide for a room full of people. If somebody has a gaming notebook with a 3000 or 4000 series GPU, they're free to participate.

Outline:

  • Basics/Introduction
  • Installation
  • Text-to-Image generation
  • Embeddings
  • Upscaling
  • Image-to-Image generation
  • Inpainting
  • ControlNet

Rating

This interactive talk shall be rated PEGI 16.

Non-Technical Aspects

Of course there's a lot to discuss about AI art replacing artists, legal aspects and related issues. These things can be discussed as part of this interactive talk, but anlumo would like to avoid having the whole time taken over by this. Also, he thinks that having direct insight into the techniques used while creating AI art can have a deep impact on opinions, so this could be more of a start to forming an opinion rather than butting heads between differing ones.

Interested?

The whole point of this page is to gauge interest, so if you'd like to participate, please add your name to this list:

If an sufficient amount of people appear on this list, a date and time will be discussed/announced.