dangeng / visual_anagrams
- воскресенье, 3 декабря 2023 г. в 00:00:03
Code for the paper "Visual Anagrams: Generating Multi-View Optical Illusions with Diffusion Models"
Daniel Geng, Aaron Park, Andrew Owens
This repo contains code to generate visual anagrams and other multi-view optical illusions. These are images that change appearance or identity when transformed, such as by a rotation, a color inversion, or a jigsaw rearrangement. Please read our paper or visit our website for more details.
Create a conda env by running:
conda env create -f environment.yml
and then activate it by running
conda activate visual_anagrams
Our method uses DeepFloyd IF, a pixel-based diffusion model. We do not use Stable Diffusion because latent diffusion models cause artifacts in illusions (see our paper for more details).
Before using DeepFloyd IF, you must accept its usage conditions. To do so:
python huggingface_login.py
and entering your Hugging Face Hub access token when prompted. If asked Add token as git credential? (Y/n)
, you can respond with n
.
To generate 90 degree rotation illusions, run:
python generate.py --name rotate_cw.village.horse --prompts "a snowy mountain village" "a horse" --style "an oil painting of" --views identity rotate_cw --num_samples 10 --num_inference_steps 30 --guidance_scale 10.0
Here is a description of useful arguments:
name
: Name for the illusion. Will save samples to ./results/{name}
.prompts
: A list of prompts for illusionsstyle
: Optional style prompt to prepend to each of the prompts. For example, could be "an oil painting of"
. Saves some writing.views
: A list of views to use. Must match the number of prompts. For a list of views see the get_views
function in visual_anagrams/views/__init__.py
.num_samples
: Number of illusions to samplenum_inference_steps
: Number of diffusion denoising steps to take.guidance_scale
: Guidance scale for classifier free guidance.To animate the above two view illusion, run:
python animate.py --im_path results/rotate_cw.village.horse/0000/sample_256.png --metadata_path results/rotate_cw.village.horse/metadata.pkl
Here is a description of useful arguments:
im_path
: The path to your illusionmetadata_path
: The path to metadata about your illusion, which is saved by generate.py
. Overrides the options below.view
: Name of the view. For a list of views see the get_views
function in visual_anagrams/views/__init__.py
prompt_1
: Prompt for the original image. You can add \n
characters here for line breaks.prompt_2
: Same as prompt_1
, but for the transformed image.Choosing prompts for illusions can be fairly tricky and unintuitive. Here are some tips:
"a photo of"
tend to be harder as the constraint of realism is fairly difficult (but this doesn't mean they can't work!)."an oil painting of"
seem to do better because there's more freedom to how it can be depicted and interpreted."houseplants"
or "wine and cheese"
or "a kitchen"
"an old man"
or "marilyn monroe"
tend to be good subjects.Flipping illusion:
python generate.py --name flip.campfire.man --prompts "an oil painting of people around a campfire" "an oil painting of an old man" --views identity flip --num_samples 10 --num_inference_steps 30 --guidance_scale 10.0
Jigsaw illusions:
python generate.py --name jigsaw.houseplants.marilyn --prompts "houseplants" "marilyn monroe" --style "an oil painting of" --views identity jigsaw --num_samples 10 --num_inference_steps 30 --guidance_scale 10.0
Inner circle illusions:
python generate.py --name inner.einstein.marilyn --prompts "albert einstein" "marilyn monroe" --style "an oil painting of" --views identity inner_circle --num_samples 10 --num_inference_steps 30 --guidance_scale 10.0
Color inversion illusions:
python generate.py --name negate.landscape.houseplants --prompts "a landscape" "houseplants" --style "a lithograph of" --views identity negate --num_samples 10 --num_inference_steps 30 --guidance_scale 10.0
Patch permutation illusions:
python generate.py --name patch.lemur.kangaroo --prompts "a lemur" "a kangaroo" --style "a pencil sketch of" --views identity patch_permute --num_samples 10 --num_inference_steps 30 --guidance_scale 10.0
Pixel permutation illusions:
python generate.py --name pixel.duck.rabbit --prompts "a duck" "a rabbit" --style "a mosaic of" --views identity pixel_permute --num_samples 10 --num_inference_steps 30 --guidance_scale 10.0
Skew illusions:
python generate.py --name skew.tudor.skull --prompts "a tudor portrait" "a skull" --style "an oil painting of" --views identity skew --num_samples 10 --num_inference_steps 30 --guidance_scale 10.0
Three view illusions:
python generate.py --name threeview.waterfall.teddy.rabbit --prompts "a waterfall" "a teddy bear" "a rabbit" --style "an oil painting of" --views identity rotate_cw rotate_ccw --num_samples 10 --num_inference_steps 30 --guidance_scale 10.0
Views are derived from the base class BaseView
. You can see many examples of these transformations in views.py
, if you want to write your own view.
Additionally, if your view can be implemented as a permutation of pixels, you can probably get away with just saving a permutation array to disk and pasing it to the PermuteView
class. See permutations/make_inner_rotation_perm.py
and get_view()
in views.py
for an example of this.