odak.learn.perception
odak.learn.perception
¶
Defines a number of different perceptual loss functions, which can be used to optimise images where gaze location is known.
BlurLoss
¶
BlurLoss
implements two different blur losses. When blur_source
is set to False
, it implements blur_match, trying to match the input image to the blurred target image. This tries to match the source input image to a blurred version of the target.
When blur_source
is set to True
, it implements blur_lowpass, matching the blurred version of the input image to the blurred target image. This tries to match only the low frequencies of the source input image to the low frequencies of the target.
The interface is similar to other pytorch
loss functions, but note that the gaze location must be provided in addition to the source and target images.
Source code in odak/learn/perception/blur_loss.py
__call__(image, target, gaze=[0.5, 0.5])
¶
Calculates the Blur Loss.
Parameters:
-
image
–Image to compute loss for. Should be an RGB image in NCHW format (4 dimensions)
-
target
–Ground truth target image to compute loss for. Should be an RGB image in NCHW format (4 dimensions)
-
gaze
–Gaze location in the image, in normalized image coordinates (range [0, 1]) relative to the top left of the image.
Returns:
-
loss
(tensor
) –The computed loss.
Source code in odak/learn/perception/blur_loss.py
__init__(device=torch.device('cpu'), alpha=0.2, real_image_width=0.2, real_viewing_distance=0.7, mode='quadratic', blur_source=False, equi=False)
¶
Parameters:
-
alpha
–parameter controlling foveation - larger values mean bigger pooling regions.
-
real_image_width
–The real width of the image as displayed to the user. Units don't matter as long as they are the same as for real_viewing_distance.
-
real_viewing_distance
–The real distance of the observer's eyes to the image plane. Units don't matter as long as they are the same as for real_image_width.
-
mode
–Foveation mode, either "quadratic" or "linear". Controls how pooling regions grow as you move away from the fovea. We got best results with "quadratic".
-
blur_source
–If true, blurs the source image as well as the target before computing the loss.
-
equi
–If true, run the loss in equirectangular mode. The input is assumed to be an equirectangular format 360 image. The settings real_image_width and real_viewing distance are ignored. The gaze argument is instead interpreted as gaze angles, and should be in the range [-pi,pi]x[-pi/2,pi]
Source code in odak/learn/perception/blur_loss.py
display_color_hvs
¶
Source code in odak/learn/perception/color_conversion.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 |
|
__call__(input_image, ground_truth, gaze=None)
¶
Evaluating an input image against a target ground truth image for a given gaze of a viewer.
Source code in odak/learn/perception/color_conversion.py
__init__(resolution=[1920, 1080], distance_from_screen=800, pixel_pitch=0.311, read_spectrum='tensor', primaries_spectrum=torch.rand(3, 301), device=torch.device('cpu'))
¶
Parameters:
-
resolution
–Resolution of the display in pixels.
-
distance_from_screen
–Distance from the screen in mm.
-
pixel_pitch
–Pixel pitch of the display in mm.
-
read_spectrum
–Spectrum of the display. Default is 'default' which is the spectrum of the Dell U2415 display.
-
spectrum_data_root
–Path to the folder containing the spectrum data of the display.
-
device
–Device to run the code on. Default is None which means the code will run on CPU.
Source code in odak/learn/perception/color_conversion.py
cone_response_to_spectrum(cone_spectrum, light_spectrum)
¶
Internal function to calculate cone response at particular light spectrum.
Parameters:
-
cone_spectrum
–Spectrum, Wavelength [2,300] tensor
-
light_spectrum
–Spectrum, Wavelength [2,300] tensor
Returns:
-
response_to_spectrum
(float
) –Response of cone to light spectrum [1x1]
Source code in odak/learn/perception/color_conversion.py
construct_matrix_lms(l_response, m_response, s_response)
¶
Internal function to calculate cone response at particular light spectrum.
Parameters:
-
*_response
–Cone response spectrum tensor (normalised response vs wavelength)
Returns:
-
lms_image_tensor
(tensor
) –3x3 LMSrgb tensor
Source code in odak/learn/perception/color_conversion.py
construct_matrix_primaries(l_response, m_response, s_response)
¶
Internal function to calculate cone response at particular light spectrum.
Parameters:
-
*_response
–Cone response spectrum tensor (normalised response vs wavelength)
Returns:
-
lms_image_tensor
(tensor
) –3x3 LMSrgb tensor
Source code in odak/learn/perception/color_conversion.py
display_spectrum_response(wavelength, function)
¶
Internal function to provide light spectrum response at particular wavelength
Parameters:
-
wavelength
–Wavelength in nm [400...700]
-
function
–Display light spectrum distribution function
Returns:
-
ligth_response_dict
(float
) –Display light spectrum response value
Source code in odak/learn/perception/color_conversion.py
initialize_cones_normalised()
¶
Internal function to initialize normalised L,M,S cones as normal distribution with given sigma, and mu values.
Returns:
-
l_cone_n
(tensor
) –Normalised L cone distribution.
-
m_cone_n
(tensor
) –Normalised M cone distribution.
-
s_cone_n
(tensor
) –Normalised S cone distribution.
Source code in odak/learn/perception/color_conversion.py
initialize_random_spectrum_normalised(dataset)
¶
initialize normalised light spectrum via combination of 3 gaussian distribution curve fitting [L-BFGS].
Parameters:
-
dataset
–spectrum value against wavelength
-
peakspectrum
–
Returns:
-
light_spectrum
(tensor
) –Normalized light spectrum function
Source code in odak/learn/perception/color_conversion.py
initialize_rgb_backlight_spectrum()
¶
Internal function to initialize baclight spectrum for color primaries.
Returns:
-
red_spectrum
(tensor
) –Normalised backlight spectrum for red color primary.
-
green_spectrum
(tensor
) –Normalised backlight spectrum for green color primary.
-
blue_spectrum
(tensor
) –Normalised backlight spectrum for blue color primary.
Source code in odak/learn/perception/color_conversion.py
lms_to_primaries(lms_color_tensor)
¶
Internal function to convert LMS image to primaries space
Parameters:
-
lms_color_tensor
–LMS data to be transformed to primaries space [Bx3xHxW]
Returns:
-
primaries
(tensor
) –: Primaries data transformed from LMS space [BxPxHxW]
Source code in odak/learn/perception/color_conversion.py
primaries_to_lms(primaries)
¶
Internal function to convert primaries space to LMS space
Parameters:
-
primaries
–Primaries data to be transformed to LMS space [BxPHxW]
Returns:
-
lms_color
(tensor
) –LMS data transformed from Primaries space [BxPxHxW]
Source code in odak/learn/perception/color_conversion.py
second_to_third_stage(lms_image)
¶
This function turns second stage [L,M,S] values into third stage [(L+S)-M, M-(L+S), (M+S)-L] Equations are taken from Schmidt et al "Neurobiological hypothesis of color appearance and hue perception" 2014
Parameters:
-
lms_image
–Image data at LMS space (second stage)
Returns:
-
third_stage
(tensor
) –Image data at LMS space (third stage)
Source code in odak/learn/perception/color_conversion.py
to(device)
¶
Utilization function for setting the device.
Parameters:
-
device
–Device to be used (e.g., CPU, Cuda, OpenCL).
color_map(input_image, target_image, model='Lab Stats')
¶
Internal function to map the color of an image to another image. Reference: Color transfer between images, Reinhard et al., 2001.
Parameters:
-
image
–Input image in RGB color space [3 x m x n].
-
target_image
–
Returns:
-
mapped_image
(Tensor
) –Input image with the color the distribution of the target image [3 x m x n].
Source code in odak/learn/perception/color_conversion.py
hsv_to_rgb(image)
¶
Definition to convert HSV space to RGB color space. Mostly inspired from : https://kornia.readthedocs.io/en/latest/_modules/kornia/color/hsv.html
Parameters:
-
image
–Input image in HSV color space [k x 3 x m x n] or [3 x m x n]. Image(s) must be normalized between zero and one.
Returns:
-
image_rgb
(tensor
) –Output image in RGB color space [k x 3 x m x n] or [1 x 3 x m x n].
Source code in odak/learn/perception/color_conversion.py
lab_to_srgb(image)
¶
Definition to convert LAB space to SRGB color space.
Parameters:
-
image
–Input image in LAB color space[3 x m x n]
Returns:
-
image_srgb
(tensor
) –Output image in SRGB color space [3 x m x n].
Source code in odak/learn/perception/color_conversion.py
linear_rgb_to_rgb(image, threshold=0.0031308)
¶
Definition to convert linear RGB images to RGB color space. Mostly inspired from: https://kornia.readthedocs.io/en/latest/_modules/kornia/color/rgb.html
Parameters:
-
image
–Input image in linear RGB color space [k x 3 x m x n] or [3 x m x n]. Image(s) must be normalized between zero and one.
-
threshold
–Threshold used in calculations.
Returns:
-
image_linear
(tensor
) –Output image in RGB color space [k x 3 x m x n] or [1 x 3 x m x n].
Source code in odak/learn/perception/color_conversion.py
linear_rgb_to_xyz(image)
¶
Definition to convert RGB space to CIE XYZ color space. Mostly inspired from : Rochester IT Color Conversion Algorithms (https://www.cs.rit.edu/~ncs/color/)
Parameters:
-
image
–Input image in linear RGB color space [k x 3 x m x n] or [3 x m x n]. Image(s) must be normalized between zero and one.
Returns:
-
image_xyz
(tensor
) –Output image in XYZ (CIE 1931) color space [k x 3 x m x n] or [1 x 3 x m x n].
Source code in odak/learn/perception/color_conversion.py
rgb_2_ycrcb(image)
¶
Converts an image from RGB colourspace to YCrCb colourspace.
Parameters:
-
image
–Input image. Should be an RGB floating-point image with values in the range [0, 1]. Should be in NCHW format [3 x m x n] or [k x 3 x m x n].
Returns:
-
ycrcb
(tensor
) –Image converted to YCrCb colourspace [k x 3 m x n] or [1 x 3 x m x n].
Source code in odak/learn/perception/color_conversion.py
rgb_to_hsv(image, eps=1e-08)
¶
Definition to convert RGB space to HSV color space. Mostly inspired from : https://kornia.readthedocs.io/en/latest/_modules/kornia/color/hsv.html
Parameters:
-
image
–Input image in HSV color space [k x 3 x m x n] or [3 x m x n]. Image(s) must be normalized between zero and one.
Returns:
-
image_hsv
(tensor
) –Output image in RGB color space [k x 3 x m x n] or [1 x 3 x m x n].
Source code in odak/learn/perception/color_conversion.py
rgb_to_linear_rgb(image, threshold=0.0031308)
¶
Definition to convert RGB images to linear RGB color space. Mostly inspired from: https://kornia.readthedocs.io/en/latest/_modules/kornia/color/rgb.html
Parameters:
-
image
–Input image in RGB color space [k x 3 x m x n] or [3 x m x n]. Image(s) must be normalized between zero and one.
-
threshold
–Threshold used in calculations.
Returns:
-
image_linear
(tensor
) –Output image in linear RGB color space [k x 3 x m x n] or [1 x 3 x m x n].
Source code in odak/learn/perception/color_conversion.py
srgb_to_lab(image)
¶
Definition to convert SRGB space to LAB color space.
Parameters:
-
image
–Input image in SRGB color space[3 x m x n]
Returns:
-
image_lab
(tensor
) –Output image in LAB color space [3 x m x n].
Source code in odak/learn/perception/color_conversion.py
xyz_to_linear_rgb(image)
¶
Definition to convert CIE XYZ space to linear RGB color space. Mostly inspired from : Rochester IT Color Conversion Algorithms (https://www.cs.rit.edu/~ncs/color/)
Parameters:
-
image
–Input image in XYZ (CIE 1931) color space [k x 3 x m x n] or [3 x m x n]. Image(s) must be normalized between zero and one.
Returns:
-
image_linear_rgb
(tensor
) –Output image in linear RGB color space [k x 3 x m x n] or [1 x 3 x m x n].
Source code in odak/learn/perception/color_conversion.py
ycrcb_2_rgb(image)
¶
Converts an image from YCrCb colourspace to RGB colourspace.
Parameters:
-
image
–Input image. Should be a YCrCb floating-point image with values in the range [0, 1]. Should be in NCHW format [3 x m x n] or [k x 3 x m x n].
Returns:
-
rgb
(tensor
) –Image converted to RGB colourspace [k x 3 m x n] or [1 x 3 x m x n].
Source code in odak/learn/perception/color_conversion.py
make_3d_location_map(image_pixel_size, real_image_width=0.3, real_viewing_distance=0.6)
¶
Makes a map of the real 3D location that each pixel in an image corresponds to, when displayed to a user on a flat screen. Assumes the viewpoint is located at the centre of the image, and the screen is perpendicular to the viewing direction.
Parameters:
-
image_pixel_size
–The size of the image in pixels, as a tuple of form (height, width)
-
real_image_width
–The real width of the image as displayed. Units not important, as long as they are the same as those used for real_viewing_distance
-
real_viewing_distance
–The real distance from the user's viewpoint to the screen.
Returns:
-
map
(tensor
) –The computed 3D location map, of size 3xWxH.
Source code in odak/learn/perception/foveation.py
make_eccentricity_distance_maps(gaze_location, image_pixel_size, real_image_width=0.3, real_viewing_distance=0.6)
¶
Makes a map of the eccentricity of each pixel in an image for a given fixation point, when displayed to a user on a flat screen. Assumes the viewpoint is located at the centre of the image, and the screen is perpendicular to the viewing direction. Output in radians.
Parameters:
-
gaze_location
–User's gaze (fixation point) in the image. Should be given as a tuple with normalized image coordinates (ranging from 0 to 1)
-
image_pixel_size
–The size of the image in pixels, as a tuple of form (height, width)
-
real_image_width
–The real width of the image as displayed. Units not important, as long as they are the same as those used for real_viewing_distance
-
real_viewing_distance
–The real distance from the user's viewpoint to the screen.
Returns:
-
eccentricity_map
(tensor
) –The computed eccentricity map, of size WxH.
-
distance_map
(tensor
) –The computed distance map, of size WxH.
Source code in odak/learn/perception/foveation.py
make_equi_pooling_size_map_lod(gaze_angles, image_pixel_size, alpha=0.3, mode='quadratic')
¶
This function is similar to make_equi_pooling_size_map_pixels, but instead returns a map of LOD levels to sample from to achieve the correct pooling region areas.
Parameters:
-
gaze_angles
–Gaze direction expressed as angles, in radians.
-
image_pixel_size
–Dimensions of the image in pixels, as a tuple of (height, width)
-
alpha
–Parameter controlling extent of foveation
-
mode
–Foveation mode (how pooling size varies with eccentricity). Should be "quadratic" or "linear"
Returns:
-
pooling_size_map
(tensor
) –The computed pooling size map, of size HxW.
Source code in odak/learn/perception/foveation.py
make_equi_pooling_size_map_pixels(gaze_angles, image_pixel_size, alpha=0.3, mode='quadratic')
¶
This function makes a map of pooling sizes in pixels, similarly to make_pooling_size_map_pixels, but works on 360 equirectangular images. Input images are assumed to be in equirectangular form - i.e. if you consider a 3D viewing setup where y is the vertical axis, the x location in the image corresponds to rotation around the y axis (yaw), ranging from -pi to pi. The y location in the image corresponds to pitch, ranging from -pi/2 to pi/2.
In this setup real_image_width and real_viewing_distance have no effect.
Note that rather than a 2D image gaze location in [0,1]^2, the gaze should be specified as gaze angles in [-pi,pi]x[-pi/2,pi/2] (yaw, then pitch).
Parameters:
-
gaze_angles
–Gaze direction expressed as angles, in radians.
-
image_pixel_size
–Dimensions of the image in pixels, as a tuple of (height, width)
-
alpha
–Parameter controlling extent of foveation
-
mode
–Foveation mode (how pooling size varies with eccentricity). Should be "quadratic" or "linear"
Source code in odak/learn/perception/foveation.py
make_pooling_size_map_lod(gaze_location, image_pixel_size, alpha=0.3, real_image_width=0.3, real_viewing_distance=0.6, mode='quadratic')
¶
This function is similar to make_pooling_size_map_pixels, but instead returns a map of LOD levels to sample from to achieve the correct pooling region areas.
Parameters:
-
gaze_location
–User's gaze (fixation point) in the image. Should be given as a tuple with normalized image coordinates (ranging from 0 to 1)
-
image_pixel_size
–The size of the image in pixels, as a tuple of form (height, width)
-
real_image_width
–The real width of the image as displayed. Units not important, as long as they are the same as those used for real_viewing_distance
-
real_viewing_distance
–The real distance from the user's viewpoint to the screen.
Returns:
-
pooling_size_map
(tensor
) –The computed pooling size map, of size WxH.
Source code in odak/learn/perception/foveation.py
make_pooling_size_map_pixels(gaze_location, image_pixel_size, alpha=0.3, real_image_width=0.3, real_viewing_distance=0.6, mode='quadratic')
¶
Makes a map of the pooling size associated with each pixel in an image for a given fixation point, when displayed to a user on a flat screen. Follows the idea that pooling size (in radians) should be directly proportional to eccentricity (also in radians).
Assumes the viewpoint is located at the centre of the image, and the screen is perpendicular to the viewing direction. Output is the width of the pooling region in pixels.
Parameters:
-
gaze_location
–User's gaze (fixation point) in the image. Should be given as a tuple with normalized image coordinates (ranging from 0 to 1)
-
image_pixel_size
–The size of the image in pixels, as a tuple of form (height, width)
-
real_image_width
–The real width of the image as displayed. Units not important, as long as they are the same as those used for real_viewing_distance
-
real_viewing_distance
–The real distance from the user's viewpoint to the screen.
Returns:
-
pooling_size_map
(tensor
) –The computed pooling size map, of size WxH.
Source code in odak/learn/perception/foveation.py
make_radial_map(size, gaze)
¶
Makes a simple radial map where each pixel contains distance in pixels from the chosen gaze location.
Parameters:
-
size
–Dimensions of the image
-
gaze
–User's gaze (fixation point) in the image. Should be given as a tuple with normalized image coordinates (ranging from 0 to 1)
Source code in odak/learn/perception/foveation.py
MetamericLoss
¶
The MetamericLoss
class provides a perceptual loss function.
Rather than exactly match the source image to the target, it tries to ensure the source is a metamer to the target image.
Its interface is similar to other pytorch
loss functions, but note that the gaze location must be provided in addition to the source and target images.
Source code in odak/learn/perception/metameric_loss.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 |
|
__call__(image, target, gaze=[0.5, 0.5], image_colorspace='RGB', visualise_loss=False)
¶
Calculates the Metameric Loss.
Parameters:
-
image
–Image to compute loss for. Should be an RGB image in NCHW format (4 dimensions)
-
target
–Ground truth target image to compute loss for. Should be an RGB image in NCHW format (4 dimensions)
-
image_colorspace
–The current colorspace of your image and target. Ignored if input does not have 3 channels. accepted values: RGB, YCrCb.
-
gaze
–Gaze location in the image, in normalized image coordinates (range [0, 1]) relative to the top left of the image.
-
visualise_loss
–Shows a heatmap indicating which parts of the image contributed most to the loss.
Returns:
-
loss
(tensor
) –The computed loss.
Source code in odak/learn/perception/metameric_loss.py
__init__(device=torch.device('cpu'), alpha=0.2, real_image_width=0.2, real_viewing_distance=0.7, n_pyramid_levels=5, mode='quadratic', n_orientations=2, use_l2_foveal_loss=True, fovea_weight=20.0, use_radial_weight=False, use_fullres_l0=False, equi=False)
¶
Parameters:
-
alpha
–parameter controlling foveation - larger values mean bigger pooling regions.
-
real_image_width
–The real width of the image as displayed to the user. Units don't matter as long as they are the same as for real_viewing_distance.
-
real_viewing_distance
–The real distance of the observer's eyes to the image plane. Units don't matter as long as they are the same as for real_image_width.
-
n_pyramid_levels
–Number of levels of the steerable pyramid. Note that the image is padded so that both height and width are multiples of 2^(n_pyramid_levels), so setting this value too high will slow down the calculation a lot.
-
mode
–Foveation mode, either "quadratic" or "linear". Controls how pooling regions grow as you move away from the fovea. We got best results with "quadratic".
-
n_orientations
–Number of orientations in the steerable pyramid. Can be 1, 2, 4 or 6. Increasing this will increase runtime.
-
use_l2_foveal_loss
–If true, for all the pixels that have pooling size 1 pixel in the largest scale will use direct L2 against target rather than pooling over pyramid levels. In practice this gives better results when the loss is used for holography.
-
fovea_weight
–A weight to apply to the foveal region if use_l2_foveal_loss is set to True.
-
use_radial_weight
–If True, will apply a radial weighting when calculating the difference between the source and target stats maps. This weights stats closer to the fovea more than those further away.
-
use_fullres_l0
–If true, stats for the lowpass residual are replaced with blurred versions of the full-resolution source and target images.
-
equi
–If true, run the loss in equirectangular mode. The input is assumed to be an equirectangular format 360 image. The settings real_image_width and real_viewing distance are ignored. The gaze argument is instead interpreted as gaze angles, and should be in the range [-pi,pi]x[-pi/2,pi]
Source code in odak/learn/perception/metameric_loss.py
MetamericLossUniform
¶
Measures metameric loss between a given image and a metamer of the given target image. This variant of the metameric loss is not foveated - it applies uniform pooling sizes to the whole input image.
Source code in odak/learn/perception/metameric_loss_uniform.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 |
|
__call__(image, target, image_colorspace='RGB', visualise_loss=False)
¶
Calculates the Metameric Loss.
Parameters:
-
image
–Image to compute loss for. Should be an RGB image in NCHW format (4 dimensions)
-
target
–Ground truth target image to compute loss for. Should be an RGB image in NCHW format (4 dimensions)
-
image_colorspace
–The current colorspace of your image and target. Ignored if input does not have 3 channels. accepted values: RGB, YCrCb.
-
visualise_loss
–Shows a heatmap indicating which parts of the image contributed most to the loss.
Returns:
-
loss
(tensor
) –The computed loss.
Source code in odak/learn/perception/metameric_loss_uniform.py
__init__(device=torch.device('cpu'), pooling_size=32, n_pyramid_levels=5, n_orientations=2)
¶
Parameters:
-
pooling_size
–Pooling size, in pixels. For example 32 will pool over 32x32 blocks of the image.
-
n_pyramid_levels
–Number of levels of the steerable pyramid. Note that the image is padded so that both height and width are multiples of 2^(n_pyramid_levels), so setting this value too high will slow down the calculation a lot.
-
n_orientations
–Number of orientations in the steerable pyramid. Can be 1, 2, 4 or 6. Increasing this will increase runtime.
Source code in odak/learn/perception/metameric_loss_uniform.py
gen_metamer(image)
¶
Generates a metamer for an image, following the method in this paper This function can be used on its own to generate a metamer for a desired image.
Parameters:
-
image
–Image to compute metamer for. Should be an RGB image in NCHW format (4 dimensions)
Returns:
-
metamer
(tensor
) –The generated metamer image
Source code in odak/learn/perception/metameric_loss_uniform.py
MetamerMSELoss
¶
The MetamerMSELoss
class provides a perceptual loss function. This generates a metamer for the target image, and then optimises the source image to be the same as this target image metamer.
Please note this is different to MetamericLoss
which optimises the source image to be any metamer of the target image.
Its interface is similar to other pytorch
loss functions, but note that the gaze location must be provided in addition to the source and target images.
Source code in odak/learn/perception/metamer_mse_loss.py
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 |
|
__call__(image, target, gaze=[0.5, 0.5])
¶
Calculates the Metamer MSE Loss.
Parameters:
-
image
–Image to compute loss for. Should be an RGB image in NCHW format (4 dimensions)
-
target
–Ground truth target image to compute loss for. Should be an RGB image in NCHW format (4 dimensions)
-
gaze
–Gaze location in the image, in normalized image coordinates (range [0, 1]) relative to the top left of the image.
Returns:
-
loss
(tensor
) –The computed loss.
Source code in odak/learn/perception/metamer_mse_loss.py
__init__(device=torch.device('cpu'), alpha=0.2, real_image_width=0.2, real_viewing_distance=0.7, mode='quadratic', n_pyramid_levels=5, n_orientations=2, equi=False)
¶
Parameters:
-
alpha
–parameter controlling foveation - larger values mean bigger pooling regions.
-
real_image_width
–The real width of the image as displayed to the user. Units don't matter as long as they are the same as for real_viewing_distance.
-
real_viewing_distance
–The real distance of the observer's eyes to the image plane. Units don't matter as long as they are the same as for real_image_width.
-
n_pyramid_levels
–Number of levels of the steerable pyramid. Note that the image is padded so that both height and width are multiples of 2^(n_pyramid_levels), so setting this value too high will slow down the calculation a lot.
-
mode
–Foveation mode, either "quadratic" or "linear". Controls how pooling regions grow as you move away from the fovea. We got best results with "quadratic".
-
n_orientations
–Number of orientations in the steerable pyramid. Can be 1, 2, 4 or 6. Increasing this will increase runtime.
-
equi
–If true, run the loss in equirectangular mode. The input is assumed to be an equirectangular format 360 image. The settings real_image_width and real_viewing distance are ignored. The gaze argument is instead interpreted as gaze angles, and should be in the range [-pi,pi]x[-pi/2,pi]
Source code in odak/learn/perception/metamer_mse_loss.py
gen_metamer(image, gaze)
¶
Generates a metamer for an image, following the method in this paper This function can be used on its own to generate a metamer for a desired image.
Parameters:
-
image
–Image to compute metamer for. Should be an RGB image in NCHW format (4 dimensions)
-
gaze
–Gaze location in the image, in normalized image coordinates (range [0, 1]) relative to the top left of the image.
Returns:
-
metamer
(tensor
) –The generated metamer image
Source code in odak/learn/perception/metamer_mse_loss.py
RadiallyVaryingBlur
¶
The RadiallyVaryingBlur
class provides a way to apply a radially varying blur to an image. Given a gaze location and information about the image and foveation, it applies a blur that will achieve the proper pooling size. The pooling size is chosen to appear the same at a range of display sizes and viewing distances, for a given alpha
parameter value. For more information on how the pooling sizes are computed, please see link coming soon.
The blur is accelerated by generating and sampling from MIP maps of the input image.
This class caches the foveation information. This means that if it is run repeatedly with the same foveation parameters, gaze location and image size (e.g. in an optimisation loop) it won't recalculate the pooling maps.
If you are repeatedly applying blur to images of different sizes (e.g. a pyramid) for best performance use one instance of this class per image size.
Source code in odak/learn/perception/radially_varying_blur.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 |
|
blur(image, alpha=0.2, real_image_width=0.2, real_viewing_distance=0.7, centre=None, mode='quadratic', equi=False)
¶
Apply the radially varying blur to an image.
Parameters:
-
image
–The image to blur, in NCHW format.
-
alpha
–parameter controlling foveation - larger values mean bigger pooling regions.
-
real_image_width
–The real width of the image as displayed to the user. Units don't matter as long as they are the same as for real_viewing_distance. Ignored in equirectangular mode (equi==True)
-
real_viewing_distance
–The real distance of the observer's eyes to the image plane. Units don't matter as long as they are the same as for real_image_width. Ignored in equirectangular mode (equi==True)
-
centre
–The centre of the radially varying blur (the gaze location). Should be a tuple of floats containing normalised image coordinates in range [0,1] In equirectangular mode this should be yaw & pitch angles in [-pi,pi]x[-pi/2,pi/2]
-
mode
–Foveation mode, either "quadratic" or "linear". Controls how pooling regions grow as you move away from the fovea. We got best results with "quadratic".
-
equi
–If true, run the blur function in equirectangular mode. The input is assumed to be an equirectangular format 360 image. The settings real_image_width and real_viewing distance are ignored. The centre argument is instead interpreted as gaze angles, and should be in the range [-pi,pi]x[-pi/2,pi]
Returns:
-
output
(tensor
) –The blurred image
Source code in odak/learn/perception/radially_varying_blur.py
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 |
|
SpatialSteerablePyramid
¶
This implements a real-valued steerable pyramid where the filtering is carried out spatially (using convolution) as opposed to multiplication in the Fourier domain. This has a number of optimisations over previous implementations that increase efficiency, but introduce some reconstruction error.