[Up to date with V-1.0.1+ stable]
This page lists all the functions available in the CIANNA Python interface.
As version updates go, we will provide direct links to versions of this page that correspond to the different version releases.
All these functions are declared in the src/python_module.c file, which should always be the ultimate reference for each sub-version of CIANNA. Default values specified in the API documentation are either set in the Python interface or in the corresponding low-level C function.
CIANNA.init(in_dim, in_nb_ch, out_dim, bias=0.1, b_size=8, comp_meth="C_CUDA", network=nb_networks-1,
dynamic_load=1, mixed_precision="off", inference_only=0, no_logo=0, adv_size=30)
Create and initialize a network object with all the parameters that must be set before using any other function. The various dimensional parameters (input, in_nb_ch, out_dim, b_size) must be set in advance so CIANNA can construct properly sized arrays when calling dataset or layer construction functions.
Note: Most of the time, the number of neurons/filters in the last layer must be set to the same value, but there are cases where the target size is different from the output size, for example, when using a YOLO output.
Note: This value will be preferred over any bias value set for the first layer.
Note: The batch size is not an architectural parameter and can be changed at will for a given network. For example, a network trained with a batch size of 64 can be used in inference with a batch size of 8 to reduce the network memory footprint.
Note: For workflows where multiple networks must be used, the network id value is used to identify on which one the function is applied.
Note: Networks declared within the same script must be initialized using the same compute method and mixed precision type.
Note: In practice, dynamic loading decorrelates the dataset size from the available GPU memory. It is also more suited for dynamic augmentation with frequent changes in the training dataset. The latency added by the frequent movement of batches from the host to the GPU memory is usually negligible. In edge cases of fixed small datasets and very fast (latency-dominated) networks, setting dynamic load to off might improve the overall performance.
CIANNA.free_network(network=0)
[V-1.0.1+] Free all network components, including all individual layers.
Note: Individual layer destruction is not supported at the API level.
CIANNA.create_dataset(dataset, size, input, target, network=nb_networks-1, silent=0)
Create and initialize a dataset object, create the required low-level C arrays, and fill them using the provided Numpy arrays. The function handles data movement from host to GPU if needed based on parameters specified in the init function.
Note: The memory usage during the execution of this function is up to twice the size of the dataset. The Numpy arrays can be deleted after the function as it has been copied into low-level C arrays. If the compute method is "C_CUDA" and a mixed precision is set, the low-level C arrays will be of this type both on the host and on the GPU, reducing the memory requirements of the datasets.
Note: Data should always be 2D Numpy arrays regardless of the real input dimension. The input data layout for dense outputs is of the following format: [N_examples, output_neurons], with the fastest index on the right. For convolutional inputs, the format is: [N_examples, N_channels, Depth, Height, Width], with the last 4 dimensions flattened to form a 2D array of shape [N_examples, N_channels*Depth*Height*Width]. This means that for a typical RGB image that follows the [Height, Width, N_channels], each channel must be flattened using row-major convention and append to the input one after the other to obtain the expected data layout.
CIANNA.delete_dataset(dataset, network=nb_networks-1, silent=0)
Free all the low-level C arrays of the dataset and reset the dataset object.
CIANNA.swap_data_buffers(dataset, network=nb_networks-1)
Swap the dataset properties and low-level array pointers between a dataset and its "buffer" version.
Note: The "DATA_buf" versions of the different datasets are used for parallelizing data loading or augmentation process and network training on a previously loaded dataset. Data are not moved; only the pointed location to "DATA" and "DATA_buf" are exchanged.
| Steps | Thread A | Thread B | |---------|----------------|----------------| | 0 | Load DATA | / | | 1 | Train on Data | Load DATA_buf | | 2 | Swap DATA/Buf | Join | | 3 | Loop from 1 | / |
CIANNA.dense(nb_neurons, activation="RELU", bias=None, prev_layer=nb_layers-1, drop_rate=0.0,
strict_size=0, init_fct="xavier", init_scaling=None, network=nb_networks-1)
Create and initialize a layer object of type "dense". Most of the layer properties are set automatically regarding the position in the network and the surrounding layers, but some parameters can be customized at the function call.
Note: The layers are sequentially added to the network. If it is the first layer, it is assumed to be connected to the input, and if it is declared after another layer, it will take the previous layer as its input. This process also takes into account the type of successive layers. For example, setting a "dense" layer after a "conv" or a "pool" layer will automatically flatten the previous output to be used as input for the present "dense" layer. The last layer to be declared (or the highest layer id) is assumed to be the output layer of the network.
Note: Most activation types allow additional arguments in the string. For example, our ReLU implementation accepts leaking and saturation parameters. Using "RELU_S800.0_L0.05" (default) as activation corresponds to a saturation limit of 800.0 and a leaking factor of 0.05 below 0 and above the saturation value of the RELU. Dedicated functions that take keyword arguments and return the corresponding formatted string are described in this section.
Note: When using dropout weights averaging for inference, the layer checks the dropout value of the "previous" layer and scales its weights accordingly.
Note: By default, if the following conditions are fulfilled:
- Strict size is set to 0 (default)
- The compute method is set to "C_CUDA"
- A mixed precision type that supports Tensor Core acceleration is used
- The specified number of neurons is a multiple of 8
Then, when constructing the network architecture, the number of neurons will be automatically reduced by 1, so nb_neurons+1 is a multiple of 8 that accommodates the bias. This automatic change ensures that all the operations have the right shape to use TC acceleration.
/!\ The strict size flag is mostly useful at the last layer, for which the number of neurons is an architectural constraint and should not be modified.
CIANNA.conv(f_size=[...], nb_filters, stride=[...], padding=[...], int_padding=[...], activation="RELU",
bias=None, prev_layer=nb_layers-1, input_shape=[...], drop_rate=0.0, init_fct="xavier",
init_scaling=None, network=nb_networks-1)
Create and initialize a layer object of type "conv". Most of the layer properties are set automatically regarding the position in the network and the surrounding layers, but some parameters can be customized at the function call.
Note: There is presently no automatic option to preserve the spatial size between the input and the output of the layer. A carefully chosen padding regarding the filter size and the stride must be used to preserve dimensionality. The network behavior is not guaranteed if the input cannot be fully decomposed in an integer number of convolution regions. This will raise a warning but will not stop the execution.
Note: Missing dimensions can be automatically handled, allowing for less verbose code. For example, if the input dimension is 2D and only one dimension is specified in f_size, then the filter is assumed to be square. Values for irrelevant dimensions are ignored.
Note: Internal padding is usually set to increase the spatial dimensionality in output, for example when constructing auto-encoders. For now, it is the only way to achieve this in a translation equivariant way with CIANNA because no "un-pooling" layer is provided.
Note: The "SMAX" activation is only supported if the spatial size has been reduced to 1, then the normalization is made on the channels.
Note: This parameter is useful for reconstructing a spatially constrained space from a flat dense output, which can be useful for GAN, auto-encoders, etc.
CIANNA.pool(p_size=[...], stride=[...], padding=[...], prev_layer=nb_layers-1, drop_rate=0.0,
p_type="MAX", activation="LIN", p_global=0, network=nb_networks-1)
Create and initialize a layer object of type "pool". Most of the layer properties are set automatically regarding the position in the network and the surrounding layers, but some parameters can be customized at the function call.
Note: The network behavior is not guaranteed if the input cannot be fully decomposed in an integer number of convolution regions. This will raise a warning but will not stop the script.
Note: Missing dimensions can be automatically handled, like for CIANNA.conv(). In addition, a default pooling size of 2x2x1 is defined, following various common architectures.
Note: Pooling layers can have a drop rate for their output, but they have no weight to scale if the previous layer drop rate is not zero. So, adding a pooling layer after a layer of any type with dropout is prohibited.
Note: In most cases, a linear activation is recommended for pooling layers. Still, it is possible to use other activation functions to build networks ending with a global pooling or to allow some specific operation ordering. The "SMAX" activation is only supported if the spatial size has been reduced to 1, then the normalization is made on the channels.
CIANNA.norm(normalization, activation="LIN", prev_layer=nb_layers-1, group_size=8, set_off=0, network=nb_networks-1)
Create and initialize a layer object of type "norm". Most of the layer properties are set automatically regarding the position in the network and the surrounding layers, but some parameters can be customized at the function call.
Note: Normalization networks are only supported between "conv" or "pool" layers. It is also recommended to only have dropout in the network after the last normalization layer.
Note: Only Group-Normalization is supported at the moment. Still, it allows to perform Layer-Normalization by setting the group size to the number of channels and Instance-Normalization by setting the group size to 1.
Note: It is recommended to have a non-linear activation function on the "conv" layer preceding the normalization layer and let the normalization layer be linear. Still, a non-linear activation function can be set for the normalization layer to allow specific architectures reproduction, but then the activation function of the previous layer should be set to linear.
Note: The save format is stable from V-0.9.3.4 to V-1.0.1+. This might change in the experimental version (V-1.1+) in order to accomodate new functionalities of to solve some performance issues. Any change to the save format will be notified in the patch note for each version. Some format conversion tools might be provided at some point for major version changes to avoid retraining networks that would benefit from performance updates with the latest versions.
CIANNA.save(file, network=nb_networks-1, bin=0)
Save the network model in its current state in a file using customized ASCII or binary formats.
Note: The save file contains all the architectural information required to reload the network without any configuration file.
Note: The binary format results in much smaller files, which are faster to load, but is very sensitive to any save format change. Saving to the ASCII format is slower but provide direct access to the weight values to be used with other frameworks and would allow manual conversion in case of format update. The ASCII format can also be useful to perform on-the-fly light changes (e.g. activation limits, drop_rate, etc.) to a saved model without retraining it from scratch. For a given version of CIANNA, it should always be possible to convert from ASCII-to-BIN or BIN-to-ASCII by chaining a CIANNA.load() and a CIANNA.save() call with different format flags.
CIANNA.load(file, iteration, network=nb_networks-1, nb_layers=0, nb_skip_layers=0, bin=0)
Load a network model from a file using a custom ASCII or binary format.
Note: The save file contains all the architectural information required to load the network without any configuration file.
Note: When loading a pre-trained network for transfer-learning on a new task or fine-tuning on a new dataset, this value can be reset to 0.
Note: This parameter allows loading only the first part of a network model. For example, the first layers from a network trained on a given task (e.g., classification) can be used in a new task (e.g., detection). New layers can be added after calling the CIANNA.load() function, which means they are added sequentially at the end of the network structure using random weight initialization. We plan on adding more loading functionalities, for example, loading a given subset of layers, which should allow more flexibility (remove layers, aggregate complex networks, extract a subpart of network models, etc.). These operations can already be done manually by editing ASCII-formatted save files.
Note: The shape of the first loaded layer is compared to either the network input or the last existing layer in the architecture before appending the rest of the save file. Combined with the nb_layers argument, only subparts of saved models can be loaded. Multiple save files can be loaded successively to aggregate model subparts.
CIANNA.print_arch_tex(path, file_name, size=1, in_size=1, f_size=1, out_size=1, stride=1, padding=1,
in_padding=0, activation=0, bias=0, dropout=0, param_count=0, network=nb_networks-1)
Write the current network architecture in a latex formatted table (output file_name.tex) and compile it onto a .pdf. This function allows easy integration into papers/reports using a standard table format to help comparisons. The columns to use can be selected through keyword flags.
CIANNA.train(nb_iter, learning_rate, end_learning_rate=0.0, control_interv=1, momentum=0.0, lr_decay=0.0,
weight_decay=0.0, confmat=0, save_every=0, save_bin=0, network=nb_networks-1, shuffle_gpu=0,
shuffle_every=1, TC_scale_factor=1.0, silent=0)
High-level interface to a custom training procedure. This function will handle all the aspects of the training on the current "TRAIN" dataset, including data movement, gradient optimization, network forward and back propagation loops, saving the model, etc.
Note: This function loops over all the self-constructed batches of the "TRAIN" dataset and monitors the error on the "VALID" dataset. The "TEST" dataset is never used in this function.
While a number of iterations can be specified, advanced training procedures (e.g., for dynamic augmentation, dynamic data loading, multiple networks co-training, etc.) can usually be constructed by invoking this function for a single iteration in a loop with other operations on the datasets or other networks. All the modifications applied to the model are preserved between calls, and a global iteration value is updated so every element of the training procedure that depends on it evolves appropriately between calls.
This function was specifically designed to run in parallel in the main Python thread while other threads handle data movement or data augmentation. See the dynamic augmentation example.
Note: The scaling is applied to the output layer loss, the scaled gradient is propagated through the network, and the weight update correction is downscaled accordingly. This allows propagation in a larger numerical range to prevent numerical overflow while still performing updates of the proper scale. A too-low scaling will lose information, but a high scaling has no negative impact on information propagation unless an upper overflow occurs. This scaling is set in the training function to be refined/scheduled through multiple calls.
CIANNA.forward(saving=1, drop_mode="AVG_MODEL", no_error=0, repeat=1, network=nb_networks-1, silent=0)
High-level interface to a custom inference and output saving procedure. This function will handle all the aspects of the inference on the current "TEST" dataset, including data movement, network forward loop, output saving, etc.
Note: This function can be used to save the inference output on the validation dataset during training by initializing the "TEST" dataset object with the same content as "VALID". This function is often used to perform the end prediction of the underlying test dataset. When deploying a trained model, this function allows inference on unlabelled data. For this, they must be filled in the "TEST" dataset object and associated with a zero-filled target of the appropriate dimension. This is noticeably the only function that allows selecting the behavior of the dropout between model averaging and MC-dropout.
Note: About the data layout format, it is like the dataset input, always a 2D array. The output data layout for dense outputs is of the following format with the fastest index on the right: [N_examples, output_neurons]. For convolutional/pool outputs, the format is: [N_examples, N_channels, Depth, Height, Width], with the last 4 dimensions flattened to [N_examples, N_channels*Depth*Height*Width].
Note: When using "AVG_MODEL" the result for a given input is predictive at a given numerical resolution (excluding CUDA/cuBLAS variability between specific hardware). In contrast, when using "MC_MODEL", results vary and should be used to draw posterior probabilities with the repeat parameters. The multiple inferences for the same mini-batch are made in a row, considering the layer's outputs to be identical for all layers before the first dropping layer. This helps save computation time as most inference time is concentrated on the first layers devoided of dropout. Due to the previous point, the output data order for "MC_MODEL" depends on the batch size, with the prediction for a given batch being continuous and repeated N times before changing to the next batch. The output data layout is of the following format with the fastest index on the right: [nb_batch, repeat, batch_size, flat_output_size]. If the output is convolutional, the [flat_output_size] follows the data layout exposed at the beginning of the present function. Remember that in the ASCII output format, the result is a 2D array with all additional dimensions flattened following the previous data layout. Using a dataset with size being a multiple of the selected batch size is recommended to perform post-process statistics more efficiently.
Note: This flag is especially useful when deploying a trained network to perform real-time inference, which requires frequent calls to the present function.
CIANNA.perf_eval(network=nb_networks-1)
Display a table with the numerical performance of each layer in microseconds for one input. Separated evaluation for forward and backpropagation performances. This function should be called after a CIANNA.train() or CIANNA.forward() function so it can measure compute time.
Note: The displayed time averages up to 1000 performance evaluations per layer. The computing time is not updated after the 1000 measurements are reached.
CIANNA.linear(None)
Create a string always equal to "LIN".
Note: Equivalent to no-activation
CIANNA.relu(saturation=800.0, leaking=0.05)
Create a string of the following format "RELU_S[saturation]_L[leaking]"
Note: Correspond to a custom leaky-ReLU with saturation. In practice, the activation is linear between 0 and the saturation value and is linear with a slope equal to the leaking factor both above the saturation value and below 0.
CIANNA.logistic(saturation=6.0, beta=1.0)
Create a string of the following format "LOGI_S[saturation]_B[beta]"
Note: The logistic derivative approaches 0 toward the two edges. This behavior can result in no gradient propagation for very contrasted activations, even if the error is large. To circumvent this behavior, we added a customized saturation value for the weighted sum before it goes through the sigmoid activation. With the default values, saturation=6 and beta=1, the activation stays in the range of [0.00247,0.99753].
CIANNA.softmax(None)
Create a string always equal to "SMAX".
CIANNA.YOLO(None)
Create a string always equal to "YOLO".
CIANNA.set_yolo_params(nb_box=0, nb_class=0, nb_param=0, max_nb_obj_per_image=0, prior_size=[...],
prior_noobj_prob=[...], error_scales=[...], slopes_and_maxes=[...],
param_ind_scales=[...], IoU_limits=[...], fit_parts=[...],
IoU_type="GIoU", strict_box_size=0, prior_dist_type="SIZE", fit_dim=0,
rand_startup=64000, rand_prob_best_box_assoc=0.0, rand_prob = 0.0,
min_prior_forced_scaling=0.0, class_softmax=0, diff_flag=0, error_type="natural",
no_override=0, raw_output=0, network=nb_networks-1)
Configure all the non-architectural parameters for a YOLO network. This function must be invoked before starting to declare/load the network architecture. The function returns the number of required filters for the YOLO output with the provided configuration.
Note: This function is mandatory for any YOLO network to operate properly, but it can be called with no argument when deploying a saved YOLO network for inference only. Many parameters have automated default values that are context-dependent (e.g., the default IoU_limits depends on the IoU_type). The output log should be looked at carefully for any doubt on which values were used by CIANNA.
Note: When using a YOLO network, the output and target sizes differ. The output is (N+1)-dimensional (with N being the number of input dimensions), with the last dimension representing all the box parameters per output grid element. In contrast, the target dimension that must be set in CIANNA.init() depends on the parameters with the following formula: [1+max_nb_obj_per_image*(7+nb_param+diff)]. In practice, for all YOLO datasets, the provided target should be a 2D array with the number of lines equal to the number of images in the dataset, and each row representing all the boxes contained in the corresponding image in the form of [Nb_box␣box_1_elements␣box_2_elements␣[box_..._elements]␣[0_fill]], with 0 filling to preserve a unique size for all images box list. For each box, the elements are in the form of [box_class_int␣xmin␣ymin␣zmin␣xmax␣ymax␣zmax␣diff_flag]. The box_class element is an id starting from 1 to nb_class+1, always present even if nb_class=0. The 3D box coordinates are always mandatory regardless of the actual dimensionality of the problem (see priors setting for details), and then the diff_flag for each target is only here if the diff_flag keyword is set to 1.
Note A: This parameter should be considered mandatory. The default value of 0 is set so this function can be invoked for inference in a deployment mode. In such a case, the parameter value will be loaded from the save file.
See note A
See note A
See note A. This parameter does not impact the number of boxes predicted in a given image.
Note: If the dimensionality of this array is smaller than the input dimensionality, the priors are considered equal in all the dimensions. This would automatically result in square/cubic priors. Internally, any prior not set for a given dimension is set to 1 pixel (up to 3D for now). When training a 2D YOLO network, the target should account for the third dimension with z_min = 0 and z_max = 1 for a target size of 1 pixel in the third dimension. If not set, some computations on the target size (IoU, length, surface, volume) will produce a 0 output, preventing the network from learning anything!
See note A
Note: This scaling should be inversely proportional to the density of objects of a given size regarding their distribution over the various size priors. This parameter balances the many updates toward an objectness target of 0 with the few informative updates toward a positive objectness target.
Note: The proper scaling values are problem and architecture-dependent.
Note: These scalings help emphasize some specific regression parameters. Default 1.0 for each parameter.
Note: This parameter also impacts the objectness score as it is used as the target value for the best target/prediction match during training. Changing the training IoU association requires a change in the objectness thresholds used in any post-process selection.
Note: This process will compute the distance between each target and the box priors in the box-size multi-dimensional space. If the strict_box_size association is set, only the N boxes with the closest prior can be associated with the target in a given sub-grid element. This condition is only circumvented during the initial random startup association.
Note B: This helps prevent cases where all the objects are only detected by a small subset of the available priors because they were never close enough to be considered. This is especially useful for small priors and crowded contexts.
Note: This parameter affects box comparison for strict_box_size and rand_prob_best_box_assoc.
Note: This parameter can be useful in cases where the objects always span over a full axis of the input but for which some class or regression parameter information lies in this dimension. Other situations might require a manual setup of fit dim (pretraining, etc.)
Note: This parameter helps to get all the boxes in a regime where they can detect something.
See note B
See note B
See note B
Note: For example, in 2D with the smallest prior set 12x12 and a min_scaling_forced_scaling set to 2.0, all targets with a surface smaller than 12x12x2 = 288 (approximately a 17x17 target square box) are forced to be associated with the smallest prior, regardless of their distance to any other prior in the list. This can be set independently or in addition to strict_box_size_association.
See note B
Note: Using softmax classification strongly impacts the classification loss scale, so it should be manually adjusted in error_scales.
Note: A "difficult" object is only considered for positive reinforcement. If a prediction is confident enough and overlaps sufficiently with the "difficult" target, then the prediction is updated according to the target. If the previous condition is not fulfilled, the "difficult" target acts only on "good but not best" matches. In IoU_limits, the user defines the two thresholds, one in objectness and the second one in IoU.
Note: Flagging targets as "difficult" can have different meanings. In most cases, it is based on the fact that part of an object is visible, but assessing its class might be difficult. For cases where dynamic augmentation is used on input images, it might be interesting to flag as "difficult" the objects for which the surface is under a given threshold and/or close to the edges. For example, the target boxes can be refined after augmentation regarding the fraction of their original surface that remains in the image: less than 25%, the target is removed; between 25% and 50%, the target is kept but flagged as "difficult"; and above 50%, the target is kept. In practice, the "difficult" flagging is up to the user when constructing the training sample.
Note: Different "difficult" flags can be used for each object to further refine this process and account for different causes that make the object difficult to detect. If set to 1, a detected "difficult" object updates all the box parameters. If set to 2, a detected "difficult" object updates all but the class and extra parameters. If set to 3, a detected object updates only the probability and objectness score. All "difficult" objects are accounted for the good but not best association.
Note: All the available association refinement processes, cascading loss thresholds, and the "difficult" flagging all help YOLO networks to converge more efficiently and in a more balanced and stable way. However, these processes have the side effect of breaking the relative "monotonic" loss decrease that is expected. In practice, the association refinements mean that it is not always the current best prediction that gets associated with a given target, and the cascading loss and "difficult" flagging means that more and more elements get added to the loss as the network gets better to represent all the aspects of the detection/classification/characterization. Overall, it results in apparent periodic increases/oscillation in the apparent loss (hence cascading loss) that might be confused with problematic/over-training by an uninformed user. To circumvent this, the "natural" error display was implemented to almost restore the expected monotonic error display. Still, the "complete" error monitoring can be very useful for properly analyzing the training loss dynamic regarding all the previously cited refinements and identifying if the hyper-parameters must be adjusted.
Note: Having some of the YOLO parameters written in the save files is useful for deployment without providing a configuration file. Still, it might be interesting to be able to change some parameters or limits during training or when trying to train on a new task from a pre-trained network, hence the no override option.
Note: When using the "raw" format, the output ordering provides the grid element coordinate of the box that should be used in addition to the grid element relative coordinate to compute box centers.
CIANNA.set_fit_parts(position=1, size=1, probability=1, objectness=1, classes=1, parameters=1)
Use explicit keywords to create an array of integers formatted in the proper order to be given to the CIANNA.set_yolo_params() function with the "fit_parts" keyword.
Note: For all parameters, there are 3 possible flags. 1: The corresponding loss part is fitted normally; 0: The corresponding loss part has a constant target corresponding to the center of the activation interval, -1: the corresponding loss part is not fitted at all, and the resulting output is manually set to 0 but is unconstrained internally.
CIANNA.set_error_scales(position=2.0, size=2.0, probability=1.0, objectness=2.0, classes=1.0, parameters=1.0)
Use explicit keywords to create an array of scaling values for each part of the YOLO loss in the proper order to be given to the CIANNA.set_yolo_params() function with the "error_scales" keyword.
CIANNA.set_IoU_limits(good_IoU_lim, low_IoU_best_box_assoc, min_prob_IoU_lim, min_obj_IoU_lim,
min_class_IoU_lim, min_param_IoU_lim, diff_IoU_lim, diff_obj_lim)
Use explicit keywords to create an array of threshold values for the different association refinement processes in the proper order to be given to the CIANNA.set_yolo_params() function with the "IoU_limits" keyword.
Note: Most of these parameters are part of what we call the "cascading loss" process. It refers to the fact that some box properties are not fitted until the prediction overlaps sufficiently with the target. This does not affect the position and size prediction, so the network can always improve regarding raw detection. This helps ensure that the network can sufficiently identify the object features in the image before trying to predict the corresponding class of the regression parameters. In most cases, this cascading loss process speeds up and stabilizes the training, but in some cases, it even improves the prediction results (especially for the regression parameters).
Note: The IoU and objectness values are expressed regarding the chosen IoU association function in the CIANNA.set_yolo_params(). For example, IoU only spans from 0 to 1, while GIoU and DIoU are in the interval -1 to 1. This also affects objectness values, even if it always spans from 0 to 1. What can be considered a good prediction objectness threshold depends on the IoU function. From this, it is natural that the default threshold values depend on the selected IoU function. Default values are provided for each parameter in the following order: IoU, GIoU, DIoU, DIoU2.
Note: for "difficult" objects, the IoU and Objectness conditions are combined. Meaning that it must be both properly detected and confident enough in its own prediction. Still, the "good but not best" condition can apply to a "difficult" object, for example, a good prediction but with low confidence, or if multiple box priors properly detect the "difficult" object.
CIANNA.set_sm_single(slope=1.0f, fmin, fmax)
Use explicit keywords to create an array of one slope and two extremum values. This function can be used to set up any of the individual YOLO loss part in the CIANNA.set_slopes_and_maxes() function.
CIANNA.set_slopes_and_maxes(position=[...], size=[...], probability=[...],
objectness=[...], classes=[...], parameters=[...])
Use explicit keywords to create a 2D array of slopes and extremum values in the proper order to be given to the CIANNA.set_yolo_params() function with the "slopes_and_maxes" keyword. Each of the arguments of the present function can be filled with an array return by the CIANNA.set_sm_single function.
Note: The position is activated with a sigmoid, and the slope can be seen as the beta parameter. The result is the position as a fraction of the current grid element.
Note: The size is computed with a linear activation, but the corresponding box size is obtained with the following formula: b_s = p_s*exp(a), where b_s is the box size in one dimension, p_s is the prior size of the current box, and a is the activation of the neuron. The extremum values are used to cap the activation BEFORE the exponential.
Note: The probability is computed with a sigmoid activation, and the slope can be seen as the beta parameter. This probability is independent of the IoU, so only represents the confidence of a given box of overlapping an object, not the quality of this intersection.
Note: The objectness is computed with a sigmoid activation, and the slope can be seen as the beta parameter. This value depends on the IoU, so it is a combined score of the quality of the intersection between the prediction and the underlying target, with the probability of the prediction actually corresponding to a target.
Note: The probability can either be computed with a sigmoid activation of a softmax depending on what was set in CIANNA.set_yolo_params(). This class score should be seen as conditional. Representing the predicted class of the detected object IF the box corresponds to an object. This means that, for a box with a predicted objectness of zero, the class is unconstrained and can be high for a given class without any real meaning. In practice multiplying the class score with the objectness score to filter out the detection after inference works quite well as it mixes the probability of being an object, the quality of the box, and the confidence of the network in its classification.
Note: This is a full addition to the detection formalism. Each box can predict an arbitrary number of additional parameters in the form of a regression (e.g flux, angles, ellipsoid dimensions, ...). Note that these parameters are strongly affected by the corresponding cascading loss IoU limit, as they should not contribute to the loss until the object is already partly identified. Without cascading loss, trying to fit additional parameters might be too difficult/noisy for the network to learn anything.