CIANNA-related publications

A Guided Unconditional Diffusion Model to Synthesize and Inpaint Radio Galaxies from FIRST, MGCLS and Radio Zoo

We present a masked guided approach for a denoising diffusion probabilistic model (DDPM) trained to generate and inpaint realistic radio galaxy images. We train the DDPM using the FIRST radio galaxy catalog, the Radio Galaxies Zoo and cutouts of the MGCLS catalog. We compared different statistical distributions to make sure that our unconditional approach produces morphologically realistic galaxies, offering a data-driven method to supplement existing radio datasets and support the development of machine learning applications in radio astronomy.

Potevineau et al. 2026 (arxiv:2601.07485)
YOLO-CIANNA: Galaxy detection with deep learning in radio data: II. Winning the SKA SDC2 using a generalized 3D-YOLO network

As the scientific exploitation of the Square Kilometre Array (SKA) approaches, there is a need for new advanced data analysis and visualization tools capable of processing large high-dimensional datasets. In this study, we aim to generalize the YOLO-CIANNA deep learning source detection and characterization method for 3D hyperspectral HI emission cubes. We present the adaptations we made to the regression-based detection formalism and the construction of an end-to-end 3D convolutional neural network (CNN) backbone. We then describe a processing pipeline for applying the method to simulated 3D HI cubes from the SKA Observatory Science Data Challenge 2 (SDC2) dataset. The YOLO-CIANNA method was originally developed and used by the MINERVA team that won the official SDC2 competition. Despite the public release of the full SDC2 dataset, no published result has yet surpassed MINERVA's top score. In this paper, we present an updated version of our method that improves our challenge score by 9.5%. The resulting catalog exhibits a high detection purity of 92.3%, best-in-class characterization accuracy, and contains 45% more confirmed sources than concurrent classical detection tools. The method is also computationally efficient, processing the full ~1TB SDC2 data cube in 30 min on a single GPU. These state-of-the-art results highlight the effectiveness of 3D CNN-based detectors for processing large hyperspectral data cubes and represent a promising step toward applying YOLO-CIANNA to observational data from SKA and its precursors.

Cornu et al. 2026 (A&A V707, A203)
Identification of molecular line emission using convolutional neural networks

Context. Complex organic molecules (COMs) are found to be abundant in various astrophysical environments, particularly toward star-forming regions, where they are observed both toward protostellar envelopes as well as shocked regions. The emission spectrum, especially that of heavier COMs, might consist of up to hundreds of lines, where line blending hinders the analysis. However, identifying the molecular composition of the gas that leads to the observed millimeter spectra is the first step toward a quantitative analysis. Aims. We have developed a new method based on supervised machine learning to recognize spectroscopic features of the rotational spectrum of molecules in the 3 mm atmospheric transmission band for a list of species including COMs, with the aim of obtaining a detection probability. Methods. We used local thermodynamic equilibrium (LTE) modeling to build a large set of synthetic spectra of 20 molecular species, including COMs with a range of physical conditions typical for star-forming regions. We successfully designed and trained a convolutional neural network (CNN) that provides detection probabilities of individual species in the spectra. Results. We demonstrate that the CNN model we developed has a robust performance to detect spectroscopic signatures from these species in synthetic spectra. We evaluated its ability to detect molecules according to the noise level, frequency coverage, and line-richness, as well as to test its performance for an incomplete frequency coverage with high detection probabilities for the tested parameter space, with no false predictions. Finally, we applied the CNN model to obtain predictions on observational data from the literature toward line-rich hot core-like sources, where the detection probabilities remain reasonable, with no false detections. Conclusions. We demonstrate the use of CNNs in facilitating the analysis of complex millimeter spectra both on synthetic spectra, along with the first tests performed on observational data. Further analyses on its explainability, as well as calibration using a larger observational dataset, will help improve the performance of our method for future applications.

Kessler et al. 2025 (A&A V704, A324)
YOLO-CIANNA: Galaxy detection with deep learning in radio data: I. A new YOLO-inspired source detection method applied to the SKAO SDC1

Context. The upcoming Square Kilometer Array (SKA) will set a new standard regarding data volume generated by an astronomical instrument, which is likely to challenge widely adopted data-analysis tools that scale inadequately with the data size. Aims. The aim of this study is to develop a new source detection and characterization method for massive radio astronomical datasets based on modern deep-learning object detection techniques. For this, we seek to identify the specific strengths and weaknesses of this type of approach when applied to astronomical data. Methods. We introduce YOLO-CIANNA, a highly customized deep-learning object detector designed specifically for astronomical datasets. In this paper, we present the method and describe all the elements introduced to address the specific challenges of radio astronomical images. We then demonstrate the capabilities of this method by applying it to simulated 2D continuum images from the SKA observatory Science Data Challenge 1 (SDC1) dataset. Results. Using the SDC1 metric, we improve the challenge-winning score by +139% and the score of the only other post-challenge participation by +61%. Our catalog has a detection purity of 94% while detecting 40–60% more sources than previous top-score results, and exhibits strong characterization accuracy. The trained model can also be forced to reach 99% purity in post-process and still detect 10–30% more sources than the other top-score methods. It is also computationally efficient, with a peak prediction speed of 500 images of 512×512 pixels per second on a single GPU. Conclusions. YOLO-CIANNA achieves state-of-the-art detection and characterization results on the simulated SDC1 dataset and is expected to transfer well to observational data from SKA precursors.

Cornu et al. 2024 (A&A V690, A211)
SKA Science Data Challenge 2: analysis and results

The Square Kilometre Array Observatory (SKAO) will explore the radio sky to new depths in order to conduct transformational science. SKAO data products made available to astronomers will be correspondingly large and complex, requiring the application of advanced analysis techniques to extract key science findings. To this end, SKAO is conducting a series of Science Data Challenges, each designed to familiarize the scientific community with SKAO data and to drive the development of new analysis techniques. We present the results from Science Data Challenge 2 (SDC2), which invited participants to find and characterize 233 245 neutral hydrogen (H I) sources in a simulated data product representing a 2000 h SKA-Mid spectral line observation from redshifts 0.25-0.5. Through the generous support of eight international supercomputing facilities, participants were able to undertake the Challenge using dedicated computational resources. Alongside the main challenge, 'reproducibility awards' were made in recognition of those pipelines which demonstrated Open Science best practice. The Challenge saw over 100 participants develop a range of new and existing techniques, with results that highlight the strengths of multidisciplinary and collaborative effort. The winning strategy - which combined predictions from two independent machine learning techniques to yield a 20 per cent improvement in overall performance - underscores one of the main Challenge outcomes: that of method complementarity. It is likely that the combination of methods in a so-called ensemble approach will be key to exploiting very large astronomical data sets.

Hartley et al. 2023 (MNRAS V523, I2)
3D extinction mapping of the Milky Way using Convolutional Neural Networks: Presentation of the method and demonstration in the Carina Arm region

Context. Several methods have been proposed to build 3D extinction maps of the Milky Way (MW), most often based on Bayesian approaches. Although some studies employed machine learning (ML) methods in part of their procedure, or to specific targets, no 3D extinction map of a large volume of the MW solely based on a Neural Network method has been reported so far. Aims. We aim to apply deep learning as a solution to build 3D extinction maps of the MW. Methods. We built a convolutional neural network (CNN) using the CIANNA framework, and trained it with synthetic 2MASS data. We used the Besançon Galaxy model to generate mock star catalogs, and 1D Gaussian random fields to simulate the extinction profiles. From these data we computed color-magnitude diagrams (CMDs) to train the network, using the corresponding extinction profiles as targets. A forward pass with observed 2MASS CMDs provided extinction profile estimates for a grid of lines of sight. Results. We trained our network with data simulating lines of sight in the area of the Carina spiral arm tangent and obtained a 3D extinction map for a large sector in this region (l=257−303 deg, |b|≤5 deg), with distance and angular resolutions of 100 pc and 30 arcmin, respectively, and reaching up to ∼10 kpc. Although each sightline is computed independently in the forward phase, the so-called fingers-of-God artifacts are weaker than in many other 3D extinction maps. We found that our CNN was efficient in taking advantage of redundancy across lines of sight, enabling us to train it with only 9 sightlines simultaneously to build the whole map. Conclusions. We found deep learning to be a reliable approach to produce 3D extinction maps from large surveys. With this methodology, we expect to easily combine heterogeneous surveys without cross-matching, and therefore to exploit several surveys in a complementary fashion.

Cornu et al. 2022 (arxiv:2201.05571)
A neural network-based methodology to select young stellar object candidates from IR surveys

Context. Observed young stellar objects (YSOs) are used to study star formation and characterize star-forming regions. For this purpose, YSO candidate catalogs are compiled from various surveys, especially in the infrared (IR), and simple selection schemes in color-magnitude diagrams (CMDs) are often used to identify and classify YSOs. Aims: We propose a methodology for YSO classification through machine learning (ML) using Spitzer IR data. We detail our approach in order to ensure reproducibility and provide an in-depth example on how to efficiently apply ML to an astrophysical classification. Methods: We used feedforward artificial neural networks (ANNs) that use the four IRAC bands (3.6, 4.5, 5.8, and 8 μm) and the 24 μm MIPS band from Spitzer to classify point source objects into CI and CII YSO candidates or as contaminants. We focused on nearby (≲1 kpc) star-forming regions including Orion and NGC 2264, and assessed the generalization capacity of our network from one region to another. Results: We found that ANNs can be efficiently applied to YSO classification with a contained number of neurons (∼25). Knowledge gathered on one star-forming region has shown to be partly efficient for prediction in new regions. The best generalization capacity was achieved using a combination of several star-forming regions to train the network. Carefully rebalancing the training proportions was necessary to achieve good results. We observed that the predicted YSOs are mainly contaminated by under-constrained rare subclasses like Shocks and polycyclic aromatic hydrocarbons (PAHs), or by the vastly dominant other kinds of stars (mostly on the main sequence). We achieved above 90% and 97% recovery rate for CI and CII YSOs, respectively, with a precision above 80% and 90% for our most general results. We took advantage of the great flexibility of ANNs to define, for each object, an effective membership probability to each output class. Using a threshold in this probability was found to efficiently improve the classification results at a reasonable cost of object exclusion. With this additional selection, we reached 90% and 97% precision on CI and CII YSOs, respectively, for more than half of them. Our catalog of YSO candidates in Orion (365 CI, 2381 CII) and NGC 2264 (101 CI, 469 CII) predicted by our final ANN, along with the class membership probability for each object, is publicly available at the CDS. Conclusions: Compared to usual CMD selection schemes, ANNs provide a possibility to quantitatively study the properties and quality of the classification. Although some further improvement may be achieved by using more powerful ML methods, we established that the result quality depends mostly on the training set construction. Improvements in YSO identification with IR surveys using ML would require larger and more reliable training catalogs, either by taking advantage of current and future surveys from various facilities like VLA, ALMA, or Chandra, or by synthesizing such catalogs from simulations.

Cornu and Montillaud 2021 (A&A V647, A116)