INFORTECH Day 2024 Programme Schedule

May 29th, 2024, 9:00-17:00
De Vinci building, room Mirzakhani (1st floor)
Université de Mons
Avenue Maistriau 15, Mons, Belgium
See map of the campus

Organisers: Tom Mens, Bruno Quoitin, Quentin De Coninck

9:00 Welcome

9:15 Opening words: Tom Mens (président INFORTECH)

9:30 Keynote 1: Stefano Zacchiroli, “Building Blocks for a Safe(r) Open Source Software Supply Chain: Reproducible Builds and Software Heritage“

Bio. Stefano is full professor of computer science at Télécom Paris, Polytechnic Institute of Paris). His current research interests span digital commons, open source software engineering, computer security, and the software supply chain. He is co-founder of Software Heritage, the largest public archive of software source code. He has been a Debian developer since 2001, where he served as Debian project leader from 2010 to 2013, and a member of the Reproducible Builds steering committee. He is a former board director of the Open Source Initiative (OSI) and recipient of the 2015 O’Reilly Open Source Award.

Abstract. Securing the software supply chain, in particular when it comes to its free/open source software (FOSS) components, is all the rage now. Applied researchers, industry consortia, and practitioners alike are trying out a variety of approaches looking for the ones that will stick. In this talk we will review two building blocks for a safe(r) FOSS supply chain that are seeing significant adoption. On the one hand, Reproducible Builds enables downstream users of FOSS products, whose source code they trust, to establish trust in binary versions of the same products built by untrusted 3rd parties. On the other hand, Software Heritage has assembled the largest public archive of software source code and version control system information, providing traceability at the scale of public source code with strong integrity guarantees. We will review the state of the two projects, their synergy, related research leads, and how they can help us building the safe(r) FOSS supply chain we all need.

10:20 Coffee Break

10:40 Technical Session 1 (chair: Bruno Quoitin)

10:40 Natarajan CHIDAMBARAM (Software Engineering Lab – FS) “A Bot Identification Tool Based on Activities in GitHub“
Abstract. Social coding platforms like GitHub enable software repository maintainers and their contributors to use automated agents, known as bots, to perform error-prone and repetitive tasks. Existing bot identification approaches tend to have some practical limitations. Some of them are restricted to specific subsets of activities (e.g., committing, or commenting in issues and pull requests) and most of them require to retrieve a substantial amount of data to identify bots, making them difficult to use at scale. In this talk, I will propose a new model and associated tool to distinguish bots from humans in GitHub. This tool considers a wider range of activity types, uses as little data as possible and executes quickly to provide a prediction without compromising on the model performance.

10:58 Youness HOURRI (Software Engineering Lab – FS) “On the impact of bots in collaborative software development communities“
Abstract. The integration of automation tools such as bots and apps has transformed the landscape of open source software development in GitHub, yet their complex interactions within these ecosystems remain poorly understood. This study aims to fill this knowledge gap by deploying advanced data mining techniques to scrutinize the activities of both bots and human contributors across numerous GitHub repositories. By analyzing interactions among thousands of accounts that generate significant monthly activity, this research leverages machine learning algorithms to discern patterns and trends that define different contributor roles. The outcomes include a classification system that categorizes contributors into distinct profiles, illuminating their unique roles and impacts within the open source community. This analysis not only deepens our understanding of the natural organizational structures within these communities but also highlights the pivotal roles that bots play. The research progresses towards developing sophisticated recommendation systems that propose optimal bot integrations for specific projects and offer insights into managing community dynamics effectively. This presentation will outline the initial findings and discuss their potential to enhance workflow automation in open source software projects, paving the way for more collaborative and efficient development practices.

11:16 Sukanya PATRA (Big Data and Machine Learning Lab – FS) “Detecting Abnormal Operations in Concentrated Solar Power Plants from Irregular Sequences of Thermal Images“
Abstract. Concentrated Solar Power (CSP) plants store energy by heating a storage medium with an array of mirrors that focus sunlight onto solar receivers atop a central tower. Operating at extreme temperatures exposes solar receivers to risks such as freezing, deformation, and corrosion. These problems can cause operational failures, leading to downtime or power generation interruptions, and potentially extensive equipment damage if not promptly identified, resulting in high costs. We study the problem of anomaly detection (AD) in sequences of thermal images collected over a span of one year from an operational CSP plant. These images are captured at irregular intervals ranging from one to five minutes throughout the day by infrared cameras mounted on solar receivers. Our goal is to develop an AD method to extract useful representations from high-dimensional thermal images, that is also robust to the temporal features of the data. This includes managing irregular intervals with temporal dependence between images, as well as accommodating non-stationarity due to a strong daily seasonal pattern. An additional challenge includes the coexistence of low-temperature anomalies resembling low-temperature normal images from the start and the end of the operational cycle alongside high-temperature anomalies. We first evaluate state-of-the-art deep anomaly detection methods for their performance in deriving meaningful image representations. Then, we introduce a forecasting-based AD method that predicts future thermal images from past sequences and timestamps via a deep sequence model. This method effectively captures specific temporal data features and distinguishes between difficult-to-detect temperature-based anomalies.

11:34 Otmane AMEL (ILIA – FPMS) “Multimodal Fusion for Dangerous Action Recognition in Railway Construction Sites“
Abstract. The growing demand for advanced tools to ensure safety in railway construction projects highlights the need for systems that can smoothly integrate and analyze numerous data modalities, such as multimodal learning algorithms. The latter, inspired by the human brain’s ability to integrate many sensory inputs, has emerged as a promising field in artificial intelligence. In light of this, there has been a rise in research on multimodal fusion approaches, which have the potential to outperform standard unimodal solutions. However, the integration of multiple data sources presents significant challenges to be addressed. This study provides a comparison analysis of multimodal fusion approaches, highlighting the importance of robust fusion strategies and modality encoders. Through our analysis, we spotlight the obstacles faced in the domain, particularly for RGB-D dangerous action recognition, and the fusion method adapted for capturing complex cross-modal interactions.

11:52 Zainab OUARDHIRI (ILIA – FPMS) “FuDensityNet: Fusion-Based Density-Enhanced Network for Occlusion Handling“
Abstract. In the pursuit of robust computer vision systems, occlusion handling remains a pivotal challenge, particularly in dynamic and cluttered environments. This work introduces a refined approach to occlusion handling in computer vision, leveraging a new voxelization method and depth inference from 2D imagery to address the complexities of occluded environments. By adopting a density-aware voxelization, we significantly reduce the model’s memory usage while maintaining critical detail in occluded regions. The depth estimation technique mitigates the need for 3D sensors, using 2D images to reconstruct detailed 3D point clouds. Practical application of these methods is demonstrated through a ZED 2 camera, highlighting the model’s enhanced performance in real-world occlusion scenarios. While promising, further development is aimed at perfecting the occlusion detection capabilities and validating the model against diverse occlusion challenges.

12:10 Lunch with sandwiches

13:00 Keynote 2: Adnan Shahid, “Wireless Foundation Models”

Bio. Professor Adnan Shahid is affiliated to the Internet Technology and Data Science Lab (IDLab) at Universiteit Gent, Belgium and IMEC. Within IDLab’s intelligent Wireless Networking (iWINe) group, he leads the ‘AI/ML for Wireless’ subgroup. His research interests include machine learning and AI for wireless communications and networks, decentralized learning, radio resource management, the Internet of Things, 5G/6G networks, localization, connected healthcare, etc.

Abstract. By its nature, generative AI has the potential to introduce a degree of autonomy to various tasks and fields, including content generation, autonomous vehicles, game development, code generation, scientific research, content curation, and more. Wireless networks are no exception, as the utilization of generative AI can enable self-evolving wireless networks that can adjust, reconfigure, and optimize their functions according to specific network conditions and user demands. The integration of generative AI into wireless networks will fundamentally transform the way wireless networks are designed and operated today. To be precise, Large Language Models (LLMs), a subfield of generative AI, are envisioned to give rise to self-evolving networks. These networks, powered by multi-model LLMs trained on various wireless data, including RF signals, images, sound, radar, and more, can be fine-tuned to perform several downstream tasks such as beam management, resource management, power management, modulation selection, and others. This innovation will lead to the development of a Wireless Foundation Model, eliminating the need for dedicated AI models for each task and paving the way for the realization of artificial general intelligence (AGI)-enabled wireless networks.

13:50 Technical Session 2 (chair: Quentin De Coninck)

13:50 Hassan ONSORI DELICHEH (Software Engineering Lab – FS) “Security Issues in GitHub Actions“
Abstract. Collaborative practices have revolutionised the software development process, enabling distributed teams to seamlessly work together. Social coding platforms have integrated CI/CD automation workflows, with GitHub Actions emerging as a prominent automation ecosystem for GitHub repositories. While automation brings efficiency, it also introduces security challenges, often related to software supply chain attacks and workflow misconfigurations. We outline the security issues associated with the software supply chain of GitHub Actions workflows, most notably their reusable Actions and their dependencies. We also explore the security risks associated with misconfigurations of repositories and workflows, such as poor permission management, command injection, and credential exposure. To mitigate these risks we suggest practical remediations, including dependency and security monitoring, pinning Actions, strict access control, verified creator practices, secret scanning tools, raising awareness, and training. In doing so, we provide valuable insights on the need to integrate security seamlessly into the automated collaborative software development processes.

14:08 Dyna Soumhane OUCHEBARA (Service d’intelligence artificielle – FS) “Deep learning based vulnerability detection in source code“
Abstract. Deep learning based vulnerability detection (or prediction) refers to the use of deep learning techniques to identify potential vulnerabilities within source code, in order to guide the security review process. Over the past two decades, a plethora of methods have emerged, evolving alongside the progress in machine learning. This presentation will give first a brief review of the evolution of deep learning based vulnerability detection techniques and discuss key obstacles in this domain and current research directions. Then, we will focus on the usage of a particular type of deep learning models, namely Transformers and Pre-trained models, which are the most recently explored techniques for vulnerability prediction.

14:26 Aqeel AHMED (Service de télécommunication – FS) “Deep learning based LoRa device identification using Radio frequency fingerprinting“
Abstract. LoRa has gained popularity as the de-facto physical platform for the internet of things. Given its ability to allow communication at low power and long range, it is suitable for various IoT applications such as smart homes, smart cities, smart agriculture and environmental monitoring. However, security is still a major threat to low cost IoT devices. One of the main security aspects is the identification of legit and malicious devices in the network. Recently, Radio frequency fingerprinting method has gained the attention of researchers in the area of device identification. Radio frequency fingerprints of a device are specific hardware features that cannot be cloned and altered. In our work, we are exploring the use of deep learning based radio frequency fingerprinting to identify a LoRa device in the network. To this end, an existing dataset is used to reproduce a part of the work already done by the researchers. These results will act as baseline for our future research in this direction.

14:44 [Lightning Talk] Ewan GENCSEK (Service d’électromagnétisme et de télécommunication – FPMS) “Tag-based physical layer authentication“

14:55 Sédrick STASSIN (ILIA – FPMS) “Explaining through Transformer Input Sampling“
Abstract. Vision Transformers are becoming more and more the preferred solution to many computer vision problems, which has motivated the development of dedicated explainability methods. Among them, perturbation-based methods offer an elegant way to build saliency maps by analyzing how perturbations of the input image affect the network prediction. However, those methods suffer from the drawback of introducing outlier image features that might mislead the explainability process, e.g. by affecting the output classes independently of the initial image content. To overcome this issue, this paper introduces Transformer Input Sampling (TIS), a perturbation-based explainability method for Vision Transformers, which computes a saliency map based on perturbations induced by a sampling of the input tokens. TIS utilizes the natural property of Transformers which permits a variable input number of tokens, thereby preventing the use of replacement values to generate perturbations. Using standard models such as ViT and DeiT for benchmarking, TIS demonstrates superior performance on several metrics including Insertion, Deletion, and Pointing Game compared to state-of-the-art explainability methods for Transformers.

15:13 Coffee Break

15:35 Technical Session 3 (chair: Tom Mens)

15:35 Atharva AWARI (Mathematics and Operations Research Lab) “Coordinate-Descent Algorithm for Nonlinear Matrix Decomposition with ReLU-function“
Abstract. Nonlinear Matrix Decompositions (NMD) solve the following problem: Given a data matrix X, find low-rank factors W and H such that X ~ f(WH), where ‘f’ is an element-wise nonlinear function. In this talk, we focus on the case when f is the rectified linear unit (ReLU) activation, that is, when f(z) = max(0, z ), which is referred to as ReLU-NMD. All state-of-the-art algorithms for ReLU-NMD have been designed to solve a reformulation of ReLU-NMD. It turns out that this reformulation leads to a non-equivalent problem, and hence to suboptimal solutions. We propose a coordinate-descent algorithm designed to solve ReLU-NMD directly. This allows us to compute more accurate solutions, with smaller error. This is illustrated on synthetic and real-world datasets.

15:53 Giovanni SERAGHITI (Mathematics and Operations Research Lab) “Accelerated Algorithms for Nonlinear Matrix Decomposition with the ReLU function“
Abstract. In this contribution I propose a new problem in low-rank matrix factorization, that is the Nonlinear Matrix Decomposition (NMD): given a sparse nonnegative matrix, find a low-rank approximation, that recovers the original matrix by the application of an element-wise nonlinear function. I will focus on the so-called ReLu-NMD, where the nonlinear function is the rectified unit (ReLu) non-linear activation. At first, I will provide a brief overview of the motivations and possible interpretations of the model, supported by theoretical examples. I will explain the idea that stands behind ReLU-NMD and how nonlinearity can be exploited to get low-rank approximation of given data. Then, I will stress the connection with neural networks and I will present some of the the existing approaches developed to tackle ReLu-NMD. Furthermore, I will introduce two new algorithms: (1)Aggressive Accelerated NMD which uses an adaptive Nesterov extrapolation to accelerate an existing algorithm, and (2)Three-Block NMD which parametrizes the low-rank approximation in two factors and leads to a significant reduction in the computational cost. Finally, I will illustrate the effectiveness of the proposed algorithms on synthetic and real-world data sets, providing some possible applications.

16:11 Maxime MANDERLIER (Service de Management de l’Innovation), “Enhancing Language Learning Recommendations: Integrating Large Language Model Embeddings in Graph Neural Networks“
Abstract. Our study explores the integration of Large Language Model (LLM) embeddings with Graph Neural Networks (GNNs) to revolutionize content recommendations in the context of foreign language learning. Our approach harnesses the nuanced understanding of language offered by LLMs, specifically using their embeddings to enrich the features within a GNN framework. This integration aims to address the challenges of accurately matching learners with content that is not only relevant to their interests but also appropriate for their proficiency level. By conducting extensive experiments, we demonstrate that our method significantly outperforms traditional and advanced recommendation systems, showing an up to 18% increase in Normalized Discounted Cumulative Gain at rank 5. This improvement underscores the potential of combining LLM embeddings with GNNs in creating a more responsive and effective recommendation system. The study affirms the versatility of LLM embeddings in enhancing the recommendation process, paving the way for their application in diverse learning contexts beyond language acquisition. Our findings offer fresh insights into the innovative use of LLMs and GNNs, marking a significant step forward in tailoring educational content to individual learner needs.

16:29 [Lightning Talk] Moad HANI (ILIA) “Modeling neurodegenerative and neurovascular disorders utilizing longitudinal data and temporal learning approaches“
Abstract. Neurodegenerative diseases present a significant challenge in clinical management, given their multifactorial nature and diverse progression trajectories. Leveraging artificial intelligence holds promise in unraveling the intricate dynamics of these diseases, potentially uncovering concealed patterns and identifying biomarkers crucial for prognostication. Such advancements could revolutionize patient care by enabling tailored treatment strategies, consequently alleviating the socioeconomic burden associated with disease management. Parkinson’s disease, a prominent neurodegenerative disorder characterized by a spectrum of motor and non-motor symptoms, serves as a compelling focal point for investigation. Currently, clinicians rely on conventional clinical scores to monitor disease progression, encompassing various physiological systems due to Parkinson’s multi-system nature. Our research seminar aims to elucidate the significance of standardization in clinical data analysis, particularly in predicting patient subtypes and stages through the application of time-modelling techniques. By establishing a standardized framework, clinicians can harness existing data alongside sophisticated time-modelling methodologies to anticipate patient trajectories accurately. Our proposed approach offers a robust framework capable of automating patient trajectory clustering, thereby facilitating the alignment of new patients with learned trajectory patterns. Central to this framework is the integration of cost-effective feature selection methodologies and continuous time Markov-chains, ensuring efficient utilization of available resources. Notably, there exists a paucity of tools for systematically evaluating the efficacy of machine learning systems concerning conventional clinical modalities vis-à-vis the incorporation of more resource-intensive modalities such as imaging or genomic data. Furthermore, a critical gap lies in methodological frameworks that comprehensively address the combined challenges of temporal variability and complex inter- and intra-patient heterogeneity. We endeavor to bridge these gaps, advancing our understanding of disease progression modeling and paving the way for personalized treatment strategies tailored to individual patient profiles.

16:40 Xavier Lessage (ILIA – FPMS) “Automatic anomalies detection and labeling from complete mammographies: a retrospective study“
Abstract. Mass localization in mammography is a critical task for early detection and effective treatment of breast cancer, a prevalent health concern worldwide. Computer Aided Diagnosis systems (CADx) can assist radiologists in their difficult diagnosis task, and play a key role in detecting abnormalities and treating breast cancer. In this paper we propose an innovative method for automated labelling and localization of mammographic masses and microcalcifications. The aim of our method is to detect the presence of masses and microcalcifications on the mammography. Hence, we propose to use Yolo Framework in order to locate tumours in a dataset with complete mammograms. Malignant masses or microcalcifications are usually annotated and analysed by a medical expert. As there are only few medical experts devoted to this annotation in each hospital, this task become too time-consuming. Benign tumours, on the other hand, are nine times more numerous on average, and require far too much time from radiologists to annotate them manually. Our innovative method consists of automatically extracting all abnormalities, whether benign or malignant, from the full mammogram. Our experiments were carried out with a dataset from a Belgian hospital (HELORA) thanks to a retrospective study containing 800 malignant images and 90 000 benign images classified in two directories, positive and negative respectively. Addressing the challenge posed by the abundance of tumors relative to the limited availability of expert annotators, our approach demonstrates proficiency in reducing the time burden on radiologists.