Thursday, October 24th 2024
|
|
Short Course
Bruno Veloso Assistant Professor at FEP - University of Porto and Senior Researcher at INESC TEC
Abstract:
Knowledge Discovery from Data Streams (KD-DS) refers to the process of extracting valuable insights and patterns from continuously flowing data in real-time. With the rapid proliferation of sensors,
social media, financial transactions, and IoT devices, vast amounts of data are generated at an unprecedented pace, making traditional batch processing methods inefficient for handling such dynamic
environments. KD-DS addresses this challenge by employing algorithms and techniques designed for incremental learning, adaptability, and real-time analysis. Core tasks include stream classification,
clustering, regression, and anomaly detection, often under constraints of limited memory, processing power, and the need for timely results. Techniques like sliding windows, data summarization, and
concept drift detection are essential to manage evolving patterns over time. KD-DS has found applications in various fields, including cybersecurity, financial monitoring, and smart city
infrastructure, where real-time decision-making is crucial.
|
Friday, October 25th 2024
09:15-09:45
|
|
Luís Meira-Machado - President of the Portuguese Statistical Society
Abstract: The opening session of the Workshop on Statistics and Data Science 2024 will be led by the President of the Portuguese
Statistical Society. In addition to offering a warm welcome to all participants and officially launching the event, the session will highlight the significance of the European Statistics Day,
celebrated on October 20th. This day underscores the vital role of statistics in society, aligning with the workshop's focus on advancing knowledge in Statistics, Applied Probability, and
Operational Research.
|
09:45-10:25
|
|
Adelaide Freitas - Department of Mathematics and Center for Research and Development in Mathematics and Applications (CIDMA), University of Aveiro, Portugal
Abstract: Clustering and Disjoint Principal Component Analysis (CDPCA) is a constrained Principal Component Analysis aimed at a
simultaneous clustering of objects along a set of centroids and a partitioning of variables along a set of sparse and disjoint components. In this talk, we focus our discussion on the performance of
CDPCA on some datasets when the Alternating Least Square algorithm is used in the estimation of the parameter of the CDPCA model (a joint work with Maurizio Vichi and Márcia Sartori).
|
11:05-11:30
|
|
Daniel Tinoco- PhD program in Mathematics, Center of Mathematics, University of Minho
Abstract: This presentation will introduce an innovative approach to classical linear regression, allowing model computation in distributed settings with privacy-preserving measures for federated
data. It'll also extend the methodology to encompass a generalized linear model that maintains these properties while accommodating more diverse data distributions.
|
11:30-11:55
|
|
F. Catarina Pereira, A. Manuela Gonçalves, and Marco Costa - MAP-PDMA - PhD in Applied Mathematics, Centre of Mathematics, Department of Mathematics, University of Minho
Abstract: Climate change has increased the frequency of droughts, affecting water management, especially in agriculture [1]. This work
focuses on improving short-term forecasts of maximum temperature, a key variable in the evapotranspiration process. By using state-space models and the Kalman filter, the accuracy of 1 to 6-day
ahead forecasts has been improved [2]. This presentation will explain how these models work and demonstrate their potential for real-time weather prediction.
|
11:55-12:20
|
|
Ana Moreira and Susana Faria - PhD program in Mathematics, Centre of Mathematics, Department of Mathematics, University of Minho
Abstract: Finite Mixture Regression (FMR) models provide a flexible tool for modelling data that arise from a heterogeneous population,
where the relationship between the dependent and explanatory variables varies across different subpopulations. Given the often large number of explanatory variables in such models, variable
selection becomes critically important. Traditional methods can be computationally demanding, so penalty-based methods like the Least Absolute Shrinkage and Selection Operator (LASSO), Adaptive
LASSO (ALASSO), and Relaxed LASSO (RLASSO) have been developed. This study compares the performance of these methods in selecting the most relevant explanatory variables in mixtures of linear
regression models. Extensive simulation analyses demonstrate that the ALASSO method shows superior overall performance.
|
14:15-14:55
|
|
Maria Eduarda Silva - Faculty of Economics, University of Porto and LIADD INESC TEC
Abstract: Data from Google Trends (GT) have been gaining importance as predictors for economic indicators to overcome delays allowing timely forecasting. Such data have emerged in the literature
as alternative predictors of macroeconomic outcomes, such as the unemployment rate, featuring readiness, public availability and no costs. This talk introduces extensive daily GT data to develop a
framework to nowcast monthly unemployment rates in a real-time data availability environment, resorting to Mixed Data Sampling (MIDAS) regressions. Portugal is chosen as a use case for the
methodology since the extraction of GT data requires the selection of keywords, which is culture-dependent. The nowcasting period comprises from 2019 to 2021, which includes the coronavirus
outbreak. The results show that daily GT predictors via MIDAS lead to accurate and timely information on the unemployment rate and are particularly effective in dealing with the external shock
from COVID-19, showing accuracy gains even when compared to nowcasts obtained from typical monthly GT data via traditional ARIMAX models.
|
14:55-15:35
|
|
Soraia Pereira - Centre of Statistics and its Applications of University of Lisbon (CEAUL), Department of Mathematics, University of Minho
Abstract: Modeling spatial datasets characterized by an excess of zeros and extreme values poses significant challenges in statistical analysis. Such data structures are common in ecological
studies, particularly in marine biology, where the presence or absence of a species and rare high-density occurrences can impact population management strategies. This study introduces a
hierarchical Bayesian geostatistical model designed to effectively handle these complexities in the context of sardine egg density along the Portuguese coast.
|
16:15-16:55
|
|
Rosemeire Leovigildo Fiaccone - Professora Titular, Departamento de Estatística, Instituto de Matemática e Estatística, Universidade Federal da Bahia
Abstract: Count data are present in the most diverse research areas and are usually analyzed through the traditional Poisson and Negative
Binomial regression models. There are other possible models depending on the research question and the behavior of this type of data. The idea is to show some statistical modeling techniques for
count data through Bell probability distribution, which has the properties of being uniparametric and accommodating the phenomenon of overdispersion, which is commonly observed in this type of
data
|
16:55-17:20
|
|
Cecília Martins - PhD program in Mathematics, Center of Mathematics, University of Minho
Abstract: Fractional Differential Equations (FDEs) are essential tools for modelling complex systems in science and engineering, enabling a more precise representation of processes characterised
by non-local and memory-dependent behaviours. We propose the Neural FDE, a novel deep neural network framework that adjusts a FDE to the dynamics of data. Due to the time-dependence, Neural FDE
can be used for irregularly-sampled data unlike traditional neural networks. The numerical results suggest that, despite being more computationally demanding, it can effectively be applied to
learn complex dynamical systems.
|
|