Short-course (Link: Information on how to attend the meeting will be available soon.)
Thursday, November 4th 2021
|
|
Short-Course
Abstract:
This short course consists of presenting the fundamental ideas on Machine Learning and Data Mining.
We will present the basic tasks, algorithms, and methods. The course syllabus includes the following topics:
Machine learning tasks, Distance-based algorithms, Naive Bayes, Decision trees, Neural networks,
Support Vector machines and ensemble methods. Illustrative examples using Knime and R.
João Gama e Bruno Veloso
LIAAD-INESC TEC, Universidade do Porto
|
09:15-09:30
|
|
Welcome session
|
09:30-12:30
|
|
A light Introduction to Machine Learning - Part I
|
14:30-17:30
|
|
A light Introduction to Machine Learning - Part II
|
Friday, November 5th 2021
Link: Information on how to attend the meeting will be available soon.
10:25-10:30
|
|
Online Reception
|
10:30-11:15
|
|
Marta Sestelo (Dep. Statistics and O.R., SiDOR Group, University of Vigo)
Abstract:
Survival analysis includes a wide variety of methods for analyzing time-to-event data. One basic but important
goal in survival analysis is the comparison of survival curves between groups. Several nonparametric methods
have been proposed in the literature to test for the equality of survival curves for censored data. When the
null hypothesis of equality of curves is rejected, leading to the clear conclusion that at least one curve is different, it
can be interesting to ascertain whether curves can be grouped or if all these curves are different from each other.
Software in the form of an R package (clustcurv) has been developed in order to allow determining groups with an
automatic selection of their number. The applicability of the proposed method is illustrated using real data.
Keywords: Log-rank Test; Multiple Survival Curves; Number of Groups; Survival Analysis; Cluster; R package
|
11:30-12:15
|
|
Iúri Correia (Faculty of Sciences of the University of Lisbon, University of St. Andrews; Greenland Institute for Natural Resources)
Abstract:
The barren-ground caribou (Rangifer tarandus groenlandicus) and the muskox (Ovibos moschatus), are two species present in West Greenland,
and have been important for the human population. Their importance spans from cultural traditions and subsistence consumption to recreational
and commercial harvesting. Long-term monitoring of these species is important to optimize management strategies. Distance Sampling refers to a
set of techniques widely used in Biology to estimate density and abundance of the observed objects of interest. After fitting a detection function model,
this information was used to fit a Density Surface Model. Here, density can be spatially represented as a function of additional covariates.
The resulting prediction maps seem to be consistent with previous studies and the specialist's knowledge.
Keywords: abundance, density surface modelling/spatial density modelling, distance sampling, generalized additive model, Greenland
|
12:15-13:00
|
|
Clarice Demétrio (Departamento de Ciências Exatas, ESALQ, Universidade de São Paulo, Brasil)
Abstract:
Mixed models have become important in analyzing the results of experiments, particularly those that require more complicated models
such as those that involve longitudinal data. A method for deriving the terms in a mixed model, described by Brien and Demétrio (2009)
[J. Agric. Biol. and Env. Stat., 14, 253-80] will be presented. It extends the method described by Brien and Bailey
[J. Roy. Stat. Soc., Series B, 68 (2006): 571-609] to explicitly identify terms for which autocorrelation and smooth trend, arising from
longitudinal observations, needs to be incorporated in the model. At the same time we retain the principle that the model used
should include, at least, all the terms that are justified by the randomization. This is done by dividing the factors into sets,
called tiers, based on the randomization and determining the nesting and crossing relationships between factors. To illustrate the method,
a mixed model for the randomized complete block design with longitudinal observations is outlined. The mixed model analysis
of data from a three-phase experiment to investigate the effect of time of refinement on Eucalyptus pulp from four different sources
is also described. For this example, cubic smoothing splines are used to describe differences in the trend over time and
unstructured covariance matrices between times are found to be justified.
Keywords: Analysis of variance; Longitudinal experiments; Mixed models; Multiphase experiments; Multitiered experiments; Repeated measures.
|
14:30-15:15
|
|
Baltazar Nunes (Departamento de Epidemiologia, Instituto Nacional de Saúde Dr. Ricardo Jorge (INSA); Escola Nacional de Saúde Pública, Universidade NOVA de Lisboa)
Abstract: Statistical and epidemiological methods play an essential role in producing information for the public health decision process.
They allow the collection, analysis, reporting and interpretation of data necessary to inform public health officials in the decision-making process,
enabling the diagnosis of the situation, the selection of the most adequate measures and to monitor and evaluate their impact.
In this talk, we will show the several studies and methodologies that the Department of Epidemiology of INSA has implemented to
inform decision makers during the COVID-19 public health emergency. These include epidemiological surveillance of incidence
and transmissibility, population contacts and mobility, hospitalizations and deaths; the development of specific epidemiological
studies like sero-prevalence surveys, non-pharmaceutical measures impact and vaccine effectiveness studies; and finally the use of
mathematical models to answer "what if." questions, related to implementation or lifting of NPI and the impact of vaccination plan.
|
15:30-16:15
|
|
M. Cristina Miranda (ISCA and CIDMA, Universidade de Aveiro, CEAUL, Universidade de Lisboa)
Abstract: Hurricanes, floods, heatwaves, high rates of flu cases, peaks or troughs in the stock market - these are some examples of real-life
where the occurrence of extreme values may deeply affect human lives. Most frequently, this type of observations appears in clusters.
Fitting a suitable distribution to data turns possible to predict, prevent, and to take measures that help to deal with potential
consequences of extreme events. Extreme Value Theory (EVT) development increased due to hydrology but today there are also several authors
that focus on finance and insurance fields applications. With clusters of extreme observations, it is still possible to obtain the limit
maximum distribution with EVT, as long as some adequate local dependence conditions hold. In that case, it is necessary to estimate the
extremal index. In this talk, the latest (and old) proposals of extremal index estimators are presented. A set of real data is used to illustrate
their application using available packages in R.
Keywords: Extreme Value Theory, Dependence, Extremal index, Estimation.
|
16:15-17:00
|
|
Conceição Ribeiro (CEAUL and University of Algarve)
Abstract: The study of the evolution of real events in a temporal level or in a space level can present a great importance in the definition
of measures to improve the welfare of populations. This work aims to extend the analysis of these events and use spatial and temporal models
that allow to characterize the trend in the spatial level and in the temporal level. In other words, it intends to understand if over the years
and across regions there have been changes in patterns. To achieve these aim, hierarchical Bayesian models are used and INLA methodology is
used to implement these models through the package R-INLA.
Keywords: Spatial; Temporal; Hierarchical Bayesian models; INLA
|
|