Comparing the behavior of two or more populations with respect to a particular outcome is a common
problem in many research fields. The usual approach of performing a hypothesis test, computing the
related p-value, and taking the corresponding decision has derived in an over-simplification of the
underlying problem which is one of the starting points for the so-called replication crisis. Reporting
statistics which provide adequate measures of the size of the observed difference is always advisable.
Besides, the correct interpretation of these measures is fundamental for having an adequate knowledge
of the real state of the art of the considered topic. In this work, we discuss different measures for
summarizing the differences between two populations in both the complete and the right-censored
scenarios. The common link of these measures is the well-known area under the receiver-operating
characteristic, ROC, curve. We study the behavior of both parametric and non-parametric estimators
though Monte Carlo simulations and in different real-world problems. In addition, we discuss the
implication of these measures based on both the study design and the employed estimator. Measures
based on the area under the ROC curve provide a useful and easy to interpret metric which can facilitate
the description of the observed difference between populations although, the causal interpretation of
these measures is still a challenging problem which depends on strong assumptions.
In recent decades, the world has been confronted with the consequences of global warming; however, this phenomenon is not reflected equally in every part of the globe. Thus, the warming phenomenon must be monitored in a more regional or local scale. This work analyzes monthly long‐term time series of air temperatures in three Portuguese cities: Lisbon, Oporto, and Coimbra. We propose a periodic state‐space framework, associated with a suitable version of the Kalman filter, which allows for the estimation of monthly warming rates taking into account the seasonal behavior and serial correlation. Results about the monthly mean of the daily midrange temperature time series show that there are different monthly warming rates.
Vários procedimentos da Genómica, como os microarrays ou a proteómica, requerem a realização de milhares de testes de hipóteses em simultâneo, em entidades correlacionadas com uma estrutura de correlação desconhecida. Mais ainda, o número de variáveis é muitas vezes bastante maior do que o tamanho amostral. A questão de interesse pode ser, por exemplo, a identificação de genes diferencialmente expressos, dadas duas condições biológicas/clínicas diferentes. A grande dimensão dos dados coloca sérios desafios à Estatística convencional e nos últimos 50 anos têm sido desenvolvidos vários processos de inferência estatística multi-dimensional para abordar o problema. Nesta apresentação, faremos uma exposição sobre alguns desses processos e veremos alguns exemplos práticos de aplicação.
Censored and truncated data appear in a number of applications, including astronomy, Epidemiology and survival analysis. Censored data report incomplete inter-event times because of lost to follow-up cases or due to time limitations in the following-up of the individuals. Random truncation occurs when only lifetimes falling on a given time interval (which varies from individual to individual) can be observed. Most of the papers dealing with truncation have been confined to one-sided (left or right). The theory needed for a number of applications with doubly truncated data is still missing. A nonparametric and semiparametric estimator for the distribution function of the lifetime were proposed, considering independence between the truncation times and the lifetime. In some contexts that independency could be not acceptable and an extension of the nonparametric estimator was introduced in order to accommodate the possible dependency between the lifetime and truncation times. We illustrate the use of the proposed curve estimation in a different real data set.
In this talk I will give a brief introduction to branching processes, with particular emphasis to the ones in discrete-time known in the literature as Bienaymé-Galton-Watson processes. I will present several applications in medical and/or biological contexts and describe how they can be used to solve important problems arising in such contexts.
Modelling Heart Rate Variability (HRV) data has become important for clinical applications and as a research tool. These data exhibit long memory and time-varying conditional variance (volatility). In HRV, volatility is traditionally estimated by recursive least squares combined with short memory AutoRegressive (AR) models. This work considers a parametric approach based on long memory Fractionally Integrated AutoRegressive Moving Average (ARFIMA) models with heteroscedastic errors. To model the heteroscedasticity nonlinear Generalized Autoregressive Conditionally Heteroscedastic (GARCH), Exponential GARCH (EGARCH) and GJRGARCH models are considered. The EGARCH and GJRGARCH are necessary to model empirical characteristics of conditional volatility such as clustering and asymmetry in the response, usually called leverage in time series literature. The ARFIMA-GARCH type models are used to capture and remove long memory and characterize conditional volatility in 24 hour HRV recordings provided by PhysioNet: five from normal subjects, five from heart failure and five from atrial fibrillation patients.
In Psychotherapy a protocol of intervention is followed during a maximum of 20 weeks. During sessions several variables to study progressions of clients are recorded, regarding alliance, symptoms and biological measures. Some clients might dropout therapy for different reasons. In this seminar we will talk about statistical models that jointly model the evolution of response variables of interest, taking into account information of time of dropout due to causes related with therapy.
In orthopedics, the internal and external fixation systems are increasingly applied in the treatment of bone fractures, both in accidental and surgically induced fractures through osteotomies. These devices have a very wide range of application, highlighting the bone stabilization, elongation and transport. The osteosynthesis plates placement and fixation is done in a surgical context, being the positioning in the bone made according to the experience and criterion of the surgeon. In addition, several surgeons applies a second plate, describing best clinical results. However, biomechanical studies leading to better interpretation of the plate placement, as well as possible advantages in the use of a second plate are still scarce. The present work involves the comparison between one and two osteosynthesis plates in the femur as well as its better positioning, with emphasis on the distribution of contact pressures. Concepts from geostatistics are used to produce pressure distribution maps over the fracture region.
Apresentarei as pesquisas em que trabalho atualmente que tem como principal objetivo compreender a relação entre as alterações naturais e as de responsabilidade humana em um determinado ecossistema, nesse caso em específico no Pantanal de Cáceres/Mato Grosso, Brazil, bem como se dá as alterações neste cenário. Nesse contexto, as pesquisas das quais participo tem como princípio a Detecção de Alteração Multivariada do uso da terra no Pantanal de Cáceres/MT, as mesmas visam contribuir metodologicamente para a obtenção de resultados que possam dar suporte ao desenvolvimento de cenários de mudanças do uso da terra por meio da intervenção humana. Busca-se, desta forma, subsidiar ações preventivas e de restauração de ecossistemas através de uma metodologia de caráter sistêmico e com suporte da análise Multivariada no âmbito do desenvolvimento de novas metodologias e planejamento ambiental em escalas locais e regionais. Utilizamos em nossas pesquisas imagens de satélite, tais como LANDSAT, CBERS, MODIS e RapidEye para a aplicação da metodologia de detecção de alterações como processo de identificação de diferenças no estado de um objeto ou fenômeno através da observação do mesmo em diferentes datas ou épocas. Em geral, a metodologia baseia-se em duas etapas principais: 1) Pré-processamento e processamento das imagens e 2) Utilização da técnica multivariada de detecção de alteração. A metodologia é avaliada visual e numericamente. Diante do exposto, a principal motivação para a realização do estágio pós doutoral é a importância que há na exploração de novas metodologias de pesquisa para promover o aprimoramento docente em nível internacional.
Cancer survival analysis is of major importance in the evaluation of cancer care practices provided to populations. International comparison of survival probabilities from cancer should take into account differences in patient?s population age structure since survival from cancer is often age dependent. This is usually achieved through direct age-standardization using a common age distribution standard such as the International Cancer Survival Standards. The direct age-standardization implies the estimation of survival for each age group. Often, the extreme age groups (youngest or oldest, depending on the cancer) are sparse and their net survival estimates are either very unstable or even impossible to obtain a few years after diagnosis.
Net survival, the survival that would be observed in the absence of causes of death not related to the disease in study, can be estimated using the Pohar-Perme estimator or a modelling approach. If the model is correctly specified, both methods should produce the same estimate. When age is considered as a continuous variable and the excess hazard is modelled with flexible functions (e.g. splines), net survival of each individual can be thinly predicted for any time since diagnosis. The net survival of a given age group is obtained as the mean of the individual net survival of the subjects in this age group. Although a flexible modelling approach is used, net survival estimate of each age group depends on the observed number of subjects in each group as well as on their observed age-distribution. This will again lead to unstable net survival estimates when the data are sparse even if the model allows to smoothly predict exact individual net survivals. Age group-specific estimates given by the non-parametric Pohar-Perme estimator are also very unstable on such datasets.
An alternative approach to the estimation of age-standardized net survival would be to predict survival (model-based) for a reference age in each age group or for a reference age instead of averaging the individual?s survival.
The main aim of this study was to evaluate and compare methods for the estimation of age standardized net survival when data are sparse. We compared three different approaches. Two model-based estimators of survival and the non-parametric estimator proposed by Pohar-Perme. In the first model-based approach, net survival was estimated averaging individual survivals within each age group. In the second, survival was estimated at a reference age in each age group. A flexible parametric model on the log hazard scale was used to model the excess hazard. We compared empirically the three approaches on small randomly selected samples from a large simulated dataset under different scenarios of age and year of diagnosis dependence.
We formulate two models for the Caprine Arthritis Encephalitis virus disease (CAEV), a disease first reported in 1974 affecting mainly goats, [1]. Among dis- ease symptoms we find arthritis, pneumonia, mastitis, encephalitis, encephalomyeli- tis, from which the name. This causes an economic burden for the breeding because the infected goats are more vulnerable to further pathologies and produce less milk.
Several viral strains cause this pathology, belonging to the Small Ruminant Lentivirus group (SRLV). These are members of the genus Lentivirus of the fam- ily Retroviridae, [3]. Their name is lentiviruses, because they develop very slowly in time. Clinical signs appear only after several years of incubation. The most common of the 5 genotypes of SRLVs are genotypes A and B, with well-known associated diseases. Genotype B is pathogenic and can be transmitted both verti- cally and horizontally, through the blood or the saliva of infectious adult goats. The lentivirus genotype E can just be vertically transmitted. Its prototype is named the Roccaverano strain, from the place where it was first discovered. Goats infected by this genotype do not harm the breedings.
We present and investigate a basic CAEV system for modeling just the geno- type B situation, and a further one in which both strains are present. The models allow only the endemic, the genotype E-free and the disease-free equilibria, con- nected via transcritical bifurcations. Eradication of the pathogenic genotype is possible by reversing the actual policy used nowadays by the farmers to combat the spread of this disease.
Studies with longitudinally measured outcomes are often plagued by missing data due to patients withdrawing before completing the measurement schedule. Dropout is defined when sequences of longitudinal measurements on some patients terminate prematurely. Often the reasons for dropout are informative or non-ignorable. However, the standard methods for analysing longitudinal outcome data assume that missingness is non-informative and also ignore the reasons for dropout, which could result in a biased comparison between the covariate groups.
We propose a joint model that consists of a linear mixed effects submodel for the longitudinal outcome, and cause-specific hazard sub-models for competing reasons of dropout, linked together by latent processes. The proposed method is studied in simulations and applied to the MAGNETIC trial; the largest randomised placebo-controlled study to date comparing the addition of nebulised magnesium sulphate to standard treatment in acute severe asthma in children. The reasons for dropout are sometimes clearly known and recorded during the MAGNETIC trial, but in many instances these reasons are unknown or unclear. We explore the impact of the MAGNETIC dropout process on evaluation of the treatment effect, and jointly model the longitudinal outcome of Asthma Severity Score and informative dropout process to incorporate the information regarding the reasons for dropout by treatment group.
Estimar o tamanho de uma população específica é um problema de crucial em várias áreas. Por exemplo, no ramo da ecologia é muito relevante estimar a abundância de uma população de animais selvagens. No ramo da engenharia computacional esta metodologia poderá ser usada para estimar o número de erros num software de computador. Estes métodos são também bastante usados em epidemiologia e ciências sociais. A ideia base da regressão rácio é considerar rácios de probabilidades de contagens vizinhas que podem ser estimados por rácios de frequências observadas. Depois de ajustado um modelo de regressão apropriado, a frequência das zero contagens é calculada através de uma projeção do modelo para trás. Aplicações desta ideia serão ilustradas através de estudos de simulação com os quais avaliamos o desempenho da abordagem descrita. Através destes estudos é possível concluir que o uso de uma amostra de validação não só aumenta a eficiência no processo de estimação como ajuda a obter uma estimação final das identificações não observadas mais precisa. Um modelo com inflação de zeros foi também considerado devido a suspeitas de identificações não observadas nos dados, além das previstas pelos modelos que não consideram qualquer inflação.
Este projeto é um trabalho conjunto com uma agência governamental do Reino Unido, Animal and Plant Health Agency (APHA). A teoria será aplicada a um cenário de saúde pública. Em particular, o caso de estudo a analisar está diretamente relacionado com infeções por Salmonela em aves que produzem ovos para consumo humano. A infeção por Salmonela em humanos é um grave problema de saúde pública na Europa sendo que a fonte de infeção mais comum pensa ser-se pelo consumo de ovos contaminados. É então muito importante identificar os locais onde são produzidos ovos infetados por estirpes de Salmonela de forma a serem tomadas medidas que evitem o seu consumo pelo público.
Joint Models for longitudinal and survival data are well established statistical models in the context of biostatistics. The need to account for non missing at random repeated measurements in longitudinal studies is the motivation for these models. Under classical statistical inference, using maximum likelihood function, it is necessary to integrate out a high dimension vector of random effects, which makes these models computationally difficult to implement. An alternative approach is to use Bayesian inference, sampling from the posterior distribution using MCMC techniques. However, Rue et al (2009) propose an Integrated Nested Laplace Approximation approach to sample from the posterior distribution. This method makes in fact the computation much faster and of easy implementation. In this work we propose to fit joint models under this approach in a Bayesian inference setting.
International cooperation and sharing good practices are effective tools to enhance the knowledge triangle on Education, Research, and Innovation. This brief overview is aiming at OR/MS Education subjects, with focus in the exploitation of international projects and associated results, is addressing: the ?OR & Big Data? developments within the knowledge triangle; the preliminary results and prospects from the ?European Study on OR/MS Education?; an overview of the Erasmus Intensive Programme ?Optimization and DSS for SC?; other works as the ?Lego on My Decision? or the Fulbright program on "Support to Students" that was mainly directed to 1st-Year students. The interactive exploitation of web-based information is useful, the audience is thus invited to use equipment with internet connection.
With the primary motivation to contribute to the understanding of the progression of breast cancer, within the Portuguese population, we propose a more complex statistical model assumptions than the traditional analysis.The analysis preformed has as main objective to develop a joint model for longitudinal data (repeated measurements over time of a tumour marker) and survival (time-to- event of interest) of patients with breast cancer, being death from breast cancer the event of interest. The data analysed gathers information on 540 patients, englobing 50 variables, collected from medical records of the Hospital. We conducted a previous independent survival analysis in order to understand what the possible risk factors for death from breast cancer for these patients. Followed by an independent longitudinal analysis of tumour marker Carcinoembryonic antigen (CEA), to identify risk factors related to the increase in its values. For survival analysis we made use of the Cox proportional hazards model and the flexible parametric model Royston-Parmar. Generalized linear mixed effect models were applied to study the longitudinal progression of the tumour marker. After the independent survival and longitudinal analysis, we took into account the expected association between the progression of the tumour marker values with patient?s survival, and as such, we proceeded with a joint modelling of these two processes to infer on the association between them, adopting the methodology of random effects. Results indicate that the longitudinal progression of CEA is significantly associated with the probability of survival of these patients. We also conclude that as the independent analysis returns biased estimates of the parameters, it is necessary to consider the relationship between the two processes when analysing breast cancer data.
Metanálise
Pedro M Teixeira
Núcleo de Saúde Comunitária, Instituto de Investigação em Ciências da Vida e Saúde (ICVS), Universidade do Minho
Data/Date: 26/03/2015 | 14:00
Sala/Room: Biblioteca do DMA (Sala EC0.31) - Azurém, UM
Resumo/Abstract 
Metanálise é um termo usado para designar um conjunto de métodos que permitem combinar resultados quantitativos de vários estudos para produzir um resumo global do conhecimento empírico disponível sobre um determinado tema. Numa metanálise estuda-se a variação de uma medida de Effect Size obtida em análises independentes. Serão considerados modelos de efeitos fixos e modelos de efeitos aleatórios, assim como o estudo da heterogeneidade na apreciação da qualidade da evidência estatística.
In this work we will consider longitudinal studies where several subjects are observed repeatedly over time for one response variable of interest. In this context longitudinal models allow to distinguish variability between and within subjects. Moreover, we will consider the case where the response variable of interest is not Gaussian. For example, the case of proportions, counts, categories and others. In this case generalized linear models deal with variables that belong to the exponential family of distributions. Therefore, we will be looking into longitudinal generalized linear models combining these two theories. We will be giving examples of fitting these models with real data.
Early detection of influenza outbreaks is a challenging issue in disease surveillance. There have been several proposals for triggering the alert of outbreak as soon and accurate as possible. One approach are the Markov Switching Models, where a latent variable has two possible values representing the epidemic and non-epidemic states for each time (an location for spatio-temporal models), and two possible models with different structures are selected according to the value of the latent variable. Martinez-Beneito et al. 2008 proposed a Markov Switching Model for the detection of influenza outbreaks where the observations where the differentiated rates. This helps distinguishing the epidemic state even in low rates. Given that influenza dispersion is related to climate variables and spreads person to person, a spatio-temporal extension of this model is a natural improvement where data from nearby locations helps detect the epidemic state. The spatial and temporal relation may be modeled through Gaussian Markov random fields. Bayesian paradigm allows to easily estimate the posterior distribution of all the parameters of the model. In particular the posterior distribution of the latent variables of the Markov Switching Model is the tool of decision for assessing the risk of epidemic.
Occupational stress in health professionals has been a current topic of debate and research in the last few years, showing that nurses represent a professional class particularly exposed to high levels of stress, with severe effects on individual's health. However there is limited research in Portuguese nurses concerning the characteristics that underline the expression of psychological distress. This study explores the role of personal and professional variables on nurses mental health status. To determine the risk factors associated with clinical symptomatology of distress, a logistic regression analysis was conducted. Personal variables were introduced in the first two steps and professional variables in the last step. Due to its significance in previous analyses, we considered as sociodemographic variables, gender, having/or not a hobby and the practice/or not of physical exercise, and as professional variable the type of workplace. The final model reveals that physical exercise and hobby became important predictors of nurses' clinical symptoms. Despite the necessity of further investigation, findings highlight the importance of leisure activities for nurses' mental health status, giving relevance to the need of implementing healthy life styles.
Os Modelos de Regressão Aditivos Estruturados (STAR) permitem lidar com a presença de uma grande variedade de covariáveis e simultaneamente explorar possíveis correlações espaciais e temporais. Neste seminário iremos apresentar os resultados da modelação STAR para dois conjuntos de bases de dados em medicina. O primeiro composto por 212 517 registros de mulheres que fazem parte do Programa de Rastreio do Cancro da Mama da região centro de Portugal nos últimos 20 anos, e o segundo conjunto de dados inclui informações de todos os casos confirmados de TB (Tuberculose) notificados em Portugal entre 2000 e 2010.
This work discusses the use of panel data models to approach the age estimation methods. Particularly, to extend to the combination of dental and skeletal maturation indicators in a longitudinal sample of French-Canadian children between ages of 7 and 15 years old, for legal purposes. The choice of a fixed effects model or random effects model is based on the specificity of each models assumption.
No Brasil a Mortalidade e a incidência das doenças cardiovasculares e diabetes são pouco conhecidas. Além disso, os estudos de coorte ainda são escassos, especialmente com adultos saudáveis, das grandes metrópoles. O ELSA é uma coorte de cerca de 15 mil funcionários de instituições públicas de ensino superior e pesquisa, entre 35 e 74 anos de idade em seis grandes cidades do Brasil. A Aplicação de técnicas estatísticas adequadas para este tipo de dados é fundamental para se conhecer a magnitude do problema.
We introduce and extension of the Efron-Petrosian NPMLE when the lifetime and the truncation times may be dependent. The proposed estimator is constructed on the basis of a copula function which represents the dependence structure between the lifetime and the truncation times. Two different iterative algorithms to compute the estimator in practice are introduced, and their performance is explored through an intensive Monte Carlo simulation study.