Seminar Schedule: Year 2017

Friday, 1st of December 2017, Time: 12:00

Juan José Egozcue, Departament d'Enginyeria Civil i Ambiental, Universitat Politècnica de Catalunya, Barcelona, Spain.

Tuesday, 26th of September 2017, Time: 12:30

Georg Heinze, Center for Medical Statistics, Informatics and Intelligent Systems, Medical University of Vienna, Austria.

Logistic regression with rare events: problems and solutions

Friday, 7th of July 2017, Time: 12:00

Mike Campbell, University of Sheffield, Sheffield, United Kingdom.

The CONSORT statement for Pilot Trials: What is different?

Friday, 16th of June 2017, Time: 12:30

Tim Robinson, Department of Statistics, University of Wyoming, USA.
A Case Study in Data Science: Mobilizing Data for the Monitoring of Native Grass Composition in the Prairie Pothole Region

Friday, 9th of June 2017, Time: 12:30

Jesús Igor Heberto Barahona Torres, Instituto de Matemáticas, Universidad Nacional Autónoma de México, México.

¿Quién escribió la historia de la conquista de la Nueva España? ¿Es correcta la hipótesis de Duverger? Un estudio comparativo basado análisis estadístico de textos

Friday, 26th of May 2017, Time: 12:30

Justin Silverman, Department of Molecular Genetics and Microbiology, Duke University, NC, USA.

Scalable count-compositional models for microbiome time-series data

Wednesday, 22nd of March 2017, Time: 14:00

Haavard Rue, King Abdullah University of Science and Technology, Saudi Arabia.

Penalising model component complexity: A principled practical approach to constructing priors

Friday, February the 10th 2017, Time: 12:30

Nihan Acar Denizli (Mimar Sinan Güzel Sanatlar Üniversitesi, Istanbul, Turkey)

Functional Linear Regression Models for Scalar Responses on Remote Sensing Data: An Application to Oceanography

Top

Some thoughts about the parameter space

SPEAKER: Juan José Egozcue
LANGUAGE: Spanish
PLACE: Building C5, Aula C5016, Campus Nord, UPC (see map)
DATE: Friday, 1st of December 2017
ABSTRACT: The sample space and its structure (operations and metrics if any) is the first step in any statistical modelling. In Bayesian statistics, parameters of the observational model are assumed to be random. Their sample space is the parameter space and it has to be taken as a sample space of random parameters. As such, its structure is a key point, for instance, to determine which are suitable mean values and variability of parameters. Even the mode is a characteristic that depends on the parameter space. This structure has been seldom taken into account and a real space structure is implicitly assumed. For instance, after a posterior simulation of parameters using some MCMC procedure, the simulated values are averaged or marginalized without any further consideration.

As an example of a parameter space which can hardly be considered as a real space, the estimation of multinomial probabilities is reviewed. Specifically, posterior mean and mode. Additionally, predictive marginalization is also visited.

ABOUT THE SPEAKER: Dr. Juan José Egozcue studied Physics, oriented to Geophysics and Meteorology, at the University of Barcelona (Spain). He obtained his PhD in the same university with a dissertation on maximum entropy spectral analysis (1982). In 1978 he got a position at the Escuela de Ingeniería de Caminos, Canales y Puertos de la Universidad Politécnica de Cataluña (UPC), Barcelona, Spain, teaching several topics on Applied Mathematics. In 1983 he started teaching Probability and Statistics. He became Associate Professor in 1985, and Full Professor in 1989, at the UPC, where he has been vice-chancellor of the university (1986-1988) and Chair of the Department of Applied Mathematics III (1992-1998). He retired from UPC in January, 31, 2016 and become Professor Emérito (UPC) in September, 1st, 2016. His research activities are presently centered in two lines: estimation of natural hazards using Bayesian methods, specially applied to seismic, rainfall and ocean wave hazards; and statistical analysis of compositional data, with special emphasis in the geometry of the sample space, the simplex.

He started the research on compositional data analysis about 2000 in cooperation with Prof. Dr. Vera Pawlowsky-Glahn. Presently, he is President of the Association for Compositional Data (CoDa-Association). The main contributions to the field of compositional data follow below, and for more info on his research see http://futur.upc.edu/JuanJoseEgozcueRubi.

Pawlowsky-Glahn, V. and J. J. Egozcue: Geometric Approach to Statistical Analysis on the Simplex, Stochastic Environmental Research and Risk Assessment, 15, 5, 384-398, 2001.

Egozcue, J. J., V. Pawlowsky-Glahn, G. Mateu-Figueras and C. Barceló-Vidal: Isometric logratio transformations for compositional data analysis,
Mathematical Geology, 35, 3, 279-300, 2003.

Egozcue, J. J. and V. Pawlowsky-Glahn: Groups of parts and their balances in compositional data analysis, Mathematical Geology, 37, 7, 799-832, 2005.

Pawlowsky-Glahn, V., J. J. Egozcue and R. Tolosana-Delgado: Modeling and Analysis of Compositional Data, John Wiley & Sons, Chichester, UK, 272pp., ISBN: 9781118443064, 2015

Top

Logistic regression with rare events: problems and solutions

SPEAKER: Georg Heinze
LANGUAGE: English
PLACE: Building C5, Aula C5016, Campus Nord, UPC (see map)
DATE: Tuesday, 26th of September 2017
ABSTRACT: We discuss problems arising in the analysis of logistic regression models with sparse data sets, considering the case where interest lies in both prediction and effect estimation. The first problem is separation, where perfect separability of the outcomes in a data by the covariates set hampers effect estimation. Another problem is low accuracy despite a high sample size. Several approaches to deal with these problems have been proposed, e.g. methods based on bias reduction such as Firth's logistic regression, or the use of weakly informative priors in a Bayesian framework. Because of many advantageous properties, Firth's logistic regression has become a standard approach for the analysis of binary outcomes with small samples. It is implemented in the R package logistf and in many other statistical software systems. Whereas it reduces the bias in maximum likelihood estimates of coefficients, bias towards one-half is introduced in the predicted probabilities. The stronger the imbalance of the outcome, the more severe is the bias in the predicted probabilities. We propose a simple modification of Firth's logistic regression resulting in unbiased predicted probabilities. While this method introduces a little bias in the regression coefficients, this is compensated by a decrease in their mean squared error. We demonstrate the properties of our proposed method in a comparative simulation study, including also other methods, and exemplify its use with real data. Joint work with Rainer Puhr, Mariana Nold, Lara Lusa, Angelika Geroldinger. Reference: Puhr et al, Firth's logistic regression with rare events: accurate effect estimates AND predictions? Statistics in Medicine 2017 Jun 30;36(14):2302-2317, doi: 10.1002/sim.7273. The slides of this seminar are available here.

ABOUT THE SPEAKER: Georg Heinze is head of the Section for Clinical Biometrics at the Center for Medical Statistics, Informatics and Intelligent Systems of the Medical University of Vienna since 2015. He received his PhD Degree in Statistics from the University of Vienna in 1998, and was appointed Associate Professor in 2004. Primary research focuses on biostatistical modeling strategies for prediction and estimation of effects of exposures on outcomes, particularly when sample sizes are small or outcome events are rare. Secondary focus is the re-use of health data for medical research, particularly when sample sizes are very large, as in nationwide studies on health insurance claims. For more information please visit the website https://cemsiis.meduniwien.ac.at/en/kb

Top

The CONSORT statement for Pilot Trials: What is different?

SPEAKER: Mike Campbell
LANGUAGE: English
PLACE: FME, Edifici U, Sala de Juntes. Campus Sud, Universitat Politècnica de CatalunyaPau Gargallo, 5, 02028 Barcelona
DATE: Friday 7th of July, 2017. Time: 12:00
ABSTRACT: The CONSORT statements about the reporting of randomised controlled trials (RCTs) are now established as an aid to authors, editors and reviewers, to enable RCTs to be reported properly. They make it easier for readers and for those conducting a meta analysis to decide whether the conclusions reported in the study are indeed valid. It has been shown that CONSORT has improved the reporting of RCTs. Pilot trials and preliminary studies are conducted to assess whether a proposed trial can be carried out. Recent reviews have shown that their reporting is poor, and this is particularly unfortunate, since they could be crucial in enabling other authors to design better trials. Over the past five years a team from the UK and Canada have been working on a revision of the CONSORT for Pilot Trials and this was published earlier this year. The talk will describe what pilot and feasibility trials are, the process of revising a CONSORT statement and what the new statement contains.

ABOUT THE SPEAKER: Mike Campbell is Emeritus Professor of Medical Statistics at the University of Sheffield. He is interested in pilot trials, cluster trials, sample size calculations and teaching statistics and is is a co-author of "How to design, analyse and report cluster randomised trials in Medicine and Health related research", "Statistics at Square One", and "Sample size tables for clinical studies".

Top

A Case Study in Data Science: Mobilizing Data for the Monitoring of Native Grass Composition in the Prairie Pothole Region

SPEAKER: Tim Robinson
LANGUAGE: English
PLACE: Building C5, Aula C5016, Campus Nord, UPC (see map)
DATE: Friday, 16th of June 2017. Time: 12:30
ABSTRACT: The Prairie Pothole Region (PPR) is a 715,000 km2 area in North America that is filled with wetlands, hills, and lakes formed by glaciers as they melted and moved through the area more than 10,000 years ago (https://en.wikipedia.org/wiki/Prairie_Pothole_Region). Over the last century, much of the acreage in the PPR has been converted to agricultural use. It is estimated that nearly half of the potholes have been drained for agricultural use and in some areas, nearly 90% of the potholes have disappeared. Effective management for conservation requires data informed decision making and subsequent policy development within a complex system. For data to be of value for conservation efforts, thought must be placed into the workflow required to effectively mobilize the data to decision makers. This talk not only describes the workflow process being used to mobilize data for decision making for conservation managers but also argues that the described workflow is the essence of data science and hence is generalizable to a wide variety of applications. As part of this talk, I will also discuss the importance of placing careful thought into the sampling design (selection of sampling units as well as a power analysis for trend detection) used for monitoring native prairie health objectives. Anyone with a basic understanding of statistics and an interest in mobilizing data for decision making is encouraged to attend.

ABOUT THE SPEAKER: Tim Robinson is a professor at the Statistics Dept. University of Wyoming. He got his B.S. in mathematics and Psychology from James Madison University in 1989 and his M.a. and PhD Degrees in Statistics from the Virginia Polytechnic Institute in 1994 and 1997 respectively. Main research interests: Design of Experiments, Response Surface methodology, Categorical Data Analysis and applications in engineering, medicine and the environment. For more information on his research, please visit the web page http://www.uwyo.edu/statistics/facultyandstaff/robinson_tim.html

Top

¿Quién escribió la historia de la conquista de la Nueva España? ¿Es correcta la hipótesis de Duverger? Un estudio comparativo basado análisis estadístico de textos

SPEAKER: Jesús Igor Heberto Barahona Torres
LANGUAGE: Spanish
PLACE: Building C5, Aula C5016, Campus Norte, UPC (see map)
DATE: Friday 9th of June, 2017. Time: 12:30
ABSTRACT: Existen dos principales textos que narran la conquista de la Nueva España: "Las Cartas de Relación", escritas por Hernán Cortes y "Historia verdadera de la conquista de la Nueva España.", de la autoría de Bernal Díaz del Castillo. Por otra parte, en su libro "Crónicas de la Eternidad", Duverger proporciona evidencia histórica para afirmar que tales textos fueron escritos por el mismo autor: Hernán Cortes. Este trabajo hace una comparación de ambos textos sobre la base de métodos estadísticos mulivariantes. Para responder a las preguntas planteadas, calculamos los estadísticos descriptivos básicos, los índices de variedad y riqueza del vocabulario, así como el análisis de correspondencias.

ABOUT THE SPEAKER: El Dr. Igor Barahona concluyó la maestría en sistemas de calidad y productividad en el ITESM, Campus Monterrey. En 2013 se graduó del programa de Doctorado en Estadística e Investigación Operativa por la Universidad Politécnica de Catalunya. En 2014 realizó una estancia Posdoctoral en la Universidad de Manchester, RU. Ha sido consultor en desarrollo analítico para diferentes empresas en México y España. Es autor del libro "The level of adoption of analytical tools" y de seis artículos en revistas indizadas. Sus líneas de investigación se enfocan en la aplicación de métodos estadísticos orientados en investigar diferentes fenómenos sociales, la sensometría estadística de alimentos y bebidas; y el análisis bases de datos textuales de gran envergadura (big-data). Actualmente es parte del programa Cátedras-CONACYT, adscrito al Instituto de Matemáticas de la UNAM, sede Cuernavaca. Para mas información consulte su página web personal http://www.matem.unam.mx/fsd/igor

Top

Scalable count-compositional models for microbiome time-series data

SPEAKER: Justin Silverman
LANGUAGE: English
PLACE: Building C5, Aula C5016, Campus Norte, UPC (see map)
DATE: Friday 26th of May, 2017. Time: 12:30
ABSTRACT: Within the biomedical community there is an increasing recognition of the importance that host-associated microbes play in both human health and disease. Moreover, there has been much excitement over the insights that can be obtained from longitudinal measurements of these microbial communities; however, due to statistical limitations appropriate models have been lacking. Host microbiota are typically measured using high-throughput DNA sequencing which results in counts for different species. Relative abundances, assumed compositional, are then estimated from these counts. In addition, due to technological limitations, the total number of counts per sample is often small compared to the distribution of species' relative abundances. This leads to datasets with many zero or small counts. With such data, maximum likelihood estimates of sample proportions are biased and models that incorporate the sampling variability are essential. In this seminar I will explore these as well as other statistical challenges that arise in modeling microbiota time-series. I will also discuss a novel framework that addresses these challenges by combining multinomial-normal-on-the-simplex dynamic linear models with recent advances in Markov Chain Monte Carlo simulations to enable scalable Bayesian inference for microbiome time-series. Using real datasets, I will explore some of the insights that these models have provided into the forces that drive microbial dynamics.

ABOUT THE SPEAKER: Justin Silverman holds a B.S. degree in Physics and Biophysics from the John Hopkins University. He is currently PhD candidate in Computational Biology and Bioinformatics at Duke University. His PhD concerns the design and evaluation of microbiome-based therapeutics through statistical modeling. His research interests are Diet and human health, host-associated microbiota, design of diagnostics and therapeutics, complex systems, machine learning, geometric approaches to data analysis.

Top

Penalising model component complexity: A principled practical approach to constructing priors

SPEAKER: Haavard Rue
LANGUAGE: English
PLACE: FME, Edifici U, Aula: S05, Campus Sud, UPC, Pau Gargallo, 5, 02028 Barcelona (veure mapa).
DATE: Wednesday 22nd of March, 2017. Time: 14:00
ABSTRACT: Setting prior distributions on model parameters is the act of characterising the nature of our uncertainty and has proven a critical issue in applied Bayesian statistics. Although the prior distribution should ideally encode the users' uncertainty about the parameters, this level of knowledge transfer seems to be unattainable in practice and applied statisticians are forced to search for a "default" prior. Despite the development of objective priors, which are only available explicitly for a small number of highly restricted model classes, the applied statistician has few practical guidelines to follow when choosing the priors. An easy way out of this dilemma is to re-use prior choices of others, with an appropriate reference. In this talk, I will introduce a new concept for constructing prior distributions. We exploit the natural nested structure inherent to many model components, which defines the model component to be a flexible extension of a base model. Proper priors are defined to penalise the complexity induced by deviating from the simpler base model and are formulated after the input of a user-defined scaling parameter for that model component, both in the univariate and the multivariate case. These priors are invariant to reparameterisations, have a natural connection to Jeffreys' priors, are designed to support Occam's razor and seem to have excellent robustness properties, all which are highly desirable and allow us to use this approach to define default prior distributions. Through examples and theoretical results, we demonstrate the appropriateness of this approach and how it can be applied in various situations, like random effect models, spline smoothing, disease mapping, Cox proportional hazard models with time-varying frailty, spatial Gaussian fields and multivariate probit models. Further, we show how to control the overall variance arising from many model components in hierarchical models. This is onging work within the www.r-inla.org project, with Daniel Simpson (Bath), Andrea Riebler, Geir-Arne Fuglstad (NTNU) and Sigrunn Sorbye (Univ. of Tromso) and others.

ABOUT THE SPEAKER: Haavard Rue is currently Professor at the Division for Computer, Electrical and Mathematical Science and Engineering of the King Abdullah University of Science and Technology in Saudi Arabia, and has been Professor at the Department of Mathematical Sciences of the Norwegian University of Science and Technology (NTNU) in Trondheim, Norway, for many years. He has given 15 invited plenum and keynote talks at larger conferences. He has directed 25 PhD students, and served as associate editor for several journals, among others the Journal of the Royal Statistical Society, Series B, Scandinavian Journal of Statistics and the Annals of of Statistics. He is principal investigator of several large Norwegian research projects, and has taught over 25 courses and lectures on "Bayesian computing with INLA" around the world, and has (co-)authored well over 150 research articles in scientific journals. His main research interests are Computational Bayesian statistics, Bayesian methodology for priors, sensitivity and robustness, Integrated Nested Laplace Approximations (INLA), Gaussian Markov random fields, Models for dependent data, Stochastic partial differential equations for spatial modeling and Bayesian statistical models for extreme data scales. For more information on his research see his personal web page https://www.kaust.edu.sa/en/study/faculty/haavard-rue

Top

Functional Linear Regression Models for Scalar Responses on Remote Sensing Data: An Application to Oceanography

SPEAKER: Nihan Akar
LANGUAGE: English
PLACE: Edificio C5, Aula C5016, Campus Norte, UPC (ver mapa)
DATE: Friday, February the 10th, 2017. Time: 12:30
ABSTRACT: Remote Sensing (RS) data obtained from satellites are a type of spectral data which consist of reflectance values recorded at different wavelengths. This type of data can be considered as a functional data due to the continous
structure of the spectrum. The aim of this study is to propose Functional Linear Regression Models (FLRMs) to analyze the turbidity in the coastal zone of Guadalquivir estuary from satellite data. With this aim different types of FLRMs for scalar response have been used to predict the amount of Total Suspended Solids (TSS) on RS data and their
results have been compared.

(Treball conjunt amb Pedro Delicado, Gülay Basarir i Isabel Caballero)

THE SPEAKER: Nihan Acar Denizli is a Research Assistant in the Department of Statistics of Mimar Sinan Fine Arts University, Istanbul, Turkey. In her PhD she worked on functional linear models under the supervision of Gülay Basarir (MSGSU, Istanbul, Turkey) and Pedro Delicado (UPC, Barcelona, Spain). She obtained her Phd Degree in November 2016. Her research areas include Functional Data Analysis (FDA), Linear Models and Multivariate Statistical Analysis. She is collobrating as statistician in the international "Big Data Sjögren Project".

Top