Programme And Abstracts For Monday 11^th Of December

Keynote: Monday 11^th 9:10 098 Lecture Theatre (260-098)

R In Times Of Growing User Base And Data Sizes

Simon Urbanek
AT&T Labs

Abstract: R has been historically used mainly on single machines, the analyst performing both analysis and visualization locally. However, the flexible abstraction of graphics in R and its extensibility makes R a great tool to be used remotely and across large clusters. The sizes of datasets as well as the popularity of R have created a demand for extending R’s capabilities beyond single machine. In this talk we will illustrate how R can be used by many users in a collaborative open-source RCloud environment to share data analyses, visualizations and results openly. The design also allows scaling across many instances. At the same time this environment can be combined with distributed computing to scale not only with the number of users but also with the size of datasets. In the second part of the talk we will show several approaches how R can be used very efficiently for Big Data analytics at scale leveraging the Hadoop ecosystem. We will start with hmr - a faster way to use the map/reduce framework from R, introduce ROctopus which allows us to perform arbitrary operations on large data without the constraints of a map/reduce framework and show a general framework for developing and using models in R that can leverage distributed systems. We will illustrate the use of the approaches on real dataset and a large cluster.

Programme And Abstracts For Monday 11th Of December

R In Times Of Growing User Base And Data Sizes

Robust Principal Expectile Component Analysis

Effect Of Area Level Deprivation On Body Mass Index: Analysis Of NZ Health Surveys

Calendar-Based Graphics For Visualising People’s Daily Schedules

Nonparametric Test For Volatility In Clustered Multiple Time Series

IGESS: A Statistical Approach To Integrating Individual Level Genotype Data And Summary Statistics In Genome Wide Association Studies

Author Name Identification For Evaluating Research Performance Of Institutes

A Computational Tool For Detecting Copy Number Variations From Whole Genome And Targeted Exome Sequencing

Clustering Using Nonparametric Mixtures And Mode Identification

Bayesian Curve Fitting For Discontinuous Function Using Overcomplete Representation With Multiple Kernels

Estimation Of A Semiparametric Spatiotemporal Models With Mixed Frequency

LSMM: A Statistical Approach To Integrating Functional Annotations With Genome-Wide Association Studies

A Study Of The Influence Of Articles In The Large-Scale Citation Network

Estimating Links Of A Network From Time To Event Data

Estimation Of A High-Dimensional Covariance Matrix

Innovative Bayesian Estimation In The von Mises Distribution

Evidence Of Climate Change From Nonparametric Change-Point Analysis

Joint Analysis Of Individual Level Genotype Data And Summary Statistics By Leveraging Pleiotropy

An Advanced Approach For Time Series Forecasting Using Deep Learning

Genetic Map Estimation Using Hidden Markov Models In The Presence Of Partially Observed Information

A Simple Method For Grouping Patients Based On Historical Doses

Semiparametric Mixed Analysis Of Covariance Model

Adaptive False Discovery Rate Regression With Application In Integrative Analysis Of Large-Scale Genomic Data

Structure Of Members In The Organization To Induce Innovation: Quantitatively Analyze The Capability Of The Organization

Vector Generalized Linear Time Series Models

Local Canonical Correlation Analysis For Multimodal Labeled Data

A Practitioners Guide To Deep Learning For Predictive Analytics On Structured Data

Clustering Of Research Subject Based On Stochastic Block Model

Zen And The aRt Of Workflow Maintenance

Canonical Covariance Analysis For Mixed Numerical And Categorical Three-Way Three-Mode Data

Variable Selection Algorithms

Estimating Causal Structures For Continuous And Discrete Variables

Incorporating Genetic Networks Into Case-Control Association Studies With High-Dimensional DNA Methylation Data

Adaptive Model Checking For Functional Single-Index Models

Mobile Learning In Teaching Bioinformatics For Medical Doctors

On Optimal Group Testing Designs: Prevalence Estimation, Cost Considerations, And Dilution Effects

The Use Of Bayesian Networks In Grape Yield Prediction

Pattern Prediction For Time Series Data With Change Points

Test For Genomic Imprinting Effects On The X Chromosome

Fluctuation Reduction Of Value-At-Risk Estimation And Its Applications

E-Learning Courses On Introductory Statistics Using Interactive Educational Tools

Estimation Of Animal Density From Acoustic Detections

Mixed Models For Complex Survey Data

Regression With Random Effects For Analysing Correlated Survival Data: Application To Disease Recurrences

Genetic Predictors Underlying Long-Term Cognitive Recovery Following Mild Traumatic Brain Injury

Bayesian Structure Selection For Vector Autoregression Model

Three-Dimensional Data Visualization Education With Virtual Reality

Talk Data To Me

Smooth Nonparametric Regression Under Shape Restrictions

Elastic-Band Transform: A New Approach To Multiscale Visualization

Meta-Analytic Principal Component Analysis In Integrative Omics Application

Flight To Relative Safety: Learning From A No-Arbitrage Network Of Yield Curves Model Of The Euro Area

Bayesian Analyses Of Non-Homogeneous Gaussian Hidden Markov Models

Robustness Of Temperature Reconstruction For Past 500 Years

Nonparametric Causal Inference By The Kernel Method

A Unified Regularized Group PLS Algorithm Scalable To Big Data

Evaluation Of Spatial Cluster Detection Method Based On All Geographical Linkage Patterns

Scoring Rules For Prediction And Classification Challenges

Meta-Analysis With Symbolic Data Analysis And Its Application For Clinical Data

Real-Time Transit Network Modelling For Improved Arrival Time Predictions

Visualization And Statistical Modeling Of Financial Big Data

Sparse Group-Subgroup Partial Least Squares With Application To Genomic Data

Genetic Approach And Statistical Approach For Association Study On DNA Data

Modeling Of Document Abstraction Using Association Rule Based Characterization

Bayesian Static Parameter Inference For Partially Observed Stochastic Systems

Bayesian Survival Analysis Of Batsmen In Test Cricket

Covariate Discretisation On Big Data

BIG-SIR A Sliced Inverse Regression Approach For Massive Data

Symbolic Data Analytical Approach To Unauthorized-Access Logs

My Knee Still Hurts; The Statistical Pathway To The Development Of A Clinical Decision Aid

Programme And Abstracts For Monday 11^th Of December