Big data and statistics: A statistician’s perspective

Authors

  • David Rossell University of Warwick (United Kingdom).

DOI:

https://doi.org/10.7203/metode.0.3590

Keywords:

Big Data, statistics, case studies, pitfalls, challenges

Abstract

Big Data brings unprecedented power to address scientific, economic and societal issues, but also amplifies the possibility of certain pitfalls. These include using purely data-driven approaches that disregard understanding the phenomenon under study, aiming at a dynamically moving target, ignoring critical data collection issues, summarizing or preprocessing the data inadequately and mistaking noise for signal. We review some success stories and illustrate how statistical principles can help obtain more reliable information from data. We also touch upon current challenges that require active methodological research, such as strategies for efficient computation, integration of heterogeneous data, extending the underlying theory to increasingly complex questions and, perhaps most importantly, training a new generation of scientists to develop and deploy these strategies.

Downloads

Download data is not yet available.

Author Biography

David Rossell, University of Warwick (United Kingdom).

Professor at the Department of Statistics. University of Warwick (United Kingdom).

References

Berry, D., 2012. «Adaptive Clinical Trials in Oncology». Nature Reviews Clinical Oncology, 9: 199-207. DOI: <10.1038/nrclinonc.2011.165>.

Curtice, J. and D. Firth, 2008. «Exit Polling in a Cold Climate: the BBC-ITV Experience Explained». Journal of the Royal Statistical Society A, 171(3): 509-539. DOI: <10.1111/j.1467-985X.2007.00536.x>.

Fan, J.; Han, F. and H. Liu, 2014. «Challenges of Big Data Analysis». National Science Review, 1 (2): 293-314. DOI: <10.1093/nsr/nwt032>.

Font-Burgada, J.; Reina, O.; Rossell, D. and F. Azorín, 2013. «ChroGPS, a Global Chromatin Positioning System for the Functional Analysis and Visualization of the Epigenome». Nucleic Acids Research, 42(4): 1-12. DOI: <10.1093/nar/gkt1186>.

Gorton, G., 2009. «Information, Liquidity, and the (Ongoing) Panic of 2007». American Economic Review, 99(2): 567-572. DOI: <10.1257/aer.99.2.567>.

Hilbert, M., 2012. «How Much Information Is There in the “Information Society”?». Significance, 9(4): 8-12. DOI: <10.1111/j.1740-9713.2012.00584.x>.

International Business Machines Corporation, 2011. IBM Big Data Success Stories. International Business Machines Corporation. Armonk, NY. Available at: <http://public.dhe.ibm.com/software/data/sw-library/big-data/ibm-big-data-success.pdf>.

Jordan, M., 2013. «On Statistics, Computation and Scalability». Bernoulli, 19(4): 1378-1390. DOI: <10.3150/12-BEJSP17>.

King, G. et al., 2009. «Public Policy for the Poor? A Randomized Assessment of the Mexican Universal Health Insurance Programme». The Lancet, 373: 1447-1454. DOI: <10.1016/S0140-6736(09)60239-7>.

Lazer, D.; Kennedy, R.; King, G. and A. Vespignani, 2014. «The Parable of Google Flu: Traps in Big Data Analysis». Science, 343(6176): 1203-1205. DOI: <10.1126/science.1248506>.

Lewis, M., 2003. Moneyball. The Art of Winning an Unfair Game. W. W. Norton & Company. New York.

Lohr, S., 2012. «The age of Big Data». The New York Times, 11 February 2012. Available at: <www.nytimes.com/2012/02/12/sunday-review/big-datas-impact-in-the-world.html>.

Müller, P.; Parmigiani, G.; Robert, C. and J. Rousseau, 2004. «Optimal Sample Size for Multiple Testing: the Case of Gene Expression Microarrays». Journal of the American Statistical Association, 99(468): 990-1001. DOI: <10.1198/016214504000001646>.

Nuzzo, R., 2014. «Scientific Method: Statistical Errors», Nature, 506: 150-152. DOI: <10.1038/506150a>.

Rossell, D.; Stephan-Otto Attolini, C.; Kroiss, M. and A. Stöcker, 2014. «Quantifying Alternative Splicing from RNA-Sequencing Data». The Annals of Applied Statistics, 8(1): 309-330. DOI: <10.1214/13-AOAS687>.

Silver, N., 2012. The Signal and the Noise: Why So Many Predictions Fail – but Some Don’t. Penguin Press. New York.

Shaw, J., 2014. «Why “Big Data” Is a Big Deal». Harvard Magazine, 3: 30-35, 74-75. Available at: <http://harvardmag.com/pdf/2014/03-pdfs/0314-30.pdf>.

Student, 1931. «The Lanarkshire Milk Experiment». Biometrika, 23(3-4): 398-406. DOI: <10.2307/2332424>.

World Economic Forum, 2012. Big Data, Big Impact: New Possibilities for International Development. World Economic Forum. Cologny, Switzerland. Available at: <www3.weforum.org/docs/WEF_TC_MFS_BigDataBigImpact_Briefing_2012.pdf>.

Published

2015-04-16

How to Cite

Rossell, D. (2015). Big data and statistics: A statistician’s perspective. Metode Science Studies Journal, (5), 143–149. https://doi.org/10.7203/metode.0.3590
Metrics
Views/Downloads
  • Abstract
    1285
  • PDF (Català)
    364
  • PDF (Español)
    161
  • PDF
    172

Issue

Section

The digits of science. Statistics as scientific tool

Metrics