Data Competitions
Kaggle. A series of data competitions in which teams compete the model the data and predict the future. A USD500 prize for each competition.

Ready for Teaching
OzDASL. A library of data sets for teachers of statistics in Australian and New Zealand. Gordon Smyth, Walter and Eliza Hall Institute of Medical Research.

Data and Story Library. DASL (pronounced "dazzle") is an online library of datafiles and stories that illustrate the use of basic statistics methods. Stories are classified according to statistical methods and major topics of interest. Well organized. Perhaps the best single source of data sets for teaching an introductory class. DASL Project, Cornell University.

Datasets for Learning Biostatistical Modeling Techniques. Includes the Titanic survival data set. Frank Harrell, Vanderbilt Medical Center.

JSE Data Archive. Journal of Statistics Education archive of data sets for teaching.

Electronic Dataset Service. Data sets classified by statistical methodology. Trina Hosmer, University of Massachusetts.

Statistics UCLA. Jan de Leeuw, University of California, Los Angeles.

Case Studies. Several dozen case studies with questions and hints.

Data Sets. Select data by subject area, data from textbooks and consulting projects.

Time Series Data Library. Over 500 data sets. Robert Hyndman, Monash University.

Xtreme Database. Currently contains 18 datasets. Michael Thomas, Universitaet-Gesamthochschule, Siegen.

WWS509 Generalized linear models. Data sets for a graduate course at Princeton University. Germán Rodríguez, Princeton University.

TextBooks

Statistics UCLA: Data Sets. Jan de Leeuw, University of California Los Angeles.

Statistics by Freedman, Pisani, and Purves.

Applied Statistical Methods: for Business, Economics and the Social Sciences by W L Carlson and B Thorne.

Applied Statistics by David Cox and E J Snell.

Handbook of Small Data Sets by Hand, Daly, Lunn, McConway and Ostrowski. See also the review by Elizabeth Coates, Sheffield Hallam University. Local copy.

The Basic Practice of Statistics by David Moore.

University of Wisconsin Data Archive. Douglas Bates, University of Wisconsin.

Statistics for Experimenters by Box, Hunter and Hunter.

Probability and Statistics for Engineering and the Sciences by J Devore.

Analysis of Messy Data by Milliken and Johnson.

Practical Data Analysis for Designed Experiments by Yandell.

Analysis of Longitudinal Data by Diggle, Liang and Zeger. University of Lancaster.

Applied Linear Regression by Sanford Weisberg.

Applied Regression Analysis: a Research Tool by John Rawlings. North Carolina State University.

Applied Statistics for Engineers and Physical Scientists by Hogg and Ledolter. Cornell University.

Case Studies in Biometry by Nicholas Lange et al.

Casebook for a First Course in Statistics and Data Analysis by Samprit Chatterjee, Mark Handcock and Jeffrey Simonoff. Case studies and data files suitable for an introductory statistics course. Includes
additional case studies
not in the book. New York University. Local copy.

Engineering Statistics by Montgomery, Runger and Hubele. Arizona State University.

Forecasting: Methods and Applications by Makridakis, Wheelwright and Hyndman.

Fundamentals of Biostatistics by Bernard Rosner.

Generalized Linear Models by Peter McCullagh and John Nelder. University of Chicago.

Introduction to the Practice of Statistics by David Moore and George McCabe.

Modern Applied Statistics with S-Plus
by Bill Venables and Brian Ripley. S-Plus functions and data sets. Oxford University.

Nonlinear Regression Analysis by Douglas Bates and Donald Watts.

Pattern Recognition and Neural Networks by Brian Ripley, Oxford University.

Regression with Graphics by Lawrence Hamilton.

Statistical Methods in Cancer Research by Norman Breslow and N. E. Day. University of Washington.

Statistical Modelling in GLIM by Murray Aitkin, Dorothy Anderson, Brian Francis and John Hinde. Datasets and macros. See first the table of contents of available data sets. Statlib. Local copy.

Stat Labs: Mathematical Statistics Through Applications by Deb Nolan and Terry Speed, University of California, Berkeley.

TIMESLAB by Joe Newton.

Larger or Less Processed
Data: a Collection of Problems from many Fields for the Student and Research Worker by D.F. Andrews and A.M. Herzberg. Data from the book. Some data sets are classics. Many others do not yield to standard analyses. Statlib. Also available by ftp from the University of Toronto or UCLA, including the whole collection as a compressed tar file.

Dr B's Wide World of Web Data. Links to hundreds of data sets, organized by subject matter. John Behrens, Arizona State University.

Graphics Data Expositions. Data for the bi-annual data expositions of the Statistical Graphics Section of the ASA.

Journal of Applied Econometrics Data Archive. Data from JAE articles accepted after January 1994. Queen's University.

Multilevel Data Sets. For teaching and training in multilevel model methods.

Peter J Diggle Data Sets. Geostatistical and Spatial point pattern data sets. Peter Diggle, University of Lancaster.

SPSS Data Sets. Data sets for SPSS and SYSTAT, and a selection of other public data sets. SPSS Inc.

Statistical Reference Datasets. The purpose of this project is to improve the accuracy of statistical software by providing reference datasets with certified computational results that enable the objective evaluation of statistical software. NIST.

Statlib

Datasets. Main StatLib data archive.

Breakfast Cereal Data

. From the 1993 Graphics Expo.

Case Studies in Biometry. Data diskette for the book by Nicholas Lange, Louise Ryan, Lynne Billard, David Brillinger, Loveday Conquest, Joel Greenhouse. Wiley, 1994.

Data Expositions. Data sets used for the annual ASA Statistical Graphics and Computing Data Expositions.

Disease Data. From the 1991 Statistics in Public Health Surveillance Exposition.

JASA Data. Contributed datasets from articles published in the Journal of the American Statistical Association.

King Crab Data. A large but patchy data set. Although the topic is in principle an interesting one, my students have had trouble assembling any useful data set from the various files associated with this project. 1990 Data Expo.

Oscillator Time Series. From the 1993 Graphics Expo.

UCI Machine Learning Repository. Over 100 datasets from large to small. Christopher Merz, University of California, Irvine.

University of Wisconsin Data Archive. Data sets from masters exams and several books, including Box, Hunter & Hunter; Devore; Milliken & Johnson'sAnalysis of Messy Data; Yandell's Practical Data Analysis for Designed Experiments. Douglas Bates, University of Wisconsin.

Workshop on Smoothing Applications, UBC June 1999: Data Sets. A collection of substantial, interesting and well documented data sets to be used by speakers at the workshop in 1999. Nancy Heckman, University of British Columbia.

Sources of Raw Data
Australian Bureau of Meteorology

Australian Historical Financial Data. Reserve Bank of Australia.

Australian Social Science Data Archive.

Council of European Social Science Data Archives. Provides a clickable map of social science data archives all over the world, and an integrated data catalogue for social science data archives.

Documents Center. A excellent index to government statistical data on the Web, both United States and international, maintained by the Documents Center of the University of Michigan.

Data Zoo. California coastal data collection programs. Organized by experiment, instrument type and geographical region. Center for Coastal Studies, University of California, San Diego.

Economic and Social Research Council Data Archive. Largest collection of accessible computer-readable data in the social sciences and humanities in the UK.

LDEO Climate Data Catalog. Earth science data, primarily oceanographic and atmospheric datasets. University of Columbia.

NZ Social Science Research Data and Information Services Centre. Contains 33 social science datasets. Approval needs to be signed to access many of them.

Minnesota River Basin Data Center.

Project Gutenberg. Full text online for a huge number of books, including such things as the World Factbook. Major public domain books or classics for which copyright has expired are likely to be here.

US Census Reference. The complete US Census on one CD. GeoLytics Inc.

VIMS Pier Ambient Monitoring Data. Local conditions on the York River at Gloucester Point, VA. You can download water parameters and meteorological variables measured at 6 minutes intervals for the past 10 days, or view graphs of the same variables for the current and past years. Virginia Institute of Marine Science, College of William & Mary, Gloucester Point, VA.

Other Lists of Links
More data sources. Julian Faraway, University of Michigan.