Check out this great visualization of data from New York designer Matt Daniels. He analyzed 85 rappers to find which emcee has the largest vocabulary. The dataset was selected from the first 35,000 lyrics of each artist. It's interactive so be sure to visit the page for more information on each artist.
Screen shot from http://poly-graph.co/vocabulary.html
Alkek Library has access to several databases specifically for browsing data sets. More are listed here but the databases below may be useful for this course.
Data Citation Index:
Part of the Web of Knowledge, the Data Citation Index fully indexes a significant number of the world's leading data repositories of critical interest to the scientific community, including over two million data studies and datasets.
Data Planet Statistical Datasets:
Interactive database of statistics that enables users to create tables, maps, and figures from a variety data sources covering banking, criminal justice, education, energy, food and agriculture, government, health, housing and construction, industry and commerce, labor and employment, natural resources and environment, income, cost of living, stocks, transportation, and more. Data holdings for the United States are significant with some data available at state, county, or local geographies. International data, available at the country level, include population, food and agriculture, labor, trade, and more. Data are organized by subject and source.
Vast archive of social science data. Can use with statistical software, such as SAS, SPSS, & Stata. Thematic categories include census data, community & urban studies, conflict, aggression, economic behavior, education, leadership, geography, health care, legal systems, mass political behavior, and organizational behavior.
IMF eLibrary - Data:
Cross search 4 sets of statistics. International Financial Statistics – all aspects of international and domestic finance, with history to 1948. Direction of Trade - value of exports and imports between countries and their trading partners, with history to 1980. Balance of Payments - international economic transactions data and International Investment Position, with history to 1960. Government Finance Statistics - budgetary and extra-budgetary financial operations data of governments, with history to 1990.
Need to browse data? Here's a list of several resources to search for interesting datasets.
ANES Data Center:
American National Election Studies provides data from its own surveys on voting, public opinion, and political participation and seeks to explain election outcomes.
Corruption Perception Index:
The Internet Center for Corruption Research provides empirical and experimental data on corruption and studies on reform. Has data tables as well as an interactive world map.
Housed at the Institute for Quantitative Social Science (IQSS) at Harvard, hosts the world's largest collection of social science research
Hadoop Illuminated - Publicly available big datasets:
Get pointers for working with large data sets, links to generic repositories, geographic, government, and web data.
Human Development Reports:
The Human Development Index (HDI) is a measure of achievement in the basic dimensions of human development across countries.
A listing of demographic data resources provided by the Minnesota Population Center. Some data requires registration and agreement to usage licenses but all information is free.
Open Data Network:
Browse data in Finance, Public Safety, Infrastructure, Education, Transportation, Politics, and more
Pew Research Center Datasets:
Non-partisan public opinion polling, demographic research, content analysis and other data-driven social science research
A global registry of research data repositories from different disciplines.
Roper Center Archives:
Specializes in data from public opinion surveys
World We Want Trends:
Data visualizations, activity maps, and more. Data from the United Nations Survey for a Better World.
Created by Mark Edward Phillips, part of the UNT Data Repository
Nelson Mandela Twitter Dataset:
"This dataset contains Twitter JSON data for several Twitter search queries that were collected the week following the death of Nelson Mandela on December 5, 2013 using the twarc (https://github.com/edsu/twarc) package that makes use of Twitter's search API. A total of 10,678,479 Tweets make up the combined dataset"
Stand With Wendy Twitter Dataset:
"This dataset contains Twitter JSON data for several Twitter search queries collected the week following the filibuster by Wendy Davis in the Texas Senate related to Senate Bill 5, using the twarc (https://github.com/edsu/twarc) package that makes use of Twitter's search API. A total of 560,954 Tweets make up the combined dataset."
Yes All Women Twitter Dataset:
"This dataset contains Twitter JSON data for several Twitter search queries that were collected around the #YesAllWomen Twitter "conversation" between May 25, 2014 and June 8, 2014 using the twarc (https://github.com/edsu/twarc) package that makes use of Twitter's search API. A total of 2,805,763 Tweets and 34,532 images make up the combined dataset. "
ICWSM (International Conference on Web & Social Media) Dataset Sharing Service:
Link directs to 2015 datasets but multiple years are available. Must agree and send in Dataset Usage Agreement as per directions on bottom of page.
JSON downloads of daily twitter streams