Skip to Main Content

Research Data Management

Data Analysis

What is Data Analysis?

Data Analysis is the process of systematically applying statistical and/or logical techniques to describe and illustrate, condense and recap, and evaluate data. The procedure helps reduce the risks inherent in decision-making by providing useful insights and statistics, often presented in charts, images, tables, and graphs

Types of Data Analysis

  • Descriptive Analysis

Descriptive analysis involves summarizing and describing the main features of a dataset. It focuses on organizing and presenting the data in a meaningful way, often using measures such as mean, median, mode, and standard deviation. It provides an overview of the data and helps identify patterns or trends.

  • Inferential Analysis

Inferential analysis aims to make inferences or predictions about a larger population based on sample data. It involves applying statistical techniques such as hypothesis testing, confidence intervals, and regression analysis. It helps generalize findings from a sample to a larger population.

  • Exploratory Data Analysis (EDA)

EDA focuses on exploring and understanding the data without preconceived hypotheses. It involves visualizations, summary statistics, and data profiling techniques to uncover patterns, relationships, and interesting features. It helps generate hypotheses for further analysis.

  • Diagnostic Analysis

Diagnostic analysis aims to understand the cause-and-effect relationships within the data. It investigates the factors or variables that contribute to specific outcomes or behaviors. Techniques such as regression analysis, ANOVA (Analysis of Variance), or correlation analysis are commonly used in diagnostic analysis.

  • Predictive Analysis

Predictive analysis involves using historical data to make predictions or forecasts about future outcomes. It utilizes statistical modeling techniques, machine learning algorithms, and time series analysis to identify patterns and build predictive models. It is often used for forecasting sales, predicting customer behavior, or estimating risk.

  • Prescriptive Analysis

Prescriptive analysis goes beyond predictive analysis by recommending actions or decisions based on the predictions. It combines historical data, optimization algorithms, and business rules to provide actionable insights and optimize outcomes. It helps in decision-making and resource allocation.

Data Analysis Methods

  • Qualitative Data Analysis

The qualitative data analysis method derives data via words, symbols, pictures, and observations. This method doesn’t use statistics. The most common qualitative methods include:

  1. Content Analysis, for analyzing behavioral and verbal data.
  2. Narrative Analysis, for working with data culled from interviews, diaries, and surveys.
  3. Grounded Theory, for developing causal explanations of a given event by studying and extrapolating from one or more past cases.
  • Quantitative Data Analysis

Also known as statistical data analysis methods collect raw data and process it into numerical data. Quantitative analysis methods include:

  1. Hypothesis Testing, for assessing the truth of a given hypothesis or theory for a data set or demographic.
  2. Mean, or average determines a subject’s overall trend by dividing the sum of a list of numbers by the number of items on the list.
  3. Sample Size Determination uses a small sample taken from a larger group of people and analyzed. The results gained are considered representative of the entire body. 

Data Visualization

Data visualization is the graphical representation of information and data. By using visual elements like charts, graphs, and maps, data visualization tools provide an accessible way to see and understand trends, outliers, and patterns in data. Data visualization is commonly used for idea generation, and illustration, so as to help teams and individuals convey data more effectively to colleagues and decision-makers. Frequently used types of data visualizations are listed in the following tabs.

Bar Chart

A bar chart is one of the most commonly used forms to present quantitative data. It is simple to create and to understand. It is best used when comparing data from different categories. A bar chart is simple: We usually have a few values – ordered as categories on the x or y axis. Then we have the values expressed as bars (horizontal) or columns (vertical). The extent of the bars is the value.

Bar Chart of Race & Ethnicity in New York (2015)

File:Bar Chart of Race & Ethnicity in New York (2015).svg

Datawheel, CC0, via Wikimedia Commons

Pie Chart

A pie chart is used to display the proportions of a whole. These charts are useful for percentages. When making a pie chart, please note:

  • All portions should add up to a total of 100%.
  • Sizes of the portions should represent their value.
  • Not too many variables

Line Chart

A line chart is a type of chart used to show information that changes over time. We plot line charts using several points connected by straight lines. The line chart comprises two axes known as the 'x' axis and 'y' axis. The horizontal axis is known as the x-axis.

Scatter Plot

A scatter plot is a type of plot or diagram to display values for typically two variables for a set of data.

Scatter plots show whether there is a relationship between two variables. The trend line shows the central tendency of the data.

Scatter Plots | A Complete Guide to Scatter Plots

Yi, M. (2019). A complete guide to scatter plots. Retrieved February25, 2021.

Box Plot

A box plot or boxplot is a method for graphically depicting groups of numerical data through their quartiles. Box plots may also have lines extending from the boxes indicating variability outside the upper and lower quartiles.

Michelson experiment (1881)

Data Analysis and Visualization Resources and Tools

The Research Data Services department assists with:

  • Open-access tools in analyzing and visualizing data. 
  • Identify what types of visualization best fit your needs
  • Provide tips on creating your visualization
  • Offer guidance on interpreting data analysis results 

Other Resources on Campus

 

What is R? What is RStudio?

R is more of a programming language than just a statistics program. it is “a language for data analysis and graphics.” You can use R to create, import, and scrape data from the web; clean and reshape it; visualize it; run statistical analysis and modeling operations on it; text and data mine it; and much more. 

RStudio is a user interface for working with R. It is called an Integrated Development Environment (IDE): a piece of software that provides tools to make programming easier. RStudio acts as a sort of wrapper around the R language. 


Install R and RStudio

R and RStudio are two separate pieces of software:

  • R is a programming language that is especially powerful for data exploration, visualization, and statistical analysis
  • RStudio is an integrated development environment (IDE) that makes using R easier. In this course we use RStudio to interact with R.

Windows

  • Download R from the CRAN website.
  • Run the .exe file that was just downloaded
  • Go to the RStudio download page
  • Under Installers select RStudio x.yy.zzz - Windows Vista/7/8/10 (where x, y, and z represent version numbers)
  • Double click the file to install it
  • Once it’s installed, open RStudio to make sure it works and you don’t get any error messages.
MacOS
  • Download R from the CRAN website.
  • Select the .pkg file for the latest R version
  • Double click on the downloaded file to install R
  • It is also a good idea to install XQuartz (needed by some packages)
  • Go to the RStudio download page
  • Under Installers select RStudio x.yy.zzz - Mac OS X 10.6+ (64-bit) (where x, y, and z represent version numbers)
  • Double click the file to install RStudio
  • Once it’s installed, open RStudio to make sure it works and you don’t get any error messages.

Open Access e-book: Grolemund, G. (2014). Hands-on programming with R: Write your own functions and simulations. " O'Reilly Media, Inc.". 

Python

Python is a high-levelgeneral-purpose programming language. Its design philosophy emphasizes code readability with the use of significant indentation. Python is dynamically typed and garbage-collected. It supports multiple programming paradigms, including structured (particularly procedural), object-oriented and functional programming. It is often described as a "batteries included" language due to its comprehensive standard library.

- from WIKIPedia

undefined

NVivo

NVivo is a software program used for qualitative and mixed-methods research. Specifically, it is used for the analysis of unstructured text, audio, video, and image data, including (but not limited to) interviews, focus groups, surveys, social media, and journal articles

NVivo 14 - Lumivero

Tableau

With Tableau, users can upload data from spreadsheets, cloud-based data management software, and online databases and merge them to identify trends, filter databases, and forecast outcomes. Users also can drag and drop information to transform data and instantly create charts and visualizations. Start your free trial of Tableau here

Power BI

Microsoft’s Power BI software provides business intelligence and data analytics tools to clean and transform data, merge data from different sources, and perform grouping, clustering, and forecasting to find patterns in the data. Start Power BI for free here

Google Charts

Google’s free Charts software creates customizable charts, maps, and diagrams from imported datasets.