Skip to Main Content

Research Data Management

Need help ?

Our team is here to support you.

πŸ”— Book a Consultation

πŸ“ Location: ALK 452

We assist with:

  • Using open-access tools for data analysis and visualization
  • Offering guidance on interpreting data analysis results

Data Analysis

Data Analysis is the process of systematically applying statistical and/or logical techniques to describe and illustrate, condense and recap, and evaluate data. The procedure helps reduce the risks inherent in decision-making by providing useful insights and statistics, often presented in charts, images, tables, and graphs.

Data Analysis

Different Types of Data Analysis

1. Descriptive Analysis

Descriptive analysis involves summarizing and describing the main features of a dataset. It focuses on organizing and presenting the data in a meaningful way, often using measures such as mean, median, mode, and standard deviation. It provides an overview of the data and helps identify patterns or trends.

2. Inferential Analysis

Inferential analysis aims to make inferences or predictions about a larger population based on sample data. It involves applying statistical techniques such as hypothesis testing, confidence intervals, and regression analysis. It helps generalize findings from a sample to a larger population.

3. Exploratory Data Analysis (EDA)

EDA focuses on exploring and understanding the data without preconceived hypotheses. It involves visualizations, summary statistics, and data profiling techniques to uncover patterns, relationships, and interesting features. It helps generate hypotheses for further analysis.

4. Diagnostic Analysis

Diagnostic analysis aims to understand the cause-and-effect relationships within the data. It investigates the factors or variables that contribute to specific outcomes or behaviors. Techniques such as regression analysis, ANOVA (Analysis of Variance), or correlation analysis are commonly used in diagnostic analysis.

5. Predictive Analysis

Predictive analysis involves using historical data to make predictions or forecasts about future outcomes. It utilizes statistical modeling techniques, machine learning algorithms, and time series analysis to identify patterns and build predictive models. It is often used for forecasting sales, predicting customer behavior, or estimating risk.

6. Prescriptive Analysis

Prescriptive analysis goes beyond predictive analysis by recommending actions or decisions based on the predictions. It combines historical data, optimization algorithms, and business rules to provide actionable insights and optimize outcomes. It helps in decision-making and resource allocation.

Data Analysis Methods

1. Qualitative Data Analysis

The qualitative data analysis method derives data via words, symbols, pictures, and observations. This method doesn’t use statistics. The most common qualitative methods include:

  • Content Analysis, for analyzing behavioral and verbal data.
  • Narrative Analysis, for working with data culled from interviews, diaries, and surveys.
  • Grounded Theory, for developing causal explanations of a given event by studying and extrapolating from one or more past case.

2. Quantitative Data Analysis

Also known as statistical data analysis methods, quantitative approaches collect raw data and process it into numerical data. Quantitative analysis methods include:

  • Hypothesis Testing, for assessing the truth of a given hypothesis or theory for a data set or demographic.
  • Mean (average), which determines a subject’s overall trend by dividing the sum of a list of numbers by the number of items on the list.
  • Sample Size Determination, which uses a small sample taken from a larger group and analyzed. The results gained are considered representative of the entire body.

Data Visualization and Its Types

Data visualization is the graphical representation of information and data. By using visual elements like charts, graphs, and maps, data visualization tools provide an accessible way to see and understand trends, outliers, and patterns in data. Data visualization is commonly used for idea generation and illustration, so as to help teams and individuals convey data more effectively to colleagues and decision-makers. Frequently used types of data visualizations are listed in the following tabs.

Data Visualization Example

A bar chart is one of the most commonly used forms to present quantitative data. It is simple to create and to understand. It is best used when comparing data from different categories. A bar chart is simple: We usually have a few values – ordered as categories on the x or y axis. Then we have the values expressed as bars (horizontal) or columns (vertical). The extent of the bars is the value.

Bar Chart of Race & Ethnicity in New York (2015)

File:Bar Chart of Race & Ethnicity in New York (2015).svg

Datawheel, CC0, via Wikimedia Commons

A pie chart is used to display the proportions of a whole. These charts are useful for percentages. When making a pie chart, please note:

  • All portions should add up to a total of 100%.
  • Sizes of the portions should represent their value.
  • Avoid using too many variables.

Pie Chart Example

A line chart is a type of chart used to show information that changes over time. We plot line charts using several points connected by straight lines. The line chart comprises two axes known as the 'x' axis and 'y' axis. The horizontal axis is known as the x-axis.

Line Chart Example

A scatter plot is a type of plot or diagram to display values for typically two variables for a set of data.

Scatter plots show whether there is a relationship between two variables. The trend line shows the central tendency of the data.

Scatter Plots | A Complete Guide to Scatter Plots

Yi, M. (2019). A complete guide to scatter plots. Retrieved February25, 2021.

A box plot or boxplot is a method for graphically depicting groups of numerical data through their quartiles. Box plots may also have lines extending from the boxes indicating variability outside the upper and lower quartiles.

Michelson experiment (1881)

Data Analysis and Visualization Resources and Tools

πŸ“Š Research Data Services Department

     Need help with data analysis or visualization? Our team is here to support you.

      πŸ”— Schedule an appointment

      πŸ“ Location: ALK 452

     We assist with:

  • Open-access tools in analyzing and visualizing data. 
  • Identify what types of visualization best fit your needs
  • Provide tips on creating your visualization
  • Offer guidance on interpreting data analysis results 

πŸ“š Other Resources on Campus

πŸ“– Open Access E-books

 

What is R? What is RStudio?

R logo

R is more of a programming language than just a statistics program. It is “a language for data analysis and graphics.” You can use R to create, import, and scrape data from the web; clean and reshape it; visualize it; run statistical analysis and modeling operations on it; text and data mine it; and much more.

RStudio is a user interface for working with R. It is called an Integrated Development Environment (IDE): a piece of software that provides tools to make programming easier. RStudio acts as a sort of wrapper around the R language.


Install R and RStudio

R and RStudio are two separate pieces of software:

  • R is a programming language that is especially powerful for data exploration, visualization, and statistical analysis
  • RStudio is an integrated development environment (IDE) that makes using R easier. In this course we use RStudio to interact with R.

Windows

  • Download R from the CRAN website.
  • Run the .exe file that was just downloaded
  • Go to the RStudio download page
  • Under Installers select RStudio x.yy.zzz - Windows Vista/7/8/10 (where x, y, and z represent version numbers)
  • Double click the file to install it
  • Once it’s installed, open RStudio to make sure it works and you don’t get any error messages.
MacOS
  • Download R from the CRAN website.
  • Select the .pkg file for the latest R version
  • Double click on the downloaded file to install R
  • It is also a good idea to install XQuartz (needed by some packages)
  • Go to the RStudio download page
  • Under Installers select RStudio x.yy.zzz - Mac OS X 10.6+ (64-bit) (where x, y, and z represent version numbers)
  • Double click the file to install RStudio
  • Once it’s installed, open RStudio to make sure it works and you don’t get any error messages.

Open Access e-book: Grolemund, G. (2014). Hands-on programming with R: Write your own functions and simulations. " O'Reilly Media, Inc.". 

What is Python?

Python logo

Python is a high-level, general-purpose programming language. Its design philosophy emphasizes code readability with the use of significant indentation. Python is dynamically typed and garbage-collected. It supports multiple programming paradigms, including structured (particularly procedural), object-oriented, and functional programming. It is often described as a “batteries included” language due to its comprehensive standard library.

- from Wikipedia


Installation Guide

Mac (macOS)

  • Go to the Python Downloads page for macOS.
  • Download the latest macOS installer (universal2 or Intel/Apple Silicon as appropriate).
  • Double-click the downloaded .pkg file and follow the on-screen instructions.
  • After installation, open Terminal and type python3 --version to confirm installation.
  • Optional: Install a package manager like Homebrew and run brew install python for easier updates.

Windows

  • Go to the Python Downloads page for Windows.
  • Click the latest Windows installer (64-bit) and run the downloaded .exe file.
  • Important: Check the box that says "Add Python to PATH" before clicking Install Now.
  • After installation, open Command Prompt and type python --version to confirm installation.
  • You can also manage Python versions using Miniconda or pyenv-win.

Open Access e-book: Introduction to Python Programming - OpenStax

NVivo

NVivo 14 - Lumivero

NVivo is a software program used for qualitative and mixed-methods research. Specifically, it is used for the analysis of unstructured text, audio, video, and image data, including (but not limited to) interviews, focus groups, surveys, social media, and journal articles.


Installation Guide

Mac (macOS)

  • Go to the NVivo official website.
  • Click Free Trial or Buy Now to access the download options (you may need to create a Lumivero account).
  • Choose the macOS installer and download the .dmg file.
  • Double-click the downloaded file and drag the NVivo icon into your Applications folder.
  • Open NVivo from Applications, sign in with your Lumivero account, and activate your license or trial key.

Windows

  • Go to the NVivo official website.
  • Select Free Trial or Buy Now to access the download options (login or create a Lumivero account if required).
  • Download the Windows installer (.exe file) and run it.
  • Follow the on-screen instructions to complete installation.
  • After installation, launch NVivo, sign in with your Lumivero account, and activate using your license or trial key.

Tableau

With Tableau, users can upload data from spreadsheets, cloud-based data management software, and online databases and merge them to identify trends, filter databases, and forecast outcomes. Users also can drag and drop information to transform data and instantly create charts and visualizations. Start your free trial of Tableau here

Power BI

Microsoft’s Power BI software provides business intelligence and data analytics tools to clean and transform data, merge data from different sources, and perform grouping, clustering, and forecasting to find patterns in the data. Start Power BI for free here

Google Charts

Google’s free Charts software creates customizable charts, maps, and diagrams from imported datasets.

AntConc

AntConc is a freeware, multiplatform tool for carrying out corpus linguistics research, introducing corpus methods, and doing data-driven language learning. It runs on any computer running Microsoft Windows (built on Win 10), MacOS (built on Mac Catalina), and Linux (built on Linux Mint). It is developed in Python and Qt using the PyInstaller compiler to generate executables for the different operating systems. It uses SQLite as the underlying database.