What is Data Analysis?
Data Analysis is the process of systematically applying statistical and/or logical techniques to describe and illustrate, condense and recap, and evaluate data. The procedure helps reduce the risks inherent in decision-making by providing useful insights and statistics, often presented in charts, images, tables, and graphs
Descriptive analysis involves summarizing and describing the main features of a dataset. It focuses on organizing and presenting the data in a meaningful way, often using measures such as mean, median, mode, and standard deviation. It provides an overview of the data and helps identify patterns or trends.
Inferential analysis aims to make inferences or predictions about a larger population based on sample data. It involves applying statistical techniques such as hypothesis testing, confidence intervals, and regression analysis. It helps generalize findings from a sample to a larger population.
EDA focuses on exploring and understanding the data without preconceived hypotheses. It involves visualizations, summary statistics, and data profiling techniques to uncover patterns, relationships, and interesting features. It helps generate hypotheses for further analysis.
Diagnostic analysis aims to understand the cause-and-effect relationships within the data. It investigates the factors or variables that contribute to specific outcomes or behaviors. Techniques such as regression analysis, ANOVA (Analysis of Variance), or correlation analysis are commonly used in diagnostic analysis.
Predictive analysis involves using historical data to make predictions or forecasts about future outcomes. It utilizes statistical modeling techniques, machine learning algorithms, and time series analysis to identify patterns and build predictive models. It is often used for forecasting sales, predicting customer behavior, or estimating risk.
Prescriptive analysis goes beyond predictive analysis by recommending actions or decisions based on the predictions. It combines historical data, optimization algorithms, and business rules to provide actionable insights and optimize outcomes. It helps in decision-making and resource allocation.
The qualitative data analysis method derives data via words, symbols, pictures, and observations. This method doesn’t use statistics. The most common qualitative methods include:
Also known as statistical data analysis methods collect raw data and process it into numerical data. Quantitative analysis methods include:
Data visualization is the graphical representation of information and data. By using visual elements like charts, graphs, and maps, data visualization tools provide an accessible way to see and understand trends, outliers, and patterns in data. Data visualization is commonly used for idea generation, and illustration, so as to help teams and individuals convey data more effectively to colleagues and decision-makers. Frequently used types of data visualizations are listed in the following tabs.
Bar Chart
A bar chart is one of the most commonly used forms to present quantitative data. It is simple to create and to understand. It is best used when comparing data from different categories. A bar chart is simple: We usually have a few values – ordered as categories on the x or y axis. Then we have the values expressed as bars (horizontal) or columns (vertical). The extent of the bars is the value.
Bar Chart of Race & Ethnicity in New York (2015)
Datawheel, CC0, via Wikimedia Commons
Pie Chart
A pie chart is used to display the proportions of a whole. These charts are useful for percentages. When making a pie chart, please note:
Line Chart
A line chart is a type of chart used to show information that changes over time. We plot line charts using several points connected by straight lines. The line chart comprises two axes known as the 'x' axis and 'y' axis. The horizontal axis is known as the x-axis.
Scatter Plot
A scatter plot is a type of plot or diagram to display values for typically two variables for a set of data.
Scatter plots show whether there is a relationship between two variables. The trend line shows the central tendency of the data.
Yi, M. (2019). A complete guide to scatter plots. Retrieved February, 25, 2021.
Box Plot
A box plot or boxplot is a method for graphically depicting groups of numerical data through their quartiles. Box plots may also have lines extending from the boxes indicating variability outside the upper and lower quartiles.
Michelson experiment (1881)
The Research Data Services department assists with:
Other Resources on Campus
R
is more of a programming language than just a statistics program. it is “a language for data analysis and graphics.” You can use R to create, import, and scrape data from the web; clean and reshape it; visualize it; run statistical analysis and modeling operations on it; text and data mine it; and much more.
RStudio is a user interface for working with R. It is called an Integrated Development Environment (IDE): a piece of software that provides tools to make programming easier. RStudio acts as a sort of wrapper around the R language.
Install R and RStudio
R and RStudio are two separate pieces of software:
.exe
file that was just downloaded.pkg
file for the latest R versionOpen Access e-book: Grolemund, G. (2014). Hands-on programming with R: Write your own functions and simulations. " O'Reilly Media, Inc.".
Python is a high-level, general-purpose programming language. Its design philosophy emphasizes code readability with the use of significant indentation. Python is dynamically typed and garbage-collected. It supports multiple programming paradigms, including structured (particularly procedural), object-oriented and functional programming. It is often described as a "batteries included" language due to its comprehensive standard library.
- from WIKIPedia
NVivo is a software program used for qualitative and mixed-methods research. Specifically, it is used for the analysis of unstructured text, audio, video, and image data, including (but not limited to) interviews, focus groups, surveys, social media, and journal articles
With Tableau, users can upload data from spreadsheets, cloud-based data management software, and online databases and merge them to identify trends, filter databases, and forecast outcomes. Users also can drag and drop information to transform data and instantly create charts and visualizations. Start your free trial of Tableau here.
Google’s free Charts software creates customizable charts, maps, and diagrams from imported datasets.