Recent Question/Assignment
Description Possible Marks / Wtg(%) Word Count Due date
Assignment 2 Written Practical Report 100 marks 15% Weighting 1500 27/08/18
Modules 1–5 are particularly relevant for this assignment. Assignment 2 relates to the course objectives 1, 2 and 4:
1. demonstrate applied knowledge of people, markets, finances, technology and management in a global context of business intelligence practice (data warehouse design, data mining process, data visualisation and performance management) and resulting organisational change and how these apply to implementation of business intelligence in organisation systems and business processes
2. identify and solve complex organisational problems creatively and practically through the use of business intelligence and critically reflect on how evidence based decision making and sustainable business performance management can effectively address real world problems
4. demonstrate the ability to communicate effectively in a clear and concise manner in written report style for senior management with correct and appropriate acknowledgment of main ideas presented and discussed.
Note you must use RapidMiner Studio for Task 2 and you must use Tableau
Desktop for Task 3 in this Assignment 2. Failure to do so may result in Task 2 and/or
Task 3 not being marked and zero marks awarded. Note carefully University policy on
Academic Misconduct such as plagiarism, collusion and cheating. Your Assignment 2 submission is automatically submitted to and checked in Turnitin for academic integrity when you submit your Assignment 2 via the course study Assignment 2 submission link. If any of these occur they will be found and dealt with by the USQ Academic Integrity Procedures. If proven, Academic Misconduct may result in failure of an individual assessment, the entire course or exclusion from a University program or programs.
Assignment 2 consists of three main tasks and a number of sub tasks
Task 1 Organisational culture and data driven decision making (Worth 30 Marks) Conduct a desktop research by critically reviewing relevant literature on organizational culture and data driven decision making. Drawing on the relevant literature, write a short essay on the impact of organizational culture on data driven decision making that addresses two sub tasks:
Task1.1), provide a concise definition of organizational culture and data driven decision making (about 250 words) and
Task 1.2) explain how organizational culture could impact on the adoption and use of data driven decision making in organisations transitioning to a data driven decision making paradigm (about 500 words)
Task 2 Exploratory Data Analysis and Linear Regression Analysis (Worth 35 Marks)
Carefully study the Data Dictionary for California Housing Data Set (See Table 1) and accompanying description of each variable in the housing.csv data set. It is important you understand this data set as it is used for Task 2 and Task 3 in Assignment 2.
Table 1 Data Dictionary for California Housing Data Set
Variable Description Unit
longitude longitude of location numeric
latitude latitude of location numeric
housingMedianAge median age of housing at location yrs
totalRooms number of rooms at location integer
totalBedrooms number of bedrooms at location integer
population number of individuals living at location integer
households number of independent households at location integer
Variable Description Unit
medianIncome median income of households 10K$
medianHouseValue median value of housing at location 1K$
oceanProximity Proximity to ocean (NEAR BAY, NEAR OCEAN, 1H OCEAN, INLAND, ISLAND) String
Note: You should conduct some desktop research on real estate house property market in relation to key drivers of house values in order to fully understand and interpret the key findings of the exploratory data analysis (EDA) of the housing.csv data set and the key results of the Linear Regression Models for the housing.csv data set for Task 2 and for visual presentation of this data in Task 3. The following resources:
https://planningtank.com/real-estate/factors-influencing-house-prices and http://www.homeguru.com.au/house-prices are a good starting point.
Task 2.1) Conduct an exploratory data analysis (EDA) of the housing.csv data set using the RapidMiner Studio data mining tool.
Provide the following for Task 2.1:
(i) a screen capture of your final EDA process, briefly describe your EDA process
(ii) summarise key results of your exploratory data analysis in Table 2.1 Results of Exploratory Data Analysis for housing.csv.
(iii) Discuss the key results of exploratory data analysis presented in Table 2.1 and provide a rationale for selecting top 5 variables for predicting house values and in particular their relationship with house values drawing on the results of EDA analysis and relevant literature (About 250 words).
Table 2.1 should include the key characteristics of each variable in the housing.csv data set such as maximum, minimum values, average, standard deviation, most frequent values (mode), missing values and invalid values etc.
Hint: The Statistics Tab and the Chart Tab in RapidMiner Studio provide a lot of descriptive statistical information and the ability to create useful charts like Barcharts, Scatterplots etc for the EDA analysis. You might also like to look at running some correlations and/or chi square tests as appropriate for the housing.csv data set to determine which variables contribute most to predicting house values.
Task 2.2) Build a Linear Regression model for predicting house value using a RapidMiner data mining process and an appropriate set of data mining operators and a reduced set of variables from the housing.csv data set as determined by your exploratory data analysis in Task 2.1. Provide the following for Task 2.2:
(i) A screen capture of Final Linear Regression Model process and briefly describe your Final Linear Regression Model process
(ii) A table named Table 2.2 named Results of Final Linear Regression Model for Task 2.2 for housing.csv data set.
(iii) Discuss the results of the Final Linear Regression Model for housing.csv data set drawing on the key outputs (coefficients, standardised coefficients, t-statistics values, p-values and significance levels etc) for predicting house values and relevant supporting literature on the interpretation of a Linear Regression Model (About 250 words).
Include all appropriate RapidMiner outputs such as RapidMiner Processes, Graphs and Tables that support the key aspects of exploratory data analysis and linear regression model analysis of the housing.csv data set in your Assignment 2 report.
Note you need export the RapidMiner Processes and Graphs from RapidMiner using the File/Print/Export Image option and include in the Task 2 section where relevant or in Appendix 2 of Assignment 2 report.
Task 3 Tableau Desktop View of California House Prices data set (Worth 20 marks) Task 3.1) Create a Tableau Text Table or Graph view that displays house values, proximity to ocean and other relevant data using the data set housing.csv. Comment on the (1) process of preparing a Text Table or Graph view using Tableau Desktop and (2) key trends and patterns that are apparent in Text Table/Graph you created for the presentation of the housing.csv data set (about 100 words).
Task 3.2) Create a Geo map Graph view that displays house values and other relevant data using housing.csv. Comment on the (1) process of preparing a Geomap Graph view using Tableau Desktop and (2) key trends and patterns that are apparent in the Geomap Graph created for visual presentation of the housing.csv data set. Note this housing values data set is drawn from State of California USA so you need to modify the Map Menu Edit Locations options in Tableau so you can plot latitude and longitude coordinates of houses locations on a Tableau Geomap Graph (about 150 words).
Note: you need copy the two Text Table / Graph views you have created in Tableau using the Worksheet Menu Copy or Export Image option and include in the Task 3 section where relevant or in Appendix 3 of Assignment 2 report.
Report presentation, writing style and referencing (Worth 15 marks)
Your Assignment 2 must be presented in report format, written in an appropriate style and supported where required with appropriate in text references using Harvard Referencing Style
Your assignment 2 report must be structured in report format as follows:
Cover/Title Page for Assignment 2
Table of Contents
Body of report – main sections and subsections for assignment 2 tasks and sub tasks so
Task 1 will be a main heading with appropriate sub headings etc....for each sub task such as Task 1.1, Task 1.2 etc..
Task 2 …
Task 3….
List of References List of Appendices
You must submit two files for Assignment 2:
1. Assignment 2 Report for Tasks 1, 2 and 3 in Word document format with extension .docx
2. Tableau packaged workbook with the extension .twbx contains required two Text Table / Graph views for Task 3
You must use the following file naming convention:
1. Student_no_Student_name_CIS8008_Ass2.docx
2. Student_no_Student_name_CIS8008_Ass2.twbx
You must use Harvard referencing style – Harvard referencing resources
Install a bibliography referencing tool – Endnote which integrates with your word processor. http://www.usq.edu.au/library/referencing/endnote-bibliographic-software or alternatively use an online citation tool such as Zetoro or You Cite This For Me
USQ Library - how to reference correctly using Harvard referencing system https://www.usq.edu.au/library/referencing/harvard-agps-referencing-guide