Recent Question/Assignment
Description Marks out of Wtg(%) Word Count
Due date
Assignment 2 Written and Practical Report
100 marks 30% Weighting 3000 24/04/15
The key frameworks and concepts covered in modules 1–5 are particularly relevant for this assignment. Assignment 2 relates to the specific course learning objectives 1, 2 and 4 and associated MBA program learning goals and skills: Global Content, Problem solving, Critical thinking, and Written Communication at level 3: 1. demonstrate applied knowledge of people, markets, finances, technology and management in a global context of business intelligence practice (data warehouse design, data mining process, data visualisation and performance management) and resulting organisational change and how these apply to implementation of business intelligence in organisation systems and business processes 2. identify and solve complex organisational problems creatively and practically through the use of business intelligence and critically reflect on how evidence based decision making and sustainable business performance management can effectively address real world problems 4. demonstrate the ability to communicate effectively in a clear and concise manner in written report style for senior management with correct and appropriate acknowledgment of main ideas presented and discussed. Assignment 2 consists of three main tasks and a number of sub tasks
Task 1 (Worth 40 marks) consists of the following sub tasks The sinking of the Titanic is a famous event. You may find it useful to research the facts surrounding the sinking of the Titanic to inform your understanding of the problem and ensuing interpretation of your data analysis of the factors determining the survival of passengers on the Titanic. Use the data mining tool RapidMiner to conduct an exploratory analysis of the titanic_train.csv data set which is provided on the course study desk Assignment 2 folder link and then build a simple predictive model of Survival on the Titanic using a Decision Tree.
a) You need to identify five key variables that contribute most to determining the survival rate of passengers on the ill-fated Titanic on its maiden voyage. Note you should also refer to the data dictionary provided with the titanic3_train.csv file which describes each of the variables and their range of values. (Hint: an exploratory analysis should be based on summary statistics, histograms, crosstab tables and scatterplots of individual variables and the relationship between individual variables and the target variable survived. Which variables are correlated with target variable survived and other variables?) You might also need to consider reformatting some of variables to facilitate the next stage of analysis of the titanic3._train.csv and titanic3_score.csv data sets using a Decision Tree (Hint: you will need to convert the survival variable to nominal variable with the values Yes = 1, No = 0 in titanic_train.csv). See Data Mining for the Masses Chapters 3 and 4 for guidance in Exploratory Data Analysis using RapidMiner.
Discuss each of your five top predictor variables and the results of your exploratory data analysis in general using the RapidMiner data mining tool as well as how you dealt with missing data and unusual data informed by relevant supporting literature on the survival rate of passengers on the Titanic. Your discussion should also include appropriate statistical analysis results such as graphs and results tables from conducting an exploratory data analysis in the RapidMiner data mining tool with some supporting references on predictive model building and interpretation using Decision Trees in data mining (about 600 words).
The following table lists the data dictionary for the data set titanic_train.csv. (Note: titanic_score.csv is the same as titanic_train.csv but does not contain any values for target variable survived which is referred to as a label variable in Rapidminer).
Variable Description pclass Passenger Class (1 = 1st class; 2 = 2nd class; 3 = 3rd class) survived Survived (0 = No; 1 = Yes) name Name Sex Sex Age Age sibsp Number of Siblings/Spouses Aboard parch Number of Parents/Children Aboard ticket Ticket Number fare Passenger Fare cabin Cabin embarked Port of Embarkation(C = Cherbourg; Q = Queenstown; S = Southampton) boat Lifeboat body Body Identification Number home.dest Home/Destination
SPECIAL NOTES: Pclass is a proxy for socio-economic status (SES) 1st ~ Upper; 2nd ~ Middle; 3rd ~ Lower
Age is in Years; Fractional if Age less than One (1) If the Age is Estimated, it is in the form xx.5
Fare is in Pre-1970 British Pounds (£) Conversion Factors: 1£ = 12s = 240d and 1s = 20d With respect to the family relation variables (i.e. sibsp and parch) some relations were ignored. The following are the definitions used for sibsp and parch.
Sibling: Brother, Sister, Stepbrother, or Stepsister of Passenger Aboard Titanic Spouse: Husband or Wife of Passenger Aboard Titanic (Mistresses and Fiancées Ignored) Parent: Mother or Father of Passenger Aboard Titanic Child: Son, Daughter, Stepson, or Stepdaughter of Passenger Aboard Titanic
Other family relatives excluded from this study include cousins, nephews/nieces, aunts/uncles, and in-laws. Some children travelled only with a nanny, therefore parch=0 for them. As well, some travelled with very close friends or neighbours in a village, however, the definitions do not support such relations.
STORY BEHIND THE DATA: This dataset is based on the Titanic Passenger List edited by Michael A. Findlay, originally published in Eaton & Haas (1994) Titanic: Triumph and Tragedy, Patrick Stephens Ltd, and expanded with the help of the internet community.
b). Build a model for predicting the survival of passengers on the Titanic using a decision tree in RapidMiner (See Chapter 10 of Data Mining for the Masses textbook for guidance on Decision Trees in RapidMiner) using the two data sets, titanic3_train.csv and titanic3_score.csv. Then present and discuss the results of your Decision Tree analysis and a diagram showing your final Decision Tree. Comment on the relative predictive strength of this model and what you believe are the most significant variables that determined whether a passenger on the Titanic survived or not. Include some supporting references on using Decision Trees in data mining (about 400 words).
Task 2 (Worth 25 marks) consists of the following two sub tasks Big data is a hot topic and is generating enormous interest in industry and academia however there is no agreement on the definition of this term and the application of big data analytics in practice is currently more hype than reality. Your task is twofold:
a) Research and critically critique the current literature available on the Internet and in academic journals and conferences and provide a comprehensive definition and description of the term ‘Big Data’ that is underpinned and supported by the reference literature (Approx 500 words)
b) Research and critically critique the current literature available on the Internet and in academic journals and conferences and provide a comprehensive discussion describing one specific application of Big data analytics in an Industry sector, emphasize how, in this specific application, of Big data analytics is providing business value to organisations in this industry sector (Approx 1000 words)
Your discussion and analysis here should be underpinned by an appropriate level of in text referencing using Harvard Referencing Style.
Task 3 (Worth 25 marks) consists of the following sub tasks With the following Excel file SalesSuperstore.xlsx provided on the course study desk Assignment 2 Folder link and using Tableau Desktop 8.3 produce the four following reports with appropriate accompanying graphs based on a Tableau workbook sheet view for each. Briefly comment on each report in about 125 words in terms of what trends and patterns are apparent in each report.
The SalesSuperstore.xlsx file contains the following dimensions and information: 1. C u s t o m e r N a m e , C u s t o m e r S e g m e n t 2.L o c a t i o n - R e g i o n , S t a t e , C i t y , Z i p c o d e 3. Product Category, Sub Category, Product Name, Product Container, Unit Price 4. O r d e r I n f o r m a t i o n 5 . S h i p p i n g I n f o r m a t i o n 6. Sales Information 7. P r o f i t
a) Create a report and accompanying graph using Tableau that shows a trend analysis for sales by Product Category over the years 2009 to 2012 and comment on key trends and patterns apparent in this report (125 words approx)
b) Create a report and accompanying graph using Tableau that shows for each Product Category Average Profit and Total Sales for each month over the years 2009 to 2012 and comment on key trends and patterns apparent in this report (125 words approx)
c) Create a geographical map presentation using Tableau that shows graphically the relative size by City within each state, Product Sales for year 2012 and comment on key trends and patterns in this report (125 words approx)
d) Create a report and accompanying graph using Tableau that shows for Product Sub Categories that are technology based Unit Prices, Sales and Profit for each month over the years 2009 to 2012 and comment on key trends and patterns in this report (125 words approx)
Your assignment 2 report must be structured as follows, which is similar to the report structure detailed in Summers & Smith 2010:
Cover page for assignment 2 report 1. Title Page 2. Table of Contents 3. Body of report – main sections and subsections for assignment 2 task and sub tasks so 3.1 Task 1 will be a main heading with appropriate sub headings etc....for each sub task etc.. 3.2 Task 2 … 3.3 Task 3 …. 4. List of References 5. List of Appendices
You need to submit two files when you submit Assignment 2 1. Your Assignment 2 Report for Tasks 1, 2 and 3 in Word document format with the extension .docx 2. Your Assignment 2 Task 3 as a Tableau packaged workbook with the extension .twbx
Use the following file naming convention: 1. Student_no_Student_name_CIS8008_Ass2.docx and 2. Student_no_Student_name_CIS8008_Ass2.twbx
Online Assignment submission All assignments must be submitted electronically via the course study Assignment 2 submission link and are subject to automated checking for plagiarism and collusion by Turnitin when you submit your Assignment 2 documents via the Assignment 2 submission link.
Note carefully University policy on plagiarism, collusion and cheating. If any of these occur they will be found and dealt with.
Harvard referencing resources Install a reference tool (example Endnote) which integrates with your word processor. These tools are a great help for referencing and citing sources in your assignments. For more information on how to get Endnote you may visit the following webpage: http://www.usq.edu.au/library/referencing/endnote-bibliographic-software.
Study the referencing techniques in Communication skills handbook (Smith & Summers 2010). The USQ Librarian has compiled the following resources on how to reference correctly using the Harvard referencing system – make use of these excellent resources if you are unsure as how to reference correctly using Harvard referencing system. Library Harvard Referencing Guide http://www.usq.edu.au/library/referencing/harvardagps-referencing-guide