My Account

HomeRecent QuestionsQuestion #77168Oth231124547

Recent Question/Assignment

Q1 ML for Malware Analysis 25 Points
In the week 7 lecture, we show example code of using 3 machine learning models to train and measure the performances on data in the form of feature vectors for 200 binaries of which 50% are malware and 50% are benign ware. You can find the code and the data at https://drive.google.com/drive/folders/142NMRSTifttezfPqwTkf6dlg-VWOrdaY? usp=drive_link
Download the jupyter notebook and the test.csv file from the above link, and run the code either on google colab, or using anaconda installation on your own machine.
(i) Open the test.csv file using excel or another spreadsheet program. You will find that rows 2 till 101 are labeled as malware (see the last column). Rows 102 till 201 are labeled as benign ware. Since the amount of data is so small, you can eyeball the data very quickly and find certain features (columns) that have very different values for rows marked as malware compared to their values for rows marked as benign. Those features are useful in classifying between a malware and benign ware. You will also find some features that have similar values irrespective of whether the row is labeled malware or not. Such features are useless in classification. Name 3 features that are useful for classifying and 3 that are not useful for classifying.

(ii) Explain the need for feature selection in 3-4 sentences. In other words, once we have extracted the features, why not use all the extracted features and why do we need to select a subset of features?

(iii) In the code, you will find that we computed feature correlations, and generated a heatmap for all pairwise correlation. However, we selected only those features which have high absolute value of correlation with the labels. Explain in your own words, within 2-3 sentences why this selection criterion makes sense?

(iv) In the code, we selected 7 features out of 23 features extracted. State in your own words (no more than 2-3 sentences) what might be the reason that even after removing that many features, some of the machine learning models yielded high accuracy, precision and recall?

(v) We only kept those features which have high correlation with the labels, but there may be other methods to reduce features -- explain in 2-3 sentences one possible alternative method for feature selection.

Q2 ML for Intrusion Detection 25 Points
In Week 8 lecture, we show how to use ML models to train on network packet data for intrusion detection. You can find the data and the code at https://drive.google.com/drive/folders/1BrX2QtYvTZiBIKYVrn64phV4dbqsDDpn? usp=sharing
(i) In the example code shown in week 8, we showed how scapy library is used. Write in your own words, what use of scapy library was shown? (Hint: in the rest of the code, we used pcap file for data source -- and did not use scapy library in the code -- but think about how the pcap files might have been collected).

(ii) Explain in your own words what are flows that are constructed from packets in pcap file?

(iii) In the example code shown in week 8, explain in 2-3 sentences how the flows are labeled as benign and malicious?

(iv) In the example code shown in week8, we use PCA to transform the feature vectors into transformed vectors. We then plot the first two features in the transformed feature to plot the transformed data in 2-D plots. Do your own research to find out what PCA does and explain in 2-3 sentences why PCA is useful?

Save Answer

Looking for answers ?

Recent Questions

I am not offering you SEO, nor PPC.This is something completely different.Just send us keywords of your interest and your website banner instantly appears number one on Google and Bing search results without...I want to create a digital twin. I need an implementation where need to design a model and I want 2 publications I high index journal like transaction.AGENCY ANALYSIS REPORTAssessment Cover Sheet and InstructionsFLD304 Fieldwork Placement 2Family Name:Click or tap here to enter text. Given Name(s):Click or tap here to enter text.Student ID:Click or tap...Program: Master of Information SystemsAssessment 1: Portfolio 4 (Session 5: Repetition Structure)Semester 2: Block 8-2024Course Code: MBIS4003 Course Title: Software Development Total Marks: 5 Duration:...Jeet Here We Just Talked On The Web Chat, I Have To Write About 3000-3500 Words On The Subject Jurisprudence Or Any Topic/Chapter Of Jurisprudence As Per Undergraduate Law Degree By Today Evening As Per...Program: Master of Information SystemsAssessment 1: Portfolio 1 (Session 2 Topic: Input, Processing andOutput) Semester 2: Block 8-2024Course Code: MBIS4003 Course Title: Software Development Total Marks:...AssessmentAssessment item 2—Assignment 2Due date 11:45pm on Sunday, 24Nov.2024Weighting 35%Length Not applicableObjectivesThis assessment item relates to the unit learning outcomes numbers 1, 2 and 3 as...Show All Questions

Recent Question/Assignment

Looking for answers ?

Recent Questions

Essential Features of a Successful Mobile App: Develop Them with Java

13 Resources to Supplement Your Child’s Education at Home

5 Ways to Avoid Plagiarism in Academic Writing