Recent Question/Assignment
Data Analyst Candidate Exercise
The following exercises are designed to represent some of the technical tasks associated with a Data Analyst at Comscore. These are meant both to orient a candidate to our dataset and to understand a candidate’s reasoning and analysis, keeping in mind that the rational steps that lead to a solution are often as important as the solution itself. This test may take between 1 to 2 hours to complete.
Section I – Introduction & Graph Interpretation
1. Put these search terms into the two most logical groups:
a) Bank of America
b) Savings accounts
c) Wireless phones
d) Verizon wireless phones
e) Travelocity
f) Cruise lines
g) Chase checking accounts
2. The following graphs show the volume of unique people (Y-axis) visiting a given website during a given timeframe (X-Axis). For each graph, select the description that best fits the data, and give a brief explanation.
Seasonality:
a) A graph displaying seasonality within the retail industry
b) A graph displaying seasonality within the travel industry
c) Neither
Graph 1: ____
Please explain your reasoning:
Graph 2 : ____
Please explain your reasoning:
Graph 3 : ____
Please explain your reasoning:
3. The following graph represents two websites owned by the same company. In the space below, give a reasonable explanation for the activity shift seen after October 2010.
4. The following chart shows the average time per session (in minutes) of a visit to the following three websites: Google.com, Yahoo.com and YouTube.com. Do these results make sense? Please explain why or why not?
Section II – Technical Example
Regular expressions, or regex, are used in computing to match strings of text in a powerful and efficient manner. At Compete, an analyst uses regex to match on URL strings and search keywords to help recognize a consumer’s behavior. For example, if wanted to match on the brand “Hoover” we would want to write a regular expression that would also match on “Hoover vacuums” and “Hoover vacuum cleaner”, but probably not “Hoover Dam”.
While alphanumeric characters retain their literal meaning within regex, many other characters take on a functional meaning. For a more in-depth look at regex, feel free to consult the web – including the regular expression Wikipedia page (http://en.wikipedia.org/wiki/Regular_expression).
Some examples of regular expressions which we will be using in this quiz are as follows:
A. The expression {.*} is used to represent any number of any characters (both alphanumeric or whitespace). It’s actually the combination of two characters that have a special meaning:
. which represents any single character
* which represents a given character repeated N times, where N is 0 or higher
When combined they represent any possible permutation of N characters. For example
a.*c will match “abc”, “abbc”, “ac”, “a man wrote the letter c”
B. The expression {.} is used to represent the period character. Since the character {.} has a special meaning in regex, if we want to represent the literal text character we use the expression {.}
amazon.com will match “amazon.com”
C. The expression {^} represents the beginning of a string, while the {$} represents the end. For example
^rain will match “rained”, but not “train” since expression requires a matching string to begin with “rain”
rain$ will match “train”, but not “rained” for similar reasons
^rain$ will only match the exact string “rain”
D. The expression {(…)}, and {|}
In regex, there is the ability to group characters together. There are several uses for this, however for our purposes here we will only focus on grouping when paired with the Boolean OR {|} operator. For example, an expression such as a(b|c)d looks for the following sequence in a string: the letter a, then the letter b or c, then the letter d – which would match either “abd” or “acd”
I drive a (Prius|Hummer) would match both “I drive a Prius” and “I drive a Hummer”
but would NOT match the string “I drive a (Prius|Hummer)”
Using the explanation above, or an outside resource on regular expressions, answer the questions below.
Exercises:
5. Which of the below text strings match the following regular expression [MARK ALL THAT MATCH]:
^.*wal.*mart$
A. “wal-mart”
B. “walmart”
C. “walmart coupons”
D. “shop at wal-mart”
E. NONE OF THESE MATCH
6. Which of the below text strings match the following regular expression [MARK ALL THAT MATCH]:
^.*autotrader.com/(f|s)yc.*$
A. “tradein.autotrader.com”
B. “www.autotrader.com/f|syc/index”
C. “sell.autotrader.com/syc/syc_center”
D. “www.autotrader.com/fyc/index.jsp”
E. NONE OF THESE MATCH
7. In the space below, compose a regular expression that will meet ALL of the below requirements:
WILL MATCH the following strings:
1. www.hulu.com/watch/218620/law-and-order-special-victims-unit#
2. www.hulu.com/watch/367084/law-and-order-criminal-intent#
3. www.hulu.com/watch/119706/law-and-order-uk#
And WILL NOT MATCH:
1. www.hulu.com/browse/movies
2. www.hulu.com/watch/102253/magnum-pi#
3. www.hulu.com/watch/138166/law-and-order#
Regular Expression:
Section III – Datamining Example
tags tag_groups
COLUMN_NAME DATA_TYPE PK COLUMN_NAME DATA_TYPE PK
tag_no integer YES tag_group_no integer YES
tag_id varchar(255) NO name varchar(255) NO
description text NO description text NO
date_created timestamp NO date_created timestamp NO
created_by_user_no integer NO created_by_user_no integer NO
short_name varchar(10) NO tag_group_owner_no integer NO
name varchar(50) NO tag_mega_group_no integer NO
tag_group_no integer NO denominator_tag_no integer NO
display_case varchar(32) NO tag_group_sort_order integer NO
formula text NO group_type varchar(16) NO
tag_sort_order integer NO notes text NO
from_date date NO tag_group_id varchar(48) NO
to_date date NO
should_renormalize boolean NO
locked_weight_date date NO
should_weight boolean NO
notes text NO
tag_group_owners
COLUMN_NAME DATA_TYPE PK
tag_group_owner_no integer YES
tag_group_owner_id varchar(32) NO
name varchar(128) NO
date_created timestamp NO
created_by varchar(64) NO
1. Create a script that will return the “name” field of each table below when the “tag_no” of interest are: 46391,46392,46393,46397,46398,46399,46400,46401,46402,46403,46404,46405,46406,46407,46408,46409,46410,46411,46412,46413,46414,46415,46416,46417,46418.
2. Write a SQL query to check if there are duplicated rows in the “tags” table in Q1?
Section IV - Open Ended Question
The purpose of this final section is to give you the opportunity to demonstrate your approach toward answering a logic question, and assess your ability to reach a reasonable, justified conclusion. The focus is not on whether your answer is “correct”, but rather to identify the steps you took to reach your conclusions. You may choose any way you see fit to answer the question; however, strong answers are clear and logically reasonable, developed around two or more relevant ideas, and use simple arithmetic to arrive at the result. Please note your answers should contain a good argument with a clear line of reasoning, rather than a simple formula.
1. How many printed books are purchased on Amazon.com each month by customers in the U.S. and what are the total sales that are generated?
a. Key assumptions made to frame the question
b. The process to arrive at a result
c. Result