Lab 4
Goals of this Lab:
Make sure everyone is in a (4 person) team with a team name
Please enter your team in bcourses. On the menu on the left select "People" and then the "Groups" tab. You should be able to enter your group's name and invite members into it. Please accept the invitations to complete group registration.
Make sure your team has a tentative project plan
Use this lab to refine and reach consensus on your high-level project plan. Get input from the staff about your ideas. Make sure that your idea:
- Involves data that you can access with the quantity and quality needed
- Involves interesting questions, ideally with some outside stakeholders (someone other than you is interested in the answer).
- Involves non-trivial data analysis (this is almost always true).
Please check with a staff member (Prof or GSI) before proceeding past this stage.
Start on goal definition for your project
Start refining your project definition:
- How much data do you think you need, and how will you get it? What tools will you use (e.g. SQL, HDFS, OpenRefine Lab, Pandas)?
- What specific questions do you want to explore? Spend some time on this point - its the key to executing a good project. There are usually some obvious questions you can answer with a dataset, but what are the non-obvious ones? Try to come up with a long list (e.g. 10 questions) and then winnow down to the most interesting ones.
- What kinds of analysis do you *think* you will do? This is a guess, but its good to have a target.
- What machine learning methods and what tools do you think you will use?
Please consult with the staff about this part too. You dont need to be checked off to proceed however.
Ideally: Acquire, and start exploring your dataset
The second milestone in your project will be Exploratory Data Analysis on your dataset(s). Its good to start on this now, even though that assignment isnt due for a while. The reason is that the data itself will often show interesting patterns and *suggest* interesting questions to explore. It may also not be as good for the questions you picked above, and that wont be clear until you start exploring it.
- Get hold of a small sample of data
- Parse and Clean it if needed
- Try to compute some descriptive statistics on your data
- Try visualizing properties of features and combinations (e.g. scatter plots for pairs of features).
Fill out the Lab 4 responses here.