Skip to content

Latest commit

 

History

History

13-databases

Before

  • Read the Tidy Data paper on structuring data. Optionally also check out the corresponding slides and presentation video. [paper] [github] [slides] [video]

Optional:

  • Go through Zed Shaw's work-in-progress Learn SQL the Hard Way, which will still take you through even more SQL with SQLite than we'll do in class.

Questions

  • Consider thinking of multinomial Naive Bayes likelihood probabilities as coefficients on word dummy features. How are they similar or different as compared with logistic regression coefficients?
  • How can binary classifiers be used for multiclass problems? That is, if a technique only gives a probability of "yes" vs. "no" (for some question) how can you use the technique for questions with more than two possible answers?
  • How do K Nearest Neighbors, Naive Bayes, and linear models compare in terms of model interpretability? How/when could this inform model choices?
  • What are the negatives of "tidy data"? When would it not be a good idea to have data in a "tidy" format?
  • What other thoughts, comments, concerns, and questions do you have? What's on your mind?

During

Application presentation.

Question review.

Slides on databases.

SQL lab on SQL, with data pre-populated.

SQL lab on using SQLite with your own data.

On the structured side of the spectrum, this summarizes a lot of the data structure and software map:

Structure Format Software Servers
Tabular CSV etc. most; SQLite MySQL, PostgreSQL, etc.
Nested JSON, XML rjson, lxml, etc. web etc.
Graph various networkx, Gephi, etc. Neo4j, etc.

After

Optional: