Machine Learning Brain Dump
https://www.youtube.com/channel/UCQALLeQPoZdZC4JNUboVEUg
https://www.youtube.com/watch?v=IpGxLWOIZy4
Linear Regression - Home Prices by Size
- Grading Descent
- Least squares
Naive Bayes - Email Spam Classifier
Decision Tree - Recommending Apps
Logistic Regression - Slicing Data Set in Two - Admission to University
- Gradient descent
- Log-loss Function (assigns large value to mis-classified points and small value to classified points)
Neural Network - Intersecting Multiple Logistic Regressions to Isolate a Section
- Linear Optimization - Find the line that maximizes the distance to the boundary points
- Support Vector Machine - Cuts dataset in two at the best “line”
- Kernel Trick - Find a function that gives high values for red and low for green on vice-versa
K-Means Clustering - Define clusters and centroids (pizza parlors) when you know how many clusters you want
- Pick one random point for each cluster you have
- Assign each house to the parlor closest to it
- Move the parlor to the center of its assigned houses
- Repeat previous two steps until parlor doesn’t move
Heirarchical (agglomerative) Clustering
- Define what is “too far apart”
- Find smallest distance between houses including at least one unclustered house and add them to a cluster
- If the distance is “too far”, stop clustering
Linkage Methods
- Ward - Tends to look for spherical clusters, very cohesive inside and extremely differentiated from other groups. Another nice characteristic is that the method tends to find clusters of similar size. It works only with the Euclidean distance.
- Complete - Links clusters using their furthest observations, that is, their most dissimilar data points. Consequently, clusters created using this method tend to be comprised of highly similar observations, making the resulting groups quite compact.
- Average - Links clusters using their centroids and ignoring their boundaries. The method creates larger groups than the complete method. In addition, the clusters can be different size and shapes, contrary to the Ward’s solutions. Consequently, this average, multipurpose approach sees successful use in the field of biological sciences.
Distance Metrics
- Euclidean (euclidean or l2) - In a map, the shortest distance between two points
- Manhattan (manhattan or l1) - Calculated by summing the absolute value of the difference between the dimensions. Think of moving along one axis and then the other, like a car driving along city blocks
- Cosine (cosine) - A good choice when there are too many variables and you worry that some variable may not be significant. Cosine distance reduces noise by taking the shape of the variables, more than their values, into account. It tends to associate observations that have the same maximum and minimum variables, regardless of their effective value.
https://www.youtube.com/watch?v=BR9h47Jtqyw
Gradient descent error function
- Probability function
- Activation function – every point in the domain gets mapped to somewhere between 0 and 1 in the probability function ** f(x) = 1/(1+e^(-x)) – sigmoid function ** Maximum Likelihood ** sum of -log of probabilities – lower is less error
Combining Regions
- Add probabilities from each area (weighted as desired), map via the activation function to get combined probability
Neural Network - 29:00
Deep Neural Network – Neural network w/ multiple hidden layers - 31:00
https://www.youtube.com/watch?v=2-Ol7ZB0MmU
Friedly indroduction to Convolutional Neural Networks & Image Recognition
- Mapping / \ X and O from pixels to understood characters
https://www.youtube.com/watch?v=UNmqTiOnRfg
Friendly introducetion to Recurrent Neural Networks
- Perfect roommate – Apple pie, Burger, Chicken
- Vector / matrix math
https://www.youtube.com/watch?v=aDW44NPhNw0
Machine Learning: Testing and Error Metrics
- K-Fold Cross Validation - 5:00
- High Recall (medical diagnosis – better to send a healthy person for more tests) vs. High Precision (spam – better to miss a spam)
- F1 Score = Harmonic Mean = 2xy/(x+y)
- Fb score – if b small (<1) weigh towards precision – if large (>1) weigh towards recall
- Overfitting / underfitting - 27:00
- Under – error due to bias – too simple
- Bad Train, Bad Test
- Over – error due to variance – too specific
- Great train, Bad test Grid Search Cross Validation - 41:00
- Under – error due to bias – too simple
- Kernel and Gamma Parameters and Hyperparameters - 42:00
dbpedia.org – Wikipedia, turned into structured knowledge
WordNet. Because WordNet.
https://github.com/dair-iitd/OpenIE-standalone
https://www.w3.org/TR/rdf11-primer/#section-triple
https://en.wikipedia.org/wiki/Semantic_triple
https://www.parson-europe.com/en/knowledge-base/430-knowledge-modeling.html
- RDF / Turtle
- RDFS / RDF Schema
- OWL / OWL DL
- Hierarchies
- Taxonomies
- Ontologies
- SPARQL
https://github.com/aoldoni/tetre
https://github.com/machinalis/iepy – REALLY __ LOOK AT THIS ONE
https://gate.ac.uk/ https://www.youtube.com/playlist?list=PLO0_lNc5k9lJQnH9CsyXMIBevoZDmYYj0
https://www.cs.waikato.ac.nz/ml/weka/
https://en.wikipedia.org/wiki/Orange_(software)
jupyter notebook
https://www.toptal.com/machine-learning/machine-learning-theory-an-introductory-primer
- Sigmoid Function
https://www.youtube.com/channel/UCYO_jab_esuFRV4b17AJtAw https://www.youtube.com/watch?v=aircAruvnKk
- Multilayer Perceptron / Plain Vanilla Neural Network
- Sigmoid function squishes outputs to between 0 and 1 – ReLU(a) = max(0, a) is the newer / “better” way
- Bias for inactivity
https://www.youtube.com/watch?v=IHZwWFHWa-w
- Cost function
- Back-propagation – making the adjustments towards gradient descent to minimize the cost function
- Doing this finds a local minimum
- Multilayer perceptron – 80’s and 90’s technology
https://www.youtube.com/watch?v=Ilg3gGewQ5U
- Back-propagation
- Stochastic gradient descent – drunk guy stumbling down the hill - 10:00
https://www.youtube.com/watch?v=tIeHLnjs5U8
- Backpropagation calculus
- Super freakin useful.
- Grant (3b1b) on Khan
Roadmaps
https://medium.com/@thisismetis/roadmap-how-to-learn-machine-learning-in-6-months-7c501889b545
https://www.quora.com/What-is-the-best-roadmap-to-learn-machine-learning-using-Python