XCS224N Lecture 1

NLP with Deep Learning: Lecture 1 - Introduction & Word Vectors

These notes cover key concepts from XCS224N, including human language, word meaning, and the word2vec algorithm, along with foundational math models in NLP.

Human Language and Word Meaning

Human language is inherently complex due to its social nature. People interpret and construct language based on context, making it challenging for computers to understand and generate. Despite this complexity, deep learning has enabled impressive advancements in modeling language, specifically in representing word meaning using vectors.

Word2Vec Algorithm

One key breakthrough in NLP is the word2vec algorithm, which...

Building Micrograd

The spelled-out intro to neural networks and backpropagation: building micrograd

I watched and am summarizing a lecture that walks through building an automatic differentiation engine and neural network from scratch, called Micrograd. The content covers how backpropagation works, how derivatives are computed, and how neural networks perform gradient-based optimization.

Neural Networks and Backpropagation

Neural networks are essentially functions that map inputs (data) to outputs (predictions). The key to training these networks lies in optimizing their parameters (weights and biases) so that the predictions match the target values as closely as possible. The process of tuning these weights relies...

Understanding Transformers

Summary of Attention in Transformers, Visually Explained

I watched and summarized the video Attention in Transformers, Visually Explained. The video breaks down the attention mechanism, a critical component of transformers, widely used in modern AI systems.

What is Attention?

Transformers, introduced in the paper Attention is All You Need, are designed to predict the next token in a sequence. The key innovation is the attention mechanism, which adjusts word embeddings based on the surrounding context. Initially, each token is associated with a high-dimensional vector, known as an embedding. But without attention, these embeddings lack context.

For example, the...

The Future of LLMs

Predictions on the Future of Large Language Models: Insights from Andrej Karpathy

In a recent conversation with Andrej Karpathy, a leading figure in AI, several fascinating predictions about the future of Large Language Models (LLMs) and AI technology were discussed. Here are some of the key takeaways:

1. Synthetic Data as the Future

Karpathy believes that synthetic data generation will be crucial for the development of future LLMs. As we near the limits of internet-sourced data, generating synthetic, diverse, and rich data will become the main way to push models forward. He warns, however, of “data collapse,” where...

3Blue1Brown: dot product

Dot Product: Key Insights

Numerical Definition

The dot product of two vectors is the sum of the products of their corresponding components:

\( \mathbf{v} \cdot \mathbf{w} = \sum_{i=1}^{n} v_i w_i \)

For example:

\( [1, 2] \cdot [3, 4] = 1 \cdot 3 + 2 \cdot 4 = 11 \)

Geometric Interpretation

The dot product can be seen as the projection of one vector onto another:

[\mathbf{v} \cdot \mathbf{w} =

\mathbf{v}

\mathbf{w}

\cos(\theta)]

Where \( \theta \) is the angle between the vectors. It’s positive if they point in the...