regression grind

a dump of my plan + notes for studying for my finals for a class that i should be doing well but is not because i'm just not good at math and stats apparently. might be the most i've studied for a class ever in my life

go through

  • notes
  • hw
  • quizzes
  • pq1, q1
  • pq2, q2
  • pfinals

prediction

  • distribution of SSE (sigma_hat)
  • e(sse)
  • show y_hat independent to residual
  • distribution of beta_hat
  • log reg why use logit? issues with linear model
  • explain what hii is?
  • why 0 < hii < 1
  • what is stud(ei)?
  • press

Outliers

  • outlier in x: leverage (hii) > 3p/n
  • outlier in y: discrepancy (studentized e) > t (n-1-p), 1-a/2 (outlier in
  • both: influence (cooks distance) >4/n

multicollinearity

problems: inflated SE checks:

  1. swing/change sign coefficients in f-test
  2. correlation matrix
  3. VIF

solution

  1. drop
  2. feature engineer
  3. regularized regression
  4. dimensionality reduction
  5. partial least square

heteroskedasticity

  • unbiased

detection

  • residual plot

problem: no longer BLEU -> wrong SE(beta) and CI/PI widths solution

  1. log / square
  2. boxcox
  3. robust SE
  4. WLS

if ei is non linear, use nonparametric regression (knn, moving average)

non-normal

  • still BLEU
  • no inference,

detection

  • histogram
  • qq plot
  • test for normality: shapiro

for normal: skewness: 0 (third moment) kurtosis: 3 (fourth moment)

  • omnibus k2 test (want high p-value to reject)
  • JB test

problems: unreliable t.test, wrong CI/PI

false assumption of linearity

  • transform y -> may introduce hetero if homo
  • transform x -> nice when only prob is non-linearity
  • transform both

Model selection under: biased coefs + predictions (under/overstimate), overestimate sigma2

extra vars: unbiased, MSE has fewer degrees, wider CI and lower power

over(multicol): inflated SE for coefs, rank deficient

adjusted R 1 - MSE / SST = 1 - SSE / n-p / SST / n-1 takes into account the "cost" of losing DF

Mallows Cp

  • identify subset where Cp is near k+1 where k is no of preds
  • this means bias is small
  • if all not near, missing predictor
  • if a number of them, choose model with smallest

AIC, BIC

  • estimates infomation lost in a model
  • trade-off goodness in fit vs simplicity, penalized by no. of model params (p)
  • larger penalty term in BIC than AIC : ln(n)p vs 2p

PRESS

  • modified SEE, uses predicted value for ith obs from model fit on data excluding that point

10/9/2024

data @ grammarly

Because of the routines we follow, we often forget that life is an ongoing adventure...and the sooner we realize that, the quicker we will be able to treat life as art: to bring all our energies to each encounter, to remain flexible enough to notice and admit when what we expected to happen did not happen. We need to remember that we are created creative and can invent new scenarios as frequently as they are needed.

– Maya Angelou

went to a grammarly fireside chat with a few directors in their data org

what you look for when hiring

  • base technical skills: sql, python, data manipulation (least important)
  • is this person going to make our business successful? is grammarly going to grow the user base, make money
  • people who understand business problems, can frame problems and solve them, has applied acumen
  • create experiments that can drive business

recommendations for early data career

  • find right mentors, strong set of leaders and peers, that will teach you more than anything else
  • exposure to multiple aspects of problem solving, not just technical stuff, but cross-collaborate with stakeholders, how to report status, how to link your output with business impact, communication skills
  • look into the tech stack of the company, work on the latest stuff
  • company size, startup vs large company? structure is important

day2day for new hire on data platforms

  • ingestion side, infra to bring data in
  • data governance, compliance
  • data eng: data modeling, transformations
  • analytics & ML: data cleanliness and quality and availability
  • everyone is a strong SWE, systems engineer (infra), data engineer (DE)
  • strong generalist SWE
  • databricks

big vs small company

  • think about impact vs prestige
  • big: prestige (name brand + density of talent)
    • mentoring on best practices, how to function these huge machines
  • small: impact
    • have stories that your work directly move X millions of revenue
  • find a company that can give you impact (bullet points on resume) + prestige

role of data science evolving (skills?)

  • understanding the business context, the space you're working in, the elements you need to solve that
  • less on fundamentals of the models, more on the application
  • foundational statistics and knowledge intuition
  • when things don't go well, really understanding exp design, causality, foundations of statistics will matter the most

AI companies

  • robust system to measure quality of AI, human eval, responsible AI metrics
  • subject matter experts on staff, people that can inform the models, everything that AI does is informed by human choices, who is making the choices?
  • what does the data infra look like? is that company investing in the data? serious test on companies investment on AI

most innovative + impactful project

  • orchestration framework for potential accounts to reach out to for sales team (ml, experimentation, llms to produce content)
  • 50b events a day, making informed decisions from this
  • suggestion quality, what is quality?

grammarly in llm era

  • huge advantage in user context across different services
  • lots of potential to innovate, supercharged efforts
  • creates interesting data for feedback mechanism for marketing

some interesting work i encountered on their engineering blog


migraine was back in the morning and i couldn't do anything at all in the morning besides cook lunch. acetaminophen kicked in 3 hours after i took it at 11am and i could finally function for most of today. pretty cool that i got to meet alumnis from cohort 11. when i graduate, i'll be meeting future students too. and i'll be saying the same things and giving them advice.

10/8/2024

migraines

"Nothing is absolute. Everything changes, everything moves, everything revolves, everything flies and goes away." – Frida Kahlo

i had really bad migraine today again. where i just feel really nauseous and dizzy. i walk slower. i'm perpetually thirsty. i feel sick down to my bones. when i try to focus on something, i can feel my brain compressing. not really sure why this is happening. it's been like this since i readjusted my braces. it's probably that. i was told by my nurse it should be better this time though. hope this won't last for another 16.5 months.

my productivity and overall output is dramatically decreased. this is not ideal, especially during finals week. i'm only 22 about to turn 23. i shouldn't be this tired everyday, getting headaches and feeling sick often. there are old people in their 70s that have more energy than me. the elderly in church have more youthful energy. what can i do to feel better. maybe it's sleep. i need to get 8 hours. i've been getting 6 these days. time for bed. nothing is going into my brain today.

10/7/2024

world communion sunday

went back to the presbyterian church for their special combined service. cantonese church services are fun. i like it when i get to learn both chinese/cantonese and about the bible at the same time.

main takeaway from the sermon is to reflect on the authenticity of my faith. to not just say yes on my lips, but do another thing. that i should examine my heart and not my actions. God's grace is extended to all and this grace transform our hearts and mind, and that faith without action is DEAD (James 2:17), but by true faith, actions and good works ensue.

they had so much food. had conversations with old people. talked about malaysia and different cultures with food. talked about the pains you get when you start getting older. the members here have known each other for decades, and it made me miss home. i realize I won't get to experience growing old with a church, if I do achieve my dreams, and immigrate, then I'll be moving from church to church, it made me think about where i'll grow roots, and where I'll be settling down in the future, and how that shapes my life as i grow older.

weather was so hot it was worse than malaysia. so glad it's getting colder from now. really made me appreciate the fact that i'm in SF and weather is basically perfect. and how much weather affects well-being.

tried to study but kept context switching between paper trail, and the ray embeddings project, and fixing regression homework, and other stuff.

can't wait for finals to be over. studying for exam is one of the most time-consuming, stress-inducing, mentally exhaustive tasks, with little ROI depending on how you learn and how the exams are structured. I suppose you learn the value of deep understanding of concepts, and having high precision in your responses, and being able to perform under pressure.

10/6/2024

meet joe black

decided to rewatch meet joe black for the 3rd or 4th time? a 3 hour movie that's slow, but has beautiful writing and score. you feel a lot of the emotions of the characters. they have this coffee shop scene that i really like.

a few thoughts

  • prime brad pitt and his blonde hair style
  • claire forlani's eyes and sophistication
  • anthony hopkins' presence and authority throughout the movie
  • the way he describes about his wife
  • joe and his obsession with peanut butter
  • the jamaican woman in the hospital
  • “eryting goin to be irie now”
  • claire's realization of who joe is while they're hugging each other at the party
  • death and taxes
  • him saying goodbye to his daughter

many more thoughts but its late. overall this movie made me think about how mysterious life is, and the shortness of it. how all your achievements and work and material gains all reduce down to nothing but the relationships that you have and the kindness that you've shown to others and the good work that you've done.

to live a long, peaceful life, to have experienced love, to have worked hard and suffer to build a legacy, to go through the ups and downs with family and friends, to be so lucky where you can wake up one day and say "I don't want anything more". that is true happiness.


some quotes

Joe Black: I don't care Bill. I love her.
William Parrish: How perfect for you - to take whatever you want because it pleases you. That's not love.
Joe Black: Then what is it?
William Parrish: Some aimless infatuation which, for the moment, you feel like indulging - it's missing everything that matters.
Joe Black: Which is what?
William Parrish: Trust, responsibility, taking the weight for your choices and feelings, and spending the rest of your life living up to them. And above all, not hurting the object of your love.
Joe Black: So that's what love is according to William Parrish?
William Parrish: Multiply it by infinity, and take it to the depth of forever, and you will still have barely a glimpse of what I'm talking about.
Joe Black: Those were my words.
William Parrish: They're mine now.

William Parrish: Love is passion, obsession, someone you can't live without. I say, fall head over heels. Find someone you can love like crazy and who will love you the same way back. How do you find him? Well, you forget your head, and you listen to your heart. And I'm not hearing any heart. Cause the truth is, honey, there's no sense living your life without this. To make the journey and not fall deeply in love, well, you haven't lived a life at all. But you have to try, cause if you haven't tried, you haven't lived.

William Parrish: Yeah, I certainly hope so. Yeah. I loved Susan from the moment she was born, and I love her now and every minute in between. And what I dream of is a man who will discover her, and that she will discover a man who will love her, who is worthy of her, who is of this world, of this time, and has the grace, compassion and fortitude to walk beside her as she makes her way through this beautiful thing called life.

Jamaican Woman: It nice it happen to you. Like you come to the island and had a holiday. Sun didn't burn you red-red, just brown. You sleep and no mosquito eat you. But the truth is, it bound to happen if you stay long enough. So take that nice picture you got in your head home with you, but don't be fooled. We lonely here mostly too. If we lucky, maybe, we got some nice pictures to take with us

William Parrish: I thought I was going to sneak away tonight. What a glorious night. Every face I see is a memory. It may not be a perfectly perfect memory. Sometimes we had our ups and downs. But we're all together, and you're mine for a night. And I'm going to break precedent and tell you my one candle wish: that you would have a life as lucky as mine, where you can wake up one morning and say, "I don't want anything more." Sixty-five years. Don't they go by in a blink?

William Parrish: It's hard to let go, isn't it?
Joe Black: Yes it is, Bill.
William Parrish: And that's life... what can I tell you.

10/5/2024

apostles' creed

I believe in God, the Father almighty, creator of heaven and earth. I believe in Jesus Christ, his only Son, our Lord, who was conceived by the Holy Spirit and born of the virgin Mary. He suffered under Pontius Pilate, was crucified, died, and was buried; he descended to hell. The third day he rose again from the dead. He ascended to heaven and is seated at the right hand of God the Father almighty. From there he will come to judge the living and the dead.

I believe in the Holy Spirit, the holy catholic* church, the communion of saints, the forgiveness of sins, the resurrection of the body, and the life everlasting. Amen.

*that is, the true Christian church of all times and all places

notes on the the apostle creed from Pemuda (young adult service)

  • hell and hades is different
  • catholic means universal, not roman catholic

Sheol/Hades = the realm of the unbelieving dead, a temporary place where they await for resurrection

Gehena = permanent place and final place of judgement for the lost

prophecy of david -> claim of Peter at Pentecost ps 16:9-10 -> Acts 2:29:31

Hades

  • rich man in torment, looked up and saw far away, Lazarus by the side of Abraham
  • there is a gulf in Hades, split into two
  • Luke 16:19-23

Jesus and the criminal

  • Today you will be with me in paradise (Luke 23:42-43)

Jesus preached in Hades

  • He went and preached (proclaimed) go the spirits in prison, who formerly were disobedient
  • 8 souls were saved
  • Spirits in prison = unbelievers in Noah's time (place for unbelievers
  • 1 Peter 3:18-20

imprisoned spirits?

  • angels who did not keep their positions of authority but abandoned their proper dwelling (Jude 6)
  • "For if God did not spare angels when they sinned, but sent them to hell, putting them in chains of darkness to be held for judgement" (2 Pet 2:4)

Hell is Tartarus, the deepest pit of Hades. not all demons are there. some fallen angels who cohabited with women are there

conclusion

  • Jesus really died and went to the realm of the dead
  • proclaimed victory over death and the coming judgement, not preached salvation for the unbelievers in Hades

old man banging the roof for almost an hour last night made me groggy and sleepy the entire day. semester is almost ending. next week is finals. then it'll start all over again. there's so many things i want to do

wrote a bunch of things that came to my mind that I want to achieve during the car iq seminar series.

  1. PyTorch, implement model architectures from scratch
  2. Genesis - KG of bible
    • how to parse data? by book
    • verse by verse chunking - or chapter? Both?
    • use chroma db? pg vector?
    • what prompt to use for node and edge creation
    • can one line have multiple head nodes?
    • visualize embeddings on nomic
    • query with GraphRAG? vs normal RAG
    • try contextual embeddings
    • can embeddings find references to OT verses from NT?
    • how can this be used for bible studies?
  3. LLM agents course
    • write an article per course
    • code implementation for each paper -> mega repo called agents
    • build working agents
    • take notes of papers
  4. Google AI chrome hackathon
    • clippy: a context grabbing extension that unchartered productivity/ what you're doing on the web and summarizes research it / it accumulates overtime in a vector db. so you don't have to manually tag or organize

hard to figure out what to focus on. hopefully the meta internship narrows down what i have to focus on, helps me realize what i'm interested in and truly want to learn, and forces me to upskill in pytorch and deep learning and ml.

10/4/2024

japanese beef curry

migraine was really bad again today. is it ia lack of quality sleep? stress? braces? dehydration? screens? so many potential causes. i got home and just laid on the floor and was knocked out for an hour.

first time cooking japanese beef curry today with my mueller 6 qt enamled cast iron dutch oven. the entire process took 3 hours end-to-end. taste wise, the flavor could be better, and my beef wasn't soft enough. i think it needs curry powder and bay leaves. and i shouldn't use sirloin for my beef, but chuck roast.

spoke to someone working at stripe as a DA.

a few takeaways below.

3 components of work

  • define key metrics to track
  • running AB tests, interpret results, provide product strategy and recommendations
  • deep dive analysis, sizing opportunity, i.e. forecasting cost for next year, causal inference

writing culture at stripe

  • always write a doc when making a decision, get opinions from others, have a review session
  • compared to traditional powerpoint culture, no need to spend time on making things look nice

stripe vs MSFT

  • stripe is growing, still smaller than FAANG, fast moving culture, cares more about startups and smaller business
  • microsoft is large, things move slower. more solid structure for doing things
  • at stripe, you just do things, there's good and bad side
  • MSFT has a lot of usage data, how people use office and teams to run analysis to better serve users. ex: how to acquire more seats (more subscriptions for office products)

stripe DS vs DA

  • DS runs more ML models than DA
  • DS would deploy a model
  • MLEs would do even more engineering type of work, setup envs, MLOps
  • experiments is fun, you can learn something totally different from what you expected

experimentation at stripe

  • coordinate with eng where to put code,
  • do forefront analysis for how much samples needed, metrics to track,
  • AA testing to ensure experiment setup is right to not waste time to put in bad data
  • it's more manual, microsoft is more streamline to run experiments

experimentations are moments of truths

  • experiment results gives you learnings for product roadmap
  • i.e. results showed cost but also decrease in conversion rate, run further tests to decide action plan. care about cost or customer more?

advice on career

  • look at people ahead of you and where they landed
  • gather data points
  • it's your choice in the end
  • you have to try, sometimes you never know, you have to do it and see if you like it or not
  • it's a luck thing to find what you love

10/3/2024

ray summit day 2

Samsara

what? IoT for operations

mission: improve the safety, efficiency, and sustainability of the operations that power the global economy.

applications

  • video-based safety: dash cams
  • telematics: sensor data from vehicles, real time gps, routing, maintenance
  • workforce apps: compliance and training
  • connected equipment: location tracking, utilization
  • site visibility: remote visbility, proactive alerting

-> ai insights

ex:

  • driver performance
    • score, distance driven, total behaviours, % over speed limit
  • safety inbox
    • harsh events / speeding events
    • review dashcams of drivers that had accidents, coach the driver
  • seatbelt detection
  • drowsiness detection

Recursion

what? biotech company with >50 PB proprietary biological, chemical, and patient-centric data

in drug discovery, everything is a model (scientific)

each model is a proxy built by scientific experts

data -> database -> ds analyze data

ai-based drug discovery, we model (ML) all the models (scientific)

cell images are information rich and inexpensive and scalable

runs >2.2M experiments per week

the idea is you dose diseased cells with increasing levels of concentration of medicine, and observe the effects

images are analyzed by masked auto encoders (MAE)

their paper: Masked Autoencoders are Scalable Learners of Cellular Morphology (code)

why use this? it's not about generating image of cells

masked autoencoding: self-supervised generative AI

reason: the intermediate layer of encoded and decoder carries a lot of useful information that can be used for downstream task

  • MAE embedding: how images are different from control and perturbed cells

25.7% increase in expressed gene knockouts detected with the new model

with this approach, we discover NEW models of biology

instacart

core customers is family

answering the question "what's for dinner?"

#1 chatbot

  • expected: how to eat healthier, meal plan
  • reality: customer support, edit order, order details

it turned into a customer support chatbot

#2 search

ask instacart

soccer snacks for 20 kids

#3 catalog generation

today instacart has 85k stores, billions of items

bought one of every item, take photos, send it to extract attributes, put it on site. they've done this for millions of items

catalog augmentation today:

take product information + images -> LLM -> catalog

this is 10x cheaper

#4 internal productivity

Ava to democratize AI use across company

  • internal chat platform with prompt library, access to latest models
  • built into slack
  • internal slack room to train employees genAI

lessons from AI boom

  • invest in AI usage across your company
  • make tooling usable by your whole team

future of instacart: even more personalized down to nutrition labels

kevin weil

  • how is prod management different at openai
    • depends on culture of company, twitter is consensus culture
    • sam is visionary, pushes us to think big, but leaves room to build the right things
    • technology floor is not fixed at openai, its doing what computers couldn't do ever before.
    • you can't quiet see it coming, you can kind of see it through the mist, you often don't know until the model is baked
    • the way you build a product for an 80% correct thing is different from a 99% correct thing
  • strategy for developer facing roadmap
    • philosophy: more AI = better for the world
    • bring new capabilities, multimodal voice API, distillation
    • more intelligence, cheaper, faster, safer
    • gpt4o mini is 1% cost of gpt3 when it launched, it's massively smarter and safer, in <2 years it's 99% reduction in cost
    • the cheaper AI is, the more problems we can solve
  • open source models
    • from mission perspective: getting more AI into hands of people
    • open source is a great strategy
  • competing with cloud providors
    • microsoft: openai
    • google: more direct competitor
    • amazon: anthropic
    • competition makes all of us better, users get better models
    • up to openai to take more risks in the product
  • o1 use cases
    • let model think for 5 hours, days or months for hard research questions
  • move from AI answering questions for you -> AI doing things for you
    • not 5 minute tasks, but 5 hour tasks
    • even mundane and efficiency things takes reasoning, to take a complex world and act on it
    • reasoning will be a big part of the future
  • it will be a different way of building product, things will become async
  • build the things that just don't quite work today, and in 6 months, you'll be ahead of everyone else
  • if you're building something and you're afraid of someone's next model launch, you may not be building the right thing
  • if you're building something where you're looking forward for the next model launch, that's the right place to be
  • ai consumer product monetization
    • social media was ads, what about AI? subscriptions?
    • legal: $1000/hr for 6 hours to write a document can be done in 5 minutes with o1 with 3 dollar API credit
    • how do we share value of creation, bringing AI to the rest of the world that may not be able to pay
  • planning chatgpt roadmap analogies
    • think of systems like another human
    • people don't just blurt out CoT or go mute for 60 seconds, people give periodic thoughts when they're thinking, how to reflect this in the product?
    • the way people write and talk have discrepancies, written and spoken English are different languages
    • build neediness in voice mode to get the conversation going
    • PMs shape personality of models
  • richer responses
    • model responses need to be richer, now it's a lot of back and forth of text
    • it needs to be more naturally integrated with voice and video
    • how long will chat interface hold?
  • left/right leaning, politics
    • 50/50 should not be topics that models should take a stance
    • model spec: this is what the stance we expect our models to follow
    • if the model is not behaving in a natural way, two reasons
      • not following spec: fix the bug
      • you disagree with the spec: a debate we can have
  • who writes the specs?
    • we employ writers to get its emotions right
    • twitter: "no matter how many smart people you have within your walls, there are way more smart people outside your walls"
    • at open ai, philosophy of iterative deployments, when it comes with safety and societal aspects, getting models out there, and progressively exposing them to broader groups of people is how we slowly drive positive change
    • model spec is public so it receives feedback from people around the world
  • future
    • personalized tutoring for kids
    • we are more eval limited and not intelligence limited
    • applying datasets that are not public
    • making the model really good at specific things

petabyte scale embedding generation

embedding intro

  • vectorized numerical representation of information
    • closeness in vector space = data itself is similar
  • powers search, classification, chatbots, RAG-based
  • involve model inference

offline embedding generation

  • 3 characteristics: huge data volume (text, audio, video), predictable workload (optimize for throughput), heterogeneous compute
  • various input data formats
    • raw text
    • semi-structured: JSON, JSONL
    • structured data: parquet, Avro
    • table format: iceberg
    • import ray.data -> ray.data.read_datasource(CustomerDataSource())
  • preprocessing on CPU
    • preprocessing steps - tokenization, image resizing, audio/video decoding
    • filtering - conditional based filtering
    • special step - customized preprocessing logic in python
    • you can scale each step independently by providing more CPU cores
  • partial data preprocessing
    • gpu-based preprocessing (in batches)
      • ex: image transformation, transformation workload that works better for GPU
    • embedding model inferences: run embedding model
  • compute resources utilization
    • with ray.data, read data in mini batch in CPU, do preprocessing, transfer through Object Store Buffer, move to GPU, embedding model inference
    • resource utilization could be low if any steps becomes bottleneck, mainly GPU preprocessing and embedding model inference
    • when a batch finishes, there should be another batch already loaded in memory
  • two jobs
    • #1 pure CPU job, transformation -> intermediate data storage
    • #2 embedding generation job -> vector db
    • why? ray data is not great at handling heterogenous compute, so split into two
    • intermediate data is formatted in a way for job #2 to consume, so GPU utilization is high
    • problems
      • two jobs to maintain
      • orchestration solution needs to be introduced
      • wasted I/O time
  • takeaways
    • flexibility
      • flexible input types and output sources
      • custom preprocessing logics
      • support different embedding models
      • support different accelerators
    • high performance
      • scale each step independently
      • stream execution to avoid GPU ideal time (key for high throughput)
    • scalability
      • scale to thousands of CPU and GPU nodes

online embedding generation with ray

why online?

in a RAG app, we will get a user query, do preprocessing steps, and then generate embedding, use this embedding to perform similarity search on embeddings from offline generation, pass it to model of choice, and return a response

requirements

  • generate in real time with high availability and low latency
  • input can be many format
  • preprocessing involves multiple steps / model inference
  • processing step involves CPUs and GPUs
  • easy to use, validate, and iterate

real time processing

  • RayServe hosts a live endpoint for a real time query
  • hosts multiple routes with different HTTP methods
  • ability to specify a full suite of configurations

highly available and efficient autoscaling

  • global control service fault tolerance for graceful recovery from failures
  • default and custom health checks
  • configurable replica step per deployment

complex logic and heterogeneous cluster

  • support for model composition
  • each of models/deployments can scale independently
  • support for heterogeneous RayCluster
  • ex: text sanitization and translation before embedding a text query

rewatched lectures, watched office hour recordings, wrote out notes twice, was at campus from 3:30 p.m. till 10 p.m. i put more effort into this class than the last quiz, hope that reflects in the exam. i don't know why i got so anxious for this test, it's the lingering effects of blanking in the last quiz that is spurring fear-based motivation in me to get all the details right, to fully understand everything without skipping any details or taking any shortcuts. this might be the most ive studied for any subject. i don't know why i'm struggling so much in this material. might be my weak foundation in math.

10/2/2024

ray summit day 1

a dump of notes i took for ray summit day 1

ion stoica

  • mid 2000s : big data and classical ML with hadoop and spark
  • mid 2010s: deep learning and RL. GPU started to become indispensible
  • mid 2020s: GenAI

5 trends

  1. Scale
    • growing 5x every year
    • cost of training 10x every 2 years
    • this happens on inference too: o1 model takes 10s seconds and more context
  2. massive unstructured data
    • text, audio, image, video
  3. sophisticated post-training
    • pruning & distillation
  4. AI powering teh AI stack
    • ai is used to optimize model development
  5. compound AI and agentic system
    • involves 100s of models
    • exploring LLM based intelligent agents paper

these trends spurs innovation in

  • hardware accelerators by NVDA, AMD, Aws, Intel, etc.
  • GPU pods (clusters)

CPU-centric -> accelerator centric world

  • AI clouds (lambda, aws)
  • Frameworks (SGL, VLLM, TensorRT-LLM)
  • Tools for monitoring
  • Models (hugging face)

engineers spend time writing yaml files and troubleshooting kubernetes

we need a software engine

  • support any ml workload
  • any data types and model architecture
  • fully utilize any accelerators
  • scale from laptop to thousands of GPUs
  • abstract away complexity of infra from end developer
  • serve as flexible and unifying platform for entire AI ecosystem

AI compute engine

3 core problems

  • managing compute resources

    • autoscaling, spot instance support, hardware failure handling
  • managing data

    • distributed object store, shared memory, futures, optimized data movement (NCCL, RDMA)
  • executing workloads

    • scheduling, fault tolerance, management of stateless and stateful tasks, dynamic and compiled graphs
  • instacart training on 100x more data

  • niantic cut LOC by 85%

  • canva cut cloud cost by 50%

announcements

default execution to ray

  • dynamic memory allocation
  • expensive copy GPU-to-CPU memory
  • expensive transfer over slow network
  • pass args and references

solution: compiled graphs

  • create and compile a static graph to execute repeated tasks
  • pre-allocate static buffers; reuse them in many places
  • no need to pass args and result references
  • direct GPU-to-GPU transfer

ray data

unstructured data is the fastest growing use case

they require mixed CPU and GPU compute

ray handles

  • streaming ingest
  • last-mile preprocessing
  • ingest for training

spark, hadoop are all CPU-based, and works best on structured, tabular data

AI workloads are GPU-centric and requires unstructured data

amazon cut cost by 82% moving from spark to ray data, cutting $120 mil a year

runway

runway ML is focused on world modeling with visual data.

many aspects of the world are not captured through language, it's a lossy state. using video data, their gen-3 alpha has emergent capabilities of understanding physics like how liquid flows, water splashes, even though not trained on it

they referenced Scalable Diffusion Models with Transformers which sparked SOTA image generation

other research papers on their website

gen-3 alpha challenges

  • size of samples are orders of magnitude greater than language models, network challenges, handle communication computation overlap well
  • data preprocessing challenges, dataset curation, quality of data is important

science is about modeling the distribution

art is going out of distribution

the more you can model reality, the more you can build very accurate distribution of the world, and the more you can out of distribution.

the future involves AI in film making, they've hosted an AI film festival and supports professionals working on AI-augmented film projects with the hundred film fund

marc andreessen

  • it took 70-80 years to prove AI was possible, (2012) image net -> self-driving -> transformers (2017)
  • ai systems = new kinds of computers
  • traditional computers are deterministic systems, you always get the same output. ai systems are probabilistic computers
  • there's a fine line between hallucination and creation, people hallucinate too
  • why AI is better now? moore's law provided compute power and internet provided the data
  • are people in the future going to use video or photo editing software? or will they just speak to get what they want the computers to do
  • adding AI to your product is like adding flowers to a cake, it doesn't really work well. if you want to build a good product, the flower has to be in the recipe
  • bullet point no 6. phenomenon a 5 year old company adding AI to their 5 bullet points slides
  • biotech: challenge of data, gathering all human genome data, in china it is fine, but in the US it is illegal
  • ai and geopolitics: mid 2010s AI and autonomy is the third offset in military. 1st was nuclear, and 2nd was maneuver warfare (advancement of GPS).
  • DARPA self-driving challenge was in 2005. once you talk the pilot out of the plane, you can do all kinds of things
  • ukraine war, russians have guys in tanks, whereas ukranians have autonomous drones and javelins
  • iranian war, USA using millions of dollars of tomahawk missles to destroy drones costing only thousands of dollars, it's like a slippage of time, these technologies are in the same era
  • strongest military force: who has the best technology and money
  • will you still have human soldiers at risk in planes and submarines in 20 years?
  • having two kinds of conversations in D.C.
    • tuesday conversations : US vs CHINA, what can SV do to advance technology
    • thursday convo: focused on US, technology is freaking us out, we need to regulate and shutdown, we have to slow down, etc.
  • why has technology gone political?
    • it's all our fault
    • the dog that caught the bus, it catches on the tailpipe and just keeps being dragged across the street. we are the dog
    • people in holywood are freaked out about AI being able to generate full movies
    • people are going on strike against automation and technology
  • can AI level up this discourse?
    • computing used to be a 30 mil technology that slowly trickled down
    • today AI is released to everyone, there's a general uplift of intelligence for everyone, access to intelligence on their fingertips
  • open source
    • people in California are lobbying to slow down, existential threat
    • EU implemented a stifling blanket of regulation
  • robotics
    • the whole history of AI and robotics is you get low-level first, robots to pack your suitcase and clean, and robots that can play songs and draws
    • today it's the reverse, it can be creative, but we don't have robotics yet
    • Unitree in china has a huge supply chain of robotics
    • robotics is very close, we might be a few years fromm humanoids gathering data like tesla cars
  • who will be the AI winners
    • you need a strategy
    • a lot of questions are on the economic side, where the value is going to be?
    • are LLMs going to be a question of who has the best model? that's what happened to google search
    • or is it the race to the bottom? where intelligence is like selling rice? anyone can use an open-source model, anyone can buy GPUs to get the same result.
      • google paper: anyone who has the same data, can get the same results
      • evidence: price per token cost has dropped 100x in the last year
    • having full competitive open source changes things, llama models release changed things
  • NVIDIA gpu
    • the other argument, they draw a huge profit lead, which draws competition and other startups who wants the piece of the pie
    • developing chips from scratch might do better than GPUs who were originally made for graphics
    • NVIDIA might do well for 5 years, and other competitions take over
  • advice for founders
    • big thing is it rarely makes sense to just start a company and go search an idea
    • it's usually deep domain experts who's been in an industry for 5-15 years, deep in the trenches trying to figure out better ways to solve problems
    • how to operate in a rapidly changing environment: always be running experiments
    • doing a new thing is always scary, what if it doesn't work?
    • run the smallest experiment, smallest customer segment, learn as you go, without having huge downsides to risk

some thoughts from talking to people at booths

questions to ask people at sponsor booths

  • what does your company do?
    • follow up questions if possible
  • who are your competitors? what makes you different?
  • who are your customers?
  • what are some interesting use cases you've seen for your product? success stories?
  • what is the future roadmap?
  • if consumer product, do you personally use it? have you built anything interesting?
  • how long have you been there?
  • what gets you excited about the company?

thank them for answering your questions, get swag, move on.

10/1/2024

ds at apple

I’m very solitary, that’s all ... I can’t dismiss it. Inside, I’m very much in communication with a lot of people and things who absolutely don’t know I’m in communication with them.

– Jean-Luc Godard

teeth pain is super distracting and uncomfortable. it makes me irritable and moody. i can't enjoy my food. the pain is always there.

the best time for gym is post 7 pm. most people are already leaving, and i have a fixed time window for my gym time ~45 minutes. the best time on sunday is after church.

worried about LR test. i have ray summit to attend the next two days. based on today's session there seems to be a lot to catch up on. i'm still unsure about everything. the cost of not paying more attention in class, and skipping class for the meta interview that one day, and not being more curious and asking questions, and not doing the homework properly is being paid right now in more anxiety, more time, and more effort. i find myself studying for cs a lot easier compared to statistics classes. am i just bad at math? how am i statistics major?


spoke to a DS manager at Apple and here are a few takeaways

  • for internship
    • first step is build trust and relationship with team, get familiar with them, the scope, the projects, the problems. don't be shy to ask questions. talk to everyone.
    • second is seek alignment. ensure you're aligned with your mentor's goal for the role. they do not like surprises. ask "am i on the right track to deliver on X?"
  • on culture
    • meta culture is move fast. when you're blocked, don't just stay stuck, seek help. if you're waiting on an answer, solve a different problem in the mean time.
    • apple culture: move fast in a thorough way. high quality and standards for products and features. looks for creativity. people who are proactive and can innovate.
  • interviewing
    • read the job description clearly, you don't need to satisfy all requirements, there are must-haves and nice-to-haves
    • interviewing is a mutual selection, an equal partnership
    • think of your interviewer as your colleague, what kind of questions will you ask them and vice versa
  • on how to solve problems
    • as a junior, you don't have to worry about finding the best solution for a problem
    • what matters more is you have in your mind good reasons for why you chose method A, B or C
    • even if all of those methods fails, you learned that they are not good solutions
    • your learning process demonstrates your ability to pivot to different solutions.
    • as you grow in your career, you accumulate experience to know the right$$ pieces to the puzzle
  • biggest mistake
    • not asking why when you're assigned a task

9/30/2024

View the archives