The race for the first truly open base model fully equivalent to GPT3/4 is on!The LLaMA release proved that an excellent model can be published and run on consumer-grade hardware (seeBell.cpp), but the research license prohibits companies from using it and all its variants (alpacas, camels, koalas, etc.) for their own use in their work. So there is a lot of interest and desire for a *true* open source LLM that can be used commercially (with better customization, fine tuning and privacy compared to closed source LLM APIs).
The previous main competitor was EleutherovGPT-JexistNeoMala strana (FLAN-T5 (137B),palm(540B), aBLOOM van BigScience(176B) High. But as far as I know, Databricks is the first to release a high-quality licensed LLM that not only runs on affordable devices, butA simple Databricks notebookCan be customized to your specifications/desired style - only $30 for 30 minutes on one machine!
Mike Conovertells how a small team of applied artificial intelligence engineers convinced Ali Ghodsi and 5,000 of his colleagues to join the venture, building the first open-source, instruction-based LLM enhanced on human-generated teaching datasets and licensed for research and commercial use use. He also answered our questions about other recent LLM open source projects, CerebasGPT and RedPaj, although we documented them before stability weekStable LMVersion.
Last until the end and eat Easter eggs with AI Drake!
record in beautifulstudioThe studio is in San Francisco.
The full transcript is below the fold.
*Mike ConoverLinkedInexiston twitter
*Alessio's GPT email installer
* open model
*reflection,Recursive criticism and improvement
*Wheel of lightning
* Product of artificial intelligence: Google Maps
* AI ljudi: EleutherAI, Huggingface'sStas Beckmann
* AI Prediction: Turn on LLaMA to play, AI twins (drake), admiring the confusion
* Startup Requirements: LLMOps/benchmarks, path mapping
* [00:00:21] Meet Mike Conover
* [00:03:10] Dolly 1.0
* [00:04:18] Making Dolly
* [00:06:12] Dolly 2.0
* [00:09:28] Adaptation of gamification instructions
* [00:11:36] Summary - thumbnail of the language
* [00:15:11] CICERO and geopolitical AI agents
* [00:17:09] Data sets and intentional design
* [00:21:44] Biological basis of artificial intelligence
* [00:23:27] Training your own LL.M.
* [00:28:21] You probably don't need big models
* [00:29:59] Good LLM use cases
* [00:31:33] Dolly is $30 on Databricks
* [00:36:06] Databricks open source
* [00:37:31] LLMOps en prompt-tooling
* [00:42:26] "I'm Sheet Maxi"
* [00:44:19] AI and workplace productivity
* [00:47:02] OpenAssistant
* [00:47:41] CerebrasGPT
* [00:51:35] red pajamas
* [00:54:07] Why Dolly > OpenAI GPT
* [00:56:19] Open source licenses for AI models
* [00:57:09] Why the open source model?
* [00:58:05] Moving the model
* [01:00:34] learning in simulation
* [01:01:28] Why model reflection and self-criticism work
* [01:03:51] Lightning wheel
[00:00:00] Hi everyone. Welcome to the Latent Space podcast. These are Alessio Partner and CT and Residence and Decibel Partners. I'm Joan Bama, co-host of swyx Brighter and editor of Space. Welcome, Mike.
[00:00:21] Introducing Mike Conover
[00:00:21] Hey, great to be here. so yes
[00:00:23] We try to get to know you a lot so you don't have to introduce yourself. That.
[00:00:27] But then we also ask you to fill in the blanks. So you're currently a software engineer at, uh, Databricks. Uh, but you got your PhD in complex systems analysis from Bloomington University in Indiana, where you did some, uh, cluster analysis on Twitter, which I thought was interesting.
[00:00:43] Yes. Uh, if you're interested in information from indirect sources or me, I highly recommend people look into it, I don't know how you would describe it. That. That. Then you go to LinkedIn and continue working. Homepage News, Relevance, then SkipFlag, Business Intelligence Knowledge Graph, then acquired by Workday, where you became Director of Machine Learning, and now your Databricks.
[00:01:06] Here's a short biography so we can go through it a little bit. Step by step. But uh, people don't have anything on your LinkedIn
[00:01:12] About you? So, since I work on LinkedIn, this is basically how new hires introduce themselves on LinkedIn. Well, me, okay. I have an answer ready. Um. Well, I like to go on backcountry trails.
[00:01:25] Okay. I, you know, I think the kind of total responsibility that comes with that clears the mind. And I think what I really like about machine learning engineering and the topology of high-dimensional space comes when you think of terrain mats as contour maps.
[00:01:44] You know, it's a two-dimensional projection of a three-dimensional space, which is very similar to looking at a visualization of information and trying to connect it. Local awareness of your surroundings and the outline of a ridge you see, or a basin you can walk into, and you think there's a stream.
[00:02:04] and connect that to the projection you see on the map. I think it's very physically demanding. This is intellectually demanding. This is natural. Beauty is a big part of it, you usually spend time with your friends and I just love that. I love that these are camping trips. Ugh, gone are the days. That. That.
[00:02:21] Camping. I, I also hunt, you know, I, uh, archery, uh, backcountry hunting, but yeah. You know, sometimes it's simple, let's take a walk in the woods and see where it goes.
[00:02:32] Oh yeah. Have you ever thought about those trips to the, uh, Australian outback? For example, where are people located?
[00:02:40] I did
[00:02:40] Mountain. I'm a mountain person. I love you mountain people. I like to fly with fish. I like it Do you like hiking? That. As if the wilderness seemed beautiful. I think 8 of the 10 deadliest snake species live in Australia. Like I, uh, yeah, you're fine. you are fine. That. That.
[00:02:52] Yes. All lessons are from real mountaineering
[00:02:55] And machine learning, hiking.
[00:02:56] Good guy. It is very similar to gradient descent. Yes my friend. Well, yes, I already noticed that. Yeah, I don't know, I'm not sure. It's like the least resistance, please.
[00:03:10] Dolly 1.0
[00:03:10] Great. So Dolly, you know, in the last three weeks you've gone from a brand new project at Databricks to one of the most popular open source projects.
[00:03:19] So on March 24th you have Dolly 1.0. It's a 6 billion parameter model based on GPT-J 6 billion, and you see the alpaca training set to train it. The first question is: why did you start with GPT-J and not LLaMA where everyone started
[00:03:34] At that time. Yeah, well, I mean, well, you know, we talked a little bit before the show, but LLaMA is hard to find.
[00:03:40] We've been looking for model weights, but haven't heard anything. And you know, I think our experience with, uh, Dolly's original email alias, before it came out on the hug side, hundreds of people were asking for it, I think it's easy to just not get a reply to the inbox.
[00:03:56] Hmmmm. So, I mean, there is a practical consideration of not having LLaMA weights, but most importantly, I think it would be more interesting if someone could make it. rectify. Well, I think we, um, I've used the GPT-J model in the past and I know it's high quality from a grammar point of view.
[00:04:15] Well, I think it's a reasonable choice. Uh Huh. That.
[00:04:18] making a doll
[00:04:18] Yes. Maybe we should, we can also understand why you started working on Dolly. Uh, you've been with Databricks for about a year. Uh Huh. Is there, is it like a top-down guideline? Is this your idea? we'll see, uh
[00:04:31] What happened? I have been working with NLP and language understanding for some time.
[00:04:36] I mean, since the Skip logo is back on 20/16, 20/17 we can introduce the Skip logo, I'm sorry if that's the case. You know, we shouldn't pay too much attention to it, but it's an area where information flows through human networks, and that's a long-standing interest of mine. We did a hack day project, I handed it over to our CEO, you know, that's when ChatGPT came out, it was an integration of developer experience.
[00:05:02] I think this should exist as a user. I want this. Uh Huh. We should build this. It doesn't have to be ours. I mean, for us, uh, our leadership team has been on this journey for about 10 years, probably longer than at Databricks. They still are. so hungry It's crazy. Watching these, these people in action, you know, it's crazy to run this far in a marathon.
[00:05:23] He said, great, do it. fix that. So, you know, I, we work full time on infrastructure forecasting and infrastructure optimization. We are, you know, uh, we just started building, you know, so we've been working on this kind of technology for a few months.
[00:05:46] We have a beam that partially rotates on top of our foot. Uh, we're reusing a lot of existing code that we've built over the last few quarters, uh, to build Dolly and, just to
[00:05:58] To clarify, is this an internal set or is it obtained externally as data?
[00:06:02] A lot of things that we open up, you know, like that, that, that, that, I mean, no, it's by no means the whole series, but it is, it's some of the core component. okay Yes.
[00:06:12] Dolly 2.0
[00:06:12] It only took 19 days to go from 1.0 to 2.0. Yes. So 2.0 is 12 billion. Thus, the number of rotation parameters. You base it on Elu's model series.
[00:06:23] Instead, I think the biggest change is that instead of using a change-generated alpaca turn set, which has its limitations, you get brand new, uh, created training datasets created by Databricks employees. So I want to talk about how you do that. You know what, you just go around saying, hey guys, I like today, take a day to come up with a set of instructions?
[00:06:47] Or are people involved voluntarily?
[00:06:50] Yeah, again I mean our founding team, they see this, I mean as much as anyone you talk to, they're new founders or they're trying to work in this space People, like us executives are enthusiastic and they're going to see the light neon future and will confidently lead Databricks.
[00:07:12] Step into the world. So Ali only emails twice a day. do it do it You know, we put together, you know, we use a series of InstructGPT type tasks, you know, generating content, thinking about qa, on qa, paraphrasing, whatever, we're basically putting together these Google Sheets.
[00:07:34] You know, how can we build this as quickly as possible? We see that need, you know, the alpaca trick works really well. Oddly enough, it's not obvious to us, you know, for GPT-J or even ILLaMA, you know, hundreds of billions of tokens on the train, this whisper of new data, you know, kind of moving inward, Shifting parameters, um, tensors to new parts of the state space.
[00:08:02] I think, you know, my experience is kind of related to statistical physics, I think there's a bit of a phase transition. Uh Huh. Like ice and water. Just like them. The difference between the two is very, very small, but they couldn't be more different. So Ali was pestering, like a huge email list.
[00:08:21] Well, thousands of people. And, uh, it worked. The other thing is, you know, it's thanks to our people, people who see this moment and want to be a part of something. And I think there is only passion and enthusiasm. this way. so it's easier than you think
[00:08:37] The answer is, so you put some answers in a blog post.
[00:08:40] Yes. And they are very wide. Because one of the questions is how to light a campfire? That. Then the answer is four paragraphs
[00:08:46] Really really, I think yes, really. That. I think part of it is because of the rapid adoption of these technologies, you have hundreds of millions of people, you know, who knows what the numbers are.
[00:08:58] But on ChatGPT. People are educated about what they expect from these tools and have an opinion. Well, I mean, you know, a lot of the answers are written in the style you'd expect from these helpers. And I thought I'd like to talk a little bit about how this question was asked, because this is really relevant to our business customers, how the composition of the data set qualitatively shapes the resulting behavior of a precise model exposed to this stimulus.
[00:09:28] Explanation of gamification of connection
[00:09:28] You know, you're looking at a dataset like Flan, which is a very, very large dataset, I mean, I mean over a thousand tasks. Well, it's, you know, something like that. Gold standard teaching data sets, many of which already have synthesized answers, we will talk about assessment, but the answers are very short. You know, it's like broadcasting the positive or negative relationship of a word, you know, like a judgment about the feeling of a sentence.
[00:09:52] So it's very multitasking, I mean it seems like there's thousands of different kinds of tasks that are done a little bit irregularly, you can't adapt much to one particular behavior, so you have to compress and do things a lot is not bad . So I think you end up having to interpolate between different types of behavior in this way.
[00:10:12] But also something like, when do you predict the end of the range marker? Especially adjusting the instructions if your completion time is short. Our empirical observation is that fine-tuned models produce shorter results. So there is how to start a campfire. And as a narrative, a thoughtful humanoid description.
[00:10:36] I think it takes a demonstration to get that behavior out of the model. You have, you have a chart of the best, uh, who made it
[00:10:43] What, uh, fun gamification shenanigans?
[00:10:46] Well, the thing is, you know, I think you can ask people to help you. Uh, you know, like some people always go too far and then sure.
[00:10:55] Yes. Okay, so you're definitely seeing long-tailed distributions. I think I looked at the open work of the assistants last night, I mean don't quote me on this, but about 12 people made up 10% of the total responses, which is great, it's just that human systems have a long tail distribution terms activity thing.
[00:11:12] Yes, yes, that's right. Therefore, it is not surprising. We see it to some extent in our data set, but not in this way if you open it up to the whole internet. So, I think that people are motivated colleagues. That. Doing the right thing, you know, that's, you know, our thing too.
[00:11:29] Like us. Hopefully it's actually useful and not just a practical feat. I think people have figured this out.
[00:11:36] Summary - a thumbnail of the language
[00:11:36] Does he have an assignment?
[00:11:37] Is it particularly difficult for you to get data? Do you love a good data summary?
[00:11:41] Oh, because it's long, uh, it's long, it takes thinking, you know you have to synthesize and name people all over the place in a wikipedia paragraph, like I can do while watching TV, but It's like writing graduation thesis.
[00:11:59] Yes, it is difficult. Yeah, there could be more structure, like, uh, how much new signal each record introduces into the model from an information-theoretic perspective. I hope the summary is true. A demanding job that doesn't get overwhelmed easily. We're developing our own, and I don't have a definitive answer on how it works because we're still an open research project for the company.
[00:12:27] Yes. Well, I, you know, I'm just arguing, I think sometimes the compression becomes more important the more AI is created. For shipping, because we have to expand a little bit, we're seeing contracts again, based on what we spend, uh,
[00:12:41] Information. real. I mean, gossiping a little bit, I think your company has too much material.
[00:12:48] You think about something like, uh, PRD, like, or, you know, a product demand list, you know, rational people. You want a zoom-in language, you want to see the high-level structure of something, and then be able to get details on demand, just like you pan or zoom in an information visualization.
[00:13:09] I was with. Well, the head of AI at Notion and who, you know, you probably know, as a pretty amazing person, what does this kind of idea, epitome of language look like? Because your visual cortex is built in such a way that it is evolutionarily highly conserved, capable of seeing something and perceiving its essence.
[00:13:28] This makes the thumbnail field display. Like you, I think you'll be talking to the, uh, Lexi guys soon. You could see us enjoying the graphic realm while answering questions and feeling like, oh, these are moody cyberpunk scenes. Uh Huh. Which is the language? Maybe it's like, maybe it's not there.
[00:13:52] Maybe so. Stop me if I get too far from here. But you see clothes as technology that shapes our physiology. rectify. Like our phenotype, our phenotypic expression, we were once covered in hair. We developed this technology, fire falls into this category, and our bodies have changed over a long period of human history.
[00:14:15] Hmmmm. It's possible that the way the visual cortex is evolutionarily conserved to perceive things quickly dictates how we process information. I have no idea. Now what about the language. It's like reading tons of examples of different models and seeing how they perform as we move through the loss curve.
[00:14:34] This does
[00:14:34] A sentence. I mean, when you think about images in text, you don't really have peripheral vision. You know, when you want to see something, you focus on the most important thing, and then you want to start expanding to see the rest. That. Since text is a single type, the loading densities are the same.
[00:14:49] Nothing stands out when you see the tax wall compared to the picture of NI. As usual something jumps out first. That. So I don't have an answer either. I mean, I'm really curious about that word
[00:14:58] Cloud, what, but, that's it, what a joke, right? Wait for me. Yeah, that's like a punch line.
[00:15:06] You have to have that
[00:15:06] Done, you know, you, your Twitter
[00:15:08] Work. I have traveled through some word clouds in my time.
[00:15:11] CICERO and geopolitical AI agents
[00:15:11] Well, you know, I was wondering about this too, what excites you most about artificial intelligence? For example, what do you see as your greatest potential? One thing I'm thinking about is. It allows agents to negotiate current geopolitical issues.
[00:15:31] So if you look at Cicero's work from Meta, you can summarize it for those who did it. So I mean, you know, I don't want to present other people's work as if you're just saying yeah, that's right. But, um, I understand that diplomacy is, um, a negotiation game that's based on moves, like the risk of you all making decisions together at the same time and trying to convince people that you're going to do something or not.
[00:15:56] And, uh, the paper is co-authored with a top diplomat, and Meta has built a system that is very, very capable in this negotiation game. I. It is conceivable for nation-states to find an optimal game in theory, but in practice it is a steady state that cannot be exploited. Uh Huh. You know, when you think about a lot of large-scale geopolitical disputes, human mediators can't find a compromise, maybe AI can meet the conditions that you think, yeah, not really, it works for me.
[00:16:36] Hmmmm. And your thoughts on phobias and attention in general and how the actual visual cortex works. The idea that great writers say something a certain way affects a unique structure in your brain and you have this chemical cascade, maybe we could design systems where everyone compresses very long documents to maximize information transfer, and the thumbnails might look like this.
[00:17:04] Yeah, maybe it's all emoticons. I have no idea.
[00:17:09] Data sets and intentional design
[00:17:09] Obviously datasets are one of Dolly's big things. That. But you talked about some of these technologies like Discovery that weren't designed, maybe you're talking about the process of getting to Dolly, you like to experiment
[00:17:22] So it's not my friend my best friend Jacob Burke has this insight that you're an artificial intelligence designing jet turbines as if formulating the same plan. Uh Huh. You, you know, have a working model of aerodynamics and run it on jet turbines. I think what we're seeing is the overall picture of artificial intelligence. You know, this order-following behavior that we see in Dolly doesn't exist in the base model.
[00:17:53] You know, it's going to be effective, you know, it's a very strong base model, but it's just going to prefix like random sites on the internet. We have Databricks, but the Alpaca community has also discovered that you can break them and get completely different behavior. It's not exactly by design.
[00:18:13] I mean, it's designed to mean that you have an intention and then you see it happen. But we don't like to choose the parameters by which they come. My question is what other features are lurking in these models right? GPT-J is two years old. What else can he do? Is this surprising?
[00:18:36] It could be, I think you're looking at, you know, specifically, that's why the Pithia suite is so cool, you know, a lot of credits. With that in mind, I think it may take some time for the research community to figure out what to do with these artifacts they create.
[00:18:54] But it's basically like this matrix of control point models and sizes, you say, I mean 110 million to 12 billion, which is the basis for Dolly Two. And then at each checkpoint during training, I mean 2 million. That. coin. That. Um, so I think the Pithia pack is just trained on a bunch, so it's like three or four hundred million, which is probably undertrained.
[00:19:18] Have you ever seen this red color? I think the red pajamas came out this morning. They replicated the ILLaMA training dataset. So that's 1.2 trillion tokens, uh, I mean, you know, that's a separate topic, but we took a close look at what it takes to reproduce the LLaMA dataset. It's like, non-trivial.
[00:19:35] I mean, put Common Crawl online and then mostly deduplicate it, you know, filter it by quality. Thus, the Common Crawl dataset in LLLaMA fits a model to predict whether Wikipedia is likely to cite a page in Common Crawl. So it's a way to like, for example, I don't want a list of phone numbers or like ads.
[00:19:58] That's a lot of work. Anyway, with Pete, I think we can start asking questions like this checkpoint size and depth matrix. We have these different model parameters. How is behavior generated during training? On another level, you know, maybe it's not a discovery process.
[00:20:22] Maybe we're more intentional, for example, I want to elicit a fol, I want to summarize, I want to close a form, answer questions. Those are the only things that matter to me. How much data do I need? Generate or buy, how many parameters do I need to solve that compression problem? Maybe it's become more deterministic, but right now it feels like we're just trying something and seeing if it works, which is very different from a lot of engineering disciplines.
[00:20:51] I'm curious, does this reflect your experience? like me
[00:20:54] I think we have a whole episode about, uh, things like losing size and Varun for Exafunction. I feel like when Chinch's paper came out, a lot of teams looked at their papers and thought, we're just throwing darts.
[00:21:07] Right. now it's like this
[00:21:10] 1.2 to, uh, 1.7 tokens, uh, you know, per, uh, per parameter. And, uh, now we're going to do it all over again
[00:21:16] 20 tokens. It's exciting, but also like, you know, I'm an engineer and a hacker, like I'm not a scientist, but I'm, you know, pretending to be a scientist. No, you know, I'm not really pretending, but as much as I respect, I respect the craft, it's also very exciting to have something that you don't really know much about because it's an opportunity to create knowledge.
[00:21:41] That's part of why this area is so exciting. have a job
[00:21:44] Biological basis of artificial intelligence
[00:21:44] We're moving on to, uh, understanding the development of artificial intelligence advances, uh, using biological foundations. Uh Huh. So, in a sense, we are evolving quickly. With training. Yes. So in a way, just finding things naturally and putting some epoxy on them.
[00:22:02] Yes, it's intuitive to me. But I think it's counterintuitive to estimate how different artificial life might evolve differently
[00:22:12] Biological life. That. I, like Richard Dawkins, have a toy model called the Creature Morph. What, uh, no, I've never heard of that. Yes, I think it dates back to the eighties.
[00:22:25] So this is a very old fashioned functionality demo. But the idea is to think of them as little bugs in vector art. The parameters of how they're displayed depend on, you know, it's parameterized, right? So some have long antennae, some have a wider body, some have 10 legs, and some have 4.
[00:22:46] The basic approach is a genetic algorithm, where you take a subset of parameters and recombine them. As a user, you will see a three by three grid that you can click on as you see fit. So the fitness function, and then they combine again, and you get a new set of nine to nine, some of which are mutated.
[00:23:05] So the function of fitness is your perception of aesthetic beauty. It's environmental pressure. I think something like RLHF, where you have this task of learning preferences, is a little different than the following symbolic prediction of what synthetic life is and how our preferences are reflected there?
[00:23:23] I think it's a very interesting area. okay then a
[00:23:27] Train your own LL.M.
[00:23:27] Working with Dolly inspired many people. Apparently Databricks does just that. Partly for your kindness, but also for promoting the Databricks capabilities. Uh, companies with their data sets and companies that want to do something like, uh, how, what should they think
[00:23:43] You're working on this?
[00:23:44] Honestly, it's probably not about advertising our abilities. I mean, you know, we're practicing our capabilities, but I really think it helps everybody better in terms of how we can help define some of the actions that a reasonable team would take when creating technology like this. A good understanding of what needs to be done is to make it useful, not just fun.
[00:24:08] So, you know, one of the classic examples that we had in the original Dolly was writing a love letter, Ed Growlin Poe. That. And that was cool and whimsical. You know, I, I don't know if you remember the details, but that's just like me. You, but I can't stop thinking about you, you know, that's a very, very gothic, uh, quite, uh, mood in a letter like that that has nothing to do with a corporate context.
[00:24:39] Right. So, you know, that's really neat, but if I don't have to buy training data to write whimsical Gothic letters to Edgar and Poe, and if I can choose how to invest my token budget, it's going to be great for a lot of companies. Very useful. So, you know, one of those things. If I was, we're trying to get a little more clarity, we talked a little bit, like different tasks that require compression in a generalized way, you know, if you think about it, parameters like a compression language and also world knowledge.
[00:25:15] The question is, for a given model size, how many concise demos do you need to have to get a really usable, solid QA bot? So I thought that as I was building these solutions, I wanted to see how it was done. Some kind of fine-tuning of the behavior categories or datasets in the coaching setup has something to do with that behavior, and I think that's going to set the startup playbook in the enterprise that allows you to, uh, move with the economy.
[00:25:44] It's also about assessment. So one of the things we talked about before we started shooting was using the EleutherAI benchmark to evaluate, I think the rudder and you know, a bunch of other batteries to run your model through. But the first stats we looked at when we built the first version of Dolly are on our disconnect page, so you can check it out for yourself.
[00:26:08] Model GPT-J. The improved Dolly model has almost the same benchmark results, but the qualitative properties of the model are quite different. So I think there needs to be a better way to measure desired behavior, especially in these business contexts, is this a good summary? How can I determine this without asking anyone?
[00:26:37] Maybe it's kind of like a training reward bottle where you, you know, you kind of learn preferences and then you show, you know, you have an active approach to learning that you're one of those Worker replacement is very precarious, and that is a kind of man in a loop.
[00:26:52] Could this be a cat?
[00:26:54] I mean, it's possible. That, means this, that's not my area of expertise. You know, that's also something that we're trying to, uh, get a little deeper into the applicability of that string, like, I just wanted to post. Uh Huh. You know, I understand that it's kind of hard to put online and it requires a lot of tagging.
[00:27:14] So from an active learning standpoint, uh, my thinking is more like you have a reward model that you train and you say something like that based on the human judgment of my employees or some of the people that work with the crowd that I want to get to the summary or answers to the question in a closed form. Then you actually choose new examples to show to those who are close to the decision boundary and who are most likely to be confused.
[00:27:38] Like I'm really not sure, not things that are outside the decision line. And it's kind of like, I actually think there's still room for the old tricks in terms of future value creation, say 18 to 36 months. You know, not everything has to be generative AI to be very valuable and very useful.
[00:27:56] Maybe, maybe these zero attempt mods and tips will eat it all up. But the set of technologies will probably be worth, you know, you don't have to, you know, achieve room temperature fusion, you know, create value in the world, at least for another year and a half.
[00:28:20] You know, like
[00:28:21] You probably don't need a big model
[00:28:21] I'm just, I'm just spelling it out for people who are trying to get to the bottom of things. Maybe leave some crumbs. So of course. When you say technology, you don't just mean problems.
[00:28:29] Oh, I mean even like named entity recognition, yeah, like classic NLP stuff, you know, like supervised learning. I mean multi-class classification.
[00:28:37] I have a customer support ticket. I wonder if this is labeled as such. P zero. That's it, it's not a complicated problem to solve, but it's still valuable in these models that can deeply understand the nature of something without necessarily generating a language. But understand, I hope you see s, because, let's say, reasoning is very time-consuming right now.
[00:29:04] Hmmmm. You know, it's like, unless you're really serious, and I think so, one of the things that I'm excited about at Databricks is that our inference set is very, very fast. If you take a naive approach, it will be much faster than you. This leads to a very qualitative, like very different way you approach these models.
[00:29:22] If instead of 30 or 40 seconds to generate a sample, it takes 1800 milliseconds, you can do more research and better understand their behavior. You know, it's a very exciting thing. But if you need to spend your computer budget efficiently, there are thousands of possible things you can do, but you can really only do so much in a day.
[00:29:45] Ranking them using classic machine learning models is very valuable. And I, I hope you'll see an ecosystem of tools that aren't necessarily just agent-to-agent chat. Maybe I'm wrong. Like, I, I don't know. we'll see. Hello,
[00:29:59] Good LLM use cases
[00:29:59] Going back to the evolutionary point, I think people think that the generative AI part is the most branching part of the tree to explore.
[00:30:09] So that's what they're focusing on. But like you said, we'll probably stop at some point and say, oh. What we do is just as good. Let's tie everything together and enjoy using it instead of just trying to make this model do everything.
[00:30:22] Yes. Yes, there are some things that only generative models can do, like classification.
[00:30:28] I really think, I think one of the reasons we see so much enterprise value in Databricks is that you can say without provocation, given this customer service ticket, like, give me an overview of the top questions that are initiated in it. Then simply by changing that prefix, say, write a well-crafted response that addresses these issues in our company's tone and voice.
[00:30:53] Imagine having a model that adapts to the tone of your voice, you, uh, your support team. Both problems have historically taken six to eight weeks to build, as well as a reasonable machine learning team. Honestly, the real thing, the answer, I'm not sure you can do without generative techniques.
[00:31:13] Now your sales manager can do that. You know, the things that make me look silly in retrospect are just that. It's a few cheaper to do it with incentives. Maybe it's as if the price of inference is not trivial, but we just saved all this in time. I have no idea.
[00:31:33] Dolly is $30 on Databricks
[00:31:33] We'll see. It's me
[00:31:34] I'm always interested in, uh, more economics of these things. Uh, one of the main numbers you came up with for Dolly was the $30 training fee. That. How did you come up with that number? That. Much lower than you expected, let's go deeper
[00:31:50] Whatever you want. Well, think about it, so you know, we trained the original cart on 100s, so one of the cool things is that we're doing all of this on a Databricks cluster, right?
[00:32:00] So this on Databricks works out of the box, it turns out, you know, I think if you're doing your own full pregame, you might need a slightly different configuration, you know, a trillion tokens. You have to think about things like networking and clustering in the data center in a tighter way than with Spark clusters.
[00:32:23] But for distributed fine-tuning across multiple nodes, the Databricks stack works instantly. I'm glad I found it.
[00:32:32] You built the whole thing into a perfectly aligned architecture
[00:32:34] Time. That. You know, maybe, maybe it's not perfect, but it's pretty good. I think for the original Dolly it was just one node, so you could call up eight nodes, 100 machines, and I think the out-of-the-box price from the cloud provider was about $30.
[00:32:55] I think the actual number is probably less than $30. how tall are you? It took less than an hour to train this thing. That's 50, I mean, that's 50,000 llamas, 50,000 records. rectify.
[00:33:04] You opened the notebook so people can look
[00:33:07] I'll show the notes. have. I have no risk of making this up.
[00:33:11] Yes. No no no. I'm not, I am
[00:33:12] Don't say I know you don't. I just say I, I, I leave room for people to say hey, 30
[00:33:17] Dollar, it takes an hour. go on. Yes, this is crazy. And, it's like, I mean, you think, I, I, I, I'm sure you're not implying, but like, it's crazy that you can try.
[00:33:28] You know, if you have $30, you can put this thing on and train on a machine. I guess I'm talking about the idea that it's a phase transition. Amazingly, if you say, hey, given a corpus of millions of instruction pairs, you can.
[00:33:50] $10,000, which is still an order of magnitude less than the cost of training this thing, to get this behavior. I would say, yes, that sounds good. Like, yeah, if you have an afternoon, you can do this. That's not certain, I don't know if that's true.
[00:34:08] I think for the most part it's, you know, like a library, like deep-velocity, you know, deep-velocity is a library that gives you a lot of different options for things that don't fit in memory and helps increase the effective batch size, for example put the whole model on a GP on another GPU, then you have a local group of devices, then collect the gradients more or less aggregate those, those different devices to get an efficient grouping or sharing of actually different submodules of the model on the GPUs.
[00:34:43] Everything is available in the notebook, the model we train does not fit on any device. So you have to distribute the model over the training GPU, you know, it was an amazing time when the technology was free and open source, like the Microsoft team and you know, the face-hugging team made it easy.
[00:35:04] Doing something that really required a PhD two years ago. So the level of effort and CapEx were significantly lower than I expected. That.
[00:35:17] And you, you're sort of co-developing this because you also happen to be working on infrastructure optimization
[00:35:21] Tim. Yeah, I mean, it's kind of, uh, you know, it's actually a separate project for Databricks, and it's like making sure that we have a great user experience and that we have all the resources that our users need.
[00:35:37] You can push a button and get a computer, uh, get a Spark cluster. And I think when you look at a world where everyone is using GPUs on Databricks, you have to make sure that we're working as efficiently as possible to make Databricks a very cost-effective place to train and use these models.
[00:35:55] I think you have to deal with both. I think companies that do that effectively, well, they're going to create a lot of value in the market.
[00:36:06] Databricks open source
[00:36:06] Yes. You mentioned Spark, obviously Databricks, you know, Started, like the founder of Databricks created Spark. That. in Berkeley. And then, you know, starting with an open source project, you start thinking about enterprise use cases.
[00:36:18] Finally build the whole platform. Yes. You also have a lot of great open source projects like ML Flow, Delta Lakes. Yes. um, yes. things like that. How do you see it as a kind of stage of the ML operation. Yes. to correct. When you think about lm surgery, such as the need, of course you know. Some of these models are undoubtedly the spark of a new generation.
[00:36:39] What do you think would be required for infrastructure that you might consider building?
[00:36:44] Yeah, I mean, um, deal with this open source issue first. I mean, you know, Databricks has done a lot of things and released a lot of technology into the public domain, and a reasonable person might say you should.
[00:37:00] Treat it like an IP address that neither you nor anyone else owns. Again and again I think, this story is more, better, together we succeed. When you create a new class, people rush to fill it with ideas and use cases, and that's it, it's really powerful. This is both a good thing and good for the community.
[00:37:21] I think Dolly is a natural extension of that dynamic and I think it reflects our founders' tastes and beliefs about markets and technology
[00:37:31] LLMOps en prompt-tooling
[00:37:31] As for LM operations, it's not a phrase that rolls off the tongue. We will, we need something better than this. Let's go back to what thumbnails mean for text.
[00:37:43] Hmmmm. One thing my team does a lot now is pass examples like generated samples back and forth. okay Because like these benchmarks for evaluation, the behavior of interest is not captured. So we often have a reference battery with tips. Let's say 50 to 100.
[00:38:03] Write a love letter to Edgar and Poe. That. Give me a list of ins. For example, what is one of the things we consider? With this in mind when planning a backcountry backpacking trip, you can create a list of sensible backpacking recommendations. You see, when you move the model a little bit through the loss curve during instruction matching, that behavior comes out, and when you finally qualitatively evaluate, the model does a lot of different error models that I want based on these traces that I get to answer this or this model, this set data that matches the instructions produces shorter completions.
[00:38:40] This will generate. Crazy done, you know, this one could solve all those problems. I don't know if you've seen Nat Devrel. Uh Huh. I'm sure you must be thinking that I want to run inference in parallel on random queries and compare and contrast like these tools will do, especially with fast inference layers and that's where I think Databricks has a lot of opportunity for people Where value is created by serving, communicating and by measuring the model's behavior as it changes over time, not just by subjecting it to volume.
[00:39:19] benchmarks, but also qualitative subjective benchmarks plus human feedback in the loop, imagine I burn a model checkpoint, every thousand steps I send it to the comments team and get a hundred human feedback results. How much human feedback does it seem to need to achieve statistical significance?
[00:39:43] But I think so. The whole, you know, each of them is like a different perspective on the behavior of the model. Quantitative, qualitative, and then human, uh, level feedback. Someone will make a product that does these things well, in a user-friendly form. And it's fast, and, uh, meets the specific needs of AI developers.
[00:40:04] I think this company is going to be very successful, I hope it becomes Databricks. Oh good.
[00:40:10] Make fun of what you could be
[00:40:11] Construction. interesting. You know, it's not about making forward-looking statements, it's about being like a logical person, do you want to do that? Uh Huh. I need Yes.
[00:40:19] Yes. I need Yes. I happen to work for a company.
[00:40:21] Yes. So it just goes, uh, uh, this little bit, because I've been working on it for a while. safe. Have you ever encountered layers of advice? This will be one of the training tools. And then I think there's a bit of Human Loop, but yeah, that's not the point of their curriculum, is it?
[00:40:34]Quick low? That. I will, fine. He sent me glad he dropped that reference because he contacted me and I watched his demo video and yes, isn't this on the field? I think a lot of people take advantage of that. But the reason I haven't done anything in that regard is because I can do it in a spreadsheet.
[00:40:51] Like all you have to do is say yes.
[00:40:53] Spreadsheet features you can, but I mean like text editing and Google Sheets is hard. Is? That. Me, I mean hmmmm. What is missing? do you know? Oh, well, like the text editing experience there, it's like trying to wrap those cells. ok So now you have to double click to enter edit mode.
[00:41:12] I think they're struggling with big records. So, as spreadsheets get slower, what you really want is not, for example, this particular question about how Google Sheets doesn't fit the bill, which is a topic I certainly don't have. But it's like connecting to a persistent underlying data source.
[00:41:34] Because right now I have a bunch of spreadsheets that I'm managing, like the ones that exist on Google Drive, and there's some crap in there. Or is it on my local machine? should I send it? Like, if, can I lock records so I can't comment on them later? How can I collect multiple reviews from different people?
[00:41:50] How are summary statistics calculated for these estimates? Listen, I like the first one, it ignites a sublime passion. That. You know, keep it simple, right? That. confirm. I feel like the way I talk about it with my co-workers is like we're emailing each other. Copies of signed printouts from PDF and DocuSign don't exist yet, and no one realizes that they're doing it as a funny dance.
[00:42:16] I see. I also solved this using Google Sheets, I'm sure there might be something better. I have Stockholm syndrome.
[00:42:26] "I'm Sheet Maxi"
[00:42:26] So there's something else I want to highlight, uh, that's Quadra. Alright. Uh, full disclosure, one of my investments, but Google actually implemented Forms, implemented Web Assembly.
[00:42:35] Yes. And another, and another canvas. ok Speaks Python and sql. That. That. And, uh, there's Scala. That. Uh, well I, I thought, I thought, yeah, some people there are holding interesting hearings
[00:42:46] On that. What you can do is, for example, imagine that you have a Google Sheets-type user interface with the ability to select a column or a range and search for all of those values.
[00:42:59] Yes. Let's say, I have a template fill, I want, this is what I want. my question
[00:43:04] In most other SaaS projects, people tend to build user interfaces that prevent free experimentation. That. I'm the sheets, uh, maxi. If I could do this on a list, I would
[00:43:16] In the newspaper, you know? That. Well, I mean go ahead and, like in the leaves, dig into that vein, you know, something like how artificial intelligence affects the workplace and human productivity?
[00:43:29] I think as a person I really like the analogy that compares AI technology to the development, the advent of spreadsheets in the 1980s, the idea is that you have a lot of educated professionals, serious people working in serious accounting and finance doing manual calculations they consider it their basic activity.
[00:43:53] Value is projected on paper and that's how I create value for the company. I think spreadsheets came into play. There are many doubts, such as: What should I do? That. Make the most of your days? It turns out, as I sometimes think, like being in a hot shower and not noticing how good the water is until you wiggle your toes a little.
[00:44:14] Somehow you get used to your surroundings, you stop noticing what stands out.
[00:44:19] AI and workplace productivity
[00:44:19] So in terms of how AI technology will affect productivity in the workplace, I think you have a good analogy to compare it to a spreadsheet and an Adventist spreadsheet. In the 1980s, I think you had a lot of very serious people who were seriously trying to be as productive and efficient as possible in their lives, who weren't wasting time personally.
[00:44:42] Seeing the advent of spreadsheet technology, it was like, man, what do I do? I'm someone who counts things. As if I write everything down, that's how I create value. And then it's like you start using this new tool and oh, it turns out that Ted is the most tedious and thankless part of my job.
[00:44:58] I'm like, you know, like me, I still have that human urge to create. You are only directing it to more pressing and important issues. And I think probably not, especially, even in terms of writing, it feels like a typically human and creative act, you have to write wording a lot.
[00:45:22] Oh yeah. It seems like I shouldn't spend time on all these templates. And, you know, one question, should we spend time reading templates? If so, why is there such a template? But I, I think people are very resourceful and very reasonable in how to be more efficient.
[00:45:43] And, you know, I think that will free us up to do more useful things with our time. i mean now
[00:45:50] There's still some stigma, you know, you use the mm-hmm model. Generate some text. But I made an open source like email editor. That. So I get GPT 4 pre-draft replies to all my emails.
[00:46:04] And a lot more that I just sent, but for now I'm still pretending it's me.
[00:46:07] Okay. so that's why I'm talking to you
[00:46:09] When I talk to you, you have to perfect it. rectify.
[00:46:12] But in the future it might become acceptable to say, hey, we don't have to spend this time, you talk to me. That. Like, make the police pay and come back and tell us this
[00:46:22] is what you're going to do next.
[00:46:23] Think about your preferences and then I'll think if reliability is part of it as an illusion, is T b D really an attractive question or do you need other capabilities like a grounded approach etc. It's an illusion, it's just creativity , like, we'll see.
[00:46:42] But, well, I think eventually we'll get to a point where we can trust these things to work on our behalf. That kind of scenario, like, the agenda, or just, you know, even working out the details of the contract, like, let me tell you exactly what I want, and you make sure you represent my interests faithfully.
[00:47:00] It will be very powerful.
[00:47:02] Open assistant
[00:47:02] So we're not going through you, but, uh, I think you have a lot of opinions on projects there, uh uh uh. Three on me. First off, you mentioned Open Assistant 2, cereus, GB T also came out around the same time frame. I'm not sure if you want to comment, I'll compare because they have a similar premise to yours, and there's also the three red pajamas that just came out this morning.
[00:47:24] Yes. We could also hear your thoughts. So yeah, if you want to pick one, is it the first one? Well, open the assistant.
[00:47:30] Yes. So I think Open Assistant is great. I like what they are doing. I would love to use their free open data set to improve the quality of Dolly 3.
[00:47:41] Yeah, but as we've seen, training is, so Cerus is a good example, you know, I think they're, I understand, I don't know that team or really, you know, I haven't looked closely at the technology, but I worked on the model as it demonstrates their ability to scale models across multiple boards on this unique chip they designed.
[00:48:04] But I think if you look at some benchmarks, it's similar to some Ethe I models, or maybe a little worse. And I think one of the things you can see here is that the market for the base model and the importance of having your own base model is not that big.
[00:48:27] I like to eat a little. Kern is training these people, and I'm thinking about these kinds of stem cells, you know, a stem cell is a fragment, a cell that can start to look more like its environment. After exposure to eye tissue or kidney tissue, it can differentiate into anything. These base models are more or less prototypes, then fine-tuned to become the specific proxy you're looking for.
[00:48:53] Well, I think it's expensive to train them. They need a lot of time to train. Even with thousands of GPUs, I think you'd still need a month to get some of these very large models working properly, and that's assuming everything goes well. So what does Open Assistant do. I think it represents the next phase, like open datasets, and that's what the Dolly release is about, I think of it as an upgrade to video games.
[00:49:21] I don't play a lot of video games, but I do, and I'm familiar with concepts like your character being able to double jump now. Uh Huh. rectify. Astonishing. You know, it's like here's a data set that allows it to talk to you. Good. This is a data set that can answer questions about passes in a vector index.
[00:49:38] I think everybody's listening and I think there's a really good opportunity to create a lot of value for people with this unsexy work exercise, just write it down and figure out how to get to this measurement method. Some of them look like semi-synthetic approaches, so I'd like to see something in the Dolly dataset.
[00:49:58] Recounting all the clues. So now you basically have multiple different ways to say the same thing and have a complement of correct answers for different question variants. I thought this would work normally, it's some kind of image enlargement. I would say turn things around.
[00:50:13] Yes. That. I think it will be about the language. Like one of those things you can do. Because we saw datasets translated into Spanish and Japanese within 24 hours. Dolly data set. Yeah, I mean, you know, it could be, yeah. That. rectify. That. Well, that's really cool. Well, there are also some things that are only possible with open data.
[00:50:31] Well, it's only good for open data, but I was thinking last night, I wonder if you could explain that, because I don't know what's the best and the most up-to-date or the state-of-the-art -artistic models of explanation are What. You can use Google Translate and take advice. Translate it into Spanish and then translate it back into English and you'll be saying the same thing in a slightly different way.
[00:50:54] Ah, yes. So I think the self-study docs are actually some tips to get more cues, then do it with a large model, and then use a human annotator to evaluate or train the reward model. I think a bootstrap loop behind these open datasets will create multi-million dollar training corpora.
[00:51:14] So what Open Assistant does is a great model. I don't know if you've tried their interactive chat, but it's a pretty impressive feat. But, you know the open data position on behalf of the Dolly dataset and the Open Assistant dataset, I think that's probably going to define the next six to nine months.
[00:51:35] Red pajamas
[00:51:35] I work in this room. Well, then there's the red, red pajamas. Red Pajamas, I mean, yes, as I said, you can read LLaMA's work well. There's a section with datasets, I think they use seven different datasets, archives, I think maybe Stack exchange and shared indexing.
[00:51:50] Okay. So they have a common chill.
[00:51:52] Yes. C4, which is normal indexing, but a filtered subset. Yes. Uh, GitHub archives books. Wikipedia Stack Exchange.
[00:51:59] Yes. So, you know, when you read the lLLaMA paper, take Common Crawl as an example. So in LLAMA operation, I think a typical search is 3 TB. It's not something you download from, because it is, you have to generate this data set, or at least the CC-net that they specify there, um, implement.
[00:52:18] You devoted a paragraph in this research paper to how they came up with the Common Crawl and pretty much deduplicated it. They train a model to predict whether something is likely to be a link, ie. reference link on Wikipedia. And many other things. Not just one, where do you get the model to predict if something is a reference link on Wikipedia when you train it, and then where is your intersection?
[00:52:41] You know, now you have a kind of precise recall, like those decisions have a big impact on the quality and characteristics of the model you're learning. But just from a scaling perspective, like building Common Crawl locally, you need to keep a non-trivial distributed system.
[00:52:59] Well I think it's Red Pajamas, I think it's an obscure lab study by Mila and Chris Ray, or at least it's related, I think being together is kind of a leadership. There are great teams behind this, so I have no reason to believe they didn't make it. Hard, hard work.
[00:53:21] Yes. Now, if you want to do llama replicas in public, this is an important part of the ascension. I think it's a natural next step. I would be surprised if there are no trains on the road now. Everyone agrees. LLLaMA is very, very strong. We also agree that it's pretty high to not publicly encourage someone to spend a few million dollars to produce it and then be the team that opened up the architecture.
[00:53:50] Hmmmm. Well, I think the next one, you know, you're looking for a similar forecast. I think we are at most five months away from an open LLaMA clone that is as good as the target being made. If not, I would be very disappointed.
[00:54:07] Why choose Dolly > OpenAI GPT
[00:54:07] I think there's a big difference between open and open: open in a commercially useful way.
[00:54:13] Yes. After that, I knew about Dolly II's post, you said you and Dolly had a lot of inbound. That. 1.0. But many companies cannot use it. That. Because of where the training data comes from. That. What use cases do people have? It's similar to having a conversation with your data.
[00:54:30] Are there other things that people might not think to do with it?
[00:54:34] Yeah, I think we have some customers that have designed very specific use cases for the purchase support solution. One of the things that many companies are discovering is that AI models are very powerful, and Databricks wants to be a company that can use the right tool for the job.
[00:54:55] For example, if you have information from the public web, let's say you have forum posts, yes, you need to synthesize and process, it's not sensitive information. You should really be able to use either model. This can be a refined model that focuses on your problem. It can be a general model that follows instructions and regardless of the intelligence of GPT4, it is very powerful.
[00:55:20] You should be able to use these tools. There are definitely use cases for this in the enterprise, I'm just not interested in sharing this IP address. You know, these are essentially our state secrets. Or from a regulatory and compliance standpoint. I just can't send this data to a third party subprocess or something.
[00:55:38] As mundane as it is, I really don't want to bid on it. You know, it's like, uh, I have some reason to keep it in the house. A lot of these use cases, you know, I'm not a lawyer so I'm not going to speculate on the actual licensing considerations or the kind of actual obligations, but it's like people like to be able to move around with confidence and what we ended up with Dolly, is for Let's be clear.
[00:56:09] The model and dataset are licensed for commercial use. You can build your business on this. I think that's a big reason why the reaction has been so positive.
[00:56:19] Open source licenses for AI models
[00:56:19] Hugging has, uh, a license to monitor the railroad, um ha huh. AI license, not yet recognized as open source. So that's the whole stable expansion issue, it's just unclear because it's a brand new license that's unproven.
[00:56:32] But I just find it interesting that the current open source licensing regime revolves around code. Now, you know, the value is passed from the code to the wait.
[00:56:43] Yes. I think we could have a three hour argument about the open source program and who decides what the open source license is.
[00:56:51] But I think there's a way, hey, we know what commercial use is. It's good for that. Yes good. You don't have to worry about us suing you. It's like, you know, the semantics of it. Clear is always better. exactly Just like we don't need to have PWD approval. That.
[00:57:07] You'll be fine. Now
[00:57:09] Why open source models?
[00:57:09] Keep talking about why open source? That. I think that, like many eyes, all creatures are superficial. I think the reality is that we don't know what the challenges are for artificial intelligence systems. Uh Huh. And we have the potential to get representative and comprehensive solutions to the challenges they pose by publishing them and launching research so that those dealing with the hard problems of moral bias, artificial intelligence, security, protection, etc., can carefully observe how the real thing is built, how it works, and study it extensively, instead of, hey, we have a team for this.
[00:57:50] You go hmm. We just need you to believe in our work. I guess I want to be the future of the past, not me, I hope people get that right. I want someone to download this.
[00:58:05] Moving the model
[00:58:05] When people
[00:58:06] What do you think about the model change?
[00:58:10] You know, we certainly talked about how the data set determines how the model behaves. Good. Obviously, some people may be new to Open AI and want to try the cart now. That. For example, what kind of infrastructure would need to be built to allow people to move characters from one model to another?
[00:58:26] Please find out how it works.
[00:58:28] It's really interesting. Well, because you see, even if you switch between GPT3.5 and GPT4, this behavior, like some things that are not possible on three-five are not, I mean, many, many things that are not possible on three-five are not possible on 4, but you want a slightly different wording of the question, e.g. a slightly different quick text or .
[00:58:51] It's like if you want to regression test a prompt, you can think of it as an automated system that helps design the prompt so that the output of this new model is the same as the output of the previous model model. Kind of like using a language model to repeat advice. So it just evolves it to fit the new model.
[00:59:13] I have two beautiful boys who are amazing people and my friend Ben and I made a tell your adventure interactive book for them that uses ChatGPT to generate the story and then the stories are selected and then illustrated using Dolly, Open AI generative image model.
[00:59:36] These options. Children can then choose their own path through these stories. If you're really going to push these things beyond just a quick one-turn response, which you're really like, I'm, you know, it's fine if it's a language and you really need it to be like an API.
[00:59:52] 19 is timeout, 20 is like pi, then 20th generation. It's just a completely different format. He just likes, you really like to ask very seriously in the system. I just want you to give me three options. That. The letters A, B, C, you know, I mean from a regression testing point of view, how do you know, if I run this prompt a hundred times, a hundred from hun, it comes from a hundred I need a hundred formatting and sorting symbols?
[01:00:21] It's not something that one person can do effectively, so I think you need some kind of metamodel of the model to evaluate the results and manage these migrations. Uh Huh. Yes I did, it's fun. product category. I didn't think much of it. That.
[01:00:34] Learning in simulation
[01:00:34] When you mentioned this example before, you know, the country trip, I thought, yeah, it would be great if you had one, like an analog, well, that's your list.
[01:00:44] Now that I have the game, I put the character in that inventory to see if I can survive in the void. Because you might like, you know, the first time I went camping in Yellowstone, I forgot to pack my tent like a fly and apparently it was raining. That's because, you know, you're going to get punished
[01:00:59] Yes. It's an environment that gives you a gradient. just Update your model eight. You should be happy to have such an excellent yes. mini
[01:01:06] These models love it, the missing piece of evolution is like these models can't. to die. I can't break my arms. They can't, if they make a suggestion, because they don't actually have one.
[01:01:16] Influence on them. Um, so I'm really interested, if in the future, you know, well, you want to write a poem, you know, I like poetry. Now we will send these structured people. That. If you get rejected, it makes your model
[01:01:28] Why role model reflection and self-criticism work
[01:01:28] Die. So I think one of the cool things about the Lang chain, for example, is we all know they've done a good job and made useful tools, but these models can see if they're wrong.
[01:01:38] So, for example, you can ask the model to generate statements. And it may not capture the features used to predict the next token loss. You can hallucinate, you can make things up, but you can show that same model to that generation. and ask him to tell you if it is correct. It can, it can recognize that it's not, I think it's a direct function of the weight of attention, you can focus on the whole.
[01:02:03] And for the next marker prediction, I can only see the prefixes, and I'm just ironically trying to pick. to correct. You often make it look like it's a weighted sampling distribution over that softmax output vector, which it isn't. Fact reference, but if you resubmit the model and like it, this is the entire generated paragraph, rate it as a whole.
[01:02:25] Well, I can process all the tokens at once, it's just an easier problem to solve. So I thought, oh, this is a great insight. That. That. I think that. Only, this is a reflection. That. You, you only see what you say, and the model has enough information to judge it.
[01:02:41] It's like admitting your plan hmmm. environment and see how it behaves. I think you can ask the model, I think we can try this today. This is my itinerary. criticize it. Uh Huh. exactly? Like, what could possibly go wrong with this inventory? I think there is a case where these kinds of techniques have a trajectory, like reflexive models that are not superlinear.
[01:03:10] You're not getting anything beyond what's already in the model, you're just kind of saturated, like, well, you need human feedback. Another scenario is the alpha go scenario, where models can play the game themselves and observe their behavior and interactions. Become stronger, better, more capable.
[01:03:31] This is a more interesting scenario, the idea is like if I consider the whole generated sample, I have more insight than just sampling the next token. Uh Huh. Advice that it is possible. In terms of a super, you know, non-saturating mass payout, there's potential for an escape.
[01:03:51] Yeah that's great Mike, we've been here once, maybe next time we can jump to the landing pad.
[01:03:55] We will read the question again. ok if you want to think about it. Alright. favorite artificial intelligence
[01:04:00]Article? This is a boring answer, but it's true. Google map. Ah. And what about AI A, they are working with Nerf recently, so you can use Yeah. A few different photos. You can research within the company.
[01:04:15] There's no doubt that they also, I mean, I don't know that the team at Google is doing this, but it's digesting the sum of human knowledge about each entity in their graph, like language processing and bringing what's judged this company? Look, it's not a product of artificial intelligence, but it's definitely a product of machine learning, and it's great too.
[01:04:37] You forgot how much you spent. I'm in the cafe around the corner. I use it to figure out where to go. It's actually walking 150 meters, you know, it's so reflexive, but it also comes from visualizing information. Well, I like cards. Uh Huh. I started our conversation by saying that I think a lot about maps, multi-scale adaptability, and the correlation and refinement of the information displayed.
[01:05:08] It takes away your intention. do you drive OK, you better show me the garage. So it fits in very well in a subtle way that we don't notice. I think great product design equals good assembly. If it's true, you won't notice it. Uh Huh. So I think Google Maps is an amazing AI jr.
[01:05:28] Product performance. Google map. That. This is an excellent option. Astonishing. Well, they need help. That.
[01:05:36] This is actually the best advertising, real estate, right? For example, many people should buy ads, especially on Google Maps. That. That's how they left and I, I don't know how big that company was, but it must have been huge.
[01:05:45] Yes. And then my next point is that there should be a Google Maps optimization where you can call your business the best barbershop, and when you look at it, it will show up as the best barbershop. That,
[01:05:55]Of course. exactly? That. It's like AAA locks. That. At the top of the Yellow Pages.
[01:06:01] Want to call out AI people and the community?
[01:06:03] You know, I don't think I necessarily have anything super original to say about it. Well, as far as I know, it's a purely volunteer effort, you know, what they've accomplished is incredible. It's like you're in a series of projects.
[01:06:20] You know, on top of that, I think you're going to say and answer that when you're answering this question, I think hugging a quilt is kind of like google maps, it seems like you forget how complicated it is to do, I think yes they are. See, like certain people, I mean STA's STA's, they do a lot of deep fast stuff, they're very serious, they like to be involved in the community, so the whole team at Hugging Face is amazing, you know, they've done it. There are a lot of things happening throughout the industry.
[01:06:53] So, uh, I mean, yeah, it's like the power of open source Ultimate Transformers, libraries, diffusers, everything. miraculously. It was great, it was a wonderful experience with the product.
[01:07:03] I think a lot of people, like me, explained that Hug Face is free, get LFS hosting. And I think they, uh, moved on, inside
[01:07:11] The past few years.
[01:07:11] Yes. little. That. This is very powerful work. That.
[01:07:14] Yes. What surprises people the most about artificial intelligence a year from now? you have
[01:07:19] In? Well, yes, but I guess not, and I guess that wouldn't be surprising since we're on a ballistic trajectory for some kind of open ILLaMA replication. So I don't think we're going to happen with some things, like social, we don't have a lot of precedent to deal with, so this ghostwritten record just came out this weekend by Kanye West.
[01:07:40] Hmmmm. AI collaboration. Any ideas, Drake? That. his thoughts. That's not true, Dave has ideas. It's not true, I like a different kind of hip-hop, but it is. For a class example, this sounds like something I might hear on the radio. So there's the world, so the flag that's jumping is this graph of knowledge that's built from communication in your workplace.
[01:08:02] Think about how many times you express your views and intentions about a certain topic in communication in the workplace or on the Internet in general. I think character AI is going in that direction, where you can talk to high-fidelity avatars that represent the beliefs and intentions of those around you, which will be both useful and compelling.
[01:08:27] I don't know if there is a good model for society to adapt to the current situation, I think it will, I think it's just based on people's behavior. In the beginning, it went quite fast.
[01:08:41] Look, you can definitely say that he is very good. Uh Huh. I'm really curious to see what the long-term results will be, because once or twice you hear it, you know it's not a cohesive song.
[01:08:55] But it was really the funniest thing to me, so Drake and Weekend never sang a song together again because they had something to do with Weekend in One Come in, he said, replace me if you force me. Because Drake actually implied that he would never have been popular if he hadn't put the weekend on his album.
[01:09:13] Okay. So it's interesting that there is now an artificial intelligence generated weekend number. It just puts, you know, if you force me, replace me in a different context. But I think it's going to be really interesting for the labels, you know, because a lot of them do masters for most of the music they do, yeah.
[01:09:31] Lots of rides. So at some point it's easier to make music that way than to do it yourself. But I still think you need an artist's touch.
[01:09:39] Like what's unique, what, you know. I think artists often, you know, I, I know in my writing and in some kind of creative process, sometimes you feel like you're just going through the motions.
[01:09:50] Funny how we have a way of talking about a sentence that rolls off the tongue. This is very similar to the causal language model. Uh Huh. Where we are talking about the conversation track. I have a whole story where, you know, you're talking to the founder of a startup and you're like, oh my god, how many times have you said that, it's very close, sometimes the difference of three minutes is very small.
[01:10:10] Very good. That. Yes good. Only, that's what we're going to do. Wait this idea, like some creative actions that we consider maybe not very creative actions and things like that, is there a PR, is there a market pressure to support something that is really creative and not as formulaic as they are? Repeat the same bit?
[01:10:29] I love art. Crossing boundaries is often the funniest art and opens you up to things you never thought possible before. I hope people are playing there. They're like, oh, I just need some lo-fi beats to learn. Like, give me an infinite flow.
[01:10:49] I'm fine. because i like it
[01:10:52] Have you seen this chart like pop songs songs big changes drop interns big changes
[01:10:58] Octa range. completely. completely. As, I think, we used to have
[01:11:02] Bohemian Rhapsody i
[01:11:03] Yeah, that's a great example of something not properly established.
[01:11:08] So I think the name Confusion AI is really good, because we want more confusion in our lives. Yes btw, shout out for a copy of ai. I don't know if you have come across them, but certainly. They are working on digital twins. ok hey im applying for startups. If anyone, you would pay for AI
[01:11:21] he built.
[01:11:22] Well, LM action is safe. Like facilitating the generation and evaluation of patterns using multimodality, and by multimodality I mean multimodality, not images and text, but human, quantitative benchmarks and qualitative oh, patterns that I can evaluate myself, but another AI Startup. I think we have your sister, your wife, your wife's relatives that work in the park system.
[01:11:49] Hmmmm. Because almost everyone has access to practically the same information about nature. I think you can get to a lot of trails and you have very, very tight parking spaces and it's hard to get to a lot of these nice spots. Like, uh, Woods is another similar example where you have to reserve a parking space in the woods, which is a plumber.
[01:12:12] But I think America in particular is so unique because we have such a large amount of public land, and I think there are so many really magnificent and beautiful places in the world that are not documented. So I think from a geospatial perspective you can imagine the development of representing each tile on the map as a word.
[01:12:39] Embedding where you can look at the context in which the place exists and what people are saying about it and then you can distill the essence of the place and then you can make a statement about what I want to be more It runs through my daily traffic evenly. On the surface of the planet, we are not all competing for the same fixed supply of resources.
[01:13:03] I don't know if this is really going to be used to make money, like, you know, this is going to be the next 10 billion business. But because there's so much public land and so many trails, like my days of, you know, scrambling down a dirt road, my brother was the best time of my life.
[01:13:22] And, uh, I want more. I want systems that help us live as fully as possible as humans. right here
[01:13:29] A lot, you know, you know. The most popular route. Everyone wants to be there. That. Then there are some lesser known ones. And I think a lot of people send messages backwards like they don't know what they're going to find, you know?
[01:13:41] Hmmmm. There are no comments on all these prints like on YouTube. Absolutely. But as you can see. So I thought a way to better understand this would be great.
[01:13:49] I mean, talk a little bit more about this and we'll wrap up, like I think there's artificial intelligence technology like crowd management.
[01:13:59] Tools, you know, can look at sensor and camera input from many different agents in the system. I think the ultra low power glider is an example of something I would like to get, I think there are tools now that you can get a satellite today for $180 to take a picture of a five by five kilometer area like this.
[01:14:21] I just want to be able to run a scout fleet in remote areas and get the latest track conditions. I don't know if anyone is making money from it, but if there is, I'll use it. So I might have to build one. Right, that's it. open source. This is part of Databrick's long-term commitment to open source to diversify into new markets.
[01:14:44] Great. mike, that's, that's great
[01:14:45] I have you. Oh, that's one, yes.
This is an open lot. If you want to discuss this with other subscribers or access the bonus episode, please visitwww.latent.prostor