Chrome's new LLM AI API OMG 2024-08-30 Chrome is experimenting with exposing an LLM to the web platform. Jake and Surma dig into how the API works, and whether something like this could work on the open web.
Resources:
Transcript Surma: I am working towards getting British citizenship, which is an interesting process. Jake: Ooh. Is this content? Because, like, well, again, I'm... Well, no, it should be content, Surma: Yes! Or should it not be? Jake: but we haven't done an intro. Look, I really do think we need to make the intros more professional. Surma: Oh. Surma: And I am Surma. Jake: Like, okay, let me give it a try. Let me give it a try. Okay. Welcome to another episode Jake: of OTMT. I'm Jake. We're going to be talking about some bits about AI and Chrome in a bit, Surma: Existing. Yeah, it's nice. Jake: but first we should thank Shopify for letting us record this and for hosting it and for Jake: paying for it and just, you know, it's just nice, isn't it? It's just nice. Yeah. They Jake: don't ask us to do that prompt. They didn't ask for any of this, did they? So, which is Surma: They also didn't ask us to do the podcast. Jake: why we didn't say anything about it in the last episode. So, you know, this is just us Jake: being nice, really. They're being nice. We're being nice. We're all friends. What do you Surma: That was so good. Jake: think about that, Summer? Was that good? That'll get us into the top three in the next survey. Surma: Once the survey's out, we're gonna, like, this podcast is sponsored by Shopify and the Surma: State of React survey, which you can find in the description, should totally vote for us. Surma: Oh, wow, Jake, that is a really astute observation. This is like one of our Jake: How about this for a smooth segue? But before we talk about tech, Summer, I heard that you Jake: were trying to get citizenship. Oh God. Why would you ever do it? Oh no. It is. Yes. And Surma: shared pet peeves is people giving tech talks and pretending to have a dialogue on stage. Surma: Yes. Jake: it is. It happens very much in the big corporate spaces, your Google IOs or internal conferences. Jake: You know, they've planned it out. They know what they're going to say. They know what Jake: the other person's going to say. So, Terry, what would I do if I wanted enhanced image Surma: We all know they've rehearsed and scripted all of it. Jake: generation as part of my website? Well, I'm glad you asked that, Samantha. Well, look Jake: at this demo. Oh, I hate it so much. I cannot stand it. I'm in a bad mood already. Well, Surma: That reminded me a bit of that. Surma: But you're correct, Jake. I am working towards British citizenship. Jake: you can. Have you received your bunting yet? And having a podcast that is in English wasn't Surma: That's mostly the reaction I get, which is like, yeah, why? Surma: No, I think that will probably be mandatory. I'm still in the application process. I'm Surma: early on, but you need to, you know, get some proof of certain different things. Surma: Like, for example, that you are, in fact, capable of speaking English to a sufficient level. Jake: enough for them, I suppose. Very well enough English. Are you sure? Did you pass this? Surma: No, no, no. You have to get an official certificate for citizenship. You need to Surma: be at level B1 or above, which I was actually surprised by. B1 is speaking and hearing. Surma: There's no reading or writing in this test. And it's, you know, it's intermediate. Surma: I guess it is very well enough English to, you know, live in England and get along. Jake: Are you about to tell us you failed? It's a whole metaphor about the empire, I believe. Surma: That's the thing. I went to take that language test, not quite sure what to expect. Surma: And so, you know, it was already, it was a very dodgy location for some reason where, Surma: you know, like, it felt like a building that was on the brink of collapse and you're going Surma: down to the basement, you know, like kind of dimly lit. And then they have these. Surma: And they have these little, like, booths and old, I want to say molding, probably not molding, Surma: but they look a bit molding headphones that you have to put on to interact with the computer. Surma: But before that, they have to, you know, check who you are and that you're the person that you Surma: claim to be, because they obviously have to make sure you're not taking the test for someone else. Surma: And you literally have to, like, put everything from your pockets in your locker and turn your Surma: pockets inside out. And that metal detector, it actually was a bit disproportionate. Surma: But then once they have all this done and have confirmed your identity, Jake: Excuse me? Well, see, so that's why you can't have kids anymore, right? Surma: they kind of sit you down for a quick questionnaire. When you move to the UK, Surma: what are you here for? Because you'd also need this test for some other different things, Surma: like getting a new passport sometimes or if you just want to have a visa. Surma: And they asked me, so why are you here? And I was saying, well, I want to get neutralized. Surma: I didn't want to get neutered. I want to get neutralized. Jake: Oh, that's right. It's a different word. Surma: It both were wrong. And then she looked at me squinting and I was like, Jake: Oh, so you had to go to a building, I guess, to verify it. Surma: taking a step back, naturalized. I'm here because I want to get naturalized, which Surma: brilliant start to proving that I am, in fact, capable of speaking English. Surma: So that got my adrenaline up. Luckily, the entire test was on a computer Surma: with another person on the phone. But. Jake: Identity. But then you're just doing the whole thing on Zoom or Hangouts or whatever, right? Surma: They have really like strict supervision of the test environment, like you can't look over to Surma: another person, you have to have your headphones on, which honestly, these headphones aren't very Surma: good. And you have like three other people around you having similar conversations with their test Jake: So what's your next steps, then? You've got to go on Bake Off. Really? I've never seen Surma: person over the phone. It was all right. I passed one step closer. But I felt like Surma: if they had disqualified me in that very moment, I couldn't have held it against them. Surma: You know, you say that, but actually in the citizenship test, which I also have done and Surma: passed, some people have gotten questions about Baikov. Yeah, it's fascinating. Jake: a single episode or Doctor Who. Sorry. Oh, I do have the first six series of Red Dwarf Surma: Baikov, I have watched a couple I could have, you know, said Paul Hollywood, but Surma: that's probably where it ends. Jake: entirely in my head, though, so I think I would try and I would use that. Surma: Which I still haven't watched, you have recommended it to me multiple times, Jake: Yes, let's do that, because I want to talk about Chrome's new AI prompt API Surma: and I need to catch up on that. But instead of this, should we talk about the web? Surma: Oh, AI API, that's that's a strong acronym. Jake: and how that kind of fits into the web and web standards. Jake: And it means we can put AI in the episode title. It's so hot right now. Surma: So is it AI or is it LLMs? Jake: It's it's what everyone's talking about. Jake: It's LLMs, but, you know, come on, no one knows what that means. Surma: I was trying to think if we should both in the tweet, because then we'll get Surma: all the AI driven Twitter accounts to reply to us. Jake: Oh, yeah, we'll use LLM. We'll use AI. We'll put API in that. Do you know what? Surma: Yes, with some new Roman numerals that were invented. Jake: The episode title is going to look like a Roman numeral. Jake: So new letters in there. Yeah, absolutely. Surma: Yeah, tell me, like, this is basically riding the wave of giving people access to an LLM in, Surma: well, I guess that would be my first question. Is it a browser defined LLM? Can I bring my own? Surma: Like, how does it work? What does it look like, Jake? What does it look like? Jake: I'll say it first. It's behind a flag. It's very experimental. There are a lot of hoops Surma: Sounds like a great experience. Jake: you need to jump through to get to the end of the episode. Jake: I'll say it first. It's behind a flag. It's very experimental. Jake: There are a lot of hoops you need to jump through to get it working. Jake: Yeah, I tried to do it just the normal way, but I had to go and knock on the door of, Jake: you know, some of some of the old friends over at Chrome to actually figure out how Jake: to even get it working. So it's not it's not super simple. Jake: But they do say, like, this is even more experimental than their usual behind a flag Surma: Do you think that's good? I mean, I'm basically getting developers in earlier than usual. Jake: stuff, which is very experimental. Jake: No, this is a good thing. It should. And I support any browser to, like, Jake: just throw stuff in behind a flag, because if it's behind a flag, it's fine. Jake: You know my feelings about standardization. We talked about it in the last episode. Surma: Yeah. Surma: If that flag is disabled by default. Jake: But I think it should be anything behind a flag should could be added and removed at any time. Jake: If the flag is disabled by default, of course, of course. Jake: Right. So I'll go through the API briefly. This is the API that's in their explainer, Jake: which is a little bit different to the API that's in Chrome. Jake: We'll link to the explainer as well. Well, again, like, what tends to happen Surma: Good start. Jake: in these cases is some browser engineers will just kind of experiment with a thing. Jake: And then there'll be a sort of explainer standards effort spun up at the same time. Surma: All right. Jake: And that's usually the point where someone actually starts thinking about the API properly. Jake: So and then, you know, the API will get redesigned. Jake: And then there'll be a bit of a, you know, as tends to happen in the standards process, Jake: you know, the API will change in the spec or the explainer in this case, Jake: and it will change in the browser later. So the explainer is ahead of the spec, Jake: which is all fine. So there's a global window.ai or self.ai or globalist.ai, whatever. Surma: Right. That's it. Surma: How do we feel about that? Jake: Fine, whatever. Again, it's early days. Surma: I thought I thought we're trying to not do globals, because if it's like, you know, Jake: I think globals are fine. I just, again, I couldn't tell you why something would go in Surma: if it's something provided by the user agent, it should be in like Navigator or something. Surma: Okay. Jake: Navigator rather than, it's navigator.serviceworker. And I kind of wasn't part of that discussion so Jake: much. I don't know what the rules are. It's certainly not part of the document. Jake: So it shouldn't be document.ai. I think sticking stuff in Navigator was maybe something that Jake: shouldn't have happened to a service worker. But yeah, maybe nervousness around the global Jake: scope was a thing. Yeah. And in that namespace, there is .assistant. So you're, Surma: That is nice because they're not conflating AIs with conversational LLMs, they're creating Jake: which is another namespace. So ai.assistant. Jake: Yes. Surma: a whole namespace for AI stuff and creating a subnamespace for the specific chat-driven Surma: LLM stuff, which means later on, if you want to have image synthesis or anything similar, Surma: there is a namespace where this can now go. Surma: So actually, that's kind of nice. Jake: And it's assistant rather than LLM, because maybe in future, this same thing will be done Jake: with something more advanced than LLMs or whatever. Or you could have an implementation Surma: Yeah. Jake: just made with Markov chains, I suppose, right? It's not linking it to a particular technology. Jake: And then you've got the create method. So ai.assistant.create, you can give it options Jake: like temperature, top K, which if you've worked with LLMs before, the kind of familiar terms. Jake: Assistant prompt, again, familiar thing. Or initial prompts, which is you can give an example Jake: back and forth between the assistant and the user. And you can only do one of those two. Jake: And then, yeah, you get a promise for an object and you get like a prompt method on that object Jake: or prompt streaming. And you just, you give it some text and it gives you the reply. And that's Jake: pretty much it. You get clone and destroy methods and there's some other stuff, but that's, Jake: it's quite a small API. Surma: Yeah, because I played around with LLMs a little bit as well, and at start I was like, Jake: Yes. Not exactly that. I don't know if it's defined that it has to be token by token, Jake: or if it's just like, you know, some reasonable amount of streaming Jake: like you would with a network response. You get text back. Surma: I'm going to keep it easy, I just want the full response, but because you don't know Surma: that the LLM might just go, yeah, I'm going to give you like a really detailed answer Surma: for no reason whatsoever, and you suddenly wait 30 seconds until you actually get the Surma: full response, which is why streaming here I think is also very important, also because Jake: Yes. And that's actually one of the recent changes in the explainer that isn't in the Surma: you probably want to be able to cancel the response if the user says, actually this is Surma: going in the completely wrong direction, or I've seen the info that I need. Surma: Is there in a bot controller signal that you can pass in to the prompt streaming? Surma: Ah. Jake: browser yet, but yeah, it's there. I guess with the stream, you can just cancel the stream as well. Surma: Well, that's, I guess, then the next question, right? Jake: But for the more kind of promise-based one, you've got the bulk controller. Jake: So yeah, this is something you can use to create like autocomplete things, Jake: sort of proofread a document, summarize the thing, code generation, I guess, chatbots, Jake: translation maybe, although, you know, there are more specific models for that, Jake: but it's, you know, it's all kinds of stuff you could do with this. Surma: Like, what is the model? Surma: Is that up to the user agent? Surma: Can the user link to like a public model? Jake: It is down to the user agent and the user does not get to pick. Surma: Because, you know, different models are good at different things. Jake: So in Chrome right now, it is an on-device model. It's Gemini Nano. Surma: I mean, that's really interesting, right? Surma: Like the fact that this works offline, if it's on the device that you can use this Surma: offline. Surma: That is actually very appealing. Jake: Yes. But that does mean it has to download the model on first use and the API is shaped to allow Surma: Well, I mean, you know, that's normal size for React app nowadays. Jake: this. There's events and that kind of thing. It'll let you know if it's not going to be Jake: instantly ready. And the model is two gigabytes and it unzips to 25 gigabytes. Jake: Well, that's it, isn't it? It's like, it does, you know, when we were doing DevRel-y things, Jake: we were like, oh yeah, one megabyte is quite a lot for your homepage. And it's like, yeah, Surma: The question here would be, does it redownload it every time for every origin? Jake: going up to two gigabytes is, yeah, it's going to be fun to try and justify that. Jake: Yeah. Surma: Or is this where we're now breaking one of the longstanding pillars or something of Surma: the web where we said that we can't cache across origins because that's a timing attack, Surma: but maybe if it's a user agent provided LLM, is it okay to download it once and cache it Surma: for all sites to use? Jake: You are absolutely correct. Yeah, it is that. So it's a one download into Chrome and then it can Jake: be used on all sites. It's not a problem doing that because it's an internal thing. The reason Jake: we don't bleed the cache across origins is because it can be used for like cookie-like things. But Jake: here you don't get access to the bytes of the model. The cookies never influence the model Surma: I mean, you say it's not part of the experiment, but I guess in the end, it's really up to Jake: anyway, I think. So it's totally fine to do that in this case. And maybe there will be cases as Jake: well, like, you know, all phones are now like, our phone's got AI built in or something. So you Jake: might not need to download anyway, because it just came with your phone and it's updated via the OS. Jake: Although that's not part of this experiment, but you can see that, you can imagine that world. Surma: the user agent to decide, do I need to download something or can I just forward this or be Surma: an adapter somehow to whatever the operating system already provides? Surma: I could edge on Windows, just be an adapter to Cortana or, you know, I don't know exactly Jake: Yeah, exactly that. And we've seen that happen with other specs before. The shape detection API, Surma: how, but, you know, that could work. Surma: Yeah. Jake: which is only, it's not really fully implemented, but things like that just call out to the inbuilt Surma: No, and I guess, you know, like, I haven't used Gemini Nano directly myself, but I'm Jake: OS stuff. But there's nothing stopping implementers using a cloud service for this, which, okay, it Jake: wouldn't work offline, but otherwise it's not a crazy thing to do. Like push messages are already Jake: using a cloud service, right? Right. Surma: assuming that side of a model can only be so capable when, you know, LLAMA 3.1 just Surma: came out with its, what was it, 280 gigs or something? Surma: It could be like a progressive enhancement where when you're offline, it uses what is Surma: available offline, but if you're online and you're not on like a metered connection, it Surma: might actually use the model in the cloud just because it is more capable. Surma: That being said, I'm very much playing around with LLMs and stuff, so I couldn't be further Surma: from an expert or having good experience here, but I feel like the system prompts you write Surma: are very much tailored to the model you work with, and the same system prompt will give Surma: wildly different results when applied to a different model, especially when it's a different, Surma: you know, amount of parameters. Surma: It can only be smaller context windows or just like generally has been trained on a Surma: different set, and it's like LLMs by themselves are already hard to quote-unquote program Jake: I couldn't agree more. I think that's the big problem with this. You have to sort of engineer Surma: against because even with the same model, the same system prompt, the results are often Surma: varying and sometimes unreliable. Surma: Now switching out the model, it's like, oh yeah, the user agent will give you a model. Surma: How useful is that? Jake: your prompts and sometimes engineer the, yeah, not just a system prompt, but the way you ask Jake: particular questions and that is per model, right? And I don't think we'd get to a state where, Jake: well, if you get to request a particular model or sort of download your own, I think a lot of Jake: the benefit of this particular API is gone, right? Because then that two gigabytes is now Surma: And then you could just use WebGPU. Jake: per site or maybe, you know, different parts of the same site are using different versions. Surma: I know WebGPU actually doesn't expose some of the necessary types like the 8-bit floats Surma: and 4-bit integers that lots of the LLM file formats use, but I think they're working Jake: Well, there's also a WebNN spec and that is behind a flag in Chrome, which is kind of like Surma: on that. Surma: Oh. Jake: tailored towards that exact use case. So it would be able to use like neural CPU stuff rather than Jake: just relying on the GPU, but it can rely on the GPU. Yeah. We might touch on that a little bit Surma: Interesting. Jake: later as well, but yeah. So when we designed Service Worker, like we put a bit in the spec Jake: to say, you know, if there's too much work happening in the Service Worker and it's not Jake: linked to a particular response, you can terminate it, right? Because it's doing work when it shouldn't Surma: Yeah. Jake: be doing. And we just said like, you leave it up to the browser to decide what too long is, Jake: because that can change over time or different user agents can have different opinions. Jake: But when Safari implemented, they were just like, right, just tell us which one number you use Jake: and we'll do that number. Because like, if we don't, and if we pick another number, Jake: someone's going to file a bug that it's different to Chrome. Right. And I thought, okay, Jake: that's interesting and true because they were speaking from experience, but that's all that's Jake: going to happen here, right? Let's say Chrome ships this API and then later Safari does, Surma: Yeah. Jake: and they ship with a better model. Even though the model is better, it might not produce as good Jake: results because all of the usage out there is tweaked towards the particular Chrome one. Jake: As you say, they've worked on all the system prompts for this Gemini Nano model. And people Jake: are going to say, well, the Safari one's rubbish, isn't it? Even though it might be Jake: objectively better if it's given the right kind of prompts. So then maybe Safari feel pressure Surma: This might be, like, a very uninformed opinion, but the thing where LLMs for me suddenly became Jake: to use the same model as Chrome and blah, blah, blah. And it's not a great situation. Surma: really interesting and useful is when I played around – this is now, you know, maybe half Surma: a year ago or something – played around with ChatGPT 3.5 at the time for the first Surma: time and looked at the API and saw that they have functions. Surma: I think they're now called tools, where you basically, in addition to, you know, your Surma: system prompt or maybe previous messages, you also provide a JSON schema descriptions Surma: of functions that this agent, assistant, is allowed to invoke instead of providing a textual Surma: response. Jake: Or even worse, just guess. I have some historical data and so I'll just pick a number. Surma: So, you know, you could provide a function like getTemperature in location and it takes Surma: a parameter of a location and returns the temperature and now if the user asks, what's Surma: the temperature in Tokyo right now, it has been trained to not say, I'm a AI system Surma: that can't really tell you what's going on in the world, but rather go, hey, I have Surma: a – yeah, yeah, just hallucinate an answer – yeah, and then it would invoke that tool. Jake: Right. Surma: You would then, on your end, see, oh, it wants to invoke the tool, you invoke the tool, you Surma: supply the answer, and then it would use the return value from that function and generate Surma: a textual response with the actual data woven in. Surma: And in that sense, for me then, an LM became a English language to function call adapter Surma: and back. Jake: Yeah. So you could try and do that with this, right? But you would be kind of trying to feed Surma: Exactly. Surma: I tried doing that with some other models, like, which were not specifically trained Jake: that all into the system prompt. Surma: to support this and kind of, like, teach them through the system prompt to do that. Surma: I wonder if that would narrow the scope of this API more, like, this is an assistant Surma: that can, you know, turn English into function calls and then provide a textual response. Surma: Or you can decide to not let it generate a textual response and just function calling Surma: is enough. Surma: That's how I built a to-do list manager where basically I can say, hey, add milk to Surma: my shopping list and because it has a function that says add an item to a shopping list, Surma: it calls that function and I don't really invoke it anymore. Surma: It could then, you know, if I did, it would say, I have just added milk to your shopping Surma: list. Surma: Here is now your new full shopping list, but I don't care about that, so I don't even Surma: invoke it afterwards anymore. Surma: Again, very naive opinion, but I wonder if this kind of pattern would be more stable Surma: across different language models that different user agents could provide their own implementation Surma: for. Jake: Yes. Surma: And I would think it becomes more useful to building apps as well, because I think what Surma: most people are looking for is that where, you know, you build your to-do list app and Surma: you will be able to quickly say into your microphone, remind me tomorrow to do X, or Surma: you want to say into your email app, send an email to my mom and that should really Surma: just take an action and you don't want to have it, you know, respond in English and Surma: now you have to programmatically analyze the English to figure out what you want to do. Surma: So I feel like that's why I was wondering about this API, because if it's all just, Jake: Yeah. This could be added to the API, right? There's an options object there, which would be Surma: you know, chat, how do you integrate that meaningfully into your app? Surma: I guess, you know, you put stuff in the system prompt so it can ask questions, but that seems Surma: only very limited in the use case that will reliably work. Jake: a very obvious place to put these kinds of functions. But yeah, it would limit the kind Jake: of models that could be used on the other side. And maybe there's a, is that future proof as a Jake: surface, but sure. It could be done. I think some of this difference between the models, Jake: they do acknowledge it in the explainer. They've got this bit that says, we do not intend to Jake: provide guarantees of language model quality, stability, or interoperability between browsers. Jake: In particular, we cannot guarantee that the models exposed by these APIs are particularly Jake: good at any given use case. These are left as quality of implementation issues, Jake: similar to the shape detection API, which is why that API was in my head earlier. Jake: And I get it, but I don't think that that is equivalent because yes, one shape detector may Jake: be better than another, but there's a right answer, you know? Oh, there's a very narrow Jake: set of things that can be considered a right answer. It's like, scan that barcode. You know, Jake: where are the faces in this photo? It can be measured objectively. Like even the translation Jake: API, although there's a sort of wider set of correct answers, and it's a little bit more Jake: subjective, it's still a fairly small range of what's correct. Whereas in this case, if the task Jake: is to write a poem about a cat that lives on Mars, right? The range of correct and good is wide open. Surma: I mean, there are benchmarks that all the different models use to compare each model Jake: I will tell you that Gemini Nano is very bad at writing a poem about a cat on Mars, Jake: but that's besides the point. Surma: against each other model. Surma: And maybe that's one way to do it. Surma: But I tend to agree that it seems very hard here to provide a somewhat consistent user Jake: Yeah. Surma: experience. Surma: I mean, face detection is kind of like, you know, the goal is kind of clear, you know, Surma: there's a picture, it should find all the faces and you can easily like Safari struggled Surma: here, Chrome didn't. Surma: File a bug in Safari UI, isn't it seeing this face? Surma: While with the other one, it's like, what is the process? Surma: Are we all, are all browsers going to start collecting failing system prompt, prompt response Surma: tuples and add them to their training set and continually upgrade? Surma: It just doesn't seem quite the same. Surma: In general, like, as I said, like I already just like working with one well-known, well-defined Surma: LLM, even there, I find it unreliable. Surma: I always compare it to like, it's like a programming language where any statement has only a certain Surma: percentage of actually being executed and nobody would choose that language to build Surma: critical infrastructure, right? Surma: And so it feels a bit like I would not want to put anything critical in there. Surma: And I guess I'm not saying that this should be a critical part of your web app, but I Jake: Hmm. Absolutely. Surma: would not know how to write a system prompt that gives a reliable user experience if I Surma: don't even know which model I'm building against. Surma: And then it's like, are we going back to user agent sniffing? Jake: Exactly. And I think you would want the model to identify itself, which now is quite a fingerprinting Jake: vector, especially if it might differ by location, which maybe it would. I don't know if it would Jake: or not. You know, you can imagine a case where in China you get a different model because of Surma: I mean, that's already the case, right? Jake: particular rules. You know, it's... Surma: Like, Meta has released a different version for LLAMA 3.1 in the EU than anywhere else. Jake: Well, okay. There you go. I think that is one of the biggest problems with this, Surma: Oh boy. Jake: but there's another problem and that is terms and conditions. Gemini has a use policy, Jake: which says what you're allowed and not allowed to do with it. And we'll link to it, Jake: but there's a lot of sensible stuff in there. Like, don't use it to generate hate speech. Jake: Okay. Which, fine, but I think it's a bit weird having a separate policy for that. Jake: When you look at the spec for the video element, it doesn't say, by the way, don't use this to Jake: display pirated content. You know, because we've got laws for that, right? In the same way we have Jake: laws for hate speech. But there is a rule in the T's and C's that says you cannot use it to create Surma: Okay, mom. Jake: sexually explicit content. Right. Well, I feel that's a step too far. Like, why not? Jake: Why can't I use it for that? Surma: Someone made a really interesting parallel and I feel bad I don't remember who it was. Surma: I think it was Andrew Huang. Surma: I got a link to his video because I think I was a really good one. Surma: But he kind of said, you know, when the camera was invented, we say you may not take photos Surma: of naked people. Surma: You may not take photos of another painting because, you know, it's making an exact copy Surma: of another artist's work. Surma: And that's how it's quite familiar of the whole ethical problem we have with generative Surma: AI in general right now. Surma: So should the law really, like, control the tool or the person using the tool, right? Surma: Like, should the LLMs necessarily be this constrained and even trained to the point Surma: where they can't perform certain tasks, rather than saying, you know, they can do these tasks, Surma: but you're just not allowed to use them that way and putting the burden on the human. Surma: I guess, you know, that's probably like a healthy middle ground, but I think it's quite Jake: Well, it's interesting that photocopies are trained not to photocopy money, right? So it Surma: an interesting parallel to say, with a camera, I can take, create a perfect duplicate of Surma: the Mona Lisa. Surma: That doesn't mean we are training cameras to not be able to do that or make you sign Surma: a waiver you may not take a photo of the Mona Lisa. Surma: Yeah. Jake: does happen. And it's fascinating that there's a series of dots on money that is there to be Jake: picked up by things like photocopiers to stop. It's called Uryan, as in EU Ryan, but play on Surma: Oh. Surma: Oh, I have to try that. Jake: Orion, because it's like a little constellation shape. But it was, I think, started in the EU, Jake: which is why it's got EU at the start. So, and it's there. Yeah. So you could just put that Jake: pattern on your documents somewhere and then they can't be photocopied, which is kind of fun. Jake: Yeah. So, yeah, you're right. I think. So if you build a site around this to summarize the content Jake: of an email and someone puts an email in there that it was quite a raunchy email, like who's Jake: that for? Who's breaking the rules? Like, is it me? Because I fed it as a developer, I fed it to Surma: Yeah. Jake: the API. Is it the user? But it's just in general, like, why is this even a problem? Like it should Surma: Yeah, and you're the, you know, I'm guessing as a user of Chrome, you accept those terms Jake: just be, it doesn't feel like an open web thing. And also it's going to be, you know, that's the Jake: Gemini rules. So if this all ships in multiple browsers, you're now having to deal with the Jake: rules of each model depending on the browser or even the device, if it's a built-in model. Surma: when you start Chrome, but as the web app developer, did I have to accept these terms? Jake: All right. I think the idea is you do. Yeah. Which is why it's weird. Surma: Yeah, but then, you know, I could just say, oh, I just, you know, pipe through a prompt Surma: from a user that generated hate speech or, you know, and then the user's on the hook Jake: I guess it's not a new problem because you've got the same with social media sites where people can Surma: or if the developer just hard codes a prompt that will generate hate speech every single Surma: time, but then the user is being held responsible because how do you get a hold of the web developer? Surma: That's a really weird problem to have. Jake: post like hate speech or whatever. And it's kind of, you know, I'm sure they've sort of figured Jake: out exactly who's at fault in certain cases there and what the onus is on the host to remove that Jake: content as in a timely manner. But again, it tends to be things which are either things the Jake: site has imposed on itself, like a particular, you know, forum could say we don't want, you know, Jake: sexually explicit material. Whereas like, you know, here it's a single API that is implemented Surma: Yeah. Jake: in a particular way by a particular browser is coming with its own rules, which doesn't feel like Jake: the web platform to me at all. I'm kind of wondering, like, how serious is Chrome about any Surma: Yeah. Jake: of this? So when you look at their blog posts about it, they say these, these built-in AI features will Jake: reimagine the web. Like it doesn't just create a new chapter of the web. It creates a whole new era Jake: and that's the words they use, which is quite, quite strong, isn't it? But the blog post, which Surma: Wow. Jake: we'll link to about this particular feature is it only has a vague high-level description. It doesn't Surma: Wow. Jake: actually tell you how can you use this API, like what even the flags are, what the API is. It Surma: Open web. Jake: doesn't link to the explainer. If you want that, you have to fill in a Google form and they might Jake: get back to you with more details, maybe. It's odd, isn't it? And even with the explainer and the Jake: secret doc you might get access to after you fill in the form, it's still, like I said, I had to go Jake: to, you know, someone on the DevRel team and for help because it was hard to use. Which, so I've Jake: got two theories about what's going on here. The first is that Chrome has been burned in the past Jake: by folks suggesting that they are just going to unflag a feature and ship it as is. And I think Jake: some of this is genuine confusion around what a flagged feature is, but a lot of time it's done Jake: by people who do know better and they're just using it to rile up folks against Chrome. Like Jake: there was an incident last year while I was still at Google where folks got really angry about the Jake: poor accessibility of a CSS toggles implementation, but it was just behind a flag. It was an experiment Jake: and it was non-standard. It was buggy. Everything you would expect from something behind a flag Jake: at the experimental stage. But people, again, who I think knew better spun a story that Chrome was Jake: just going to ship this with its accessibility problems because they hate blind people and Jake: whatever. And people got angry about this because they, you know, that's what they were being told Jake: to be angry about this because Chrome was just going to ship it and they weren't. So maybe Chrome's Jake: trying to change its strategy here, like by making the experiment kind of super secret, but it's not Jake: secret because they're telling you it's going to be a new era and a chapter of the web, you know? Jake: So it's, I don't know. My other theory is that Google was rattled by how far ahead OpenAI and Jake: Microsoft were with the AI stuff. And they had a number of embarrassing failures with Jake: their own models. Gemini kind of looked a bit rubbish compared to the OpenAI stuff. Jake: So people asking questions about, you know, could this topple the dominance of the search engine? Jake: And we've seen that recently with OpenAI's search thing. I can't remember what it's called, Surma: Yeah. Surma: Yeah. Surma: Yeah. Jake: but, you know, you saw how that wobbled the Google stock price. So I think for Google I-O, Surma: Jake: they just threw everything AI at it to be seen to be doing something. The thing didn't really matter, Jake: I don't think, as long as it was AI. You look at the talks for Google I-O this year and it's Jake: everything is AI. So I think that's why you end up with these blog posts saying, this API is Jake: heralding a new era of the web. And it's like, oh, can I see the API? No, you may not. Surma: Yeah. Surma: Yeah. Jake: And so it's kind of like being seen to be doing something without really doing it and without Surma: Yeah. Surma: I mean, in the defense, it's, you know, clearly, you know, it's, you know, it's, you know, Jake: showing you that they've not done much around it. But then, you know, the explainer is being Jake: actively developed, so it's really difficult to get a read on it. For the reasons we've talked Jake: about, I don't see this making it into browsers. They do say that in the blog post it might come Jake: to nothing, or it might make it into the extensions API, where you can, you know, Jake: rely on the model and they can put the model ID in there. Surma: it's, you know, clearly sparking discussions. Surma: This idea of like, can we put this in the browser? Surma: And I think that's, that's a good thing. Surma: They're trying something and they're, you know, seeking feedback. Surma: The way you went about it, you know, it could probably be done a bit better. Surma: Like I would just, I saw this article as well where they announced it and there was like Surma: nothing in there, what this API would look like. Surma: And I was like, well, that's weird. Surma: But the fact that they're saying, let's try this. Surma: What do people think? Surma: And if that leads to, now I'm going to do this, then, you know, at least this information, Surma: this discussion has been had and we have good reasons or know the reasons why this Surma: shouldn't be done, at least not in this specific way. Surma: So I think that's quite good. Surma: And, you know, like I feel like people expect Google to be one of the top tier players in Surma: the AI field with DeepMind and Google Brain and all this stuff. Jake: But I don't think that's going to happen. Surma: And so their start was rocky. Surma: I think now actually Gemini is one of the really good performing models. Surma: So they have, I think, caught up. Surma: I don't think they have kind of repaired their reputation necessarily. Surma: I'm not sure how Gemini is perceived overall, but at least they're now, you know, they're Surma: in the head to head race. Jake: Nano is pretty shaky, I would say. Gemini Nano is. But then, you know, it's on device, Jake: which you would expect it not to be as good as the cloud level ones, which are Jake: orders of magnitude bigger. Surma: Well, I guess that's where currently the new Lama 3.1 seems to be doing amazing, something Surma: that you can run on your own device, and it's doing really quite well. Surma: As I said, I haven't played with Gemini Nano. Surma: So I think the whole like running LMs on your own device is a younger sub-discipline Surma: in the whole LM field now that devices suddenly all have, you know, neural network specific Surma: cores and stuff like that. Surma: But I think this is something worth exploring because clearly the use cases are going to Jake: I want to stress this is not an anti-AI rant, you know, and I think seeing AI go into more Surma: just like the amount of apps that make use of AI technology are going to continue rising. Surma: And I think the amount of apps that want to use these or offer these even while offline Surma: or on flaky internet also want to start rising. Surma: And so, you know, starting to explore how this work, I think that is good. Surma: So I'm glad they're doing it. Surma: Yeah. Jake: specific APIs, like shape detection or translation, image scaling, I would love to see AI image Jake: scaling in the browser. Yeah, seeing AI being used for more specific APIs, I think is a really, Surma: Interesting. Jake: really easy win. The prompt API, I think, I'm not so sure about that. But there is WebNN, Surma: Alright. Jake: Web Neural Network spec, as the kind of low level, the equivalent to WebGPU. Jake: Yeah, so it's behind a flag in Canary. I don't know how much active development it's getting. Jake: But yeah, in that case, you have to download the model yourself. But then you can rely on Jake: the model. And there's obviously issues with the size of the model. But I'll link to a Microsoft Jake: doc, because I think they're the ones actively working on it. And they've got a load of demos, Jake: some of which don't work, but some of which do work, which is fun. Yeah, we'll link to all that. Jake: And that feels like a nice low level way of doing it. But I do see the point is that if there's a Jake: model on your device already, you should be able to just call out to it. But the whole thing where Jake: you have to manage your prompt per model just seems like the biggest blocker to me. Surma: Thank you Jake: Yeah, absolutely. I would say, you know, we'll link to the docs. And there are some blog posts Surma: for watching, take care, bye! Surma: Yeah, let us know, then should we stop talking? Jake: out there now, which are, they kind of tell you how to play around with it. And I assume that'll Jake: get easier over time. We'll link to the explainer, demos and stuff, and WebNN. But yeah, go have a Jake: play with it. See what you think. Yeah, you've already listened to 40 minutes of what we think Jake: about it. Go and make your own mind up. That's a lie to say about that. I think we should. Jake: That's, that's enough. Enough from us, I reckon. So yeah, all there is left to say is happy next Surma: Agreed. Surma: Happy next time! Jake: time. Bye.