Chrome's new LLM AI API OMG

Chrome is experimenting with exposing an LLM to the web platform. Jake and Surma dig into how the API works, and whether something like this could work on the open web.

Resources:

Transcript
  1. Surma:I am working towards getting British citizenship, which is an interesting process.
  2. Jake:Ooh. Is this content? Because, like, well, again, I'm... Well, no, it should be content,
  3. Surma:Yes! Or should it not be?
  4. Jake:but we haven't done an intro. Look, I really do think we need to make the intros more professional.
  5. Surma:Oh.
  6. Surma:And I am Surma.
  7. Jake:Like, okay, let me give it a try. Let me give it a try. Okay. Welcome to another episode
  8. Jake:of OTMT. I'm Jake. We're going to be talking about some bits about AI and Chrome in a bit,
  9. Surma:Existing. Yeah, it's nice.
  10. Jake:but first we should thank Shopify for letting us record this and for hosting it and for
  11. Jake:paying for it and just, you know, it's just nice, isn't it? It's just nice. Yeah. They
  12. Jake:don't ask us to do that prompt. They didn't ask for any of this, did they? So, which is
  13. Surma:They also didn't ask us to do the podcast.
  14. Jake:why we didn't say anything about it in the last episode. So, you know, this is just us
  15. Jake:being nice, really. They're being nice. We're being nice. We're all friends. What do you
  16. Surma:That was so good.
  17. Jake:think about that, Summer? Was that good? That'll get us into the top three in the next survey.
  18. Surma:Once the survey's out, we're gonna, like, this podcast is sponsored by Shopify and the
  19. Surma:State of React survey, which you can find in the description, should totally vote for us.
  20. Surma:Oh, wow, Jake, that is a really astute observation. This is like one of our
  21. Jake:How about this for a smooth segue? But before we talk about tech, Summer, I heard that you
  22. Jake:were trying to get citizenship. Oh God. Why would you ever do it? Oh no. It is. Yes. And
  23. Surma:shared pet peeves is people giving tech talks and pretending to have a dialogue on stage.
  24. Surma:Yes.
  25. Jake:it is. It happens very much in the big corporate spaces, your Google IOs or internal conferences.
  26. Jake:You know, they've planned it out. They know what they're going to say. They know what
  27. Jake:the other person's going to say. So, Terry, what would I do if I wanted enhanced image
  28. Surma:We all know they've rehearsed and scripted all of it.
  29. Jake:generation as part of my website? Well, I'm glad you asked that, Samantha. Well, look
  30. Jake:at this demo. Oh, I hate it so much. I cannot stand it. I'm in a bad mood already. Well,
  31. Surma:That reminded me a bit of that.
  32. Surma:But you're correct, Jake. I am working towards British citizenship.
  33. Jake:you can. Have you received your bunting yet? And having a podcast that is in English wasn't
  34. Surma:That's mostly the reaction I get, which is like, yeah, why?
  35. Surma:No, I think that will probably be mandatory. I'm still in the application process. I'm
  36. Surma:early on, but you need to, you know, get some proof of certain different things.
  37. Surma:Like, for example, that you are, in fact, capable of speaking English to a sufficient level.
  38. Jake:enough for them, I suppose. Very well enough English. Are you sure? Did you pass this?
  39. Surma:No, no, no. You have to get an official certificate for citizenship. You need to
  40. Surma:be at level B1 or above, which I was actually surprised by. B1 is speaking and hearing.
  41. Surma:There's no reading or writing in this test. And it's, you know, it's intermediate.
  42. Surma:I guess it is very well enough English to, you know, live in England and get along.
  43. Jake:Are you about to tell us you failed? It's a whole metaphor about the empire, I believe.
  44. Surma:That's the thing. I went to take that language test, not quite sure what to expect.
  45. Surma:And so, you know, it was already, it was a very dodgy location for some reason where,
  46. Surma:you know, like, it felt like a building that was on the brink of collapse and you're going
  47. Surma:down to the basement, you know, like kind of dimly lit. And then they have these.
  48. Surma:And they have these little, like, booths and old, I want to say molding, probably not molding,
  49. Surma:but they look a bit molding headphones that you have to put on to interact with the computer.
  50. Surma:But before that, they have to, you know, check who you are and that you're the person that you
  51. Surma:claim to be, because they obviously have to make sure you're not taking the test for someone else.
  52. Surma:And you literally have to, like, put everything from your pockets in your locker and turn your
  53. Surma:pockets inside out. And that metal detector, it actually was a bit disproportionate.
  54. Surma:But then once they have all this done and have confirmed your identity,
  55. Jake:Excuse me? Well, see, so that's why you can't have kids anymore, right?
  56. Surma:they kind of sit you down for a quick questionnaire. When you move to the UK,
  57. Surma:what are you here for? Because you'd also need this test for some other different things,
  58. Surma:like getting a new passport sometimes or if you just want to have a visa.
  59. Surma:And they asked me, so why are you here? And I was saying, well, I want to get neutralized.
  60. Surma:I didn't want to get neutered. I want to get neutralized.
  61. Jake:Oh, that's right. It's a different word.
  62. Surma:It both were wrong. And then she looked at me squinting and I was like,
  63. Jake:Oh, so you had to go to a building, I guess, to verify it.
  64. Surma:taking a step back, naturalized. I'm here because I want to get naturalized, which
  65. Surma:brilliant start to proving that I am, in fact, capable of speaking English.
  66. Surma:So that got my adrenaline up. Luckily, the entire test was on a computer
  67. Surma:with another person on the phone. But.
  68. Jake:Identity. But then you're just doing the whole thing on Zoom or Hangouts or whatever, right?
  69. Surma:They have really like strict supervision of the test environment, like you can't look over to
  70. Surma:another person, you have to have your headphones on, which honestly, these headphones aren't very
  71. Surma:good. And you have like three other people around you having similar conversations with their test
  72. Jake:So what's your next steps, then? You've got to go on Bake Off. Really? I've never seen
  73. Surma:person over the phone. It was all right. I passed one step closer. But I felt like
  74. Surma:if they had disqualified me in that very moment, I couldn't have held it against them.
  75. Surma:You know, you say that, but actually in the citizenship test, which I also have done and
  76. Surma:passed, some people have gotten questions about Baikov. Yeah, it's fascinating.
  77. Jake:a single episode or Doctor Who. Sorry. Oh, I do have the first six series of Red Dwarf
  78. Surma:Baikov, I have watched a couple I could have, you know, said Paul Hollywood, but
  79. Surma:that's probably where it ends.
  80. Jake:entirely in my head, though, so I think I would try and I would use that.
  81. Surma:Which I still haven't watched, you have recommended it to me multiple times,
  82. Jake:Yes, let's do that, because I want to talk about Chrome's new AI prompt API
  83. Surma:and I need to catch up on that. But instead of this, should we talk about the web?
  84. Surma:Oh, AI API, that's that's a strong acronym.
  85. Jake:and how that kind of fits into the web and web standards.
  86. Jake:And it means we can put AI in the episode title. It's so hot right now.
  87. Surma:So is it AI or is it LLMs?
  88. Jake:It's it's what everyone's talking about.
  89. Jake:It's LLMs, but, you know, come on, no one knows what that means.
  90. Surma:I was trying to think if we should both in the tweet, because then we'll get
  91. Surma:all the AI driven Twitter accounts to reply to us.
  92. Jake:Oh, yeah, we'll use LLM. We'll use AI. We'll put API in that. Do you know what?
  93. Surma:Yes, with some new Roman numerals that were invented.
  94. Jake:The episode title is going to look like a Roman numeral.
  95. Jake:So new letters in there. Yeah, absolutely.
  96. Surma:Yeah, tell me, like, this is basically riding the wave of giving people access to an LLM in,
  97. Surma:well, I guess that would be my first question. Is it a browser defined LLM? Can I bring my own?
  98. Surma:Like, how does it work? What does it look like, Jake? What does it look like?
  99. Jake:I'll say it first. It's behind a flag. It's very experimental. There are a lot of hoops
  100. Surma:Sounds like a great experience.
  101. Jake:you need to jump through to get to the end of the episode.
  102. Jake:I'll say it first. It's behind a flag. It's very experimental.
  103. Jake:There are a lot of hoops you need to jump through to get it working.
  104. Jake:Yeah, I tried to do it just the normal way, but I had to go and knock on the door of,
  105. Jake:you know, some of some of the old friends over at Chrome to actually figure out how
  106. Jake:to even get it working. So it's not it's not super simple.
  107. Jake:But they do say, like, this is even more experimental than their usual behind a flag
  108. Surma:Do you think that's good? I mean, I'm basically getting developers in earlier than usual.
  109. Jake:stuff, which is very experimental.
  110. Jake:No, this is a good thing. It should. And I support any browser to, like,
  111. Jake:just throw stuff in behind a flag, because if it's behind a flag, it's fine.
  112. Jake:You know my feelings about standardization. We talked about it in the last episode.
  113. Surma:Yeah.
  114. Surma:If that flag is disabled by default.
  115. Jake:But I think it should be anything behind a flag should could be added and removed at any time.
  116. Jake:If the flag is disabled by default, of course, of course.
  117. Jake:Right. So I'll go through the API briefly. This is the API that's in their explainer,
  118. Jake:which is a little bit different to the API that's in Chrome.
  119. Jake:We'll link to the explainer as well. Well, again, like, what tends to happen
  120. Surma:Good start.
  121. Jake:in these cases is some browser engineers will just kind of experiment with a thing.
  122. Jake:And then there'll be a sort of explainer standards effort spun up at the same time.
  123. Surma:All right.
  124. Jake:And that's usually the point where someone actually starts thinking about the API properly.
  125. Jake:So and then, you know, the API will get redesigned.
  126. Jake:And then there'll be a bit of a, you know, as tends to happen in the standards process,
  127. Jake:you know, the API will change in the spec or the explainer in this case,
  128. Jake:and it will change in the browser later. So the explainer is ahead of the spec,
  129. Jake:which is all fine. So there's a global window.ai or self.ai or globalist.ai, whatever.
  130. Surma:Right. That's it.
  131. Surma:How do we feel about that?
  132. Jake:Fine, whatever. Again, it's early days.
  133. Surma:I thought I thought we're trying to not do globals, because if it's like, you know,
  134. Jake:I think globals are fine. I just, again, I couldn't tell you why something would go in
  135. Surma:if it's something provided by the user agent, it should be in like Navigator or something.
  136. Surma:Okay.
  137. Jake:Navigator rather than, it's navigator.serviceworker. And I kind of wasn't part of that discussion so
  138. Jake:much. I don't know what the rules are. It's certainly not part of the document.
  139. Jake:So it shouldn't be document.ai. I think sticking stuff in Navigator was maybe something that
  140. Jake:shouldn't have happened to a service worker. But yeah, maybe nervousness around the global
  141. Jake:scope was a thing. Yeah. And in that namespace, there is .assistant. So you're,
  142. Surma:That is nice because they're not conflating AIs with conversational LLMs, they're creating
  143. Jake:which is another namespace. So ai.assistant.
  144. Jake:Yes.
  145. Surma:a whole namespace for AI stuff and creating a subnamespace for the specific chat-driven
  146. Surma:LLM stuff, which means later on, if you want to have image synthesis or anything similar,
  147. Surma:there is a namespace where this can now go.
  148. Surma:So actually, that's kind of nice.
  149. Jake:And it's assistant rather than LLM, because maybe in future, this same thing will be done
  150. Jake:with something more advanced than LLMs or whatever. Or you could have an implementation
  151. Surma:Yeah.
  152. Jake:just made with Markov chains, I suppose, right? It's not linking it to a particular technology.
  153. Jake:And then you've got the create method. So ai.assistant.create, you can give it options
  154. Jake:like temperature, top K, which if you've worked with LLMs before, the kind of familiar terms.
  155. Jake:Assistant prompt, again, familiar thing. Or initial prompts, which is you can give an example
  156. Jake:back and forth between the assistant and the user. And you can only do one of those two.
  157. Jake:And then, yeah, you get a promise for an object and you get like a prompt method on that object
  158. Jake:or prompt streaming. And you just, you give it some text and it gives you the reply. And that's
  159. Jake:pretty much it. You get clone and destroy methods and there's some other stuff, but that's,
  160. Jake:it's quite a small API.
  161. Surma:Yeah, because I played around with LLMs a little bit as well, and at start I was like,
  162. Jake:Yes. Not exactly that. I don't know if it's defined that it has to be token by token,
  163. Jake:or if it's just like, you know, some reasonable amount of streaming
  164. Jake:like you would with a network response. You get text back.
  165. Surma:I'm going to keep it easy, I just want the full response, but because you don't know
  166. Surma:that the LLM might just go, yeah, I'm going to give you like a really detailed answer
  167. Surma:for no reason whatsoever, and you suddenly wait 30 seconds until you actually get the
  168. Surma:full response, which is why streaming here I think is also very important, also because
  169. Jake:Yes. And that's actually one of the recent changes in the explainer that isn't in the
  170. Surma:you probably want to be able to cancel the response if the user says, actually this is
  171. Surma:going in the completely wrong direction, or I've seen the info that I need.
  172. Surma:Is there in a bot controller signal that you can pass in to the prompt streaming?
  173. Surma:Ah.
  174. Jake:browser yet, but yeah, it's there. I guess with the stream, you can just cancel the stream as well.
  175. Surma:Well, that's, I guess, then the next question, right?
  176. Jake:But for the more kind of promise-based one, you've got the bulk controller.
  177. Jake:So yeah, this is something you can use to create like autocomplete things,
  178. Jake:sort of proofread a document, summarize the thing, code generation, I guess, chatbots,
  179. Jake:translation maybe, although, you know, there are more specific models for that,
  180. Jake:but it's, you know, it's all kinds of stuff you could do with this.
  181. Surma:Like, what is the model?
  182. Surma:Is that up to the user agent?
  183. Surma:Can the user link to like a public model?
  184. Jake:It is down to the user agent and the user does not get to pick.
  185. Surma:Because, you know, different models are good at different things.
  186. Jake:So in Chrome right now, it is an on-device model. It's Gemini Nano.
  187. Surma:I mean, that's really interesting, right?
  188. Surma:Like the fact that this works offline, if it's on the device that you can use this
  189. Surma:offline.
  190. Surma:That is actually very appealing.
  191. Jake:Yes. But that does mean it has to download the model on first use and the API is shaped to allow
  192. Surma:Well, I mean, you know, that's normal size for React app nowadays.
  193. Jake:this. There's events and that kind of thing. It'll let you know if it's not going to be
  194. Jake:instantly ready. And the model is two gigabytes and it unzips to 25 gigabytes.
  195. Jake:Well, that's it, isn't it? It's like, it does, you know, when we were doing DevRel-y things,
  196. Jake:we were like, oh yeah, one megabyte is quite a lot for your homepage. And it's like, yeah,
  197. Surma:The question here would be, does it redownload it every time for every origin?
  198. Jake:going up to two gigabytes is, yeah, it's going to be fun to try and justify that.
  199. Jake:Yeah.
  200. Surma:Or is this where we're now breaking one of the longstanding pillars or something of
  201. Surma:the web where we said that we can't cache across origins because that's a timing attack,
  202. Surma:but maybe if it's a user agent provided LLM, is it okay to download it once and cache it
  203. Surma:for all sites to use?
  204. Jake:You are absolutely correct. Yeah, it is that. So it's a one download into Chrome and then it can
  205. Jake:be used on all sites. It's not a problem doing that because it's an internal thing. The reason
  206. Jake:we don't bleed the cache across origins is because it can be used for like cookie-like things. But
  207. Jake:here you don't get access to the bytes of the model. The cookies never influence the model
  208. Surma:I mean, you say it's not part of the experiment, but I guess in the end, it's really up to
  209. Jake:anyway, I think. So it's totally fine to do that in this case. And maybe there will be cases as
  210. Jake:well, like, you know, all phones are now like, our phone's got AI built in or something. So you
  211. Jake:might not need to download anyway, because it just came with your phone and it's updated via the OS.
  212. Jake:Although that's not part of this experiment, but you can see that, you can imagine that world.
  213. Surma:the user agent to decide, do I need to download something or can I just forward this or be
  214. Surma:an adapter somehow to whatever the operating system already provides?
  215. Surma:I could edge on Windows, just be an adapter to Cortana or, you know, I don't know exactly
  216. Jake:Yeah, exactly that. And we've seen that happen with other specs before. The shape detection API,
  217. Surma:how, but, you know, that could work.
  218. Surma:Yeah.
  219. Jake:which is only, it's not really fully implemented, but things like that just call out to the inbuilt
  220. Surma:No, and I guess, you know, like, I haven't used Gemini Nano directly myself, but I'm
  221. Jake:OS stuff. But there's nothing stopping implementers using a cloud service for this, which, okay, it
  222. Jake:wouldn't work offline, but otherwise it's not a crazy thing to do. Like push messages are already
  223. Jake:using a cloud service, right? Right.
  224. Surma:assuming that side of a model can only be so capable when, you know, LLAMA 3.1 just
  225. Surma:came out with its, what was it, 280 gigs or something?
  226. Surma:It could be like a progressive enhancement where when you're offline, it uses what is
  227. Surma:available offline, but if you're online and you're not on like a metered connection, it
  228. Surma:might actually use the model in the cloud just because it is more capable.
  229. Surma:That being said, I'm very much playing around with LLMs and stuff, so I couldn't be further
  230. Surma:from an expert or having good experience here, but I feel like the system prompts you write
  231. Surma:are very much tailored to the model you work with, and the same system prompt will give
  232. Surma:wildly different results when applied to a different model, especially when it's a different,
  233. Surma:you know, amount of parameters.
  234. Surma:It can only be smaller context windows or just like generally has been trained on a
  235. Surma:different set, and it's like LLMs by themselves are already hard to quote-unquote program
  236. Jake:I couldn't agree more. I think that's the big problem with this. You have to sort of engineer
  237. Surma:against because even with the same model, the same system prompt, the results are often
  238. Surma:varying and sometimes unreliable.
  239. Surma:Now switching out the model, it's like, oh yeah, the user agent will give you a model.
  240. Surma:How useful is that?
  241. Jake:your prompts and sometimes engineer the, yeah, not just a system prompt, but the way you ask
  242. Jake:particular questions and that is per model, right? And I don't think we'd get to a state where,
  243. Jake:well, if you get to request a particular model or sort of download your own, I think a lot of
  244. Jake:the benefit of this particular API is gone, right? Because then that two gigabytes is now
  245. Surma:And then you could just use WebGPU.
  246. Jake:per site or maybe, you know, different parts of the same site are using different versions.
  247. Surma:I know WebGPU actually doesn't expose some of the necessary types like the 8-bit floats
  248. Surma:and 4-bit integers that lots of the LLM file formats use, but I think they're working
  249. Jake:Well, there's also a WebNN spec and that is behind a flag in Chrome, which is kind of like
  250. Surma:on that.
  251. Surma:Oh.
  252. Jake:tailored towards that exact use case. So it would be able to use like neural CPU stuff rather than
  253. Jake:just relying on the GPU, but it can rely on the GPU. Yeah. We might touch on that a little bit
  254. Surma:Interesting.
  255. Jake:later as well, but yeah. So when we designed Service Worker, like we put a bit in the spec
  256. Jake:to say, you know, if there's too much work happening in the Service Worker and it's not
  257. Jake:linked to a particular response, you can terminate it, right? Because it's doing work when it shouldn't
  258. Surma:Yeah.
  259. Jake:be doing. And we just said like, you leave it up to the browser to decide what too long is,
  260. Jake:because that can change over time or different user agents can have different opinions.
  261. Jake:But when Safari implemented, they were just like, right, just tell us which one number you use
  262. Jake:and we'll do that number. Because like, if we don't, and if we pick another number,
  263. Jake:someone's going to file a bug that it's different to Chrome. Right. And I thought, okay,
  264. Jake:that's interesting and true because they were speaking from experience, but that's all that's
  265. Jake:going to happen here, right? Let's say Chrome ships this API and then later Safari does,
  266. Surma:Yeah.
  267. Jake:and they ship with a better model. Even though the model is better, it might not produce as good
  268. Jake:results because all of the usage out there is tweaked towards the particular Chrome one.
  269. Jake:As you say, they've worked on all the system prompts for this Gemini Nano model. And people
  270. Jake:are going to say, well, the Safari one's rubbish, isn't it? Even though it might be
  271. Jake:objectively better if it's given the right kind of prompts. So then maybe Safari feel pressure
  272. Surma:This might be, like, a very uninformed opinion, but the thing where LLMs for me suddenly became
  273. Jake:to use the same model as Chrome and blah, blah, blah. And it's not a great situation.
  274. Surma:really interesting and useful is when I played around – this is now, you know, maybe half
  275. Surma:a year ago or something – played around with ChatGPT 3.5 at the time for the first
  276. Surma:time and looked at the API and saw that they have functions.
  277. Surma:I think they're now called tools, where you basically, in addition to, you know, your
  278. Surma:system prompt or maybe previous messages, you also provide a JSON schema descriptions
  279. Surma:of functions that this agent, assistant, is allowed to invoke instead of providing a textual
  280. Surma:response.
  281. Jake:Or even worse, just guess. I have some historical data and so I'll just pick a number.
  282. Surma:So, you know, you could provide a function like getTemperature in location and it takes
  283. Surma:a parameter of a location and returns the temperature and now if the user asks, what's
  284. Surma:the temperature in Tokyo right now, it has been trained to not say, I'm a AI system
  285. Surma:that can't really tell you what's going on in the world, but rather go, hey, I have
  286. Surma:a – yeah, yeah, just hallucinate an answer – yeah, and then it would invoke that tool.
  287. Jake:Right.
  288. Surma:You would then, on your end, see, oh, it wants to invoke the tool, you invoke the tool, you
  289. Surma:supply the answer, and then it would use the return value from that function and generate
  290. Surma:a textual response with the actual data woven in.
  291. Surma:And in that sense, for me then, an LM became a English language to function call adapter
  292. Surma:and back.
  293. Jake:Yeah. So you could try and do that with this, right? But you would be kind of trying to feed
  294. Surma:Exactly.
  295. Surma:I tried doing that with some other models, like, which were not specifically trained
  296. Jake:that all into the system prompt.
  297. Surma:to support this and kind of, like, teach them through the system prompt to do that.
  298. Surma:I wonder if that would narrow the scope of this API more, like, this is an assistant
  299. Surma:that can, you know, turn English into function calls and then provide a textual response.
  300. Surma:Or you can decide to not let it generate a textual response and just function calling
  301. Surma:is enough.
  302. Surma:That's how I built a to-do list manager where basically I can say, hey, add milk to
  303. Surma:my shopping list and because it has a function that says add an item to a shopping list,
  304. Surma:it calls that function and I don't really invoke it anymore.
  305. Surma:It could then, you know, if I did, it would say, I have just added milk to your shopping
  306. Surma:list.
  307. Surma:Here is now your new full shopping list, but I don't care about that, so I don't even
  308. Surma:invoke it afterwards anymore.
  309. Surma:Again, very naive opinion, but I wonder if this kind of pattern would be more stable
  310. Surma:across different language models that different user agents could provide their own implementation
  311. Surma:for.
  312. Jake:Yes.
  313. Surma:And I would think it becomes more useful to building apps as well, because I think what
  314. Surma:most people are looking for is that where, you know, you build your to-do list app and
  315. Surma:you will be able to quickly say into your microphone, remind me tomorrow to do X, or
  316. Surma:you want to say into your email app, send an email to my mom and that should really
  317. Surma:just take an action and you don't want to have it, you know, respond in English and
  318. Surma:now you have to programmatically analyze the English to figure out what you want to do.
  319. Surma:So I feel like that's why I was wondering about this API, because if it's all just,
  320. Jake:Yeah. This could be added to the API, right? There's an options object there, which would be
  321. Surma:you know, chat, how do you integrate that meaningfully into your app?
  322. Surma:I guess, you know, you put stuff in the system prompt so it can ask questions, but that seems
  323. Surma:only very limited in the use case that will reliably work.
  324. Jake:a very obvious place to put these kinds of functions. But yeah, it would limit the kind
  325. Jake:of models that could be used on the other side. And maybe there's a, is that future proof as a
  326. Jake:surface, but sure. It could be done. I think some of this difference between the models,
  327. Jake:they do acknowledge it in the explainer. They've got this bit that says, we do not intend to
  328. Jake:provide guarantees of language model quality, stability, or interoperability between browsers.
  329. Jake:In particular, we cannot guarantee that the models exposed by these APIs are particularly
  330. Jake:good at any given use case. These are left as quality of implementation issues,
  331. Jake:similar to the shape detection API, which is why that API was in my head earlier.
  332. Jake:And I get it, but I don't think that that is equivalent because yes, one shape detector may
  333. Jake:be better than another, but there's a right answer, you know? Oh, there's a very narrow
  334. Jake:set of things that can be considered a right answer. It's like, scan that barcode. You know,
  335. Jake:where are the faces in this photo? It can be measured objectively. Like even the translation
  336. Jake:API, although there's a sort of wider set of correct answers, and it's a little bit more
  337. Jake:subjective, it's still a fairly small range of what's correct. Whereas in this case, if the task
  338. Jake:is to write a poem about a cat that lives on Mars, right? The range of correct and good is wide open.
  339. Surma:I mean, there are benchmarks that all the different models use to compare each model
  340. Jake:I will tell you that Gemini Nano is very bad at writing a poem about a cat on Mars,
  341. Jake:but that's besides the point.
  342. Surma:against each other model.
  343. Surma:And maybe that's one way to do it.
  344. Surma:But I tend to agree that it seems very hard here to provide a somewhat consistent user
  345. Jake:Yeah.
  346. Surma:experience.
  347. Surma:I mean, face detection is kind of like, you know, the goal is kind of clear, you know,
  348. Surma:there's a picture, it should find all the faces and you can easily like Safari struggled
  349. Surma:here, Chrome didn't.
  350. Surma:File a bug in Safari UI, isn't it seeing this face?
  351. Surma:While with the other one, it's like, what is the process?
  352. Surma:Are we all, are all browsers going to start collecting failing system prompt, prompt response
  353. Surma:tuples and add them to their training set and continually upgrade?
  354. Surma:It just doesn't seem quite the same.
  355. Surma:In general, like, as I said, like I already just like working with one well-known, well-defined
  356. Surma:LLM, even there, I find it unreliable.
  357. Surma:I always compare it to like, it's like a programming language where any statement has only a certain
  358. Surma:percentage of actually being executed and nobody would choose that language to build
  359. Surma:critical infrastructure, right?
  360. Surma:And so it feels a bit like I would not want to put anything critical in there.
  361. Surma:And I guess I'm not saying that this should be a critical part of your web app, but I
  362. Jake:Hmm. Absolutely.
  363. Surma:would not know how to write a system prompt that gives a reliable user experience if I
  364. Surma:don't even know which model I'm building against.
  365. Surma:And then it's like, are we going back to user agent sniffing?
  366. Jake:Exactly. And I think you would want the model to identify itself, which now is quite a fingerprinting
  367. Jake:vector, especially if it might differ by location, which maybe it would. I don't know if it would
  368. Jake:or not. You know, you can imagine a case where in China you get a different model because of
  369. Surma:I mean, that's already the case, right?
  370. Jake:particular rules. You know, it's...
  371. Surma:Like, Meta has released a different version for LLAMA 3.1 in the EU than anywhere else.
  372. Jake:Well, okay. There you go. I think that is one of the biggest problems with this,
  373. Surma:Oh boy.
  374. Jake:but there's another problem and that is terms and conditions. Gemini has a use policy,
  375. Jake:which says what you're allowed and not allowed to do with it. And we'll link to it,
  376. Jake:but there's a lot of sensible stuff in there. Like, don't use it to generate hate speech.
  377. Jake:Okay. Which, fine, but I think it's a bit weird having a separate policy for that.
  378. Jake:When you look at the spec for the video element, it doesn't say, by the way, don't use this to
  379. Jake:display pirated content. You know, because we've got laws for that, right? In the same way we have
  380. Jake:laws for hate speech. But there is a rule in the T's and C's that says you cannot use it to create
  381. Surma:Okay, mom.
  382. Jake:sexually explicit content. Right. Well, I feel that's a step too far. Like, why not?
  383. Jake:Why can't I use it for that?
  384. Surma:Someone made a really interesting parallel and I feel bad I don't remember who it was.
  385. Surma:I think it was Andrew Huang.
  386. Surma:I got a link to his video because I think I was a really good one.
  387. Surma:But he kind of said, you know, when the camera was invented, we say you may not take photos
  388. Surma:of naked people.
  389. Surma:You may not take photos of another painting because, you know, it's making an exact copy
  390. Surma:of another artist's work.
  391. Surma:And that's how it's quite familiar of the whole ethical problem we have with generative
  392. Surma:AI in general right now.
  393. Surma:So should the law really, like, control the tool or the person using the tool, right?
  394. Surma:Like, should the LLMs necessarily be this constrained and even trained to the point
  395. Surma:where they can't perform certain tasks, rather than saying, you know, they can do these tasks,
  396. Surma:but you're just not allowed to use them that way and putting the burden on the human.
  397. Surma:I guess, you know, that's probably like a healthy middle ground, but I think it's quite
  398. Jake:Well, it's interesting that photocopies are trained not to photocopy money, right? So it
  399. Surma:an interesting parallel to say, with a camera, I can take, create a perfect duplicate of
  400. Surma:the Mona Lisa.
  401. Surma:That doesn't mean we are training cameras to not be able to do that or make you sign
  402. Surma:a waiver you may not take a photo of the Mona Lisa.
  403. Surma:Yeah.
  404. Jake:does happen. And it's fascinating that there's a series of dots on money that is there to be
  405. Jake:picked up by things like photocopiers to stop. It's called Uryan, as in EU Ryan, but play on
  406. Surma:Oh.
  407. Surma:Oh, I have to try that.
  408. Jake:Orion, because it's like a little constellation shape. But it was, I think, started in the EU,
  409. Jake:which is why it's got EU at the start. So, and it's there. Yeah. So you could just put that
  410. Jake:pattern on your documents somewhere and then they can't be photocopied, which is kind of fun.
  411. Jake:Yeah. So, yeah, you're right. I think. So if you build a site around this to summarize the content
  412. Jake:of an email and someone puts an email in there that it was quite a raunchy email, like who's
  413. Jake:that for? Who's breaking the rules? Like, is it me? Because I fed it as a developer, I fed it to
  414. Surma:Yeah.
  415. Jake:the API. Is it the user? But it's just in general, like, why is this even a problem? Like it should
  416. Surma:Yeah, and you're the, you know, I'm guessing as a user of Chrome, you accept those terms
  417. Jake:just be, it doesn't feel like an open web thing. And also it's going to be, you know, that's the
  418. Jake:Gemini rules. So if this all ships in multiple browsers, you're now having to deal with the
  419. Jake:rules of each model depending on the browser or even the device, if it's a built-in model.
  420. Surma:when you start Chrome, but as the web app developer, did I have to accept these terms?
  421. Jake:All right. I think the idea is you do. Yeah. Which is why it's weird.
  422. Surma:Yeah, but then, you know, I could just say, oh, I just, you know, pipe through a prompt
  423. Surma:from a user that generated hate speech or, you know, and then the user's on the hook
  424. Jake:I guess it's not a new problem because you've got the same with social media sites where people can
  425. Surma:or if the developer just hard codes a prompt that will generate hate speech every single
  426. Surma:time, but then the user is being held responsible because how do you get a hold of the web developer?
  427. Surma:That's a really weird problem to have.
  428. Jake:post like hate speech or whatever. And it's kind of, you know, I'm sure they've sort of figured
  429. Jake:out exactly who's at fault in certain cases there and what the onus is on the host to remove that
  430. Jake:content as in a timely manner. But again, it tends to be things which are either things the
  431. Jake:site has imposed on itself, like a particular, you know, forum could say we don't want, you know,
  432. Jake:sexually explicit material. Whereas like, you know, here it's a single API that is implemented
  433. Surma:Yeah.
  434. Jake:in a particular way by a particular browser is coming with its own rules, which doesn't feel like
  435. Jake:the web platform to me at all. I'm kind of wondering, like, how serious is Chrome about any
  436. Surma:Yeah.
  437. Jake:of this? So when you look at their blog posts about it, they say these, these built-in AI features will
  438. Jake:reimagine the web. Like it doesn't just create a new chapter of the web. It creates a whole new era
  439. Jake:and that's the words they use, which is quite, quite strong, isn't it? But the blog post, which
  440. Surma:Wow.
  441. Jake:we'll link to about this particular feature is it only has a vague high-level description. It doesn't
  442. Surma:Wow.
  443. Jake:actually tell you how can you use this API, like what even the flags are, what the API is. It
  444. Surma:Open web.
  445. Jake:doesn't link to the explainer. If you want that, you have to fill in a Google form and they might
  446. Jake:get back to you with more details, maybe. It's odd, isn't it? And even with the explainer and the
  447. Jake:secret doc you might get access to after you fill in the form, it's still, like I said, I had to go
  448. Jake:to, you know, someone on the DevRel team and for help because it was hard to use. Which, so I've
  449. Jake:got two theories about what's going on here. The first is that Chrome has been burned in the past
  450. Jake:by folks suggesting that they are just going to unflag a feature and ship it as is. And I think
  451. Jake:some of this is genuine confusion around what a flagged feature is, but a lot of time it's done
  452. Jake:by people who do know better and they're just using it to rile up folks against Chrome. Like
  453. Jake:there was an incident last year while I was still at Google where folks got really angry about the
  454. Jake:poor accessibility of a CSS toggles implementation, but it was just behind a flag. It was an experiment
  455. Jake:and it was non-standard. It was buggy. Everything you would expect from something behind a flag
  456. Jake:at the experimental stage. But people, again, who I think knew better spun a story that Chrome was
  457. Jake:just going to ship this with its accessibility problems because they hate blind people and
  458. Jake:whatever. And people got angry about this because they, you know, that's what they were being told
  459. Jake:to be angry about this because Chrome was just going to ship it and they weren't. So maybe Chrome's
  460. Jake:trying to change its strategy here, like by making the experiment kind of super secret, but it's not
  461. Jake:secret because they're telling you it's going to be a new era and a chapter of the web, you know?
  462. Jake:So it's, I don't know. My other theory is that Google was rattled by how far ahead OpenAI and
  463. Jake:Microsoft were with the AI stuff. And they had a number of embarrassing failures with
  464. Jake:their own models. Gemini kind of looked a bit rubbish compared to the OpenAI stuff.
  465. Jake:So people asking questions about, you know, could this topple the dominance of the search engine?
  466. Jake:And we've seen that recently with OpenAI's search thing. I can't remember what it's called,
  467. Surma:Yeah.
  468. Surma:Yeah.
  469. Surma:Yeah.
  470. Jake:but, you know, you saw how that wobbled the Google stock price. So I think for Google I-O,
  471. Surma:
  472. Jake:they just threw everything AI at it to be seen to be doing something. The thing didn't really matter,
  473. Jake:I don't think, as long as it was AI. You look at the talks for Google I-O this year and it's
  474. Jake:everything is AI. So I think that's why you end up with these blog posts saying, this API is
  475. Jake:heralding a new era of the web. And it's like, oh, can I see the API? No, you may not.
  476. Surma:Yeah.
  477. Surma:Yeah.
  478. Jake:And so it's kind of like being seen to be doing something without really doing it and without
  479. Surma:Yeah.
  480. Surma:I mean, in the defense, it's, you know, clearly, you know, it's, you know, it's, you know,
  481. Jake:showing you that they've not done much around it. But then, you know, the explainer is being
  482. Jake:actively developed, so it's really difficult to get a read on it. For the reasons we've talked
  483. Jake:about, I don't see this making it into browsers. They do say that in the blog post it might come
  484. Jake:to nothing, or it might make it into the extensions API, where you can, you know,
  485. Jake:rely on the model and they can put the model ID in there.
  486. Surma:it's, you know, clearly sparking discussions.
  487. Surma:This idea of like, can we put this in the browser?
  488. Surma:And I think that's, that's a good thing.
  489. Surma:They're trying something and they're, you know, seeking feedback.
  490. Surma:The way you went about it, you know, it could probably be done a bit better.
  491. Surma:Like I would just, I saw this article as well where they announced it and there was like
  492. Surma:nothing in there, what this API would look like.
  493. Surma:And I was like, well, that's weird.
  494. Surma:But the fact that they're saying, let's try this.
  495. Surma:What do people think?
  496. Surma:And if that leads to, now I'm going to do this, then, you know, at least this information,
  497. Surma:this discussion has been had and we have good reasons or know the reasons why this
  498. Surma:shouldn't be done, at least not in this specific way.
  499. Surma:So I think that's quite good.
  500. Surma:And, you know, like I feel like people expect Google to be one of the top tier players in
  501. Surma:the AI field with DeepMind and Google Brain and all this stuff.
  502. Jake:But I don't think that's going to happen.
  503. Surma:And so their start was rocky.
  504. Surma:I think now actually Gemini is one of the really good performing models.
  505. Surma:So they have, I think, caught up.
  506. Surma:I don't think they have kind of repaired their reputation necessarily.
  507. Surma:I'm not sure how Gemini is perceived overall, but at least they're now, you know, they're
  508. Surma:in the head to head race.
  509. Jake:Nano is pretty shaky, I would say. Gemini Nano is. But then, you know, it's on device,
  510. Jake:which you would expect it not to be as good as the cloud level ones, which are
  511. Jake:orders of magnitude bigger.
  512. Surma:Well, I guess that's where currently the new Lama 3.1 seems to be doing amazing, something
  513. Surma:that you can run on your own device, and it's doing really quite well.
  514. Surma:As I said, I haven't played with Gemini Nano.
  515. Surma:So I think the whole like running LMs on your own device is a younger sub-discipline
  516. Surma:in the whole LM field now that devices suddenly all have, you know, neural network specific
  517. Surma:cores and stuff like that.
  518. Surma:But I think this is something worth exploring because clearly the use cases are going to
  519. Jake:I want to stress this is not an anti-AI rant, you know, and I think seeing AI go into more
  520. Surma:just like the amount of apps that make use of AI technology are going to continue rising.
  521. Surma:And I think the amount of apps that want to use these or offer these even while offline
  522. Surma:or on flaky internet also want to start rising.
  523. Surma:And so, you know, starting to explore how this work, I think that is good.
  524. Surma:So I'm glad they're doing it.
  525. Surma:Yeah.
  526. Jake:specific APIs, like shape detection or translation, image scaling, I would love to see AI image
  527. Jake:scaling in the browser. Yeah, seeing AI being used for more specific APIs, I think is a really,
  528. Surma:Interesting.
  529. Jake:really easy win. The prompt API, I think, I'm not so sure about that. But there is WebNN,
  530. Surma:Alright.
  531. Jake:Web Neural Network spec, as the kind of low level, the equivalent to WebGPU.
  532. Jake:Yeah, so it's behind a flag in Canary. I don't know how much active development it's getting.
  533. Jake:But yeah, in that case, you have to download the model yourself. But then you can rely on
  534. Jake:the model. And there's obviously issues with the size of the model. But I'll link to a Microsoft
  535. Jake:doc, because I think they're the ones actively working on it. And they've got a load of demos,
  536. Jake:some of which don't work, but some of which do work, which is fun. Yeah, we'll link to all that.
  537. Jake:And that feels like a nice low level way of doing it. But I do see the point is that if there's a
  538. Jake:model on your device already, you should be able to just call out to it. But the whole thing where
  539. Jake:you have to manage your prompt per model just seems like the biggest blocker to me.
  540. Surma:Thank you
  541. Jake:Yeah, absolutely. I would say, you know, we'll link to the docs. And there are some blog posts
  542. Surma:for watching, take care, bye!
  543. Surma:Yeah, let us know, then should we stop talking?
  544. Jake:out there now, which are, they kind of tell you how to play around with it. And I assume that'll
  545. Jake:get easier over time. We'll link to the explainer, demos and stuff, and WebNN. But yeah, go have a
  546. Jake:play with it. See what you think. Yeah, you've already listened to 40 minutes of what we think
  547. Jake:about it. Go and make your own mind up. That's a lie to say about that. I think we should.
  548. Jake:That's, that's enough. Enough from us, I reckon. So yeah, all there is left to say is happy next
  549. Surma:Agreed.
  550. Surma:Happy next time!
  551. Jake:time. Bye.