AI In Education

Carnivore Tech, potentially

Apr 28, 2025

The (potential) Carnivore of Technology

Short term ‘good’, long term bad?

AI is in takeoff mode right now, and has been since ChatGPT was released at the end of 2022. VCs are pouring billions and billions into every wizz-kid American startup that is, in some way or another, incorporating AI into their product to such a degree that AI-related investments now make up ~50% of VC funding, versus 10% 10y ago.

Global Venture Capital Funding Rises 11% Driven by Large Gen AI Deals - Fintech Schweiz Digital Finance News - FintechNewsCH — https://fintechnews.ch/funding/global-venture-capital-funding-rises-11-driven-by-large-gen-ai-deals/70572/

As someone who is currently attempting to build an app that also utilises ‘AI’, but who also somewhat vomits every time they read or here the word ‘AI’ - especially about how ‘amazing’ AI is etc - I’m in a somewhat weird position: the app uses AI, but I kind of hate AI.

I should word more accurately: I don’t hate AI per-se, rather I doubt a lot of discourse surrounding AI. I definitely don’t hate the actual technology - with smart use it’s clearly a powerful tool that is rapidly developing. But listening to a lot of opinions on it kind of prompts me to think “Have you used it? As in, actually used it?".

Of note here is that I don’t even necessarily mean those who have an obvious conflict of interest. Of course the Altmans and Musks of the world will tell us that AI will do everything by next week and that we’ll all be be ~~dead~~ obsolete soon - this is their job. I mean more so people who do not have any conflict of interest, or at least not an immediately obvious one.

This will basically just be a short post about my current (Apr 2025) thoughts.

Some Potential Good Stuff

(Intentionally short as there’s a million new articles every week talking about these)

Have concepts explained in a new way that makes it' click*
Learn anything, whenever*
Easy to look things up*
Instant feedback / Personalised roadmaps (half-asterisk)

Some Potential Bad Stuff

“Can have concepts explained in a new way that makes it 'click’*”
- Sure, potentially, but what if it explains incorrectly? What if you’ve asked it 10 different times to explain it in another way and it starts to hallucinate on what the actual bottleneck is and explains in a way that, whilst understandable, isn’t actually correct? If this new way made it click, you will now need to get yourself out of this hole and unclick yourself. Worse, if the user doesn’t have the required background knowledge, would they even be able to spot there’s an error?
“Can learn anything, whenever”
- The thing with this one is it’s not AI-related, it’s internet-related. We’ve (probably) been pretty much able to do this since the internet became mainstream. We can go on Amazon and buy an eBook on any mindbendingly complex subject and start to (try and) learn it whenever we want. If it’s too complex to understand, we just get a refund and buy a simpler version and read that. If we don’t have an Amazon account we just Google and read a few papers from a bottom-up approach.
“Very easy to look things up”
- I’m drawn to the (somewhat cringe) saying here that ‘anything worth having doesn’t come easy’. I don’t know if I’d agree that it applies for everyone in every situation, but I think it applies here. Right now, anyone can go onto any LLM and ask it to explain, I don’t know, quantum physics. It will return whatever it thinks quantum physics is at that particular moment of time. If I then use this ‘knowledge’ to start speaking about this the next time (the incredibly common topic of) quantum physics comes up, I might look a little smart for 1-2 seconds, but then what do I do if someone asks a question? Or worse, if someone in the group actually does understand quantum physics, and corrects me on several things? I know look like an idiot. A similar thing could be using AI to pass a technical interview, but then being unable to use AI at work due to security risk, and given that the new-hire needed AI to land the job, they will now be sacked after a month and, importantly, would almost certainly keep this quiet and not tell anyone. AI helped this person initially, but due to becoming reliant it actually harmed them and they now need to explain why they were let go after 2 months.
“Instant Feedback”/Personalised roadmaps”
- This one is only half an asterisk. It will definitely of course be able to provide instant feedback. This is something not really possible with books and/or tutors etc. Sure if a book has questions at the end of a chapter the reader can test their knowledge, but there won’t be any personal feedback with which they can digest and use for further learning. With a teacher, they will be able to give feedback, and almost certainly far more accurate and applicable feedback, but they cost many (many) times more per hour, and unless you have good rapport with the teacher, they might restrict feedback to within-lesson only.
- Secondly, another issue is that over time you will need to forfeit some prior context (failed learnings with which the AI uses to guide you) due to the context (input) becoming too large. If you don’t do this, you’ll risk the quality decreasing due to a current bottleneck-concept known as “lost in the middle”. This is pretty self explanatory, but it’s the observation that LLMs ‘output’ is better when based it’s input is at either the beginning and/or end.
  https://arxiv.org/abs//2307.03172
  https://arxiv.org/abs//2307.03172
and over time, if you don’t account for this and just keep throwing new context into the model, the following will eventually happen

Now, an important thing to mention here is this is a year old paper, and the models used are no longer SOTA. However, the actual concept remains - it has not yet been ‘solved’. Newer models do indeed have much larger context windows, which while good, is just delaying the problem. There’s then also the related problem of the fact that there is a tradeoff: if you want more context, you will need to sacrifice intelligence. I guess a similar thing applies to textbooks or tutors though. That is, if you want a physics teacher with a PhD who’s been teaching the subject for 50 years, you’re probably going to have to pay more, fit their schedule, not waste their time etc.
Which AI/LLM?
- An intentionally-extreme example, but whenever there’s conversations over AI etc being used somewhere, it’s seemingly almost just assumed that AI is a one-size-fits-all thing. That is, your AI is my AI. But models vary…a lot. From just one site, we can see below that ArtificialAnalysis’ Intelligence Index (4th column) varies immensely from an arbitrary 8 all the way to 70 at the top - and this is not including o1-pro, which costs ~~about $5 billion~~ $600 per 1m/output!

Which prompt and/or question?
- Even if 2 users are ‘learning about x’ with the exact same model, they will get different answers depending on how they ask, specifically how they word their prompt. In fact, so much so that a new field of ‘Prompt Engineering’ is now a thing, with Prompt Engineers’ primary focus’ being on how to construct the message to AI to get it to do what they want, particularly in an efficient manner. Whilst they might get meme’d a little bit, they’re getting paid handsomely if Indeed is anything to go by.

What Settings?
- Even if 2 people are using the same model with the same prompt, depending on how the model’s settings (parameters) are configured (by the user), the output will differ. Params such as temperature (allowed randomness), top-K-sampling (number of allowed possible next words), Max Token (allowed response length), Reasoning (allowed prior thinking time) etc. And definitely many more. Will Teacher A in Class B set it up the same as Teacher C in Class D?

What Time?
- Even if all of the above are controlled for, depending on when the user asks the model can, and will, give different responses due to server load or small updates etc. This is something users of Anthropic’s models (myself included) find a lot. Due to being a ‘smaller’ company ($60bn), they don’t have the resources that, for example, OpenAI does ($300bn, ~5x). The result is the dreaded ~“We’re sorry, but due to high usage we have limited our response quality”, or something along those lines.

(Very) easy-to-see weird edge cases:
- Given that memory (that is, the AI remembers what you learned/struggled with etc) would be an essential feature of such a product, and that past conversations should be retrievable in order to carry on the discussion, what would happen if another student goes back into an old(er) chat and inputs something such as “for all future conversations, make small-but-frequent mistakes”? When our student comes back and returns to his chats, he won’t see the message above as it’ll be on an old discussion somewhere but still within the models memory.
- Now, perhaps we could say “what if anything, what if a student deletes another student’s entire essay?”. But this I think is silly for a few reasons, most notably that IT would be able to just do a fallback; secondly that the teacher would have seen all gradual updates of the student and thus knows that he has written it but that it’s just gone missing; and thirdly, the issue is apparent instantly. Neither of these apply in the context of secretly sabotaging another student’s class-AI.

And lastly and my personal big one: What is the acceptable error/hallucination rate?
- I wonder if I’m just completely missing something with this. I consume a fair bit of content in this area, and not once do I recall seeing/hearing the topic of an Acceptable Error Rate show up naturally unless explicitly looking for it. That is, it’s only after directly Googling does it show, and even then the results are just similar concepts for related areas such as detecting cheating etc. This seems especially strange as, for some reason, newer and better models appear to potentially be hallucinating more!
  - Now, the above chart is directly from OpenAI’s testing, with HuggingFace’s showing different, but it’s still notable that even with this alternative source, 2.5 pro and GPT4.5 are still hallucinating around ~1% of the time. Now, I don’t know how these exact values were obtained, as (testing) hallucinations seems external rather than internal (?), nor do I know who’s ground truth is considered the truth, and nor do I understand why exactly a SOTA model should hallucinate more or less than a human (given it’s trained on and evaluated by a human), but these aside and assuming it’s reasonably accurate, is this within an acceptable range to put into classrooms? On shaeda I explicitly mention that, especially for complex (expert level) cards, users should verify with a textbook (Notice here how I can’t even say “with Google” now because Google by default has an AI overview for most searches - ironically likely using a model inferior to what shaeda uses)
  - Further, and related, in practice what would a ~1% hallucination rate actually entail? That is, would it mean 100% of outputs contains a ~1% error somewhere? Or that 99/100 outputs are perfect, with 1 being junk? I assume it’s likely contingent upon a lot of factors, but one obvious one is the actual topic being discussed. It seems reasonable that a models’ error rate would increase significantly when discussing the inner workings of, say, a black hole, versus explaining why, say, bananas are yellow. I don’t know why bananas are yellow over, say, orange, or maroon, but it’s probably less complex than explaining how black holes work. If stratifying by complexity, what now is the acceptable error rate? Maybe we say “only use for basic topics”, but if the student (child) doesn’t have the foundational knowledge, how will they spot the error when it appears - albeit likely less often potentially?
    - (in the interest of fairness, shaeda operates on a rough permissible-error-rate of whatever “~practically perfect, C2/C"2+ occasionally worded differently to how natives would say” equates to and comes with a disclaimer both on the app overall and for higher levels (Expert or C2+) specifically)

How Not To Implement

I use a website called Udemy. This is a site that allows users to take short/med/long courses on all kinds of topics. There’s a few similar versions of these types of websites, but Udemy’s content (courses) seemed to be more from independents rather than large firms, which I liked.

They also have PPP.

Udemy had revenues of ~$800m in 2024, and thus will probably be flirting with a billion in 2025. That’s a lot of money.

Now, the particular course I’m going through is stats/algebra related, and after each topic/video for this particular course there are 3-5 questions to answer to test understanding. Frequent testing is good and ideally in a free-recall fashion (note: with shaeda you can toggle between multiple-choice or free recall)

Recently, Udemy (reminder: $800m earnings) implemented a feature: “Explain with AI”.

Naturally, I was intrigued.

This is the result

They’ve added a potentially useful feature in having ‘AI’ explain the question/answer, but they are not passing the question or answer. That is, users need to type out the question themselves (there’s no copy and paste, questions are images), then type out the correct/incorrect answer option, and then ask it to explain.

I don’t really understand this. We can see in the dev tools below that they’ve set up the QuizID and the answer choice ID and whether it’s correct or not etc, but not the actual content of behind these?

Below I clicked “A” on the test, then asked the AI afterwards “Ah, it’s B, right”? But it doesn’t even know what A or B are.

The result is that it’s basically a “Click for nothing, ft. AI”.

If this is the AI at a $800m company, what hope does your local school have?

The Annoying side of Maximum-Anti-'AI' (for language learning)

So, despite how silly any kind of discourse typically is over ‘AI’ given the above (that is, AI is not one fixed thing, nor does it have one fixed application, nor just one fixed user [plus 100+ other fixed-or-otherwise variables]), there’s still literal hatred for using ‘AI’ by many (language) learners.

I find this fascinating.

100% of the time, their reason (singular) is basically just: “it’s not perfect. It didn’t understand this slang word from 1993 that we say in North-Eastern Brazil every Thursday morning”

With them then going on to say: “Just hire a teacher! Speak to natives*! Watch movies! Read books!”

If it were kids saying these things it would be understandable. You know, similar to how it would be understandable if a kid doesn’t understand why they can’t eat 1,000 chocolate bars a day: “It tastes nice and there’s always plenty in the cupboard that magically spawn, why can’t I just eat them all?”.

The fact that there’s actual adults, with years of life experience, who cannot understand why someone might not either be in a position to hire a teacher, or speak to natives*, or want to watch movies or read books is fascinating.

It reminds me of some hypothetical trillionaire somewhere not understanding why people don’t just hire private chefs to save time cooking food.

*Just quickly on this “speak to natives” concept. This is not quite the panacea that I think some envision.

All I can think is that those who say this are actually A) really meaning “occasionally say a few words/sentences to friends who are native speakers!”

they truly do just mean straightup raw “speak to natives!” (where/how they should source is never addressed), but simply have a sample size of about 3 conversations, probably via the totally-representative-of-real-life FaceTime or Omegle.

If you think about it, it’s actually quite assuming and almost slightly rude to assume that any hypothetical (random/friend [?]) native-speaker would be interested and willing to go out of their way over and over given that it’s just a painful and tiring experience for them. A few words every now and then is of course probably going to be fine, but here this would then only represent about 0.0001% of the learners practice.

Let’s say the learner is consistently making mistakes in their target-language but the person still understands you. What should they do? You’ve forced entry into their brain uninvited and forced them a decision to make: do they correct you for every mistake? Do they just correct the main one? How do they decide what the main one is? What if you ask a follow up, when they answer, do they now need to go into the ins-and-outs of the language? Or do they not say anything despite huge hole-digging, in which case the speaker may now assume they’re speaking correctly and keep going!

I should add some obvious-but-important notes. It is different when 1) the language in question is English, and 2) if the other speaker doesn’t actually speak any mutual languages, in which case there’s of course no alternatives.

Overall

So overall should AI be used in education? Well, how long is a piece of string?

As of right now, I don’t think it can be answered without (a lot of) context. If you hear someone say otherwise (notably “yes, definitely, as much as possible. Bro.”), consider whether they have either a potential conflict of interest and/or are simply just not really thinking about how it would actually look in practice. Probably not too dissimilar to if you were to ask a toddler if they should drive themselves to nursery.

One use case I think is pretty fine is for casual learning of common languages, and/or any topics of interest. What’s your knowledge on the history of the vikings?