• VonReposti@feddit.dk
    link
    fedilink
    English
    arrow-up
    6
    ·
    19 hours ago

    Completely agree, I forgot to mention that part. I am testing a few models ranging from 18b to 26b on my 7900xt. It is far from “make this complete system”, but it can handle some smaller tasks. I think that will be the end goal anyway since cloud models fail a lot at maintainability, security, and other higher levels of thought that goes into coding. They can make a convincing prototype but I wouldn’t hook it up to production.

    Local models are already functioning well as a force multiplier. It can help explain logic, do minor refactoring, debugging etc. but with a bit of latency. I do think this is where we’re headed since the frontier models required for generating a full prototype can’t make production quality code and it is prohibitively expensive to do so. As far as I’ve heard, they’re generally running spending ten times as much as they earn per token.

    • partofthevoice@lemmy.zip
      link
      fedilink
      English
      arrow-up
      2
      ·
      10 hours ago

      My guess is the next big thing to come out is, we can probably squeeze a lot more reliability out of smaller models. But their workflows, context, validations, etc will need to be very tightly optimized.

      I can see harnesses coming with their own highly specialized lightweight models in the future. Some for very efficiently converting a basic prompt into chain-of-thought steps. Some for very efficiently determining relevant parts of a repository. Some for… a lot of highly specialized stuff. Then the harness would orchestrate these under the hood, reducing the cognitive load placed on any larger generalized LLMs. Those “larger generalized LLMs” could be something like 12b parameters.

      Hopefully, soon after, we can start benchmarking how much different harnesses and augmentations improve baseline model performance. Ideally, in the long run, with a deeper understanding of how to tailor harness to workload and produce more procedural determinism. Then we can start configuring harnesses like data pipelines and run them through higher-level orchestration like Airflow too.