S&S #9: To Code or not to Code

A discussion on Pythagora and the AI Software Development space

Hey Everyone, 

Welcome back. Happy Thursday. Hope everyone had a great long weekend. 

As an update, we’ve grown our network of subscribers to nearly 100 on Substack and over 400 on LinkedIn. To each and every one of you, thank you so much. There’s a lot to learn from the developments in early-stage tech, and I’m happy I can give you all views into what’s going on that you may not find elsewhere without some digging. 

Please consider sharing with your friends if you know anyone else who would value these insights and deep dives. 

If you’re new here, do me a favor and click the button below to follow along with us. 

This week, we’ll be looking at the growing theme of AI-run software development. Let’s get into it. 

Weekly Recap

Cash Cows

Between May 13th and May 19th, 74 companies in North America or Europe pulled in some fresh Seed capital. Of those, a few of the larger rounds include: 

Another Robot Company...

Bot.co is setting a course towards supplying consumers with robots that “give you time back.” Their tagline? “Everyone is busy. Bots can help.”

Led by a team including the former founder and CEO of Cruise Automation and Twitch (autonomous car co. acquired by GM for $1B, live-streaming site that gave us Sketch), they must have a pretty solid pitch, as they pulled in $150M in seed funding from Quiet Capital and several angels on May 13th. 

A Biotech startup focused on developing advanced immunotherapies to treat cancer. Founded by experts in the field, including CEO Frédéric Caroff, the company aims to revolutionize cancer treatment by enhancing the efficacy of existing therapies through its ONCO-Boost platform. 

This platform utilizes a TLR4 agonist to stimulate the immune system, effectively turning "cold" tumors into "hot" tumors and making them more responsive to treatment. They raised EUR 10.3M on May 16, 2024, in a combination of equity and debt, with the equity funding led by Elaia Partners. 

Eto is developing a platform and tools for managing and analyzing computer vision data. Their primary product, Lance, is an open-source, blazing-fast data store designed for handling large-scale unstructured datasets such as images, videos, and sensor data. 

Eto enables users to ingest data from various sources, query and analyze it quickly using SQL and Python, and manage datasets with features like versioning and schema evolution. The platform is integrated with popular data science tools and aims to streamline the workflow for AI practitioners, allowing them to focus on model development and innovation rather than data wrangling. They raised $11M in a deal led by CRV, with participants including Y Combinator, Rogue Capital, Soma Capital, and Wayfinder Ventures. 

Leya 

An AI-powered legal assistant platform designed to transform legal work by leveraging well-cited public legal sources and proprietary firm data. It offers features like accurate legal queries, collaborative tools, and instant drafting capabilities, aiming to enhance productivity and legal outcomes. 

Leya integrates with existing data management systems and prioritizes security and privacy, being GDPR compliant and ISO 27001 certified. They raised $10.5M in a deal led by Benchmark, with participants including Y Combinator, SV Angel, and Hummingbird Ventures. 

Other Innovators

Based in Chicago, IL, Bedrock Materials is developing sodium-ion battery materials to reduce the cost of electric vehicles by replacing rare earth minerals as the primary critical materials for EV batteries. Sodium-ion batteries rely on abundant, less expensive materials, including manganese, iron, sodium, and aluminum.  Though these compounds result in a less energy-dense battery, innovations in cathode materials are beginning to make it look like a more competitive race. 

Active Surfaces has developed a flexible, ultra-light, and ultra-thin solar module. Their technology effectively allows any surface to become a source for energy generation while decreasing solar costs associated with intensive installation labor. 

7Analytics is a Norwegian tech company specializing in developing high-precision risk assessment tools that leverage hydrology, geology, and data science. Their platform offers solutions like FloodCube Realtime and FloodCube Planning, which provide real-time flood warnings and sustainable stormwater management planning, respectively. 

Seed of the Week

Alright, let’s get into it, shall we? 

Listen to enough talking heads within the AI space, and you’ll hear arguments that as AI advances, we’ll reach a point where AI becomes sophisticated enough to create better versions of itself, effectively coding its own replacement. This process is commonly known as "recursive self-improvement".

When discussing the feasibility and timeline of recursive AI, there are two main perspectives. One camp argues that narrow AI (AI designed for specific tasks) could achieve recursive self-improvement before the development of General AI. The other camp maintains that a system capable of true recursive self-improvement would inherently qualify as General AI due to the comprehensive understanding and modification capabilities required.

In simpler terms, to reach recursive AI, a few things are required. 

  1. Awareness: AI must be able to comprehend the context and intent behind the code and recognize issues and inefficiencies based on this knowledge

  2. Code Modification: AI must be able to modify, edit, and improve existing code perfectly without oversight.

  3. Code Generation: AI must write new code that is better than the original.

As with anything, these steps will likely come in phases rather than achieved all at once. We’re already seeing evidence of this. 

Naturally, the first step is to create AI systems capable of coding with little to no human oversight. Recently, the tech ecosystem has seen some developments within this space.

Welcome to today's discussion.  Let's take a quick look at where we're at.

First Steps: GPT 

The first iteration of a code-proficient AI came with the release of Open AI's GPT. If you’ve played around with it enough, you know you can ask it to write snippets of code to serve specific purposes. Lots of times, this code is pretty close to being good. Other times, it’s next to worthless. This can be attributed to a few things: 

  • Limited Context Understanding: While GPT models can generate code based on prompts, they often lack a deep understanding of the broader context, business logic, and requirements of the application being developed. This can lead to code that may work but fails to address the actual problem or integrate properly with the existing codebase.

  • Lack of Maintainability: GPT models struggle with modifying or maintaining existing code. They are better suited for generating new code snippets or examples, but cannot effectively update or refactor existing codebases.

  • Inconsistency and Randomness: Small changes in prompts can lead to significantly different code outputs from GPT models, introducing inconsistencies and randomness that can hinder development and testing efforts.

  • Coding Style and Preferences: The code generated by GPT models may not always align with the coding style, conventions, or preferences of the development team, leading to readability and maintenance challenges.

  • Limited Context Length: Some GPT models, like GPT-4 Turbo, have limitations in the context length they can effectively process, potentially leading to incomplete or inaccurate code generation for larger codebases or complex projects.

Better Integrations: GitHub CoPilot

GitHub Copilot is an AI-powered coding assistant developed by GitHub and OpenAI. It uses machine learning models trained on a vast corpus of publicly available code to provide intelligent code suggestions and completions as developers type. Key features include: 

  • Code Completion: Copilot suggests code completions as developers type, sometimes completing the current line or suggesting entire blocks of code.

  • Natural Language Understanding: Developers can describe their coding intent in natural language, and Copilot will generate code suggestions accordingly.

  • Chat Interface: Copilot offers a chat interface where developers can ask questions about their code, seek explanations, or request assistance with debugging or security remediation.

  • Pull Request Summaries: Copilot can analyze code changes in a pull request and generate concise summaries describing the modifications.

  • Knowledge Bases: Copilot Enterprise allows organizations to create custom knowledge bases from their documentation, which Copilot can use as context for providing more tailored suggestions.

While Copilot is more reliable than previous alternatives, it shares limitations with GPT, making it challenging to build full-scale applications without significant human oversight.

This brings us to our seed of the week.

Enter Pythagora 

One of YC’s most recent batch companies, Pythagora is a Czech AI company that has built out an AI coding partner enabling developers to write full applications through chat-based natural language. 

The early stages of what would become Pythagora were outlined in a Reddit post by CEO and Founder Zbonimir Sblijic nine months ago.

If you want, read the full post and the blog linked within it. Together, they provide a deep and comprehensive outline of why GPT Pilot was created and how it works. If you'd rather stay here, I’ve summarized below. 

GPT Pilot - Pillars 

GPT 4 is alright at writing code. It may even be able to write most of it. But, to create complete, running applications, we need a tool that can do most of the heavy lifting while allowing devs to iterate and fill in the gaps. 

Here are the areas in which developers can intervene in the development process:

GPT Pilot will make mistakes. To make it easier to debug and to allow the developer to see where mistakes have been made, the AI should produce code snippet by snippet, and task by task, rather than generating the entire codebase all at once (which is a status quo for GPT and other AI software developers atm). 

It should be able to create a small app the same way it should create a big, production-ready app. There should be mechanisms that enable AI to debug any issue and get requirements for new features so it can continue working on an already-developed app.

GPT Pilot should follow the same protocols to create a small application as it does to create a larger, production-ready application. This requires mechanisms that enable AI to debug already written code and adapt to new requirements for more features to be integrated into existing code. 

These mechanisms include context rewinding, recursive conversations, and test-driven development. For a deeper look into what each of these means, reference this blog post

GPT Pilot - How it Works 

GPTP is a chat-based interface that allows human developers to describe the applications they intend to build. The AI will then follow a series of tasks, outlined above, leveraging several programmed AI Agents to address different steps within the larger process. 

From Zbonomir’s blog post outlining GPTP: 

  1. First, you enter a description of an app you want to build. Then, GPT Pilot works with an LLM (currently GPT-4) to clarify the app requirements, and finally, it writes the code. It uses many AI Agents that mimic the workflow of a development agency.

  2. After you describe the app, the Product Owner Agent breaks down the business specifications and asks you questions to clear up any unclear areas.

  3. Then, the Software Architect Agent breaks down the technical requirements and lists the technologies that will be used to build the app.

  4. Then, the DevOps Agent sets up the environment on the machine based on the architecture.

  5. Then, the Tech Team Lead Agent breaks down the app development process into development tasks where each task needs to have:

    1. Description of the task (this is the main description upon which the Developer agent will later create code)

    2. Description of automated tests that will need to be written so that GPT Pilot can follow TDD principles

    3. Description for human verification, which is basically how you, the human developer, can check if the task was successfully implemented

  6. Finally, the Developer and the Code Monkey Agents take tasks one by one and start coding the app. The Developer breaks down each task into smaller steps, which are lower-level technical requirements that might not need to be reviewed by a human or tested with an automated test (eg. install some package).

Current State and Future Outlook 

GPT Pilot is an open-sourced project and has been released for use on GitHub. 

While Pythagora is in its earliest stages as a company, having only publicly released one case study from a major client, it seems that GPT Pilot has already been leveraged as a useful dev tool based on the 20,000+ starts it has on GitHub. You can find some of the apps built using GPTP here

Time will tell how successful Pythagora and GPTP are at building out more complex and sophisticated applications, but the team’s approach can be looked at as a significant step forward in AI-based code development. 

Even when considering Pythagora’s unique approach and early traction, it must be noted that they face some uphill battles when vying for heavy adoption. 

Competition

Even though this is a brand-new space, there’s an argument to be made that there’s already a clear frontrunner. Devin AI, a company founded a little over six months ago, raised $175M in its first round of funding and already has a $2B valuation. Oh, it’s also founded by a human calculator. Look at this sh*t. 

With that said, there are reports and suggestions that Devin AI may not be as powerful as it’s made out to be. A brilliant writer and fellow tech junkie Devansh covers this in depth here

Negative Ramifications, Capability Limits, and More 

Both Devin AI and Pythagora draw concerns from the market for several reasons that may hinder future success and adoption. Devansh has also written a couple of good articles covering those. You can find it here, but a couple of the main points can be found below (thank you, Devansh, for the structure). 

  1. Environmental Concerns 

The LLMs both Devin AI and Pythagora use take a ridiculous amount of energy to run, releasing hundreds of tons of CO2 in the process. This is absolutely not sustainable when considering long term, high-scale adoption and usage. 

  1. Data sources, plagiarism, and more 

CoPilot, GPT, and other LLMs leveraged for coding are trained on code created by humans. This code, oftentimes, requires an incredible amount of skill, time, and creativity to create. There are arguments to be made, similar to film, music, etc., that AI is just stealing this work and repackaging it. 

Devs are none-too-pleased. 

Here’s some pretty damning evidence of that relating to GPT (unconfirmed), and there’s much more to be found through a simple google search. 

A software developer in South America who completed a five-hour unpaid coding test for OpenAI told Semafor he was asked to tackle a series of two-part assignments. First, he was given a coding problem and asked to explain in written English how he would approach it. Then, the developer was asked to provide a solution. If he found a bug, OpenAI told him to detail what the problem was and how it should be corrected, instead of simply fixing it. (Semafor)

Wrapping Up

Time will tell how long it takes us to (or if we) get to a point where we no longer need to teach people to code, relying on AI to perform all of our software development. For now, though, I like Pythagora’s approach. It’s not making as grandiose claims as Devin, and it’s marketing itself as a productivity booster for devs, not an AI dev.

We’ll see how it shakes out, but personally, I think this model can be done better quicker with the right resources.

Thanks for reading everyone. If you’d like to show your support, please consider upgrading to paid and sharing with your friends. This is a free to access newsletter and I plan on keeping it that way, but as with everything, tips are appreciated, and paid subscribers can expect a couple extra articles per month free subscribers do not have access to.

Cheers.

Reply

or to participate.