S&S #7: Work, Productivity, and Communication with Robots

A discussion on Autotab, Adept AI, Brain.ai, and the seeds of the week

Hey Everybody, 

Happy Thursday, and welcome back. Apologies for the brief hiatus in posts —  things have been busy. With that said, in the background, I’ve been working to continue to make this a better product.

An update: Somehow, I’ve been able to convince 84 people to subscribe to this thing. To each of you: I appreciate you greatly. Thanks for giving me a space to spit my thoughts without them falling into an empty abyss. 

If you’re not yet part of the 84-strong tech nerd mafia, please consider tapping the button below. 

Seeds of the Week

Cash Cows

From February 19th to February 25th, 77 companies raised a Seed round of funding in North America or Europe. Of these, 68 disclosed their round size. Here are the cash cows of that batch: 

  • Bioptimus: Bioptimis is building what they call “the first universal AI foundation model for biology to fuel breakthrough discoveries and accelerate innovations in biomedicine and beyond”. The company raised $35M in a deal led by Sofinnova Partners on Febuary 19th. 

  • Dub: Dub is a trading platform that allows you to copy and execute the exact trades of other investors. In their own words, they’re “bridging the gap between retail and professional investors”. For this, they raised $17M in a combination of equity and debt on February 22nd, with the $15M equity portion being led by Tusk Venture Partners. Word is still out on whether or not Nancy will be one of the investors you can copy…

  • Insamo Bio: Insamo has developed a drug discovery platform designed to automate the design, synthesizing, and testing of new drugs at scale. This approach enables them to “iterate our design cycle using trillions of proprietary experimental data points across an astronomical drug-like chemical space which we believe sets an entirely new standard for the application of scalable machine learning to drug design”. They raised $12M from MRL Ventures fund on February 21st. 

  • IsomAb: IsomAb is developing isoform-specific antibodies to treat Type II diabetes and peripheral vascular disease. They raised GBP 7.5M of funding in a deal led by Broadview Ventures on February 20th. 

  • OpenBorder: OpenBorder has developed a platform to make it easier for online businesses to sell their products internationally. Services include shipping, tax and product compliance, importer and merchant of record services, inventory management, local payment methods, global inventory visibility, and assistance with selling on global marketplaces. They raised $10M in a deal led by Peak XV Partners on February 22nd. 

Other Innovators

Xito is the operator of a vendor-neutral marketplace intended to offer automation through robotics. Their platform allows users to be taken step by step through selecting and implementing the right robotic system for their specific use case. 

Why it matters 

Xito is essentially democratizing access to robotics and automation for non-technical SMB leaders. The Xito team is partnered with various robotics manufacturers and offers a platform to simplify the complex process of programming and deployment, which are often barriers to theimplementation of automated solutions. 

Beyond addressing labor shortages and giving SMBs new pathways to cost-efficiency, what Xito is doing is a unique example of a business model that is capitalizing on the growing demand for robotics solutions without being a hardware manufacturer or software developer. 

NodeShift is developing a platform designed to offer affordable access to decentralized cloud computing services. Through NodeShift, users can rent cloud storage, GPUs, and computing power for over 70% less than large competitors like Google Cloud, AWS, or Azure. 

Why it Matters 

Similar to Xito, NodeShift’s business model relies on making it easier for users to access a rapidly growing and important industry. They’re empowering developers by simplifying the process of building business applications in the cloud while significantly reducing the costs to do so. Leveraging advancements in applied cryptography and distributed computing, developers can build applications with enhanced security measures and cost-efficiency. This streamlined approach not only enhances the development process but also makes cloud solutions more accessible and affordable for businesses of all sizes

Digital Insight Games is a gaming studio led by former employees of Activision, EA, Take-Two, Tencent, and Ubisoft, with a mission to create games that are not only fun to play, but implement in-game economies that allow for ownership of in-game assets and monetization of gameplay. 

Why it Matters 

DIG’s games will focus on fun, high-graphic, immersive experiences that ultimately allow gamers to benefit from their time spent in the virtual worlds they participate in. By leveraging Web3, blockchain technology, and NFTs, the company is pioneering in the emerging Games-as-a-service economy, and hopes to add new value and opportunities for gamers in the free-to-play market. 

Connected is a space tech startup developing the means to provide easily accessible internet connectivity to everyone on earth - regardless of location. Their primary focus is on increasing the ability for users to deploy IoT devices in remote areas. 

Why it Matters 

80% of the world has no cellular coverage, and 450M people worldwide have no access to mobile connectivity. 

It’s estimated that 21M IoT devices will be connected via satellite by 2026, and Connected is capitalizing on this by offering innovative connectivity solutions aiming to enhance communication and interaction between devices, paving the way for improved efficiency, automation, and data exchange in diverse settings.

CALL BACKKKKKKKKS

This batch has two more companies operating in a space where we’ve seen a lot of activity: decarbonization. 

Kvasir Technologies, who raised EUR 5.15M from VAR Ventures of February 22nd, should look familiar to those of you who read S&S #3: Clearing the Air on the Future of Travel. As a reminder: Aether Fuels and Metafuels are converting carbon-based feedstocks to jet fuel. Kvasir is taking a similar approach, and have developed a solution that converts plant biomass into a 1:1 substitute for fossil fuel. 

Cyclize, on the other hand, is related, but not exactly a competitor of Kvasir, Aether, or Metafuels. They’ve developed a plasma-based technology that allows waste plastics to be reformed and recycled into new intermediate chemicals, which ultimately can be used to produce new products. In turn, both carbon emissions as well as plastic waste pollution can be reduced. They raised $5.1M from UVC Partners on February 21st.

Time for the good stuff. 

On February 22nd, Autotab, a company that participated in Y Combinator’s Summer 2023 batch, raised $1M from a single undisclosed investor. 

Nondescript. Subtle. Quiet. 

But, I want to talk about them. And while doing so, I want us all to consider a few things about the future of AI. 

Many questions are being asked right now about what the swift progression of AI will mean for the human workforce. 

Rightfully so, a lot of these questions surround the proposition of AI replacing and displacing workers. Mckinsey projects that nearly 800M jobs could be replaced by automation globally by 2030. As a poignant, recent, real-world example of this, Klarna, a Swedish consumer Buy Now Pay Later (BNPL) company, has begun implementing an AI tool built with GPT that can do the work of 700 human employees. 

Scary. Kind of. 

Here’s the other side of the equation. 

AI is also projected to create 97M jobs. This, in and of itself, is great. But what’s more, is that ultimately, a best-case scenario AI will become the human worker’s greatest tool and asset.

(We can save the discussion of what the worst-case scenario is for other writers)

The number of AI tools that are emerging from the digital primordial soup that are built to assist humans, or ensure that nearly everything they interact with on a day-to-day basis is optimized to their preferences and behaviors, is absolutely staggering.

Again, we can talk about what too much personalization might mean, or lead to, but I want to keep this focused. 

Productivity. 

McKinsey (yes, the same ones that estimated 800M jobs would disappear) estimates that for those who do not lose their job, AI could produce a labor productivity growth of 0.1 to 0.6% annually. This translates to around 3+ Trillion dollars in value to the global economy.

Much of this productivity estimate is largely based on the promise that the everyday worker may soon be able to do away with repetitive, easily automated parts of their job. The ramifications of this are aplenty, but the most intriguing for me is that it will allow us to focus on the parts of our job that aren’t yet easily replicable. Creativity, strategy, innovation. Finding new ways to do new things, and having the time to do so because I don’t have to spend the first 3 hours of my day crawling through emails trying to decide which are the most pressing to address. 

I can write a thousand real-world use cases I’ve seen being addressed by emerging AI startups, but I’m going to opt to give you a true visual. 

Head on over to Autotab and watch their demo. See you in a couple of minutes. 

Seriously, if you haven’t already, go watch it. It’s important. I’m not going to explain what happens in it. 

Let’s talk about it. 

Autotab

Alright, so, why does that matter? What about that is special? 

Most AI tools up until last year relied on pre-built logic. Models are taught to do specific things very, very well. Things they’ve been trained to do using millions of examples. 

This works great, but it’s limited. 

Autotab is one of the first examples of a truly customizable AI. A platform that, by seeing it once, can replicate in-browser actions you take. An AI tool that you can teach to do anything, as long as you understand it well enough to do it once. 

Still, though, there are intuitive limits to this. From what I can tell, Autotab won’t take any actions on your behalf that you don’t tell it to perform*. You show it what to do, and it will do it. Information on the exact mechanics is limited, and I’m not sure what would happen if there is an unforeseen variable in that flow that interferes with the steps laid out. 

Yes, you get to choose the pre-built flows, but you’re still pre-building the flows, are you not? The only difference is you can control it to a tee. 

And what about optimization? What if the way I want to do things takes two steps more than a fully optimized version of the task?

*I don’t want to make any guesses, and the information is limited, so I want everyone to know this point is being made primarily to transition us into the next talking point. Inference, suggestions for optimization, and other higher-level advances are likely in Autotab’s development pipeline, or at least being thought about. (Not that this is an easy feat)

Autotab is going to automate away all the parts of our jobs that we know how to do really well, but don’t want to spend hours doing over and over and over again. 

With that in mind, let’s take a look at a company tackling the same issue, but differently. 

Adept AI 

Now, Adept is not one of our Seed companies this week. In fact, they’ve raised $414M, and have a post-money valuation of $1B. They’re what we here in the Nerd Nest call a Unicorn. 

But, as we look at Autotab, they’re worth examining as a comparison.

Adept is building a general intelligence that will enable humans and computers to work together in near symbiosis. Their universal AI assistant is currently being trained to automate any software process based on a user’s natural language prompt. 

Act-1

No, this isn’t another “PJ takes you through history” section. I understand that seems to be my favorite thing to do, but it’s not necessary here. 

Act-1 is Adept’s proprietary model, centered and built around two tenants: 

  1. The clearest framing of general intelligence is a system that can do anything a human can do in front of a computer. A foundation model for actions, trained to use every software tool, API, and webapp that exists, is a practical path to this ambitious goal, and ACT-1 is the first step in this direction.

  2. The next era of computing will be defined by natural language interfaces that allow us to tell our computers what we want directly, rather than doing it by hand.

Act-1’s initial development stage was as a chrome extension, enabling the model to observe what a user was doing in a browser and take certain actions, like clicking, typing, and scrolling through webpages. 

This allowed users to, for example, type “log a call with X saying that he’s thinking about buying Y number of units this coming quarter”, and Act-1 would take what is normally a 10-step process in Salesforce and perform it instantaneously. 

Adept is also able to live search the internet, finding answers to things it may not know in real-time, and reproducing that as answers to user prompts.

Similar to other AI tools, Act-1 can also take feedback. If it does something wrong, you can tell it, and it corrects it in both the present and in future related tasks. 

Act-1 is built on a proprietary architecture called Fuyu-8B.

Some characteristics of Fuyu-8B are as follows: 

  1. Has a much simpler architecture and training procedure than other multi-modal models (Multi-modal = able to process and understand multiple different data types, like text and images), which makes it easier to understand, scale, and deploy.

  2. Designed from the ground up for digital agents, it can support arbitrary image resolutions, answer questions about graphs and diagrams, answer UI-based questions, and do fine-grained localization on on-screen images.

  3. Is fast - can get responses for large images in less than 100 milliseconds.

  4. Performs well at standard image understanding benchmarks such as visual question-answering and natural-image-captioning.

This is great. But what actually makes that architecture different?

To truly be a generally intelligent copilot, an AI model must be able to both understand user context (by way of natural language processing) and take actions on behalf of users (which relies on image recognition and understanding). 

Think about it. Everything you interact with on a website, software, platform, etc. relies on you ingesting the pixels you see on a screen and assigning meaning to them. 

When you look at the nice rounded button below, you know if you click it it means you’re going to share this edition of Seeds and Speculations to everyone you know. on all the platforms you’re on.

In order to interact with software in the same way you or I do, an AI Copilot must be able to examine and assign the same meaning to images on the screen that we do. 

A few issues with classical multi-modal models exist here that make that very difficult:

  • ArchitectureMost other multimodal models involve a separate image encoder, the output of which is then connected to an existing LLM. An image encoder is a component in the model that processes visual inputs (images), and translates them into a vector format that the algorithm can understand and manipulate. These encoders are then linked back to the LLM through either cross-attention* or adapters*. *What these mean exactly is not important to the coming point, so don’t worry if you don’t know.In addition, these models typically can only process images of a certain fixed resolution, meaning images that do not meet that criteria must be altered prior to processing. 

  • Training Most other multimodal models have several separate and intensive training stages for different data types (image vs. text). This makes sense — image decoders need to learn from images, and their attached LLMs need to learn from text. After these initial training stages, the image encoder and LLM are trained together to learn to integrate and process information across both image and text together.This, ultimately, becomes difficult to scale. Decisions must be made on which modality to budget computational power for, and how best to decide on the weighting of parameters at certain stages of training. 

The architecture Act-1 is built on is a decoder-only transformer, and processes images completely different from standard models. Images, essentially, are treated and processed the same as text, as pieces of each image are broken down and processed in the same manner that pieces of text are in a sentence. This allows Act-1 to support arbitrary image solutions, thereby allowing them to understand and interact with any image (button, webpage, banner, etc.) that it comes across. 

I’m leaving out a significant amount of detail, examples, and color here, but this is the main idea of what Adept is building. 

Fuyu Heavy 

In January of this year, Adept released a presser announcing Fuyu-Heavy, their latest multimodal model. 

According to the Adept team, Fuyu Heavy is “a new multimodal model designed specifically for digital agents. Fuyu-Heavy is the world’s third-most-capable multimodal model, behind only GPT4-V and Gemini Ultra, which are 10-20 times bigger.”

Fuyu Heavy is, essentially, a souped-up version of Fuyu-8B, and the result of a multi step process

  • Act-1 connected agents to the digital world 

  • Adept built out robust procedures for training, evaluation, inference, and data collection through Act-1

  • Fuyu-8B was developed to be the foundational architecture of their future models

  • Fuyu-8B was scaled up and improved through further testing, additional computational power, and lots of reiteration, resulting in Fuyu-Heavy. 

I implore you to go take a look at the announcement blog post. It contains a ton of really impressive results from testing Fuyu Heavy’s reasoning and computational capacities. Here’s a taste: 

In the next example in the presser, Fuyu answered a question requiring it to use a Chi-Square test of independence. It’s pretty impressive stuff. 

It’s on this new architecture that Act-1 will attain the power to reach Adept’s ultimate goal of creating a generalized intelligence able to assist users across any software. 

There is no release date for an alpha version of the AI, but how they’re building, and the transparency that they’re providing on the steps they’re taking, suggest to me they’re being thoughtful and careful about their deployment, which should be respected. 

I’m looking at you Google…

Stepping Back 

So, Autotab or Adept? What’s more useful? 

Not an easy answer. Both are significant steps beyond what we’re used to AI being capable of, and both absolutely have their benefits. 

I think it’s important to consider that, right now, Autotab is the best available tool for this kind of automation, and Adept, regardless of how much time they’ve spent on the model, has not been released or publicly tested. 

But, assuming both tools are out and publicly available, what would you prefer? 

  1. A tool you can show exactly what to do, with the mutual understanding that it can also predict likely steps in the case it’s faced with an unforeseen variable 

  2. A tool that you can tell what to do, with the mutual understanding that the model can effectively interpret what you want the end result to be. 

As we think about this, I think it’s important to raise a question. 

Are we that good at communicating?

Is it a given that our language-based requests will always make sense? What happens if they don’t, or worse, get understood differently than we mean? Sentiment and meaning are complex, after all. 

For anyone that has used ChatGPT, it’s clear that sometimes your prompt leads GPT to a different result than you intended. This is by no fault of anyone, it’s just a communication and understanding error. 

With text based responses, risk is pretty low. With an AI that can control your computer, that risk grows exponentially. 

What guardrails are in place to mitigate that risk? If we take it a step further, what guardrails are in place to prevent malevolent actors from manipulating this all-powerful software controller to do harm? 

Just some things to think about. 

Brain.AI 

As a final real-world example of how this type of technology may permeate beyond our work, let’s quickly look at Brain.AI

At Mobile World Congress, Brain.AI introduced their Gen-AI-Based smartphone operating system in partnership with Deutsche Telekom. 

Created by the same team that pioneered in n-shot learning, a subcategory of AI that allows models to make predictions based on training sets of 1-100 examples (similar to how the human mind learns), Brain.AI has developed Natural AI, an interface that promises to do away with app-based human to smartphone interactions. 

Instead, Natural AI promises to respond to and learn from your behavior, pulling up the software that you need at that moment based on your actions, requests, and common flows. 

Essentially, you tell your phone what you want, and it does it, interacting with the relevant apps and software in the background. 

You can check out their intro video here: https://brain.ai/#/

As an intro: “Natural AI, our first consumer product, is the world's first generative interface. You no longer go to Apps, Apps come to you. Simply say what you need and the right app forms itself around your words - now fulfilling millions of requests each month.Natural clears away the clutter on your screen. It allows you to focus on what you want, not how to get there.” 

You can go download it for yourself now from the app store. 

I played around with it for a little bit earlier. It’s really cool, but it’s still pretty limited, and I was able to force some bugs rather quickly. 

Regardless of it’s current state, it’s a taste of what’s next in human-to-computer interaction, as are Autotab and Adept. 

Final Reflections 

These new tools are only the beginning. It’s hard to imagine what’s next, and even harder to predict what that will mean for the future of human navigation of the digital world. 

I presented some questions I believe to be relevant earlier, and I think that’s only a taste of what should be thought of and considered as these tools become more common. 

Putting all of this aside, though, there are clear and obvious improvements coming that will enable us to do way more with way less effort. For that, I’m pumped. 

Hopefully, these tools are released intelligently, with the relevant questions considered. 

Innovation is important, but I think maintaining safety and understanding the potential ramifications on human behavior, social interaction, and the ability to navigate this new world is an even more significant consideration. 

Hope you enjoyed reading. If you did, please share with your friends! 

See you next time. 

Reply

or to participate.