S&S #2: A New Wave in the Ocean of Neural Networks

Coverage of Liquid AI and other seeds from the week of December 3rd

Hey Friends 👋🏼,

Welcome to the second edition of Seeds and Speculations! This one is a doozy. Buckle up. 

Before we get into it, I want to take a moment to say thank you to the forty-three people who have decided to subscribe, and to give a special thanks to those of you who decided to upgrade to a paid subscription to show your support. I really appreciate it. 

A couple of weeks ago, I set a soft goal for S&S to have 1,000 subscribers in a year’s time. 

We’re 4% of the way there, baby. If you find this edition as exciting as I do, feel free to spread the word. Every share brings us closer to our goal of reaching 1,000 like-minded enthusiasts by the end of 2024.

Alright, enough of that. Let’s get into the important stuff. 

Our Companies of the Week 

74 companies in NA or EU raised seed funding between December 3rd and December 10th.

69 disclosed the round size, and five did not. Of those that did disclose the size of their raise, the cash cows are: 

  • Liquid AI: Our featured company of the week - an MIT spinoff focusing on developing a new generation of foundational AI models surpassing GPTs, intended to improve interpretability, efficiency, and adaptability in AI systems. They raised $37.6M from OSS Capital and PagsGroup on December 6th. 

  • RunPod: Developer of a distributed GPU cloud intended to develop, train, and scale AI applications. RunPod gives users over 50 templates to choose from, has global interoperability, unlimited storage, and allows users to deploy models in seconds. They raised $18.5M from undisclosed investors on December 4th. 

  • Atomic Industries: Focused on automating tool and die making in manufacturing using a proprietary AI stack, aiming to bridge the gap between the physical and digital worlds in manufacturing processes. They raised $17M from Narya, 8090 Industries, and Acequia Capital

  • Extropic AI: Developer of a full stack paradigm of physics based computing (not quantum, though), seeking to reimagine computation by integrating it more closely with the physical processes of the world. Specifics on what all of that actually means, or how they aim to do it, are sparse. Regardless, they raised $14M from Kindred Ventures on December 4th. Will be interesting to track this one over time for more clarity. 

  • Ability Biologics: BioTech company creating innovative, targeted immune-modulating biotherapeutics for cancer and autoimmune diseases. Their proprietary discovery engine, AbiLeap, combines AI with a database of antigen-antibody interactions to create selective antibody therapeutics. They raised $12M from Amplitude on December 6th. 

As always, I’d also like to highlight some of the smaller raisers that I think could be interesting. 

  • Lutra AI is working to simplify the integration of AI into various task-based workflows, creating copilots that function like specialized agents for common tasks, working across frequently used apps like Google Suite, YouTube, Zoom, Slack, and more. 

  • ASFin is an AI-based platform built to provide SMEs with much-needed, data-based financial planning and advice. The platform provides an economic and financial analysis engine for SMEs powered by proprietary algorithms capable of understanding the company through the study of financial documents, banking flows, payment flows, and company goals. From my experience working with a few large Financial Services companies, I know that the SME FinTech market is of significant interest, so I’m always excited to see a new AI Fin Management tool geared towards SMEs. 

  • Fabbric is developing a full service clothing design and manufacturing studio. Powered by an online platform that lets users design clothes end to end from over 600 garment options, Fabbric takes care of all manufacturing and distribution through partnerships with major manufacturers like Privalia, ASOS, Scalpers, and Maje. Right now, they’re targeting influencer and boutique brands to expedite merchandise production and sales. Hopefully, now we’ll all be able to get the D’Amelio sisters’ merch drops even quicker…

You can find the complete list of companies here.

I can hear it now…

PJ, don’t you think picking the company that just raised an almost $40M seed round with a post-money valuation of $303M is kind of a cherry pick?

Yes – and I don’t care. With the rapid emergence of new AI models like ChatGPT, it seems natural to give you all a closer look at one of the only companies trying to build an entirely new framework for a neural network; one which might perform better than transformer based frameworks (like GPT) in several high value use cases.

This could be really, really important.

Neural networks created using transformer architectures are a massive advancement in Artificial Intelligence. They do supremely cool stuff, like giving me this sick logo for my newsletter.

But! There are limitations. 

  1. They take a ton of computing power

  2. They’re often static once trained, meaning they can be challenging to use in applications that require adaptability to changes in input data.

  3. They're referred to as “black boxes” due to the difficulty in understanding and interpreting how they arrive at certain outputs or decisions.

  4. They require a substantial amount of energy to train

  5. They struggle to adapt to different types of tasks that aren’t based in content creation

Liquid Neural Networks address many of these concerns. 

But, before we get into how they do that, or what Liquid Neural Networks even are, I think the only way to do this properly is to walk you all through the evolution of neural networks, from the first Artificial Neural Networks to Recursive Neural Networks, to Long Short Term Memory Neural Networks, to Transformer Based Neural Networks (Like GPT), to (FINALLY) Liquid Neural Networks. 

Bear with me. I’m going to try to do this as quickly as possible while making it easy (ish) to understand. 

Artificial Neural Networks (ANNs)… The first iteration

In the beginning, God created the Heavens and the Earth. Fast forward Billions of years (or Thousands, depending on who you’re talking to), and humans thought that their brains were so cool and good at tasks, that they tried to make computational models based on them. 

Enter neural networks. 

At their core, neural networks are inspired by the way our neurons calculate and interpret incoming data. The inner workings are complex, but I’m going to keep it at the highest conceptual level. Neural networks are made up of three layers.

Each layer type can possess any number of layers, and any layer can possess any number of nodes (or neurons). 

Input Layer

The input layer is made up of the first set of neurons, responsible for ingesting the original data. 

Think of these nodes as a student, primed and ready to learn, who has just been given a chapter of a book to read, knowing there will be a test at the end. 

Hidden Layer 

The hidden layer of a neural network is made up of neurons responsible for digesting the input data, performing small calculations on it to decide which parts of that new data is most important to understanding the big picture, and passing it onto the next neuron in the sequence. Every neuron puts varying importance based weights on the pieces of information it digests, and passes along a modified version of that data to the next node to do the same.

If you care about the technical term, this is called the activation function.

Think of this step as our student combining and recombining facts pulled from the book chapter to try to understand it. Every time we get to a new node in the system, our student is taking new notes and highlighting important information in our chapter in an effort to understand it. 

Output Layer

After our data is passed along the hidden layer, the output layer gives a final answer. 

Training Neural Networks 

To train neural networks well, it takes vast amounts of data. This allows the neural networks to repeat this process, going from input to output, over and over again until we’re confident it understands. 

That said, this data needs to be diverse. If you’re training a Neural Network to identify animals based on a picture you give it, but only give it pictures of horses as training data, it’s going to think every picture you feed it is a horse. 

Generally speaking, deeper neural networks, those with more total neurons and layers, can handle more complex problems. With that said, deeper does not always mean better. Training, and the data used to train, is just as important — if not more important — than architecture.

But what does “training” even mean?

Loss Functions and Backpropagation 

During the training phase, a neural network “learns” from every new piece of data it’s fed.

For a neural network to “learn”, it uses loss functions and backpropagation. Loss functions measure how far the output is from the actual value, and propagates that back through the network, allowing the model to adjust the weight and biases of factors in order to get the final output closer to the true answer. 

Think of this as the student reviewing the cheat sheet, cross-referencing it with their practice test, and going back to understand where they made mistakes. 

Once the model has gone through this process enough, and has ingested enough training data, we can feed it data it’s never seen before and test its ability to produce the correct answer.

Limitations of ANNs 

  • Inability to Handle Sequential Data: ANNs process inputs independently, and lack any type of built in memory. This means they do not possess the capability to maintain information about previous inputs, making them unsuitable for tasks where the order and context of data points are important to finding the right answer. 

  • Limited Context: Similar to the above, because each input is processed independently, ANNs can’t capture dependencies or context of input sequences.

  • Fixed Input/Output Sizes: ANNs require fixed-size input and output vectors, limiting flexibility in dealing with varying lengths of data sequences. 

So, while ANNs are good for many things, another solution needed to be developed to handle any problem where the context of inputs and order of elements is crucial. 

Recurrent Neural Networks (RNNs) 

Recurrent neural networks are very similar to ANNs, but with an added function that results in “memory”. 

While ANNs technically remember things, they only remember what was learned from each individual input during training. 

RNNs, on the other hand, remember new things learned from each input while training, taking into consideration what was learned from previous inputs as they ingest the next inputs, and modifying their output based on that, forming a loop of sorts. 

In other words, a node’s output at one step becomes part of its input in the next step.

(Thanks to Trist’n Joseph for the diagram) 

This structure is decent for speech recognition, language translation, and time series forecasting, but there are still issues. They’re slow to train, and long input sequences lead to what’s called a vanishing gradient. 

The Vanishing Gradient Issue

Simply speaking, a neural network's gradient is the slope of its error function. 

What? 

Okay. Let’s say the gradient is a hill.

The top of the hill represents the greatest possible error, or the farthest away the neural network’s output can be from the actual answer.

The goal of training a neural network is to move down that slope towards sea level, where the error is the lowest. 

Still with me?

The vanishing gradient appears when that hill disappears. Now we’re on flat ground, with no idea which direction to go to reach a lower error. Your training has found the minima of the error function, but you might still be thousands of feet from sea level (or the correct answer).

Back to the horse example from earlier — our model is fed a cat, has reached the lowest point from the hill it can find, and tells you that cat is a horse. 

Obviously, the cat is not a horse.

I’m going to steal an example commonly used to bring context to this:

In the sentence “The clouds are in the ___”, the next word should obviously be “sky”, as it is linked to “clouds”. If the distance between clouds and the predicted word is short, an RNN can predict it fairly easily. Now, consider “I grew up in Germany with my parents, and spent many years there. That’s why I can speak fluent ___.” Here, the predicted word is German and can be connected directly to “Germany”. The distance between Germany and the predicted word is longer, though, and difficult for an RNN to predict. 

The bigger the gap, the harder it is. Vanishing gradient. 

Again, another model was developed to address this. 

Long Short Term Memory Networks (LSTMs)

Hilarious name, I know. Let’s recognize that and move on.

LSTMs are variations of RNNs, but with better capabilities for learning long term dependencies. This is done by leveraging what are called gates and cell states.

  • Cell State: The cell state carries information throughout the sequence of data being processed, allowing the LSTM to hold onto information gained, and designed to store and preserve long-term dependencies. At each step in the sequence, the cell state is updated based on the current input, the previous hidden state, and the operations of the gates. 

  • Input Gate: Controls how much of the new information from inputs should be added to the cell state, or the “memory” of the LSTM unit. 

  • Forget Gate: Runs calculations to decide which of that new info should be discarded as not important to long term dependencies. 

  • Output Gate: Moves the filtered version of the cell state onto the next step. 

LSTMs are better equipped for long dependencies than traditional RNNs, but they can still struggle with extremely long sequences or complex data structures. 

LSTMs also, by nature, process data sequentially. In other words, inputs are processed one by one. For example, if the training data is a sentence, each word is processed one after the other, which increases training time and inference speed. 

Transformers… The New(est) Age of NNs

Before you ask, we’re not referring to Optimus Prime, and this passage won’t have any mention of Megan Fox (sorry, guys).

Transformer models are the newest development within neural networks. They introduce a new mechanism: attention.

Rather than taking one input after the other and weighting those inputs based on what came before it, transformer models process an entire sequence at once (called parallel processing), giving weight to each word based on the entirety of the sequence. 

Think of transformers like a skilled chef preparing a complex dish. Instead of cooking each ingredient one by one, the chef combines all ingredients at once, understanding how each flavor complements the others to create a harmonious dish. This 'all-at-once' approach is what makes transformers efficient, allowing them to quickly process and understand large chunks of data

They do this by leveraging an “attention” mechanism, which calculates how much each input depends on all of the others in a sequence. The individual attention scores for each input are combined to create a context-rich representation of the entire sequence, where each element is informed by the others in a sequence. 

Because all elements are understood in relation to the rest, vanishing gradient issues disappear, and training time is massively decreased by way of memory dependence and usage. 

This is why ChatGPT can produce images, full papers, etc. When ChatGPT gives you an answer to a question, it predicts what word is most likely to come next in its answer based on a context-rich understanding of how all of the words before it, all of the words in your question, etc. matter to its output.

Limitations 

Transformer based neural networks are trained on massive, massive datasets that encompass a wide range of topics and scenarios.

GPT 3 was trained on 45 TB of data.

Though the amount of data was not released for GPT4, we know it’s more than 45TB, and it’s been disclosed that it was trained on 100 trillion parameters (the variables whose values are weighted and adjusted within the hidden layers). 

This makes transformers great at generalizing and providing reasonable responses to problems that are similar in context or structure to what they’ve seen during training, even if the exact problem is new to them. That said, they can struggle with highly specialized or niche problems that are significantly different from what they’ve seen in training data, and struggle with data that changes rapidly due to fluid environments. 

And thus concludes our lesson. So… what’s next?

Liquid Neural Networks 

In traditional neural networks, computations are performed through fixed weights and rigid connections between neurons. Conversely, LNNs introduce dynamic connectivity patterns, allowing neurons to interact in a fluid manner; changing which of the other neurons they interact with, what they’re responsible for, and which information gets passed and valued more based on changing inputs. 

So, unlike traditional neural networks, the parameters of liquid neural networks are not fixed after the training phase, allowing them to change as the input data changes.

This allows for continuous learning, meaning the AI system continues to learn and update its knowledge base even after being deployed.

Consider a traditional neural network like a well-trained musician playing from sheet music. They're excellent as long as the music doesn't change. But what if the music suddenly shifts to a different style? That's where they struggle, unlike Liquid Neural Networks, which can improvise and adapt to new melodies on the fly.

As counterintuitive as it may seem, this new, highly advanced model, is actually inspired by a roundworm’s brain. Although the roundworm has only 302 neurons (as opposed to the 100 Billion a human has), it’s capable of performing advanced, complex tasks and behaviors. 

LNNs are designed the same way: to do more with less.

For a poignant example: a classical neural network needs 100,000 neurons to keep an automated car on the road. The team behind Liquid AI at MIT CSAIL developed a liquid neural network that was able to perform the same task with just 19 neurons. 

That’s INSANE. 

Beyond being more efficient and better equipped for dynamic tasks, this structure directly addresses a main concern with other neural networks: interpretability. 

The more compact, fluid structure of LNNs can make the network’s decision making process more clear, as changes in the network’s behavior can be attributed more directly to the changes in input data, and with fewer nodes to look at, it’s more clear what each one is doing. This is directly converse to GPTs, which are often called “black boxes”, as it’s nearly impossible to understand what happens in the hidden layers between input and output, especially as the number of neurons in the hidden layer increases. 

These networks are particularly suited for a range of real world applications that require adaptability, efficiency, and the ability to handle dynamic, changing, temporal data. This will be especially useful in areas such as autonomous navigation, robotics, time series analysis, personalized AI, medical diagnostics, language processing, energy management, real time surveillance, and industrial automation.

Liquid AI

Liquid AI is a spin-off from the MIT Computer Science and Artificial Intelligence Lab (CSAIL). Their team consists of Ramin Hasani (CEO), Mathias Lechner (CTO), and Alexander Amini (CSO), and Daniela Rus (Director of CSAIL). 

Their mission is to commoditize Liquid Neural Network technology, democratizing it and making it widely accessible across industries. By focusing on dynamic and temporal data processing, Liquid AI is carving themselves out a piece of the market that is currently under addressed by current neural network frameworks. In addition, the reduced computational requirements of LNNs make them more environmentally sustainable, and the potential applications span incredibly high value areas. These key differentiators are likely the reason they’re already valued at $300M, despite not having any product, revenue, or clients. 

I’m incredibly excited to see how they progress, and I think you should be too. Hopefully after reading this, you understand why! 

If you made it this far, and enjoyed what you read, please consider subscribing for weekly editions.

‘Til next week. 

Happy Holidays!

Reply

or to participate.