As we began to construct the core elements of gAIa, we quickly realised there was so much ecosystem knowledge that could not be sufficiently captured in even the largest parameter LLMs. In one of our previous projects, Dataville, we had become fascinated with the concepts of The Semantic Web and knowledge-graphs, and they seemed like a perfect fit for gAIa.
You may already be familiar with the concept, but for those unaware; A knowledge-graph is an innovative knowledge-base which uses a graph-structure. Entities – objects, events, situations or abstract concepts, are connected by some “relationships” to other nodes in the graph, together these relationships reveal the rich interlinked, free-form connections between them.
node -[edge]- node
e.g. Beech Trees –[pair with]→ Brambles
This powerful data structure essentially encodes contextual knowledge in a way that can be easily digestible to digital systems. Further references can be made to original source materials, context of the article, or document, or even links to further reading on the web.
LLM-assisted extraction
Our first simple extraction process involved chunking the text, and parsing semantic context from these short text chunks. In our first evaluation and gAIa’s prototyping phase, this gave acceptable results, but as we absorbed many of these books and articles ourselves, we began to notice that often-times more nuanced or advanced concepts were misrepresented in the graph.
This need for improvement inspired our discovery of nascent text-analysis process; LLM-assisted extraction. Essentially, we designed a reflection process that could track a book or article’s ideas through the chapter, and between sections, and use a supervisor AI-model to extract more nuanced ideas, or simplify the essence of long-winded explanations.
Now our knowledgebase continues to grow, and thanks to this process, gAIa is able to store and recall rich, semantically-coherent information, where most other systems expose only data.
ReActing to every question
A simple semantic lookup query was a reasonable starting point, but quickly we realised the potential for more curious questioning, which could link together otherwise disparate information, or which revealed a gap in our knowledge.
To address this we introduced a ReAct (Reasoning and Action) model to enhance the process. Essentially, a digital librarian, which could dissect the user’s question, use vector-embedding techniques, and semantic-linking to expand its search to related concepts, and build a coherent logical model of the real-world, so it could finally reply with the sage wisdom of a true ecosystems expert.
We continue to add new articles and resources as we discover them, and our knowledge-graph continues to grow. If you know of any resources relating to trees or our environment, we’d love to hear from you, and we’re always curious to expand our understanding of the natural world.
Curious Researcher
Inspired by our own curiosity, we developed our latest innovation in the knowledge system; an agent with the capacity for detailed web-research, with room to flex its own curiosity, which was designed to help fill the gaps in our existing knowledge.
Our curious web-agent probes naturally whenever it’s searching through the semantic connections of the graph. Then, imitating a human’s inquisitive mind, it performs research across the internet, cross-checking multiple sources, and comparing its discoveries to its existing semantic knowledge, going back and forth in the same ReActive manner, allowing the agents to reflect on the data they discover and avoid proposing redundant additions, while proactively enhancing areas of the graph about which knowledge is lacking.
Of course this process is not fully-autonomous. We employ human-in-the-loop design principles, and a human review and research process, which allows us to prevent dataset poisoning, and expand our own working knowledge.