A tech & domain blog powered by Shtanglitza
How much of an RDF database can you push into a distributed execution model before the abstractions stop fitting?
rama-sail-graph is an attempt to find out. It is a Rama-backed RDF quad store where SPARQL queries compile into distributed execution plans that run across Rama partitions, and the whole thing sits behind RDF4J's SAIL API so that existing Java tooling can query it through familiar interfaces, within the bounds of what the engine currently supports.
For this project, the interesting problem is not storage. It is the mismatch between two execution models.
On one side is RDF4J's SAIL API: a connection-oriented, transaction-aware storage contract. A SailConnection exposes begin, commit, rollback. Clients expect to buffer writes, read their own uncommitted changes, and roll back cleanly. RDF4J's entire ecosystem (repositories, query preparation, SPARQL evaluation) layers on top of this contract. If the goal is compatibility with the existing RDF4J world, SAIL is the right boundary to implement.
On the other side is Rama: a distributed runtime where data flows through depots, gets indexed in partitioned PStates, and queries run as topologies across the cluster. There are no connections in the traditional sense. There is no local transaction buffer.
Bridging these two models is the core design problem. The adapter must present connection-scoped transaction semantics on top of a system that does not natively think in connections. Uncommitted writes go into a connection-local buffer. Rollback discards that buffer. Commit deduplicates add/delete pairs and appends the net operations to Rama's depot, optionally waiting for the microbatch to process before returning.
That is not just a compatibility detail. It is what makes the adapter usable inside the rest of the RDF4J ecosystem, including future integrations like stackable inferencing layers that depend on NotifyingSailConnection and statement-level change notification.
Published: 2026-03-30
Approved by: Shtanglitza Team
Tags: sparql clojure rdf rama distributed systems
Vector search is a common requirement for AI applications, enabling features like recommendation engines and semantic search. However, building a scalable, real-time vector search system often involves integrating multiple distinct technologies: a message queue for ingestion, various databases for indexing and storage, and a compute layer for processing.
As part of evaluating Rama as a development platform for our product solution, which incorporates several data models to capture different stages in biotech lab experiment design and execution, we decided to try building a vector search system from scratch.
While modern vector search often relies on complex graph-based algorithms like HNSW, this post explores a different approach: implementing the classic Locality-Sensitive Hashing (LSH) algorithm. LSH is a great starting point for an experiment like this because its principles - hashing, bucketing, and re-ranking - map clearly to data processing primitives. The goal is to see how Rama's unified model handles the components of this traditionally complex task.
Locality-Sensitive Hashing is an algorithm for approximate nearest-neighbor search. Instead of comparing a query vector to every other vector in a dataset (which is slow), LSH uses a special hashing technique.
Published: 2025-11-07
Approved by: Shtanglitza Team
Tags: clojure rama lsh vector search
The integration of Large Language Models (LLMs) with knowledge graphs is gaining significant traction, particularly in the context of Retrieval-Augmented Generation (RAGs). In these scenarios, LLMs usually act as interfaces for querying and summarizing information retrieved from a knowledge graph. However, other scenarios are yet to be explored. In this blog post, we explore the innovative application of LLMs for enriching structured data directly through SPARQL queries. Using the SPARQL.anything framework and the GROQ API, we'll demonstrate how to interact with a remote LLM, unlocking new possibilities for knowledge enrichment.
For those who are interested in knowledge graphs and data integration using RDF, SPARQL.anything is a powerful framework that allows users to query various data sources using the SPARQL query language. It supports querying different types of data sources, including JSON, XML, relational databases, and even remote APIs.
SPARQL.anything functions as both a CLI and a server (utilizing Apache Fuseki). For a deeper dive, you can refer to the documentation. In this experiment, we will run the server using a simple command.
Published: 2024-12-25
Approved by: Shtanglitza Team