I recently published a previous entry about doing caches automatically updated with minimized risk of wrong aggregate results, even inside a transaction with modifications.
What happens if anyone modifies ancient rows? We will get wrong aggregates. It is the only drawback of the solution proposed. But in certain circumstances it will be not enough. Maybe your table isn’t modified only in the newer parts. Or maybe you need to get always a correct aggregate no matter what. What can we do?
Let’s suppose a table can be divided in segments. Those segments are continuous on disk. If we aggregate every segment independently, we could later calculate a grand total by aggregating the previous partial aggregates. Because the segment size is relatively big (16384 rows in my examples) we can calculate the grand total almost instantly. If we know which segments have changed recently, we can invalidate them and go to the actual table for those records.
To make faster access to the segment calculation, I add one index:
CREATE INDEX ON invoicelines ((lineid >> 14), prod);
Then the table holding the cache could be defined as:
DROP TABLE IF EXISTS agg_invoicelines_prod2 CASCADE; CREATE TABLE agg_invoicelines_prod2 AS SELECT lineid>>14 as segment, l.prod, sum(quantity) as quantity, count(*) as count1, sum(l.totalprice) as totalprice FROM invoicelines l GROUP BY lineid>>14, l.prod; ALTER TABLE agg_invoicelines_prod2 ADD PRIMARY KEY (prod,segment);
In this case I’m not using materialized views. That’s because now we’re taking care on how the cache is refreshed.
Now we can create a table for segment invalidation. In case a record hits this table, means that part of the cached table isn’t valid anymore.
CREATE TABLE agg_invoicelines_prod2_invalid ( id serial not null, segment int not null, prod varchar not null, primary key (id) ); CREATE INDEX ON agg_invoicelines_prod2_invalid (segment, prod, id);
Why another table and not a simple column on the prior one? Well, if two transactions try to invalidate the same record, PostgreSQL will lock one until the other commits. This scheme allows parallel invalidation of segments.
But the problem here is we don’t have yet a view for the grand total. Let’s write one, even if it returns stale data:
CREATE OR REPLACE VIEW view_agg_invoicelines_prod2_stale AS SELECT prod, sum(quantity) as quantity, sum(count1) as count1, sum(totalprice) as totalprice FROM agg_invoicelines_prod2 GROUP BY prod;
How fast is this method compared with the previous one up to this point? Let’s execute an example query:
SELECT * FROM view_agg_invoicelines_prod2_stale -- 0.039s SELECT * FROM view_agg_invoicelines_prod -- 0.018s SELECT * FROM agg_invoicelines_prod -- 0.0001s
As you can see this method is slower than reading the cached aggregates directly, that’s because we hold 123 segments and in average, it has to process 123 rows to retrieve a single aggregated row.
Even being 400 times slower, it’s still a lot useful, because it returns in 40ms which is very fast. For larger tables you can use a bigger segment to avoid being too slow.
Remember we did this because we want invalidate segments. So let’s do it. We’ll need a trigger for every modification.
CREATE OR REPLACE FUNCTION func_trg_invoicelines_invalidate() RETURNS trigger AS $BODY$ BEGIN IF TG_OP = 'INSERT' THEN OLD = NEW; END IF; IF TG_OP = 'DELETE' THEN NEW = OLD; END IF; INSERT INTO agg_invoicelines_prod2_invalid (segment, prod) ( SELECT NEW.lineid >> 14, NEW.prod UNION SELECT OLD.lineid >> 14, OLD.prod ) EXCEPT SELECT segment, prod FROM agg_invoicelines_prod2_invalid; RETURN NEW; END; $BODY$ LANGUAGE plpgsql; CREATE TRIGGER trg_invoicelines_invalidate AFTER UPDATE ON invoicelines FOR EACH ROW WHEN (OLD.* IS DISTINCT FROM NEW.*) EXECUTE PROCEDURE func_trg_invoicelines_invalidate(); CREATE TRIGGER trg_invoicelines_invalidate2 AFTER INSERT OR DELETE ON invoicelines FOR EACH ROW EXECUTE PROCEDURE func_trg_invoicelines_invalidate();
I use a separate trigger for UPDATE to use the “WHEN” trick and avoid invalidating when there are no changes at all. Let’s test inserting new invoices:
INSERT INTO invoicelines (invoiceid, lineno, prod, quantity, totalprice) SELECT invoiceid, generate_series(1,100) as lineno, 'prod' || to_char((random()*250)::int,'FM00000') as prod, (random()*25)::numeric(12,0) as quantity, (random()*random()*1000+random()*500 +random()*50)::numeric(12,2) as totalprice FROM generate_series(1,2) invoiceid; -- 200 rows. SELECT * FROM agg_invoicelines_prod2_invalid -- segment: 122; 131 products. (131 rows)
As you can see, the new invoices ended up all in the segment 122. 200 rows collapsed into 131 invalidations. With more rows inserted we could expect better collapse of the invalidated segment.
The problem is on random updates on the middle of the table. If the changes aren’t together in the disk, we will end with one invalidation per row modified. For example, we could update the totalprice of random rows (0.01%):
UPDATE invoicelines SET totalprice = totalprice * 1.10 WHERE random() < 0.01/100.0
202 rows were updated and now the table for invalidations has 333 rows. Exactly 202 more. Every row updated randomly caused an invalidated row. This was expected.
But don’t worry, the invalidation table isn’t going to grow indefinitely. At most, when every segment is invalidated, the table will not grow larger. In this example 30867 rows will be the maximum (in theory, because two concurrent transactions can invalidate the same segment at the same time).
Now it’s time to create the magic view where we mix cached data with real data. But, before that I’ll create a small function to get the segments. This function exists mainly to hide this query from the planner later. In my tests, placing the function forces the planner to query it first and plan correctly with its results. (Your mileage may vary)
CREATE OR REPLACE FUNCTION fn_get_invoicelines_prod2_invalid( out segment integer, out prod varchar) returns setof record AS $$ SELECT DISTINCT segment, prod FROM agg_invoicelines_prod2_invalid $$ LANGUAGE SQL IMMUTABLE;
For the “online” data view the query goes like: first get all still valid rows from cache, then get live data for those segments which were invalidated. Add those two, and aggregate them to create only one result.
CREATE OR REPLACE VIEW view_agg_invoicelines_prod2_live AS SELECT y.prod, sum(quantity) as quantity, sum(count1) as count1, sum(totalprice) as totalprice FROM ( SELECT (lineid >> 14) as segment, l.prod, sum(quantity) as quantity, sum(count1) as count1, sum(totalprice) as totalprice FROM ( SELECT l.lineid, l.prod, l.quantity, 1 as count1, l.totalprice FROM invoicelines l INNER JOIN fn_get_invoicelines_prod2_invalid() x ON x.segment = (l.lineid >> 14) AND x.prod = l.prod ) l GROUP BY (lineid >> 14), l.prod UNION ALL SELECT t.* FROM agg_invoicelines_prod2 t LEFT JOIN fn_get_invoicelines_prod2_invalid() x ON x.segment = t.segment AND x.prod = t.prod WHERE x.segment is null ) y GROUP BY y.prod;
Let’s try to query it. I still have 333 segments invalidated.
SELECT * FROM view_agg_invoicelines_prod2_live -- 0.132s
Looking into the query plan I see PostgreSQL is aggregating 30536 cached rows and 14180 live rows. Of course, it takes longer than simpler approaches, because we have more rows to aggregate. But compared to the 2.4 seconds that a regular query takes to invoice lines, we’re still getting a 20x speed up.
From time to time we have to recalculate the stale data in order to avoid performance degradation. This should be done outside of any transaction, in periods of low CPU. But it doesn’t take much CPU.
First, we delete stale data from agg_invoicelines_prod2 table, so we can insert them later.
DELETE FROM agg_invoicelines_prod2 p USING agg_invoicelines_prod2_invalid i WHERE p.prod = i.prod AND p.segment = i.segment;
Now we delete the invalidated rows and we use that (putting the DELETE into a CTE expression) to calculate a insert:
WITH d as ( DELETE FROM agg_invoicelines_prod2_invalid x RETURNING x.* ) INSERT INTO agg_invoicelines_prod2 (segment, prod, quantity, count1, totalprice) SELECT lineid>>14 as segment, l.prod, sum(quantity) as quantity, count(*) as count1, sum(l.totalprice) as totalprice FROM invoicelines l INNER JOIN d ON l.prod = d.prod AND (l.lineid>>14) = d.segment GROUP BY lineid>>14, l.prod; -- 0.063s for 333 rows.
For me, it took 63ms. Very fast indeed.
Let’s see how long it takes to query the view now with a full valid cache:
SELECT * FROM view_agg_invoicelines_prod2_live -- 0.086s
Slightly faster. From 132ms to 86ms, 53% faster. But the previous method gave us 18ms, which is 477% faster. Is this the price to pay for having a live aggregate cache? Probably we could use bigger segment sizes because we have seen that invalidations and cache updates are happening way faster than expected but selects way slower. We now have 123 different segments. If we add 3 bits more to the shift, the segment would be 16 times bigger and we could expect about 7-8 different segments. That would give us a 17 bit shift.
I tried this myself and I got 16 segments instead of 8, but with good results. Invalidations table works more or less as before. Same rows. With ~300 invalidated cache rows it takes 77ms to query, but with a full valid cache it takes 12ms. Even faster than the previous method.
Well, the previous method had a fixed amount of 10.000 live rows to aggregate, now we’re aggregating fewer records so that explains how we’re getting even faster. The previous method doesn’t have triggers but makes a guess, in this one we use triggers so data modification is a bit slower but we can use that information gathered to exactly retrieve what is needed.
In some cases the group by columns have a cardinality so high that segmenting the table no longer makes sense. But I think segmenting should help on other cases, specially if you expect modifications. If aggregating the whole table takes 2 seconds, if a insertion changes the 250 products we have, you’ll end with a fully invalidated cache if you don’t use segments.
The main point of segmenting the table is querying regions of disks which are near one of another. Maybe this behavior shows up better with even more rows. Also, segments can be useful on table partitioning schemes, as we could avoid reading other child tables entirely.
Hope you liked the post. Maybe I’ll came up with a new entry on this topic if I get new ideas.
Thanks for reading!