Accurate Timestamps for Delayed Server-Side Tracking via Message Queue

I’ve implemented server-side tracking using a message queue. Events may not be handled in real time with my setup. they’re sometimes processed in batches or with a delay.

To ensure accurate analytics (e.g. time-on-page or other time-sensitive metrics), is it possible to pass a custom timestamp to your API so that events are recorded based on when they actually occurred, rather than when they’re processed?

Without this, delayed processing could lead to incorrect ordering or time-on-page calculations.

Hi IServ,

Yes, this is possible using the batch API endpoints:

https://api.pirsch.io/api/v1/hit/batch and https://api.pirsch.io/api/v1/event/batch. They use the "time" parameter instead of using the time when the requests arrive at our service :)

1 Like

That’s great to hear. However, your documentation states: ‘If you use the batch endpoint, make sure the page views are in order, otherwise they throw off your statistics.’ Does this still apply if I include the correct timestamp with each hit?

Good point. So they are stored using the timestamp, and therefore page views will be in order, but not as they are processed. This affects the session and might lead to inconsistencies like having the wrong exit page on the session.

I think we can sort the list when we receive it, making this less of an issue, but you would still need to ensure that page views are properly ordered across requests. This means that page view B should be part of the second batch request and page view A in the first, not the other way around, but I think this part is clear :slight_smile:

I’ll add sorting and update our docs!

The thing is, we’re running a multi-server infrastructure, so events and page views from one user aren’t necessarily handled by the same server (And will then be submitted in different batches). If it’s really a hard requirement, I will rethink the setup. But it would be great if there’s a way around it.

I see. In this situation, you might end up with page views that came in earlier, arrive later at our service. While we can sort the lists individually (we also process requests on multiple servers in parallel), we can’t do that across the servers.

We have distributed mutexes for the sessions if you send page views one by one, so there is no problem with that.

I think you basically have two options:

  1. Make a request for each page view individually
  2. Create a central queue/service that collects a bunch of page views before sending them consolidated to our service

Option 2. shouldn’t be hard to implement. I can help you with that if you like. The downside is of course that it’s a single point of failure. Option 1. is just using the API as is, but also quite expensive if you send a lot of egress data. Although the requests are tiny, so it might work out ok.

While Option 2 sounds like the better long-term solution, I’d still like to give Option 1 a try to see if we can get some quick results. Just to clarify: you’re still suggesting we use the batch endpoint, but with only one item per batch/array, correct?

And just one final question: do all your proposals apply to both events and page views?

Not quite, you should use the regular page view endpoint for single page views (https://api.pirsch.io/api/v1/hit).

Yes :slight_smile:

I’m still having trouble wrapping my head around this. From what I understand, the standard page view endpoint doesn’t support a time parameter. As I mentioned earlier, my challenges are:

A) Pageviews and events may be delivered with a delay
B) They might arrive out of chronological order

Based on that, you previously recommended using the batch endpoints along with the time field to address these issues.

Oh, right. In that case, you can use the batch endpoint. The only difference is the timestamp. The batch endpoint will simply call the regular page view/event endpoint, but with the timestamp. So that should do.

1 Like