FreeRadius accounting altering/updating sessions start times after a day, weeks and in some cases months - freeradius

This might be a very specific problem or just ignorance from my side, but I don't seem to figure it out.
Within our organization, we have a FreeRadius Accounting system logging sessions from Wi-Fi usage. Our team is responsible for the data analysis of this accounting data.
Recently, we had to dump the Radius Accounting Database and made a freeze frame of it. While doing so we found a weird behavior.
Running the same query before and after the dump (a query that retrieves the total amount of sessions for a single day) gave a different amount. Around a difference of 5-10%.
Looking a bit deeper we discovered that several updates were being issued that altered the start time of sessions after they had been first registered in the accounting database.
We then found that previous data we collected had disparity after weeks or months even (with the discrepancy being around 2-10%).
TLDR:
Does FreeRadius adjust the start times of sessions based on some maintenance? Are WiFi controllers allowed to do this? Is it a bug?
Overal we just want to understand the rationale so we can justify the data and adjust our processing correctly, as currently, we cannot trust the values we collect daily or even weekly on these stats!
Any help or insight would be great!!!

FreeRADIUS only updates the database as a result of data in an incoming RADIUS packet, using the SQL queries in the local configuration. The only real way to understand this is to look at your SQL queries, and incoming requests (via radiusd -X) and see what is making changes to the data. It is possible that the NAS is broken and sending invalid or changing data, or possibly re-using session IDs which overwrite existing records.
It is also possible to configure FreeRADIUS to create a "fake" accounting start entry in the database in post-auth, which will then be updated when the real Start packet arrives. If you are doing this then you should check the values that are being written, and also if the session never starts up (or the Start is lost) then bad things might happen.
But in all circumstances the only solution you really have is to look at the debug output and see what is happening and why data is being written in the way that it is. There is nothing in FreeRADIUS that randomly updates the database without being sent that data from the NAS.

Related

What to report in a time serie database when the measure failed?

I use a time series database to report some network metrics, such as the download time or DNS lookup time for some endpoints. However, sometimes the measure fails like if the endpoint is down, or if there is a network issue. In theses cases, what should be done according to the best practices? Should I report an impossible value, like -1, or just not write anything at all in the database?
The problem I see when not writing anything, is that I cannot know if my test is not running anymore, or if it is a problem with the endpoint/network.
The best practice is to capture the failures in their own time series for separate analysis.
Failures or bad readings will skew the series, so they should be filtered out or replaced with a projected value for 'normal' events. The beauty of a time series is that one measure (time) is globally common, so it is easy to project between two known points when one is missing.
The failure information is also important, as it is an early indicator to issues or outages on your target. You can record the network error and other diagnostic information to find trends and ensure it is the client and not your server having the issue. Further, there can be several instances deployed to monitor the same target so that they cancel each other's noise.
You can also monitor a known endpoint like google's 204 page to ensure network connectivity. If all the monitors report an error connecting to your site but not to the known endpoint, your server is indeed down.

Moving slow database calls from MVC to background application - advice please

I have an MVC web site, where users can search for large recordsets from SQL Server and Oracle databases. Some of these recordsets can be very large, with many thousands of records. Sadly, it is a user requirement that they do not make their searches more specific.
When a user posts their search request to the database, my web page is hanging before often timing out (due to the amount of time taken to query the database).
We are thinking about removing the expensive database calls from the MVC site, and sending the query to a separate process to run in the background. When the query is complete, we can notify the user.
My proposed solution is:
1) When the user completes the search form in the web page, to simply display a message that the results are being generated and will be sent when complete
2) Send the SQL query to a database which can contain a list of SQL queries that need to be processed
3) Create a Windows Service which checks this database every couple of minutes for new queries
4) This Windows Service then queries the database. When the query is completed, it will create a CSV of the results, and email this to the user
I am looking for some advice and comments on my above approach? What do folks think of this as an approach to process expensive database calls in the background?
Generally speaking the requests will be made infrequently, but as mentioned, will be for a great amount of data. There is a chance that two or more requests could be made at the same time, but this will be infrequent.
I will also look at optimising the databases.
Grateful for any tips.
Martin :)
Another option is to supplement the existing code to execute the query on a separate thread so that periodic keep-alive updates can be sent to the requesting page while you wait for the query results. Similar to the way the insurance quote agregator pages work.
A second option is to make the results available as a hyperlink when they are ready and then communicate that either through the website or by email to the user.
Option three if these queries are not completely ad-hoc type queries then you could profile for the most frequent combinations and pre-compute them periodically placing the results into new tables (sort of halfway to optimising the current database structure).
The caveat there is that the data won't be as up to date - but given the time the queries are currently taking it probably isn't that important to be up to the second?
Whichever solution you choose I think it's going to depend on the user expectation - Do they know what they want and just send one big query and get it and be happy? or do they try several queries to find the right combination of parameters? If the latter then waiting for an email delivery of results might not be acceptable to them. But if what they want is a downloadable results document and they know what they want first time then it may. The only problem I see here is emails going astray or taking longer than the user thinks it should causing the request to be resubmitted multiple times and increasing the server workload - caching queries and results is probably a very good idea.
I would suggest to introduce layer of abstraction like messaging broker. Request will go in queue and batch layer will consume request from queue and once heavy work is done, batch layer will notify web layer again via messaging broker, Request-Reply pattern.
In addition on database side it is allways good to optimize queries.

iOS chat app design

I'm building a simple chat app on iOS for fun (and to have projects to gain experience from), using socketsIO and a node backend. I am trying to figure out the best design for messages. I was planning to use a mongoDB database where each conversation would have its message data stored. Whenever the client sends a new message to the server, the server adds it to the appropriate conversation in the database.
I was also hoping to create a user Sign Up/Log In system which would add you to the database.
However, I've googled around quite a bit and I am really not sure if creating a database made up of conversations (that get updated whenever a sentMessage event is triggered) and user data is the right way to go.
Additionally, I've seen some people talk about saving the chats on the actual devices themselves, not in a database? What is the common design pattern for a chat app like this?
for the design I would use socket.io for emitting messages as well. It has a great community behind it, I woul also use MongoDb because everything is using JSON format and it's integrated so well with Node due to it using JavaScript.
Now the part you are interested about, is REDIS. Redis is a database that sits in RAM on the web and should be used with mongodb if you're going to be having higher traffic / need quick speed / less hanging and waiting.
REDIS would be your temporary save for the chat with a session because doing disk write/read/querying is a lot on the machine (looking at you MongoDB), If you plan on saving the chat with every message. Doing so MongoDb would just not scale all the well in the long run and is not as fast as REDIS. Mind you REDIS database will only hold the temporary chat log of let's say the last 1 million chat session or some limit (it's all in RAM so the size is limited can't have Terabytes or hundreds of Gigabytes of RAM on 1 server).
so the data flow would look something like
user sends message
server receives messsage via HTTP(S) post/put - Ajax/Observable
Server will use socket.io to emit the message to the designated user while saving the message to REDIS with a specific key/session/message.
designated user get's the update on their screen via io event.
-- inbetween there should be a check on the REDIS db of whether it is getting full. if it's full remove the last 10,000 inactive messages (could be from 1 year ago if the server hasn't gotten full yet) to make some space.
Saving the chat on the phone is an okay idea as it would save the users data/bandwidth and they could potentially look at their message while offline.
a solution is using SQL Lite which is a lightweight library that will sit inside your app acting as a database which you can perform queries on if your familiar with RDBMS you will have no problem implementing it. But now you gotta find a good way to manage saving data to REDIS/SQL-LITE/MongoDb.

Cost of continuous replications vs one-shot replications (using TouchDB and Cloudant)

We have an app that uses Cloudant as a remote server. Nevertheless, Cloudant is not completely compatible with TouchDB's continuous replications from previous experience. So our alternative for now is to trigger manually one-shot replications at a fixed frequency. Nevertheless, we would like to know if that approach is going to cost us more money than continuous replications, since continuous replications use longpoll and doesn't need to query the server often. In other words, does one-shot pull replications with Cloudant as the target cost us a GET request?
Thank you,
Paul
I think the issue you refer to is [1].
Cloudant's replication is 100% compatible with CouchDB. In this
instance, TouchDB's logs indicate the iOS network stack passed
on incomplete JSON to TouchDB. It's not clear who was to blame
in this case for the replication failure.
[1] https://github.com/couchbaselabs/TouchDB-iOS/issues/241
For the cost question, a one-shot pull replication will result in a GET to the _changes
feed each time it happens, plus the other requests required to
replicate. This _changes request will be counted as a light
HTTP request against your Cloudant account.
However, whether this works out as more or fewer requests overall
depends on the number of changes coming down from the remote server.
It's also important to remember that the number of _changes calls are very small
relative to the number of other calls involved (e.g., getting the
content of the changes themselves and particularly if there are many
attachments).
While this question is specific to TouchDB, and I mention specific
behaviours of that codebase, this answer deals with the requests involved
in replication between any two systems speaking the CouchDB replication
protocol[2].
[2] http://www.dataprotocols.org/en/latest/couchdb_replication.html
Let's take a contrived example: 1 update per 10 second window to
the source database for the replication, where a TouchDB database
is the target. Let's take a 5 minute poll vs. a continuous replication.
For simplicity of call-counting, let's also take attachments out of the
picture. We'll also assume the device has a constant network connection.
For the continuous case, every 10s TouchDB will receive an update in
the _changes feed. This causes the longpoll connection to close.
TouchDB then runs through the changes, requesting the updates from the
source database; one or more GET requests on the remote server. While
this is happening, TouchDB has to open up another longpoll request
to _changes. So in a five minute period, you'd end up with perhaps
30 calls to _changes, plus all the calls to get documents and record
checkpoints.
Compare this with a one-shot replication every five minutes. You'd
receive notification of the 30 updates in one _changes feed call.
TouchDB implements an optimisation[3] whereby it will call _all_docs
to get updated documents for 1- revs, so you might end up with a single
call to get all 30 documents (not possible in the continuous case as
you've received a single change). Then you've the checkpoint documents
to record. At best fewer than 5 HTTP calls, at most about a third of
the continuous case as you've avoided extra _changes requests.
[3] https://github.com/couchbaselabs/TouchDB-iOS/wiki/Replication-Algorithm#performance
It comes down to the frequency of updates you expect to the source
database. One-shot replication is likely to provide a smoother price
curve as you're in better control of the number of requests you make.
A further question is how often connections will drop because of the
network disconnects which happen regularly with mobile devices.
TouchDB's continuous replications will fire back up each time the
user comes on line (if added via the _replicator database). This is a
further source of unpredictable costs.
However, the benefits from more immediate visibility of changes may
certainly be worth the uncertainty.

Is it possible to have a stateless timed function

I'm trying to set a reminder in a system to fire at a certain time.
This is a web based app, so it's not like it will be in memory all the time.
Ideally I'd like to avoid using a service or job on the server(mainly out of curiosity, to see if there is a more efficient way to do it)
For example, imagine how many Ebay bids are constantly ending all the times, and emails being sent out seemingly perfectly in time.
Do people recon there is just a big loop going over and over, moving items into a queue etc... Or is there something lower level helping out (stored procedures, triggers etc)
Thanks everyone.
What you have to realize about eBay - and most large database-backed websites - is that the interactions between humans and the database that come through the web server are only a part (sometimes a very small part) of the functionality of the system.
To use eBay as an example, the email that goes out when auctions expire is not handled by a web server. They are far more likely to have that scripted. In other words, there is another program running on a number of their systems that look at the database for ended auctions, do some processing on them, send emails, etc.
If I were doing something similar (albeit on a much smaller scale,) I'd have my web services built in the usual way, but have a job that is run automatically every few minutes to do the maintenance work. It would start up, look at the database for work, process anything that was required, then exit.

Resources