Time series caching in Rails website - ruby-on-rails

I'm developing a rails website and trying to implement a cache layer.In current website we are displaying the stock information for each stock using D3 chart rendering and every other second sending a request to server for new data and appending it with the current rendered D3 chart.
My Design approach*
I will implement a caching layer that will internally send request to database ,let say every 10 second and update the cache for last one hour so at given point my cache will always have last 1 hour of data so any request that match with in this time stamp will be served from cache.
Issues :
How to store data in cache. Currently,I'm thinking of memcached for distributed caching and key as timestamp but how I invalidate the earliest timestamp key when new key with latest 10 second data comes in ?
Some of the data don't come sequentially, let say data for 14:02:33 will come later than 14:02:38. How to avoid such scenarios
Let me know if you guys have better approach to design this problem.
Thanks

Memcached supports expiration natively. That means you can assign 1 hour expiration for every new key and they will be removed automatically by the Memcached.
Such problem does not exist as memcached doesn't care about the key sequences. It's a single key - single value system.
If you care about different timeframe, you can quantize time by let's say 5s, so 14:02:33 -> 14:02:30; 14:02:37 -> 14:02:35, and so on.

Related

Why does my infulxdb insert a new row of data only every 10 seconds?

I wrote a timed task in c# to insert one row of data per second,
but I found that only one row of data is inserted every 10 seconds.
I also noticed that new insert requests within 10 seconds will only update the same row of data and not insert a new one.
What is the setting that causes this and how do I change it?
The version of influxdb is 2.2, I downloaded it from the website and started it directly without changing any configuration.
You are probably using query creator which aggregates data (prepares query with aggregation). Example setting in InfluxDB v2 web GUI:
Setting period to 1s or writing your own query without any aggregation should solve your problem.
What is more: writing data with the same tag keys to InfluxDB, with the same timestamp and the same value field name will overwrite existing value in InfluxDB. So described behaviour is normal.

ActiveRecord query for all of last day's data and every 100th record prior

I have a process that generates a new record every 10 minutes. It was great for some time, however, now Datum.all returns 30k+ records, which are unnecessary as the purpose is simply to display them on a chart.
So as a simple solution, I'd like to provide all available data generated in the past 24 hours, but low res data (every 100th record) prior to the last 24 hours (right back to the beginning of the dataset).
I suspect the solution is some combination of this answer which selects every nth record (but was provided in 2010), and this answer which combines two ActiveRecord objects
But I cannot work out how to get a working implementation that obtains all the required data into one instance variable
You can use OR query:
Datum.where("created_at>?", 1.day.ago).or(Datum.where("id%100=0"))

iOS and Mysql Events

I'm working on an app that connects to a mysql backend. It's a little simliar to snapchat in that once the current user gets the pics from the users they follow and see them they can never again see these pics. However, I can't just delete the pics from the database, the user who uploaded the pic still needs to see them. So I've come up with an interesting design and I want to know if its good or not.
When uploading the pic I would also create a mysql event that would run the same time exactly one day after the pic was uploaded deleting itself. If I have people uploading pics all the time events would be created all the time. How does this effect the mysql database. Is this even scalable?
No, not scalable: Deleting of single records is quick, however if your volume increases, you run into trouble. You do however have a classic case for using partitioning:
Create table your_images (insert_date DATE,some_image BLOB, some_owner INT)
ENGINE=InnoDB /* row_format=compressed key_block_size=4 */
PARTITION BY RANGE COLUMNS (insert_date)
PARTITION p01 VALUES LESS THAN ('2015-07-12'),
PARTITION p02 VALUES LESS THAN ('2015-07-03'),
PARTITION p0x VALUES LESS THAN (ETC),
PARTITION p0n VALUES LESS THAN (MAXVALUE));
You can then insert just as you are used to, drop the partitions once per day (using 1 event for all your data), and create new partitions also once per day (using the same event which is dropping your old partitions).
To make certain a photo lives for 24 hours (minimum), the partition cleanup has to occur with a 1 day delay (So cleanup the day before yesterday, not yessterday itself).
A date filter in your query getting the image from the database is still needed to prevent the images from older then a day being displayed.

Storing large amount of boolean values in Rails

I am to store quite large amount of boolean values in database used by Rails application - it needs to store 60 boolean values in single record per day. What is best way to do this in Rails?
Queries that I will need to program or execute:
* CRUD
* summing up how many true values are for each day
* possibly (but not nessesarily) other reports like how often true is recorded in each of field
UPDATE: This is to store events that may or may not occur in 5 minute intervals between 9am and 1pm. If it occurs, then I need to set it to true, if not then false. Measurements are done manually and users will be reporting these information using checkboxes on the website. There might be small updates, but most of the time it's just one time entry and then queries as listed above.
UPDATE 2: 60 values per day is per one user, there will be between 1000-2000 users. If there isn't some library that helps with that, I will go for simplest approach and deal with it later if I will get issues with performance. Every day user reports events by checking desired checkboxes on the website, so there is normally a single data entry moment per day (or few if not done on daily basis).
This is dependent on a lot of different things. Do you need callbacks to run? Do you need AR objects instantiated? What is the frequency of these updates? Is it done frequently but not many at a time or rarely but a bunch at once? Could you represent these booleans as a mask instead? We definitely need more context.
Why do these need to be in a single record? Can't you use a 'days' table to tie them all together, then use a day_id column in your 'events' table?
Specify in the Day model that it 'has_many :events' and specify in the Event model file that it 'belongs_to :day'. Then you can find all the events for a day with just the id for the day.
For the third day record, you'd do this:
this_day = Day.find 3
Then you can you use 'this_day.events' to get all the events for that day.
You'll need to decide what you wish to use to identify each day so you query for a day's events using something that you understand. The id column I used above to find it probably won't work.
You could use the timestamp first moment of each day to do that, for example. Or you could rely upon the 'created_at' column of the table to be between the start and end of a day
And you'll want to be sure to thing about what time zone you are using and how this will be stored in the database.
And if your data will be stored close to midnight, daylight savings time could also be an issue. I find it best to use GMT to avoid that issue.
Good luck.

Delta Extraction + Business Intelligence

What does Delta Extraction mean in regards to Data Warehousing.
Only picking up data that have changed since the last run. This saves you the effort of processing data that you've already extracted before. For example, if your last extract of customer data was at April 1 00:00:00, your delta run would extract all customers who were added or have had details updated since April 1 00:00:00. To do this, you will either need an attribute that stores when a record was last updated or use a log scraper.

Resources