Rails Sessions table attributes, Understanding activerecord-session_store

Rails Sessions table attributes, Understanding activerecord-session_store - ruby-on-rails

I am trying to understand the differences between a particular record stored in a sessions table in the database, vs session information stored in a sessions cookie. There is a part in the activerecord-session_store documentation that is confusing to me. Documentation is at: https://github.com/rails/activerecord-session_store
So for whatever reason I want to have a sessions table instead of just using the sessions cookie. I add the gem: gem "activerecord-session_store". I then do rails generate active_record:session_migration which creates the migration that builds the session table in the database once I rake db:migrate.
That sessions table holds two main columns: session_id(which is of type string) and data (which is of type text).
First Question: session_id? what exactly is this referring to? Is the session_id equal to the primary key: id?
My second question revolves around the documentation notes for the column: data. This column is of type text. According to https://msdn.microsoft.com/en-us/library/ms187993.aspx the text datatype's maximum size is 2,147,483,647 bytes, so I would assume that this is the maximum size of bytes that this column can hold. However, the activerecord-session_store documentation states:
data (text or longtext; careful if your session data exceeds 65KB).
It goes on to say this:
If the data you write is larger than the column's size limit, ActionController::SessionOverflowError will be raised.
Second Question: Why is the data column limited to 65KB when the data type text can hold 2,147,483,647 bytes? I thought that one of the main reasons why I might want a sessions table is because I want to store more stuff than a sessions cookie can store (which is 4093 bytes).
Third Question: How do I make it so that the data column can store more than 65KB of info?
Fourth Question: active_record-session_store appears to only encode the data. Is it safe that the data is encoded as opposed to encrypted because the sessions table is located on my server as opposed to in a user's cookie? Is it necessary to encrypt the session's data?

first question: No, session_id and id are not the same (although you can configure it to be the same, as described in the activerecord-session_store documentation).
second question: 65kB is the conventional max. size for text columns - see here. It changes to longtext if more than 65kB are stored (at least i understand it this way, but haven't tried).
third question: see second answer, although i'm not completely sure. I think the more important question is: Why would you store more? ;)
fourth: encoding does not happen because of safety reasons. The data is encoded...
...to store the widest range of binary session data in a text column
(according to this)
Hope this helps!

Related

How to avoid storing the original content in Solr, only the indexed version?

I have a lot of documents about 30 TB, These docs have other attributes associated with it
don't want to store the actual documents after indexing it with Solr since there it is stored somewhere else and I can access it if needed later
The other data attributes will also be indexed with solr and won't be deleted.
I'm currently developing with Ruby on rails and have mysql but would like to move to
Mongodb. Is the scenario above possible?
Thanks
-Maged

You don't have to store original content in Solr. That's the difference between stored and indexed. If you set stored to false, you will only keep the processed, tokenized version of content as needed for search. Just make sure you keep your ID stored. This is set in your field definition in schema.xml.
This does mean Solr cannot return any of the non-stored fields back to the user, so you need to match them to the original records based on IDs (just as you seem to suggest).
This also break the partial document updates, so you will need to make sure you are reindexing the whole document when things changed.

As I understand, that you don't want to play with you content of the document. Once you'll index it and keep it. The other data properties, you want to index frequently. It's better you create your "content" field stored and indexed both, if you are not concerned about space. Choose the tokenizer and filters for content smartly, so that it creates less tokens.
For partial update, follow http://solr.pl/en/2012/07/09/solr-4-0-partial-documents-update/

What data type can I use for very large text fields that is database agnostic?

In my Rails app, I want to store the geographical bounds of places column fields in a database. E.g., the boundary of New York is represented as a polygon: an array of arrays.
I have declared my model to serialize the polygons, but I am unsure whether I should even store them like this. The size of these serialized polygons easily exceed 100,000 characters, and MySQL only can store about 65000 characters in a standard TEXT field.
Now I know MySQL also has a LONGTEXT field. But I really want my app to be database-agnostic. How does Rails handle this by itself? Will it switch automatically to LONGTEXT fields? What about when I start using PostgreSQL?

At this point I suggest you ask yourself - does this data need to be stored, or should be store in a database in this format?
I propose 2 possible solutions:
Store your polygons in the filesystem, and reference them from the database. Such large data items are of little use in a database - it's practically pointless to query against them as text. The filesystem is good at storing files - use it.
If you do need these polygons in the database, store them as normalised data. Have a table called polygon, and another called point, deserialize the polygons and store it in a way that reflects the way that databases are intended to be used.
Hope this is of help.

Postgresql has a library called PostGIS that my company uses to handle geometric locations and calculations that may be very helpful in this situation. I believe postgresql also has two data types that allow arrays and hashes. Arrays are declared, as an example, like text[] where text could be replaced with another data type. Hashes can be defined using the hstore module.

This question answers part of my question: Rails sets a default byte limit of 65535, and you can change it manually.
All in all, whether you will run into trouble after that depends on the database you're using. For MySQL, Rails will automatically switch to the appropriate *TEXT field. MySQL can store up to 1GB of text.
But like benzado and thomasfedb say, it is probably better to store the information in a file so that the database doesn't allocate a lot of memory that might not even be used.

Even though you can store this kind of stuff in the database, you should consider storing it externally, and just put a URL or some other identifier in the database.
If it's in the database, you may end up loading 64K of data into memory when you aren't going to use it, just because you access something in that table. And it's easier to scale a collection of read-only files (using something like Amazon S3) than a database table.

Ruby on Rails: Is generating obfuscated and unique identifiers for users on a website using SecureRandom safe and reasonable?

Internally, my website stores Users in a database indexed by an integer primary key.
However, I'd like to associate Users with a number of unique, difficult-to-guess identifiers that will be each used in various circumstance. Examples:
One for a user profile URL: So a User can be found and displayed by a URL that does not include their actual primary key, preventing the profiles from being scraped.
One for a no-login email unsubscribe form: So a user can change their email preferences by clicking through a link in the email without having to login, preventing other people from being able to easily guess the URL and tamper with their email preferences.
As I see it, the key characteristics I'll need for these identifiers is that they are not easily guessed, that they are unique, and that knowing the key or identifier will not make it easy to find the other.
In light of that, I was thinking about using SecureRandom::urlsafe_base64 to generate multiple random identifiers whenever a new user is created, one for each purpose. As they are random, I would need to do database checks before insertion in order to guarantee uniqueness.
Could anyone provide a sanity check and confirm that this is a reasonable approach?

The method you are using is using a secure random generator, so guessing the next URL even knowing one of them will be hard. When generating random sequences, this is a key aspect to keep in mind: non-secure random generators can become predictable, and having one value can help predict what the next one would be. You are probably OK on this one.
Also, urlsafe_base64 says in its documentation that the default random length is 16 bytes. This gives you 816 different possible values (2.81474977 × 1014). This is not a huge number. For example, it means that a scraper doing 10.000 request a second will be able to try all possible identifiers in about 900 years. It seems acceptable for now, but computers are becoming faster and faster, and depending on the scale of your application this could be a problem in the future. Just making the first parameter bigger can solve this issue though.
Lastly, something that you should definitely consider: the possibility for your database to be leaked. Even if your identifiers are bullet proof, your database might not be and an attacker might be able to get a list of all identifiers. You should definitely hash the identifiers in the database with a secure hashing algorithm (with appropriate salts, the same you would do for a password). Just to give you an idea on how important this is, with a recent GPU, SHA-1 can be brute forced at a rate of 350.000.000 tries per second. A 16 bytes key (the default for the method you are using) hashed using SHA-1 would be guessed in about 9 days.
In summary: the algorithm is good enough, but increase the length of keys and hash them in the database.

Because the generated ids will not be related to any other data, they are going to be very hard (impossible) to guess. To quickly validate there uniqueness and find users, you'll have to index them in the DB.
You'll also need to write a function that returns a unique id checking the uniqueness, something like:
def generate_id(field_name)
found = false
while not found
rnd = SecureRandom.urlsafe_base64
found = User.exists?(field_name: rnd)
end
rnd
end
Last security check, try to check the correspondance between an identifier and the user information before doing any changes, at least the email.
That said, it seems a good approach to me.

How much data can rails parameter pass?

I'm trying to make a post request to server along with a massive string of data to be placed into the database. I noticed that it was cut off at a certain point (about 440K of data in that one variable). I'm just wondering how much data can Rails hold in the parameter to pass to the server?
Thanks.

There is no limit imposed by rails on the size of posted data (or data passed in the URL)
Other intermediaries may have have limits however, for example nginx has a client_max_body_size. Also check your database settings: if your data is longer than the maximum length of the corresponding column some databases will silently truncate (others will raise an error). I'd start by checking in your controller that the parameters have the expected length.

Trimming BOLD_CLOCKLOG table

I am doing some maintenance on a database for an application that uses the Bold for Delphi object persistence framework. This database has been been in production for several years and several of the tables have grown quite large. One of them is the BOLD_CLOCKLOG which has something to do with Bold's transaction management.
I want to trim this table (it is up to 1.2GB, with entries from Jan 2006).
Can anyone confirm the system does not need this old information?

From the bolds documentation:
BOLD_CLOCKLOG
To be able to map the transaction numbers used in the TimeStamp columns to the corresponding physical time (such as 2001-01-01 12:34) the persistence mapper will store a log with timestamps and times. Normally, this log is written for each database operation, but if the traffic to the database is very intensive, it is possible to restrict how often this log is written by setting the property ClockLogGranularity. The event OnGetCurrentTime should also be implemented to ensure that all clients have the same time.The usage of this table can be controlled with the tagged value: Model.UseClockLog
So I believe this is used for versioning Boldobjects, see Object Versioning Extension in bolds documentation. If your application don't need this you can drop this in the database.
In our Bold application we don't use that feature. Why don't simply test to turn off Bold_ClockLog in the model, drop that big table and try to use your application. I'm pretty sure if something is wrong then it say so at once.
I can also mention that we have an own custom objecthistoy. It is simply big string (as TStringList.DelimetedText) in a ObjectHistory class that have Time, user and a note about action. This suits our need better that Bolds builtin objecthistory. The disadvantage is of course that we need to add calls in the code when logging to history is done.

Bold_ClockLog is an optional table, it's purpose is to store mapping between integer timestamps and corresponding DateTime values.
This allows you to find out datetime of the last modification to any object.
If you don't need this feature feel free to empty the table, it won't cause any problems.
In addition to Bold_ClockLog, the Bold_XFiles is another optional table that tends to grow large. But unlike the Bold_ClockLog the Bold_XFiles can not be emptied.
Both of these tables can be turned on/off in the model tag values.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart