I'm trying to make a post request to server along with a massive string of data to be placed into the database. I noticed that it was cut off at a certain point (about 440K of data in that one variable). I'm just wondering how much data can Rails hold in the parameter to pass to the server?
Thanks.
There is no limit imposed by rails on the size of posted data (or data passed in the URL)
Other intermediaries may have have limits however, for example nginx has a client_max_body_size. Also check your database settings: if your data is longer than the maximum length of the corresponding column some databases will silently truncate (others will raise an error). I'd start by checking in your controller that the parameters have the expected length.
Related
I just want your help over a issue, about how I come to know that there are missing values specially in big data sets i.e. which columns having missing values and whose not?
This depends entirely on how the dataset is stored (if it’s at rest as a disk file), or what interface is it accessible through (SQL, graph query, etc).
If it’s a “plain file” like CSV, HDF, an Octave/Matlab matrix, then use whatever scripting tool you’re comfortable with to iterate the rows and check for missing values. If it’s an SQL dump, you can load it into SQLite or sql server and select for missing values. You could even use an SQL parser to directly report missing values from the SQL dump, since there’s really no need to persist it into a database.
If it’s live data behind an API, you can use the api to query the data for missing values – if the api supports such queries. Otherwise, use the api to export (dump) the entire data set, and query it at rest as in preceding paragraph. If the dataset doesn’t have indices that allow finding missing data, then you’ll expect the query to take long, and possibly have performance impact on the service that provides the data – act with care and understand the exact consequences of what you’re about to do.
This gives number of missing values of each column. Use your pandas dataframe instead of train.
train.isnull().sum()
Otherwise you can use train.info() or train.describe() for complete information or description of data, which also shows missing values in each column.
Number of missing values for entire dataset df.isnull().sum().sum()
I am trying to understand the differences between a particular record stored in a sessions table in the database, vs session information stored in a sessions cookie. There is a part in the activerecord-session_store documentation that is confusing to me. Documentation is at: https://github.com/rails/activerecord-session_store
So for whatever reason I want to have a sessions table instead of just using the sessions cookie. I add the gem: gem "activerecord-session_store". I then do rails generate active_record:session_migration which creates the migration that builds the session table in the database once I rake db:migrate.
That sessions table holds two main columns: session_id(which is of type string) and data (which is of type text).
First Question: session_id? what exactly is this referring to? Is the session_id equal to the primary key: id?
My second question revolves around the documentation notes for the column: data. This column is of type text. According to https://msdn.microsoft.com/en-us/library/ms187993.aspx the text datatype's maximum size is 2,147,483,647 bytes, so I would assume that this is the maximum size of bytes that this column can hold. However, the activerecord-session_store documentation states:
data (text or longtext; careful if your session data exceeds 65KB).
It goes on to say this:
If the data you write is larger than the column's size limit, ActionController::SessionOverflowError will be raised.
Second Question: Why is the data column limited to 65KB when the data type text can hold 2,147,483,647 bytes? I thought that one of the main reasons why I might want a sessions table is because I want to store more stuff than a sessions cookie can store (which is 4093 bytes).
Third Question: How do I make it so that the data column can store more than 65KB of info?
Fourth Question: active_record-session_store appears to only encode the data. Is it safe that the data is encoded as opposed to encrypted because the sessions table is located on my server as opposed to in a user's cookie? Is it necessary to encrypt the session's data?
first question: No, session_id and id are not the same (although you can configure it to be the same, as described in the activerecord-session_store documentation).
second question: 65kB is the conventional max. size for text columns - see here. It changes to longtext if more than 65kB are stored (at least i understand it this way, but haven't tried).
third question: see second answer, although i'm not completely sure. I think the more important question is: Why would you store more? ;)
fourth: encoding does not happen because of safety reasons. The data is encoded...
...to store the widest range of binary session data in a text column
(according to this)
Hope this helps!
I am working on an application which will generate unique random numbers and then store them into a database. I will check if a number exists through a HTTP request. Initially, for getting started, I would use around 10,000 numbers.
Is this the right approach?
Generate a random number, and, one by one, store them into an array and continue checking for array uniqueness, and when the array is complete, store the whole array to the database after sorting it.
Use the database and check to see if a number exists or not.
Which database should I use, as the application can scale up to 1 million numbers.
It may be more efficient, particularly if you want to generate 1000000 numbers, to make them one at a time and use validations in the model/database prevent duplicates.
As regards choosing a database, it will depend a little on you intended application. There is some info here: Which is the Best database for Rails application?
I can't comment on using a database directly from ruby without rails because I have not done that. One of the big pluses for rails for me is how easy it makes creating apps that use a database.
A couple thoughts:
If you are storing 10 or 10,000 "random" numbers, what difference does it make whether they are random going into the database, or if the database randomly picks one number of a range of 10,000 sequential numbers? Do you need doubly-random number selections? MySQL, PostgreSQL and other DBMs can generate random numbers, and you can use their random number generator to retrieve a row, so you could either have it return a value directly from its generator, or grab a row. Either way, you don't need to worry about Ruby creating a random value -- unless you really want "triplely"-random numbers. I'd just stick the values of a (1..10_000) range into the database and call that part done and work on a query to grab records randomly.
If you want truly random numbers, you can't guarantee uniqueness. If you're happy with pseudo-random, you still have a problem because you could end up returning duplicates from inside the range unless you track which numbers you've used previously for a particular session. How you track uniqueness across a bunch of sessions is going to be an interesting problem if your site gets popular.
If I was doing this, I'd reverse some of the process. I wouldn't store the "random" values in the database, I'd use Ruby's built-in random number generator, and then probably check the database to see if I'd previously generated that number for that particular session. Overall, fewer values would be stored in the database so lookups to determine uniqueness would happen faster.
That would still be an awkward system to code and would grow inefficient over time as the "unique" records for sessions grew.
To do this without a database I'd create the random/unique range using something like: array = (1..10_000).to_a.shuffle, then each time I needed a value I'd use pop to pull the last value from the randomized array. I'd be tempted to pull from that pool of values for all sessions until it was exhausted, then regenerate it. There'd be a possibility of duplicate "unique" values at that point, but there should be a pretty small chance of the same number reappearing twice in a row.
I'm attempting to submit a large database containing many tables to a web service by sending the data via JSON. Extracting the data and converting it to a JSON string is working fine but so far I have only implemented it to send one table at a time each with its own ASIHTTPRequest. My question is whether or not concatenating all the JSON strings generated from each table is a good idea or if I should first combine the tables in their abstract data form, before converting all of them together to JSON?
Alternatively if there is any other suggestion that would be good too.
It entirely depends on your needs. If the tables are unrelated, multiple requests may be more appropriate because if a request fails (timeouts or loss of connection), it won't affect any other requests. However if you have tables with associations with one another, it would be better to send it all in one go so either all the data transmitted wholly or did not so you don't end up with broken associations.
You can't just "concatenate" JSON strings. The result will not be legal JSON. You need to somehow "splice" them.
And, of course, the server on the other end must be capable of parsing the resulting JSON -- it may only expect one table at a time, eg.
I dont see any issue in doing any one of the two choices you proposed
But i would suggest concatenate the tables in the database before converting so that you dont deal with string concatenations and other form of processes
I have ten master tables and one Transaction table. In my transaction table (it is a memory table just like ClientDataSet) there are ten lookup fields pointing to my ten master tables.
Now i am trying to dynamically assigning key field values to all my lookup key field values (of the transaction table) from a different Server(data is coming as a soap xml). Before assigning these values i need to check whether the corresponding result value is valid in master tables or not. I am using a filter (eg status = 1 ) to check whether it is valid or not.
Currently how we are doing is, before assigning each key field value we are filtering the master tables using this filter and using the locate function to check whether it is there or not. and if located we will assign its key field value.
This will work fine if there is only few records in my master tables. Consider my master tables having fifty thousand records each (yeah, customer is having so much data), this will lead to big performance issue.
Could you please help me to handle this situation.
Thanks
Basil
The only way to know if it is slow, why, where, and what solution works best is to profile.
Don't make a priori assumptions.
That being said, minimizing round trips to the server and the amount of data transferred is often a good thing to try.
For instance, if your master tables are on the server (not 100% clear from your question), sending only 1 Query (or stored proc call) passing all the values to check at once as parameters and doing a bunch of "IF EXISTS..." and returning all the answers at once (either output params or a 1 record dataset) would be a good start.
And 50,000 records is not much, so, as I said initially, you may not even have a performance problem. Check it first!