Advantages and disadvantages of BLOBS (security) - ruby-on-rails

A year ago, when I did simple PHP sites for people with a simple MySQL database, I was brought up to think that storing an entire image in the database was possible but a terrible idea. Instead you should store the image in the filesystem and simply store an image path in the database. I did agree with that from the start, despite my inexperience. It must keep the database light when you're backing it up to an external service, and makes it faster during actual local use. This later point, however, is complete speculation, and I'd like someone to clarify my theories:
When you store the images associated with objects in the database as a BLOB, when you request this object, is the whole object and its attributes (including this huge amount of image information) written to memory, even when it's not needed? E.g.
2.0.0p247 :001 > Object.column_names
=> ["id", "name", "blob"]
2.0.0p247 :001 > Object.first.blob
=> # not sure what this will return! I'm guessing a matrix-like wall of image information?
2.0.0p247 :003 > Object.first.name
User Load (0.8ms) SELECT "users".* FROM "users" ORDER BY "users"."id" ASC LIMIT 1
=> "Kitty"
I understand that the call to Object.first.blob will take a relatively long amount of time because we're retrieving a large amount of image information. But will Object.first.name take the same amount of time because Object.first writes everything, id, name and blob all to memory? If the answer to this question is yes, that's a pretty good reason to never use BLOBS. However, if the answer is no, and rails is smart enough to only write requested attributes to memory then BLOBS suddenly become very attractive.
To be quite honest with you guys I'm really crossing my fingers that you'll say storing images in a BLOB is fine and dandy. It'll make things so much easier. Backing up will be simple. It'll feel very nice to back up the dynamic content of the site in one 'modular' upload instead of resorting to some elaborate whenever augmented rake task to make sure the paths and their respective images are uploaded to an external location.
More so it is absolutely impossible to make certain images private with Rails. I've searched high, I've searched low, I've asked here on SO. Got a few upvotes, but no solid response. No tutorials online. Nothing. Bags of tutorials on how to store images in the assets folder, but nothing to make images private.
Let's say I have three types of user, typeA, typeB and typeC. And let's say I have three types of images. So database schema would be as follows:
images
=> ["image_path","blob","type"]
users
=> ["name","type"]
What I want is that the users can request only the following:
typeA:
Can only view images with a type of A
Cannot view images with a type of B
Cannot only view images with a type of C
typeB:
Can only view images with a type of B
Cannot view images with a type of A
Cannot only view images with a type of C
typeC:
Can only view images with a type of C
Cannot view images with a type of A
Cannot only view images with a type of B
And yes, I could have given you the example with two types of user and image, but I really want to make sure you understand the problem; the actual system I have in mind will have hundreds of types.
Like I say, pretty simple idea, but I've found it impossible with rails, because all images are stored in the public folder! So a typeB user can just type /assets/typeAImage.jpg and they've got it! Heck, even someone who isn't a user can do it.
The send_file method won't work for me, because I'm not sending them the image as download per sae, I'm showing them image in the view.*
Now, using BLOBS would very neatly solve this problem if the images were stored in the database. I'm not sure of the actual syntax, but I could do something like this in a user view:
<% if current_user.type == image.type do %>
<%= image_tag image.blob #=> <img src="/assets/typaAImage.jpg" alt="..." class="..."> %>
<% end %>
And yeah, you could do exactly the same thing with a path:
<% if current_user.type == image.type do %>
<%= image_tag image.path #=> <img src="/assets/typaAImage.jpg" alt="..." class="..."> %>
<% end %>
but like I say, someone who isn't even a user could simply request /assets/typeAImage.jpg. Impossible to do this if it's stored in a BLOB.
In conclusion:
What's the problem with popplers BLOBS? I'm running a postgres database on Heroku with dozens of users per second. So yeah, not Facebook, but not Allegory on the Pointless of Life either, so performance matters. They've also got a strong mobile following so speed is of the essence. Will using BLOBS clash with this?
How do I display an image stored in a BLOB in a view?
And just to confirm, BLOBS will allow me to securely show secure secret images to certain members over https?
What about database backup speed? That'll take a hit, but I want to backup the images anyway and it's a nightly thing so who cares if it's slow? Right?
The images will be secure so long as the backup is encrypted, right? And just as passwords are stored as hashes within the database, should I store my super-secret BLOBS in an encrypted format as well? I'm thinking yes... Do you reckon bcrypt will be up to the task? I don't see why not.
Are BLOBS considered amateurish and lazy?
...and finally a bonus point (possibly outside the scope of the question):
*= As I wrote this I was thinking 'yes, but showing the image in the view is downloading the image to them. So can the send_file method be used to create private images in the way I describe and use the filesystem to store the images?

To answer the first question: Yes, it's possible in rails to lazy load some attributes, but by default Active Record does not support it There is a gem for this though. (DataMapper does support this by default, and there is a plugin for Sequel as well).
For the second part: the biggest drawback of your approach is performance. Assets are best served via a fast, static web server that can load the files from the filesystem, and not through a dynamic application. Clogging up the database server to query large blobs is not recommended, especially since it's much harder to scale a database than a file system.
And for the last part: there are various options you can use to hide files from the user, and only serve them when needed. One of the options is X-SendFile, or X-Accel-Redir where you can specify a filename inside the returned headers in Rails, and the web-server that is proxying the requests (and which also support this header) will pick that up, and serve that file. This is not a redirect, so the URL will still be the same, and the file can still be hidden from normal access. Of course for this you have to proxy your requests to rails through a web-server, which is usually already happening at least at the load-balancing level, or if you are using passenger.
Also note that you can tell Rails to send X-SendFile headers when serving the ordinary asset files as well.
Also see this answer.

Related

How can I preserve storage space and load time with Active Storage?

I have a user submission form that includes images. Originally I was using Carrierwave, but with that the image is sent to my server for processing first before being saved to Google Cloud Services, and if the image/s is/are too large, the request times out and the user just gets a server error.
So what I need is a way to upload directly to GCS. Active Storage seemed like the perfect solution, but I'm getting really confused about how hard compression seems to be.
An ideal solution would be to resize the image automatically upon upload, but there doesn't seem to be a way to do that.
A next-best solution would be to create a resized variant upon upload using something like #record.images.first.variant(resize_to_limit [xxx,xxx]) #using image_processing gem, but the docs seem to imply that a variant can only be created upon page load, which would obviously be extremely detrimental to load time, especially if there are many images. More evidence for this is that when I create a variant, it's not in my GCS bucket, so it clearly only exists in my server's memory. If I try
#record.images.first.variant(resize_to_limit [xxx,xxx]).service_url
I get a url back, but it's invalid. I get a failed image when I try to display the image on my site, and when I visit the url, I get these errors from GCS:
The specified key does not exist.
No such object.
so apparently I can't create a permanent url.
A third best solution would be to write a Google Cloud Function that automatically resizes the images inside Google Cloud, but reading through the docs, it appears that I would have to create a new resized file with a new url, and I'm not sure how I could replace the original url with the new one in my database.
To summarize, what I'd like to accomplish is to allow direct upload to GCS, but control the size of the files before they are downloaded by the user. My problems with Active Storage are that (1) I can't control the size of the files on my GCS bucket, leading to arbitrary storage costs, and (2) I apparently have to choose between users having to download arbitrarily large files, or having to process images while their page loads, both of which will be very expensive in server costs and load time.
It seems extremely strange that Active Storage would be set up this way and I can't help but think I'm missing something. Does anyone know of a way to solve either problem?
Here's what I did to fix this:
1- I upload the attachment that the user added directly to my service provider ( I use S3 ).
2- I add an after_commit job that calls a Sidekiq worker to generate the thumbs
3- My sidekiq worker ( AttachmentWorker ) calls my model's generate_thumbs method
4- generate_thumbs will loop through the different sizes that I want to generate for this file
Now, here's the tricky part:
def generate_thumbs
[
{ resize: '300x300^', extent: '300x300', gravity: :center },
{ resize: '600>' }
].each do |size|
self.file_url(size, true)
end
end
def file_url(size, process = false)
value = self.file # where file is my has_one_attached
if size.nil?
url = value
else
url = value.variant(size)
if process
url = url.processed
end
end
return url.service_url
end
In the file_url method, we will only call .processed if we pass process = true. I've experimented a lot with this method to have the best possible performance outcome out of it.
The .processed will check with your bucket if the file exists or not, and if not, it will generate your new file and upload it.
Also, here's another question that I have previously asked concerning ActiveStorage that can also help you: ActiveStorage & S3: Make files public
I absolutely don't know Active Storage. However, a good pattern for your use case is to resize the image when it come in. For this
Let the user store the image in Bucket1
When the file is created in Bucket1, an event is triggered. Plug a function on this event
The Cloud Functions resizes the image and store it into Bucket2
You can delete the image in Bucket1 at the end of the Cloud Function, or keep it few days or move it to cheaper storage (to keep the original image in case of issue). For this last 2 actions, you can use Life Cycle to delete of change the storage class of files.
Note: You can use the same Bucket (instead of Bucket1 and Bucket2), but an event to resize the image will be sent every time that a file is create in the bucket. You can use PubSub as middleware and add filter on it to trigger your function only with the file is created in the correct folder. I wrote an article on this

Serilog event not saved

I'm trying to log to file / seq an event that's an API response from a web service. I know it's not best practice, but under some circumstances, I need to do so.
The JSON saved on disk is around 400Kb.to be honest, I could exclude 2 part of it (that are images returned as base64), I think I should use a destructured logger, is it right?
I've tried increasing the Seq limit to 1mb but the content is not saved even to log file so I think that's not the problem...I use Microsoft Logging (ILogger interface) with Serilog.AspnetCore
Is there a way I can handle such a scenario?
Thanks in advance
You can log a serialized value by using the # format option on the property name. For example,
Log.Information("Created {#User} on {Created}", exampleUser, DateTime.Now);
As you've noted it tends to be a bad idea unless you are certain that the value being serialized will always be small and simple.

Differences in Umbraco cache structure?

Ok, So I have just spent the last 6-8 weeks in the weeds of Umbraco and have made some fixes/Improvements to our site and environments. I have spent a lot of that time trying to correct lower level Umbraco caching related issues. Now reflecting on my experience and I still don't have a clue what the conceptual differences are between the following:
Examine indexes
umbraco.config
cached xml file in memory (supposedly similar to umbraco.config)
CMSContentXML Table
Thanks Again,
Devin
Examine Indexes are index of umbraco content
So when ever you create/update/delete content, the current content information will be indexed
This index are use for searching - under the hood, it is lucene index
umbraco backend use these index for searching
You can create your own index if you want
more info checkout, Overview & Explanation - "Examining Examine by Peter Gregory"
umbraco.config and cached xml in memory are really the same thing.
The front end UmbracoHelper api get content from the cache not the database - the cache is from the umbraco.config
CMSContentXML contains each content's information as xml
so essentially this xml represent all the information of a node content
So in a nutshell they represent really 3 things:
examine is used for searching
umbraco.config cached data - save round trip to DB
CMSContentXML stores full information of a content
Edit to include better clarification from Robert Foster comment and the UmbracoHelper vs ExamineManager
For the umbraco.config and CMSContentXML table, #robert-foster commented
umbraco.config stores the most recent version of all published content only; the in-memory cache is a cached version of this file; and the cmscontentxml table stores a representation of all content and is used primarily for preview mode - it is updated every time a content item is saved. IIRC it also stores a representation of other content types
Regards to UmbracoHelper vs ExamineManager
UmbracoHelper api mostly get it's content from the memory cache - IMO it works best when locating direct content, such as when you know the id of content you want, you just call Umbraco.TypedContent(id)
But where do you get the id you want in the first place? or put it another way, say if you want to find all content's property Title which contain a word "Test", then you would use Examine to search for it. Because Examine is really lucene wrapper, so it is going to be fast and efficient
Although you can traverse tree by method such as Umbraco.TypedContent(id).Children then use linq to filter the result, but I think this is done in memory using linq-to-object, so it is not as efficient and preferment as lucene
So personally I think:
use Examine when you are searching (locating) for content - because you can use the capability of a proper search engine lucene
once you got the ids from the search result, use UmbracoHelper to get the full publish content representation of the content id into strong type model and work with the data.
one thing #robert-foster mention in the comment which, I did not know is that UmbracoHelper provides Search method which is a wrapper around the examine, so use that if more familiar with that api.
Lastly, if any above statement are wrong or not so correct, help me clarify so that anyone look at it later will not get it wrong, thanks all.

Search in Content and skip html elements in mvc

I want to search in content and I don't want to get fault result.
assume users search 'br' I don't want to see in output results that have <br> or <P> and other html elements
Simply, you must strip the tags before you search. However, that would mean not being able to query the database directly. Rather, you'd have to pull all the objects first, and then query the collection in memory.
If you're going to be doing a lot of this or have large collections of objects (where pulling all of them for the initial query would be a performance drag), then you should look into a true search solution. I've been working with Elasticsearch, which seems to be just about the best out there in my opinion. It's easy to set up, easy to use, and has third-party .NET integration through the nuget package, NEST.
With a true search solution, you can index your content fields, stripped of HTML, and then run your queries on the index instead of directly on your database. You'll also get powerful advanced features such as faceting, which would be difficult or impossible to do directly with Entity Framework.
Alternatively, if you can't go full board on the search and it's unacceptable to query everything up front (which really it pretty much always is), then your only other option is to create another companion field for each HTML content property, and always save a HTML-stripped copy of the text there. Then, use that field for your search queries.

How to store a picture within Active Directory using Ruby in a Rail3App?

All I want to do is to upload an image into the Active Directory. So far I can update any AD information but the image. I have tried to search for some idea but came up with nothing so far.
Do I have to encode an image in a certain way? Do I just ldap-replace the jpegPhoto attribute with a byte-string of the photo?
Any hint towards a solution would be great.
Thanks in advance!
First of all, there is an attribute in Active directory called thumbnailPhoto. According to this Microsoft article The thumbNailPhoto attribute contains octet string type data. The AD interprets octet string data as an array of bytes.
If you want a sample code in C# you can get something here.
On the theorical point of view you can also inject a photo with LDIF using tools like "B64" to code your image file in base 64.
Secondly, On my point of view a Directory is not a database.
So, even if the attribute exists (created by netscape according to the OID 2.16.840.1.113730.3.1.35), even if Microsoft explain us how to put a picture into Active Directory, I think that it's better to register an URL, or a path to a file from a file system into a Directory.
I have no idea of the impact on performance of AD if I load each entry with 40 Ko (average size of a thumbnail photo). But I know that if there are bad written programs on the network, I mean kind of program that load all the attributes when they search an entry into the directory, this will considerably load the network.
I hope it helps.
JP
I had this issue and was able to get it working by creating a File stream and passing it through to #ldap.replace_attribute as a binary file. i.e.
thumbnail_stream = open("path_to_file")
#ldap.replace_attribute USERS_DN, :thumbnailPhoto, File.binread(thumbnail_stream)
Where #ldap is an instance of net/ldap, bound to AD. i.e.
#ldap = Net::LDAP.new
#ldap.host = ''
#ldap.port = ''
#ldap.auth USERNAME, PASSWORD
#ldap.bind

Resources