Ranger tag permissions appear to not take effect in Atlas - apache-ranger

Having problem where it appears that policy tags set in Ranger appear to not take effect in Atlas.
Roughly following the tutorial here (https://hortonworks.com/tutorial/tag-based-policies-with-apache-ranger-and-apache-atlas/section/2/#create-ranger-tag-based-policy), trying to create a tag policy for classifications created in Atlas.
Created a classification in Atlas for an hdfs_path entity
Then created a ranger tag for that Atlas PHI classification that only allows certain atlas actions for a user not the atlas admin user, in Service Manager > Tag Based Policies
In Service Manager > atlas Policies, I make an Atlas service that uses that tag
and disable the Ranger Atlas service policy related to allowing public access to Atlas
Yet logging into Atlas as admin (not the user specified in the Ranger tag), I can still search for and find atlas entities that have the PHI tag assigned to them as well as remove and (re)add the tag, evidenced in the Ranger audit logs...
I would think this should not be possible. I would expect the tags column to have the custom tag in it and for access by "admin" to have been denied.
As an HDFS example...
Despite the fact that the Ranger tag only specifies user hdfs, I can still access the HDFS location as user "admin". I notice several things about the Ranger audit shown below
The "Name/Type" includes the Atlas classifications associated with the resource
The tags column is empty
I interpret this to mean that 1) Ranger recognizes that the location is associated with some Atlas tags and 2) it does not see any tags for or against allowing the user "admin" to access that resource.
Can anyone with more Atlas+Ranger experience let me know what I am getting wrong here? Any debugging suggestions?

So first in the log4j settings enable tracing (log4j.properties)
TagSync:
LogLevel: INFO
Then make sure theat after a tag is created in atlas via the UI that it propagates to the Ranger Database table called 'public.x_tag' or public.x_tag_def. You will see the tag first ends up on kafka and then tag sync adds it to ranger's database. Once you enable tracing the logs will show this and if it doesn't work it will show why not.
What's likely is that the service names in ranger do not match if you say the policy enforcement is not working and you may need to specify the correct qualified name in atlas when creating the classification
in Atlas creating the tag:
Notice the qualified name
Then back in ranger
What I noticed is that if the qualified name of the tag doesn't match that of the services created in ranger, the policies do not enforce. Unfortunately, it's nowhere in any documentation and I had to figure that out in the logs

Related

is it possible to set the region Google Cloud Dataflow uses at a project or organisation level?

My employers recently started using Google Cloud Platform for data storage/processing/analytics.
We're EU based so we want to restrict our Cloud Dataflow jobs to stay within that region.
I gather this can be done on a per job/per job template basis with --region and --zone, but wondered (given that all our work will use the same region) if there's a way of setting this in a more permanent way at a wider level (project or organisation)?
Thanks
Stephen
Update:
Having pursued this, it seems that Adla's answer is correct, though there is another workaround (which I will respond with). Further to this, there is now an open issue with google this now which can be found/followed at https://issuetracker.google.com/issues/113150550
I can provide a bit more information on things that don't work, in case that helps others:
Google support suggested changing where dataprep-related folders were stored as per How to change the region/zone where dataflow job of google dataprep is running - unfortunately this did not work for me, though some of those responding to that question suggest it has for them.
Someone at my workplace suggested restricting Dataflow's quotas for non-EU regions here: https://console.cloud.google.com/iam-admin/quotas to funnel it towards using the appropriate region, but when tested Dataprep continued to favour using US.
Cloud Dataflow uses us-central1 as a default region for each job and if the desired regional endpoint differs from the default region, the region needs to be specified in every Cloud Dataflow command job launched for it to run there. The zone will be automatically assigned workers to the best zone within the region, but you can also specify it with --zone.
As of this moment it is not possible to force the region or zone used by Cloud Dataflow based on the project or organization settings.
I suggest you to request a new Google Cloud Platform feature. Make sure to explain your use case and how this feature would be useful for you.
As a workaround, to restrict the jobs creation on Dataflow for a specific region and zone, you can write a script or application to only create jobs with the specific region and zone you need. If you also want to limit the creation of jobs to be done only with the script, you can remove your users’ job creation permissions and only give this permission to a service account which would be used by this script
A solution Google support supplied to me, which basically entails using Dataprep as a Dataflow job builder rather than a tool in of itself
Create the flow you want in Dataprep, but if there's data you can't send out of region, create a version of it (sample or full) where the sensitive data is obfuscated or blanked out & use that. In my case, setting the fields containing a user id to a single fake value was enough.
Run the flow
After the job has been executed once, in the Dataprep webUI under “Jobs”, using the three dots on the far right of the desired job, click on “Export results”.
The resulting pop up window will have a path to the GCS bucket containing the template. Copy the full path.
Find the metadata file at the above path in GCS
Change the inputs listed in the files to use your 'real' data instead of the obfuscated version
In the Dataflow console page, in the menu to create a job using a custom template, indicated the path copied from 2 as the “Template GCS Path”.
From this menu, you can select a zone you would like to run your job in.
It's not straightforward but it can be done. I am using a process like this, setting up a call to the REST API to trigger the job in the absence of Dataflow having a scheduler of it's own.

Method(s) to Associate Text File with neo4j Node

How would one associate a file with a node in neo4j? For instance, I have ' Company ' nodes, which refer to companies that have issued investment securities. I would like to link the node to the file that has a basic description of what that company offers for sale. In other programs, I would add a hyperlink to the file.
The neo4j instance is running on a single computer as part of my personal workflow, so I do not need to concern myself with network connectivity or sharing w/ colleagues, etc. at this point.
Also, I reviewed two seemingly-related questions, including the one entitled Neo4J: Binary File storage and Text Search "stack", but neither seems to fit the bill.
Thoughts?
You can store a reference to the file as a property on the node. This can be either a URL or an id referencing the file in another database system (such as MongoDB).

Attaching/uploading files to not-yet-saved Note - what is best strategy for this?

In my application, I have a textarea input where users can type a note.
When they click Save, there is an AJAX call to Web Api that saves the note to the database.
I would like for users to be able to attach multiple files to this note (Gmail style) before saving the Note. It would be nice if the upload could start as soon as attached, before saving the note.
What is the best strategy for this?
P.S. I can't use jQuery fineuploader plugin or anything like that because I need to give the files unique names on the server before uploading them to Azure.
Is what I'm trying to do possible, or do I have to make the whole 'Note' a normal form post instead of an API call?
Thanks!
This approach is file-based, but you can apply the same logic to Azure Blob Storage containers if you wish.
What I normally do is give the user a unique GUID when they GET the AddNote page. I create a folder called:
C:\TemporaryUploads\UNIQUE-USER-GUID\
Then any files the user uploads at this stage get assigned to this folder:
C:\TemporaryUploads\UNIQUE-USER-GUID\file1.txt
C:\TemporaryUploads\UNIQUE-USER-GUID\file2.txt
C:\TemporaryUploads\UNIQUE-USER-GUID\file3.txt
When the user does a POST and I have confirmed that all validation has passed, I simply copy the files to the completed folder, with the newly generated note ID:
C:\NodeUploads\Note-100001\file1.txt
Then delete the C:\TemporaryUploads\UNIQUE-USER-GUID folder
Cleaning Up
Now. That's all well and good for users who actually go ahead and save a note, but what about the ones who uploaded a file and closed the browser? There are two options at this stage:
Have a background service clean up these files on a scheduled basis. Daily, weekly, etc. This should be a job for Azure's Web Jobs
Clean up the old files via the web app each time a new note is saved. Not a great approach as you're doing File IO when there are potentially no files to delete
Building on RGraham's answer, here's another approach you could take:
Create a blob container for storing note attachments. Let's call it note-attachments.
When the user comes to the screen of creating a note, assign a GUID to the note.
When user uploads the file, you just prefix the file name with this note id. So if a user uploads a file say file1.txt, it gets saved into blob storage as note-attachments/{note id}/file1.txt.
Depending on your requirement, once you save the note, you may move this blob to another blob container or keep it here only. Since the blob has note id in its name, searching for attachments for a note is easy.
For uploading files, I would recommend doing it directly from the browser to blob storage making use of AJAX, CORS and Shared Access Signature. This way you will avoid data going through your servers. You may find these blog posts useful:
Revisiting Windows Azure Shared Access Signature
Windows Azure Storage and Cross-Origin Resource Sharing (CORS) – Lets Have Some Fun

Prevent users accessing image directory contents

I am creating a site using ASP.NET MVC4, one of the functions on the site is for users to upload images. The images may be of a personal nature, almost definitely containing images of their children.
The images are being stored on MS Azure SQL Database along with their metadata. To save bandwidth usage on azure, once the image has been downloaded, it saves to a user directory
~/UserImages/<Username>/<Image>
When the gallery page is loaded, the controller action checks the database against what is in the users directory and just brings down any not already there.
The <Username> part of the directory is created by the controller when required, so I am unable to set IIS permission on it. However even if I was, I am unsure what IIS could do as the users are not known in advance (new registrations etc).
Due to MVC routing, it wont be possible for users to access other users directories by guessing usernames, however if you can guess a username AND imagename, then it does display. I am looking for ideas on preventing that from happening to minimise the chance of someone elses images becoming exposed to others.
I have tried an IgnoreRoute but this didn't work.
routes.IgnoreRoute("UserImages/{*pathInfo}");
Ideally I would have the UserImages directory cleared on logout but not everyone will use logout command. If they were cleared out there is a much smaller chance of something finding the combination of username and imagename before the files are removed.
How about instead of storing your cached images within the actual site structure as static content fed by IIS, you store the images in a path outside the site.
That would ensure no unauthorized user could access them directly.
Then you can provide access to those images through a Controller (UserImagesController maybe) Action that can validate that the image being requested is one to which the current user has access.
Your check might end up being as simple as checking the requested UserName parameter of the action is the same as your current user's UserName.
With this approach you can also control the cache headers, expiration, etc, of those images.

Documents as nodes and Security Mechanism

I'm very new using either neo4jDatabase or neo4jclient driver, I'm trying to create a proof-of-concept to understand if make sense to use this technology and I've the following doubts, (I tried to search over the web but no answers...).
I have some entities that have Documents associated with them, (PDFs, DOCx ...), is it possible to have a Node property pointing to those documents? or Can documents be added as a Graph Node with a Lucene index so that a search could return document node and related relationships?
How does the Security works? is it possible to the users have access to the nodes taking in consideration their profile? Imagine that the nodes represent documents how can be implemented a security mechanism that the users only access their nodes (Documents)?
Q1: You can simply add a node property with a URI referencing the document of choice. That could be pointing to blob storage, local disk, wherever you store your documents. You could add binary objects in a node's property (by using a byte array) but I wouldn't advise doing that, since that just adds bulk to the database footprint. For reference, here are all the node property types supported.
Q2: Security is going to be on the database itself, not on nodes. Node-level (or document-level in your case) security would need to be implemented in your application. To keep data secure, you should consider hiding your Neo4j server (and related endpoint) behind a firewall and not expose it to the web. For example, in Windows Azure, you'd deploy it to a Virtual Machine without any Input Endpoints, and just connect via an internal connection. For all the details around neo4j security, take a look at this page.
1) What David said.
2) For resource level security, you need to model this in to your graph. There's an example at http://docs.neo4j.org/chunked/milestone/examples-acl-structures-in-graphs.html

Resources