Distributed file source - storage

Looking for a software solution to to store large files (>50MB - 1.5GB), distributed across multiple servers. We have looked at MogileFS, however, given existing software demands, need to have an NFS interface. Would prefer open source, however, open to all options.

If you have only a small number of servers, you could try rsync. Couple it with SSH for some security.

did you look at Hadoop DFS?
it's much more then distributed file system but should be good for very big files.

Related

dockerfile create a image private [duplicate]

We all know situations when you cannot go open source and freely distribute software - and I am in one of these situations.
I have an app that consists of a number of binaries (compiled from C sources) and Python code that wraps it all into a system. This app used to work as a cloud solution so users had access to app functions via network but no chance to touch the actual server where binaries and code are stored.
Now we want to deliver the "local" version of our system. The app will be running on PCs that our users will physically own. We know that everything could be broken, but at least want to protect the app from possible copying and reverse-engineering as much as possible.
I know that Docker is a wonderful deployment tool so I wonder: is it possible to create encrypted Docker containers where no one can see any data stored in the container's filesystem? Is there a known solution to this problem?
Also, maybe there are well known solutions not based on Docker?
The root user on the host machine (where the docker daemon runs) has full access to all the processes running on the host. That means the person who controls the host machine can always get access to the RAM of the application as well as the file system. That makes it impossible to hide a key for decrypting the file system or protecting RAM from debugging.
Using obfuscation on a standard Linux box, you can make it harder to read the file system and RAM, but you can't make it impossible or the container cannot run.
If you can control the hardware running the operating system, then you might want to look at the Trusted Platform Module which starts system verification as soon as the system boots. You could then theoretically do things before the root user has access to the system to hide keys and strongly encrypt file systems. Even then, given physical access to the machine, a determined attacker can always get the decrypted data.
What you are asking about is called obfuscation. It has nothing to do with Docker and is a very language-specific problem; for data you can always do whatever mangling you want, but while you can hope to discourage the attacker it will never be secure. Even state-of-the-art encryption schemes can't help since the program (which you provide) has to contain the key.
C is usually hard enough to reverse engineer, for Python you can try pyobfuscate and similar.
For data, I found this question (keywords: encrypting files game).
If you want a completely secure solution, you're searching for the 'holy grail' of confidentiality: homomorphous encryption. In short, you want to encrypt your application and data, send them to a PC, and have this PC run them without its owner, OS, or anyone else being able to scoop at the data.
Doing so without a massive performance penalty is an active research project. There has been at least one project having managed this, but it still has limitations:
It's windows-only
The CPU has access to the key (ie, you have to trust Intel)
It's optimised for cloud scenarios. If you want to install this to multiple PCs, you need to provide the key in a secure way (ie just go there and type it yourself) to one of the PCs you're going to install your application, and this PC should be able to securely propagate the key to the other PCs.
Andy's suggestion on using the TPM has similar implications to points 2 and 3.
Sounds like Docker is not the right tool, because it was never intended to be used as a full-blown sandbox (at least based on what I've been reading). Why aren't you using a more full-blown VirtualBox approach? At least then you're able to lock up the virtual machine behind logins (as much as a physical installation on someone else's computer can be locked up) and run it isolated, encrypted filesystems and the whole nine yards.
You can either go lightweight and open, or fat and closed. I don't know that there's a "lightweight and closed" option.
I have exactly the same problem. Currently what I was able to discover is bellow.
A. Asylo(https://asylo.dev)
Asylo requires programs/algorithms to be written in C++.
Asylo library is integrated in docker and it seems to be feаsable to create custom dоcker image based on Asylo .
Asylo depends on many not so popular technologies like "proto buffers" and "bazel" etc. To me it seems that learning curve will be steep i.e. the person who is creating docker images/(programs) will need a lot of time to understand how to do it.
Asylo is free of charge
Asylo is bright new with all the advantages and disadvantages of being that.
Asylo is produced by Google but it is NOT an officially supported Google product according to the disclaimer on its page.
Asylo promises that data in trusted environment could be saved even from user with root privileges. However, there is lack of documentation and currently it is not clear how this could be implemented.
B. Scone(https://sconedocs.github.io)
It is binded to INTEL SGX technology but also there is Simulation mode(for development).
It is not free. It has just a small set of functionalities which are not paid.
Seems to support a lot of security functionalities.
Easy for use.
They seems to have more documentation and instructions how to build your own docker image with their technology.
For the Python part, you might consider using Pyinstaller, with appropriate options, it can pack your whole python app in a single executable file, which will not require python installation to be run by end users. It effectively runs a python interpreter on the packaged code, but it has a cipher option, which allows you to encrypt the bytecode.
Yes, the key will be somewhere around the executable, and a very savvy costumer might have the means to extract it, thus unraveling a not so readable code. It's up to you to know if your code contains some big secret you need to hide at all costs. I would probably not do it if I wanted to charge big money for any bug solving in the deployed product. I could use it if client has good compliance standards and is not a potential competitor, nor is expected to pay for more licenses.
While I've done this once, I honestly would avoid doing it again.
Regarding the C code, if you can compile it into executables and/or shared libraries can be included in the executable generated by Pyinstaller.

Is it feasible to have one docker image for an already existing application with multiple dependencies

I am new to Docker and want to learn the ropes with real-life challenges.
I have an application hosted on IIS and has dependencies over SQL Express and SOLR.
I want to understand the following:
Is it possible to have my whole set-up, including of enabling IIS,
SQL, SOLR and my application in one single container?
If point 1 is feasible, how should I start with it?
Sorry if my questions are basics.
It is feasible, just not a good practice. You want to isolate the software stack to improve the mantainability (easier to deploy updates), modularity (you can reuse a certain component in a different project and even have multiple projects reusing the same image) and security (a software vulnerability in a component of the stack will hardly be able to reach a different component).
So, instead of putting all together into the same image, I do recommend using Docker Compose to have multiple images for each component of the stack (you can even pull generic, up-to-date images from Docker Hub) and assemble them up from the Compose file, so with a single command you can fire up all the components needed for your application to work.
That being said, it is feasible to have all the stack together into the same Dockerfile, but it will be an important mess. You'll need a Dockerfile that installs all the software required, which will make it bulky and hard to mantain. If you're really up for this, you'll have to start from a basic OS image (maybe Windows Server Core IIS) and from there start installing all the other software manually. If there are Dockerfiles for the other components you need to install and they share the same base image or a compatible one, you can straight copy-paste the contents into your Dockerfile, at the cost of said mantainability.
Also, you should definitely use volumes to keep your data safe, especially if you take this monolithic approach, since you risk losing data from the database otherwise.
TL;DR: yes, you can, but you really don't want to since there are much better alternatives that are almost as hard.

Is there any available panel to store machine learning models and their config files in a structured manner?

Saving different models with their corresponding config files, tracking the results and parameters, searching among them using customized filters and maybe always having a pointer to the current SOTA can be quite time-saving.
I couldn't even find something similar to TensorFlow Hub on the local server. Right now, closest I could get is Git LFS.
Is there anything better out there?
I found the answer. A few open source projects are trying to do the job. The first one is named Data Science Version Control or DVC. Which according to the docs is:
simple command line Git-like experience. Does not require installing and maintaining any databases. Does not depend on any proprietary online services;
It manages and versions datasets and machine learning models. Data is saved in S3, Google cloud, Azure, Alibaba cloud, SSH server, HDFS or even local HDD RAID;
It makes projects reproducible and shareable, it helps to answer the question: "how the model was built";
It helps manage experiments with Git tags or branches and metrics tracking;
The other possible solution to think of is MinIO which is an object storage server
suited for storing unstructured data such as photos, videos, log files, backups and container / VM images.
Microsoft Azure has a service called Azure Machine Learnig service that does exactly this, but goes much further with governance/model explainability/DevOps etc. We also include free tiers to a lot of services and announced unlimited private repos on GitHub recently.

Where can I store a copy of the source code for my ASP.NET MVC application?

I have a MVC4 application that runs on the cloud. I keep multiple backups but I would like also to have a backup some place online.
Does anyone have any suggestions as to where I could put this backup. In total the 3 projects I have occupy about 250MB. Could I store these somehow in an area of the cloud that I am already paying for. What about other places online?
Do you not use version control? If not, this would be the time to start.
Check out GitHub or Bitbucket. Both are free to use for public repositories, and have very affordable plans for private projects.
I would recommend Microsoft Team Foundation Server. TFS is integrated into Visual Studio.
Some of the key features
Up to 5 users free of charge
Unlimited number of projects
Continuous delivery to Azure (that can be very handy for you)
Work item tracking
Agile planning tools
Feedback management
VS Integrated Build (however, that feature is still in preview)
I've been using that service for several months and I would definitely recommend it.
I use dropbox for some of my backups, but there are plenty of alternatives, like Amazon S3, or hosting providers like RackSpace etc.
I also have a NAS box that I back up to. I installed SVN on it too, which I access via my laptop and my main desktop PC, so I pretty much always have at least 3 copies of the code at any one time. The SVN repository is also backed up to an external hard disk.
I don't use Azure (yet), so I don't know how their backup services work. I did find this link though, that might help you figure out if you can store your files separately.
I imagine, though, with it being hosted in a cloud based system, your code will be backed up and spread across quite a few servers - so it might not be that much of a problem for you.

CMS, or pre-baked solutions for community file sharing

I want to create a community around a current iPhone app I've built. It will allow registered users to upload and download small configuration or settings files, which are used in my app to customize functionality. These files are serialized plists (binary files around 500 bytes), but can be converted to a JSON or XML format if necessary.
I do not need an HTML front-end; I plan for it to be accessed only via my app. Files do not need to be private or secure. I do not plan to store or ask for any user private data--just a login and password.
I'm looking for tips that might get me close to my goals with the least amount of effort - I want to focus on the core functionality of the app, and have this as a stable feature that I can add to in the future if it is useful. I would of course prefer FOSS, but a commercial solution is not out of the question. Things like file sharing sites with apis, login ideas, and so on.
So, what software solutions are out there that I may not be aware of? I know that Drupal has modules to allow user logins. Is there something that would work not as a web app, but as a service only? Dropbox has file sharing and an API, but I'm not sure I could use it the way I'm intending.
In short, I could code this, but would prefer a pre-baked solution that would deal with things I may not have thought of. I am sure there must be something out there which I can use.
More Details, and what I plan on the service offering:
Registration of users via the iPhone, and all that entails (will code the UI myself--I just want an API to connect to)
Viewing of these files quickly and efficiently (the files were built with performance in mind, and this is a free app, so I would like to keep server costs down)
Uploading their own files, with a few integrity checks
Rating the files
Gathering statistics on usage (which files were downloaded most often), etc., to provide a way for the files to be ranked by rating, popularity, etc.
Optional - submitting revised versions of the files (a tree).
Optional but preferred - statistics on users (no. files uploaded, perhaps rewards system for sharing)
I'm just not up to date with current technologies and open source solutions. I have experience in SQL, relational database design, and have built backends in Java, so a custom solution is not out of the question. However, it's been a while, I'm not a security expert, and would prefer to not reinvent the wheel for what is a fairly simple project, so an off-the-shelf solution would be preferred.
Check out www.parse.com!
It is absolutely brilliant for stuff like this.
You may want to look at source versioning systems like SVN or distributed systems like Mercurial or GIT. Both would be much better if the data were serialized to a text format, like JSON or XML as you mentioned.
Registration would need to be done by you of course
Viewing of files (including changes, of course) is quick and efficient. The interface can be done in a number of ways, even simulating command-line.
Uploading files will of course work, and changes made will be stored as diffs. Integrity checks can be done, for example, by Mercurial plugins
Rating the files probably can't be done directly unless you wanted an awkward hack involving parsing change entries or writing a plugin.
Submitting revised versions of files would work as that is the raison d'être of versioning systems.
Some statistics are made available in VCSs.
This is honestly a bit of a strange use for version control systems and not altogether elegant, but sometimes that's what innovation is about.
I suggest TikiWiki .
Pros:
Out-of-the-box all you need to build a community. (See reference below for list of features)
It's FOSS
It has 200 active developers - so it really has a lot of momentum.
Cons:
So many out-of-the-box features that it suffers from feature bloat. Configuration and initial set-up may be complicated.
Not really oriented to mobile platforms.

Resources