Apache Commons VFS thread safety and resource management - vfs

I'm looking into using Apache Commons VFS for a project that will need to transfer files between local server and remote servers via ftp, sftp and https.
The standard usage examples are getting the FileSystemManager from a static method
FileSystemManager fsManager = VFS.getManager();
Is it safe to use the same FileSystemManager across multiple threads?
And a second question is about properly releasing resources in a finally block: I find the following methods in the Javadoc API:
http://commons.apache.org/proper/commons-vfs/apidocs/org/apache/commons/vfs2/FileObject.html#close()
http://commons.apache.org/proper/commons-vfs/apidocs/org/apache/commons/vfs2/FileSystemManager.html#closeFileSystem(org.apache.commons.vfs2.FileSystem)
http://commons.apache.org/proper/commons-vfs/apidocs/org/apache/commons/vfs2/FilesCache.html#close()
http://commons.apache.org/proper/commons-vfs/apidocs/org/apache/commons/vfs2/impl/DefaultFileSystemManager.html#close()
But it's not clear to me which of these resources should typically be closed.

The filemanager and filesystem objects are supposed to be thread safe, however I would not bet my live on it. Some internal locking (especially around renames) depend on the instance of the FileObject, so you should not use a FileCache which does not keep those (i.e. the default cache is fine).
FileContent and streams should not be used concurrently (in fact FileContent.close() for example only acts on streams of the current thread).
There are some resource leaks in this area (hopefully all fixed in 2.1-SNAPSHOT).

The VFS.getManager provides a single manager ie. single access to the filesystem, so I wont recommend using it from multithreaded environment. You can create your own DefaultFileSystemManager and use the close method when you are done.

Related

Logging directly to standard output

Where I work, we are migrating our entire infrastructure which was until now based on monolithic services that ran directly on a windows/linux VM to a docker based architecture that will be orchestrated by Kubernetes.
One of the things that came to my mind is how we would handle logs in this new infrastructure.
Up until now, each app had its own way of handling logs, some were using log4net/log4j to write to file system, some were writing to GrayLog via a dedicated library.
The main problem I have with that is that one of the core ideas of programming micro-services in a Docker environment is that every service should assume as little as possible about the rest of services or the platform.
So basically I was looking into how I can abstract the logging process from the application, make it independent from the rest of the infrastructure.
One interesting thing that I found was that you could write the logs to standard output (stdout) and then configure Kubernetes to pull these logs and direct them to a centralised storage or a centralised logging server (like GrayLog) https://kubernetes.io/docs/concepts/cluster-administration/logging/
I have several concerns with this approach, for once, I haven't seen too many companies that do it, most popular logging solutions are to use a dedicated library to log to filesystem.
I am also concerned about how it might impact performance, some languages block if you write to stdout, whereas when you use a standard logging library, the logs are queued.
So what about services that output massive amount of user related logs?
I was interested about what you think, I didn't see this approach used widely, maybe there is reason for that.
Logging to whatever stream (File, stdout, GrayLog...) can either be synchronous (blocking) or asynchronous (non-blocking). Inherently, that has nothing to do with the medium you log to per-se. It is true that using System.out.println in Java will result in heavy thread-contention.
All the major logging frameworks (like log4j) provide you with the means to log in an asynchronous fashion to every medium that you like.
Your perception of not many companies doing this I think is wrong. Logging to stdout and configuring your underlying architecture to forward logs somewhere is the defacto standard of all PaaS/containerized applications.
So my tip is going to be: Log to stdout using a good logging framework which ensures asynchronous usage of the stream. For the rest you'll probably be fine.

Load Balancing in ASP.NET MVC Web Application. What can/can't be done?

I'm in the middle of developing a web application and have been asked the question whether it will work with a load balancer. My initial reaction is yes, since there is no state tracked between requests anywhere in the system. However, there is some application specific state loaded on app start (configuration settings from the database mainly.)
This data is all Read Only. Is it sufficient to rely on the normal cache dependency mechanisms to manage this and invalidate these objects across all the applications in the cluster or would I have to move to a shared cache system like App Fabric to ensure reliability/consistency?
With diagnostics enabled, I've got numerous logging calls using EventSource.Write and an out of process logger picking these up. I assume in this case, I'd need one logger installed on each of the servers in the cluster to pick up the events each one triggers. I'm not too fussed about that, but what is a good way to identify which server in the cluster serviced the request?
If you initialize the data on each server seperately and it is read-only, there's no problem. The separate applications will have a copy each.
Yes, you'd need a logger on each instance. In order to identify the server you could include the servers' IP into the log. That way you can track the server. (provided you have static IP's, but I assume you do).

How to run two grails apps on the same machine and have them not share a rabbitMQ

I have a grails app running with a single rabbit node. It is great. I want to fire up the same app a second time on the same machine on a different port. Currently, both apps answer jobs from both apps. I want their rabbits to be independent. What is the easiest way to ensure that each app only responds to the messages it sends? Multiple rabbit queues?
You can provide a virtualhost entry in the grails configuration:
rabbitmq.connectionfactory.virtualHost The name of the virtual host to connect to
Define two different vhosts in RabbitMQ, and each grails app will have their very own configured area to use. Messages sent through one vhost will only be available on that vhost, effectively separating the two grails apps without having to change queue setup or other internal parts of each app - just the configuration of the connection.
Remember that access control is performed on a per vhost basis, so you'll have to give your user access to each vhost in rabbitmq.
As #fiskfisk said, multiple vhosts is an option, and would work particularly well if you have a complex set of queues, exchanges, and bindings. There are some downsides to using a new vhost for the second application, including duplication of access control management, as well as some minor performance overhead.
If you have a fairly simple queue/exchange/binding setup, I would suggest pointing the second app at a queue with a different name, or giving your app the ability to be runtime-configured to either use a different queue, or to leverage the topic-based routing within RabbitMQ and have each app flag their messages with an app-specific prefix (or something similar).
One advantage of using topic routing to differentiate apps is that you can easily dip into the full stream of messages and do other things with that stream that you didn't foresee initially, including things like archival logging or audit logging, as well as other metrics collection or analysis.
tl;dr;
For long-term flexibility, have each instance of your application send messages to queues based on topic-routing.
For quick-and-dirty / get-it-working-yesterday, use a separate vhost for each instance of your application.

Sharing data system wide

Good evening.
I'm looking for a method to share data from my application system-wide, so that other applications could read that data and then do whatever they want with it (e.g. format it for display, use it for logging, etc). The data needs to be updated dynamically in the method itself.
WMI came to mind first, but then you've got the issue of applications pausing while reading from WMI. Additionally, i've no real idea how to setup my own namespace or classes if that's even possible in Delphi.
Using files is another idea, but that could get disk heavy, and it's a real awful method to use for realtime data.
Using a driver would probably be the best option, but that's a little too intrusive on the users end for my liking, and i've no idea on where to even start with it.
WM_COPYDATA would be great, but i'm not sure if that's dynamic enough, and whether it'll be heavy on resources or not.
Using TCP/IP would be the best choice for over the network, but obviously is of little use when run on a single system with no networking requirement.
As you can see, i'm struggling to figure out where to go with this. I don't want to go into one method only to find that it's not gonna work out in the end. Essentially, something like a service, or background process, to record data and then allow other applications to read that data. I'm just unsure on methods. I'd prefer to NOT need elevation/UAC to do this, but if needs be, i'll settle for it.
I'm running in Delphi 2010 for this exercise.
Any ideas?
You want to create some Client-Server architecture, which is also called IPC.
Using WM_COPYDATA is a very good idea. I found out it is very fast, lightweight, and efficient on a local machine. And it can be broadcasted over the system, to all applications at once (to be used with care if some application does not handle it correctly).
You can also share some memory, using memory mapped files. This is may be the fastest IPC option around for huge amount of data, but synchronization is a bit complex (if you want to share more than one buffer at once).
Named pipes are a good candidates for local. They tend to be difficult to implement/configure over a network, due to security issues on modern Windows versions (and are using TCP/IP for network communication - so you should better use directly TCP/IP instead).
My personal advice is that you shall implement your data sharing with abstract classes, able to provide several implementations. You may use WM_COPYDATA first, then switch to named pipes, TCP/IP or HTTP in order to spread your application over a network.
For our Open Source Client-Server ORM, we implemented several protocols, including WM_COPY_DATA, named pipe, HTTP, or direct in-process access. You can take a look at the source code provided for implementation patterns. Here are some benchmarks, to give you data from real implementations:
Client server access:
- Http client keep alive: 3001 assertions passed
first in 7.87ms, done in 153.37ms i.e. 6520/s, average 153us
- Http client multi connect: 3001 assertions passed
first in 151us, done in 305.98ms i.e. 3268/s, average 305us
- Named pipe access: 3003 assertions passed
first in 78.67ms, done in 187.15ms i.e. 5343/s, average 187us
- Local window messages: 3002 assertions passed
first in 148us, done in 112.90ms i.e. 8857/s, average 112us
- Direct in process access: 3001 assertions passed
first in 44us, done in 41.69ms i.e. 23981/s, average 41us
Total failed: 0 / 15014 - Client server access PASSED
As you can see, fastest is direct access, then WM_COPY_DATA, then named pipes, then HTTP (i.e. TCP/IP). Message was around 5 KB of JSON data containing 113 rows, retrieved from server, then parsed on the client 100 times (yes, our framework is fast :) ). For huge blocks of data (like 4 MB), WM_COPY_DATA is slower than named pipes or HTTP-TCP/IP.
Where are several IPC (inter-process communication) methods in Windows. Your question is rather general, I can suggest memory-mapped files to store your shared data and message broadcasting via PostMessage to inform other application that the shared data changed.
If you don't mind running another process, you could use one of the NoSQL databases.
I'm pretty sure that a lot of them won't have Delphi drivers, but some of them have REST drivers and hence can be driven from pretty much anything.
Memcached is an easy way to share data between applications. Memcached is an in-memory key-value store for small chunks of arbitrary data (strings, objects).
A Delphi 2010 client for Memcached can be found on google code:
http://code.google.com/p/delphimemcache/
related question:
Are there any Caching Frameworks for Delphi?
Googling for 'delphi interprocess communication' will give you lots of pointers.
I suggest you take a look at http://madshi.net/, especially MadCodeHook (http://help.madshi.net/madCodeHook.htm)
I have good experience with the product.

Getting real-time variables from a java runtime environment's (virtual machine) memory?

Say I have a couple of java runtime environments running on my system which are used by several applications. I would like to programmatically interact with these applications by reading their memory.
A typical approach would be to directly look into this application's memory, however for java applications this seems to be practically impossible because of the java runtime environment. Instead, one has to look into the memory of the java runtime environment, or debug it.
[ the above is what I think I have learned from several searches on the web, if anything is false, please correct me ]
Note: keep in mind that the application I want to monitor is not owned by me and thus I do not have the source code nor the ability to launch the application in "debug mode" or something.
Now, as this is a non-production project, I would prefer an easy way out: using an existing windows GUI application which can already monitor variables of a java runtime environment and it's applications to programmatically crawl these from this GUI application for usage in my own project. If any such program exists, I would really appreciate the help.
If the above is not possible, how would I (programmatically) retreive these variables otherwise?
It's difficult to answer this precisely without knowing much more about the application involved, its structure etc. Note that objects move around in the JVM's memory, and so you can't monitor the actual application memory directly.
So the first question is, how do you know what you want to monitor without the source code ? e.g. which variables/objects etc.?
Given that you've worked this out, it strikes me that you have two options.
decompile and instrument the application (perhaps statically, perhaps using AOP), and recompile it. This assumes that the application is not obfuscated, and you're not in breach of licensing etc.
wrap the application in a thin layer that uses reflection to identify the variables you're interested in, and tracks the values of those variables as the process executes. I suspect you'll still have to decompile to identify these variables.
You can monitor these values remotely by creating an MBean and exposing via JMX, and monitoring via JConsole. That's pretty trivial compared to the initial step of finding those variables you're interested in.

Resources