Multiple Uploads to website simultaneosly - upload

I am building a ASP.Net website and the website accepts a PDF as input and processes them. I am generating an intermediate file with a particular name. But I want to know if multiple users are using the same site at the same time, then how will the server handle this.
How can I handle this. Will Multi-Threading do the job? What about the file names of the intermediate files I am generating? How can I make sure they won't override each other. How to achieve performance?
Sorry if the question is too basic for you.

I'm not into .NET but it sounds like a generic problem anyways, so here are my two cents.
Like you said, multithreading (as usually different requests run in different threads) takes care for most of that kind of problems, as every method invocation involves new objects run in a separate context.
There are exceptions, though:
- Singleton (global) objects whose any of their operations have side effects
- Other resources (files, etc. ), this is exactly your case.
So in the case of files, I'd ponder these (mutually exclusive) alternatives:
(1) Never write the uploaded file to disk, instead hold it into memory and process it in there (like in byte array). In this case you're leveraging the thread-per-request protection. This one cannot be applied if your files are really big.
(2) Choose very randomized names (like UUID) to write them into a temporary location so their names won't clash if two users upload at the same time.
I'd go with (1) whenever possible.
Best

Related

Should I use an YAML file or the DB to store my success/error messages in Rails app?

In my Rails app, after executing some code I want to send Slack messages to users to notify them of the execution result. There are multiple processes for which I want to send these messages, and in short, I need somewhere to store message templates for successes/errors (they're just short strings, like "Hi, we've successfully done x!", but they differ for each process).
Right now, I have a SlackMessage model in the DB from which you can retrieve the message content. However, I heard that it's better to manage custom messages like this in a yml file, since it's easier to add/edit the messages later on (like this, even though this is for locales).
What is the best practice for this kind of scenario? If it's not to use a DB, I appreciate if you could give pointers or a link on how to do it (in terms of using yml files, the only material I could find was on internationalisation).
Why don't you use the already existing I18n module in Rails? This is perfect for storing messages, and gives you the ability to use translations would you ever need them in the future.
Getting a message is simple:
Slack.message(I18n.t(:slack_message, scope:'slack'))
In this case you need a translation file like this:
en:
slack:
slack_message: This is the message you are going to select.
Read more on I18n: https://guides.rubyonrails.org/i18n.html
YAML is generally much slower than a DB to load data from. Additionally YAML parsers load all of the data usually even if there are multiple documents in the YAML stream.
For programs that have a long run-time and use a large part of the messages, it is usually not a problem to use YAML. But on short running programs the loading can be a significant part of the run-time and techniques like delaying the loading and caching might not help. As an example: I got a PR for my YAML library some time ago, that delayed the instantiation of regular expressions in the library, as that delayed the startup of some programs.
If you have many messages, they all stay in memory after loading from YAML, that might be a problem. With a DB it is much more common to only retrieve what is needed, and rely on the DB to do that efficiently (caching, etc).
If the above mentioned advantages and criteria don't help you decide, you can also have it both ways: by having the ease of reading/editing of YAML and the speed, caching, etc. of a DB. "Just" convert the YAML stream to a DB, either explicitly after editing the YAML document or on first use by your program (by looking at the files date-time-stamps). That is approach that programs like postfix use relying on postmap (although the inputs are text, but not YAML files).

Sharing an large array with all users on a rails app

I have inherited an app that generates a large array for every user that visit the app. I recently discovered that it is identical for nearly all the users!!
Now I want to somehow make one copy of it so it is not built over and over again. I have thought of a few options and wanted input to see which one is the best:
1) Create a model and shove the data into the database
2) Create a YAML file and have the app load it when it initializes.
I personally like the model idea but a few engineers at work feel as though it does not deserve to be a full model. 97% of the times users will see the same exact thing but 3% of the time users will get a slightly different array (a few elements will have changed).
Any other approaches that I should consider.??..thanks in advance.
Remember that if you store the data in the DB, each request which requires the data will have to execute a DB query to pull it out. If you are running multiple server threads, each thread could have its own copy in memory (if they are all handling requests which require the use of the array). In that case, you wouldn't be saving any memory (though you might save time from not having to regenerate the array).
If you are running multiple server processes (not threads), and if the array contents change as the application is running, and the changes have to be visible to all the processes, caching in memory won't work. You will have to use the DB in that case.
From the information in your comment, I suggest you try something like this:
Store the array in your DB, and make sure that the record(s) used have created/updated timestamps. Cache the contents in memory using a constant/global variable/class variable. Also store the last time the cache was updated.
Every time you need to use the array, retrieve the relevant "updated" timestamp from the DB. (You may need to use hand-coded SQL and ModelName.connection.execute to avoid pulling back all the data in the record, which ActiveRecord will probably do.) If the timestamp is later than the last time your cache was updated, pull the array from the DB and update your cache.
Use a Mutex ('require thread') when retrieving/updating the cached data, in case your server setup may use multiple threads. (I don't think that Passenger does, but I have had problems similar to threading problems when using Passenger+RMagick, so I would still use a Mutex to be safe.)
Wrap all the code which deals with the cached array in a library class (or a class method on the model used to store the data), so the details of cache management don't spill over into the rest of the application.
Do a little bit of performance testing on the cache setup using Benchmark.measure {}. If a bug in the setup actually made performance worse rather than better, that would be sad...
I'd go with option 2. You can add two constants (for the 97% and 3%) that load from a YAML file when the app initializes. That ought to shrink your memory footprint considerably.
Having said that, yikes, this is just a band-aid on a hack, but you knew that already. I'd consider putting some time into a redesign, if you have that luxury.

ASP.NET MVC 3 in-memory data store

I have a project which provides users with a list of current tasks that need to be completed. Any user can complete any task, and so to ensure that only one user is working on a task at a time I need to be able to 'lock' it. I'm using SignalR for this, so a user requests a lock on a task, and if they are successful (ie. if noone else has locked it) then they will be able to access the further information that they need.
My problem is how to store the list of locked tasks. The original plan was simply to add an additional bit field 'IsLocked' to the Task table and update this when the user requested a lock and when the task was unlocked. We have about 300 concurrent users, however, and a task takes only about 3-4 minutes, meaning huge numbers of additional - and tiny - queries on the database. Therefore we were wondering about in-memory storage, simply storing a list of task ids in a 'lockedTasks' list.
I had considered using caching, but am unsure on the best ways to do this, or even if better alternatives exist. If anyone has any experience in this then some advice would be great thanks
I would avoid memory completely as IIS is not that great with it, if you found your self in the IIS need for refreshing the Application Pool for some sort of reason, your list is simply gone!
Maybe a MemCache system? If it does not loose things in the above way, but...
I would advice to be in the middle, IO File is fast that request data to a Database, specially if it's not in the same machine (witch for security reasons, it should never be), so... why not, and just to hold your list, you don't use one of the currently famous NoSQL database?
MongoDB is a document database that has a .NET Library and it's easy to use, it is not as fast as Memmory, but extremely quicker than Physical databases for what you want.
Normally the NoSQL Database will be hosted in the App_Data folder so it will be extremely fast to access and you can just hold there the task_id and user_id of all locked tasks.
Have you considered stateful filters?
Check out this links for more info:
ASP.NET MVC Filters and Statefulness
Brad Wilson: Advanced MVC
3 - (Video)
Brad Wilson: Advanced MVC 3 - (PDF)
I'm sorry, but if your app can't handle a single query every 3-4 minutes x 300 users, then you're doing something very wrong. Just browsing a site typically generates orders of magnitude more queries than that.

Erlang in-memory linked list -- Exhausting memory?

I tried to make an Erlang in-memory datastore that would receive messages and add them to a list. Here's the current incarnation. The trouble is, I'm receiving about 200 messages per second and this easily exhausts the memory available.
Once a minute, I send a {write, Pid} message that should clear out and clean up this list, but it doesn't look like it's being garbage collected.
What am I doing wrong? I think I'm approaching this from the completely wrong direction...
datastore(Db) ->
receive
{put, Data} ->
datastore(lists:concat([Data,Db]));
{write, Responder} ->
ScratchName = "ScratchFile.dat",
{ok, ScratchDevice} = file:open(ScratchName,[write]),
file:write(ScratchDevice,Db),
ok = file:close(ScratchDevice),
Responder ! {load, ScratchName},
datastore([])
end.
First spontaneous comment is that file:open will open the file, truncate it, and then write to it. So every time in the loop will overwrite any previous data. So if the Responder is slow with its loading of the file, there could be data you did not expect in the file.
Second reaction is that you don't have to do this buffering yourself. If you open the file with the option {delayed_write, Size, Delay}, and set Size and Delay to values that fit your purpose, you get precisely what you are trying to implement here by just writing all the time.
Third reaction is that you are probably doing the wrong thing if you use a file to communicate between different parts of your system. What are you attempting to do?
ps.
If you need a new random filename, you can easily generate one with erlang:now/0 and io_lib:format/2. As an added bonus they will sort in creation order.
This is a very wrong way of buffering in Erlang. Data Structures such as ETS (http://www.erlang.org/doc/man/ets.html) have been designed to handle thousands and millions of IN-MEMORY Erlang Data Structures with ease. Please, do not use Lists or Queues for handling too much data. If a part of your code will be handling data which other parts of the application are supposed to consume and yet you know that the consumers will be doing it a slower rate as compared to the part that is generating or getting the data, then you need a more robust way of buffering (ETS Tables).
Another thing is that usually, processes are a point of failure in a system. If a process is used to buffer or hold on to very essential data, even if that data is instantaneous but critical to the system, what would happen at that time when the process exits or dies ? ETS tables have been designed in a way that they can provide data access to all processes even applications within the same VM (of type public). In this way, all processes can use the data, reading as much as they want (concurrently) but what you would do is to ensure consistency by having one writer / updater.
ETS Tables rarely fail in an application as compared to the frequency at which processes fail. Most recently, a method that helps us to redeem data in a failing ETS table has been introduced ( ets:give_away/3 ).
Another thing, in a comment above, you have mentioned that you are working for a large Company. Usually, with large teams, its better you evaluate a number of options and make intensive tests against several depending the nature of the application you are developing. To avoid side effects, its best that you identify which data structures are best to use for what. For example, for in-memory storage, capable of handling 200 messages per second, if tested properly, Lists and Files would fail against ETS Tables.

Session Management in TWebModule

I am using a TWebModule with Apache. If I understand correctly Apache will spawn another instance of my TWebModule object if all previous created objects are busy processing requests. Is this correct?
I have created my own SessionObject and have created a TStringList to store them. The StringList is created in the initialization section at the bottom of my source code file holding the TWebModule object. I am finding initialization can be called multiple times (presumably when Apache has to spawn another process).
Is there a way I could have a global "Sessions" TStringlist to hold all of my session objects? Or is the "Safe", proper method to store session information in a database and retrieve it based on a cookie for each request?
The reason I want this is to cut down on database access and instead hold session information in memory.
Thanks.
As Stijn suggested, using a separate storage to hold the session data really is the best way to go. Even better is to try to write your application so that the web browser contains the state inherently in the design. This will greatly increase the ability to scale your application into the thousands or tens of thousands of concurrent users with much less hardware.
Intraweb is a great option, but suffers from the scale issue in the sense that more concurrent users, even IDLE users, require more hardware to support. It is far better to design from the onset a method of your server running as internally stateless as possible. Of course if you have a fixed number of users and don't expect any growth, then this is less of an issue.
That's odd. If initialization sections get called more than once, it might be because the DLL is loaded in separate process spaces. One option I can think up is to check if the "Sessions" object already exists when you create it on initialization. If the DLL really is loaded in separate processes, this will not help, and then I suggest writing a central Session storage process and use inter-process-communication from within your TWebModule (there are a few methods: messages, named pipes, COM...)
Intraweb in application mode really handles session management and database access very smoothly, and scales well. I've commented on it previously. While this doesn't directly answer the question you asked, when I faced the same issues Intraweb solved them for me.

Resources