How can I control cache of the DB4O - memory

I have some difficulties to find documents about DB4O. How can I control cache of the DB4O? I think than its connection is expending all memory of our server. I want set the minimal cache configuration.
Could anyone recommend me some document or send me some examples?
I'm glad for anyone who can help.

If you already made the cache configuration like #Gamlor said, the problem can be a object corrupted, you can delete this object and defrag your database to improve performance

I would recommend looking at it with a profiler. Then you can see what kind of classes take up space.
A typical pitfall with db4o is that that a 'ObjectContainer' is kept open for a long time, with a high activation depth. Then a large part of your object graph is kept in memory.
Some nobs to try:
configuration.common().weakReferenceCollectionInterval(milli-secs);
How often db4o clears up it's weak reference cache system. If you lower that interval, cleans up more aggressive.
There is a file level cache. I think it's quite low by default. Anyway, here's setting:
Storage fileStorage = new FileStorage();
// A cache with 128 pages of 1024KB size, gives a 128KB cache
Storage cachingStorage = new CachingStorage(fileStorage,128,1024);
configuration.file().storage(cachingStorage);
Maybe there are more caches. I don't remember all of then.

Related

Use log4j 2 for writing to data files or database table

I used log4j (v. 1) in the past and was glad to know that a major refactoring was done to the project, resulting in log4j 2, which solves the issues that plagued version 1.
I was wondering if I could use log4j 2 to write to data files, not only log files.
The application I will be soon developing will need to be able to receive many events from different sources and write them very fast either to a data file or to a database (I haven't decided which yet).
The thread that receives the events must not be blocked by I/O while attempting to write events, so log4j2's Asynchronous Loggers, based on the LMAX Disruptor library, will definitely fit this scenario.
Moreover, my application must be able to recover either from a 'not enough space on disk' or 'unable to reach database' conditions, when writing to a data file or to a database table, respectively. In other words, when the application runs out of disk space or the database is temporarily unavailable, my application needs to store events in memory and wait for storage to become available and when it does, write all waiting events to disk or database.
Do you think I can do this with log4j?
Many thanks for your help.
Regards,
Nuno Guerreiro
Yes.
I'm aware of at least one production implementation in a similar scenario, where in gathered events are written to disk at high throughput.
Write to a volume other than your system volume to minimize the chances of system crashes due to disk space overrun.
Upfront capacity planning can help in ensuring h/w configuration with adequate resources to handle projected average load and bursts, for a reasonable period of time.
Do not let the system run out of disk space :). Keep track of disk usage, and proactively drop older data in extreme circumstances.

Mapping and allocating

I am little confused with term mapping, for example, when we say mapping memory for database, it means that we assigning specific amount of memory at some memory location to that database?
Also is allocating memory synonym for reserving memory?
Very often I encounter these two terms, and they aren't so clear to me.
If someone can clarify these two terms, I will be very thankful.
This might be a question better asked to the software community at stackoverflow. However, I am a CS.
I would say that terms aren't always used accurately and precisely.
In general allocating memory is making memory available to a program for an active purpose, such as allocating memory for buffers to hold a file or in in-memory structure now.
Reserving memory is often used to mean the same thing. However, it is sometimes more passive. For example reserving memory in case their is a future requirement, or protecting against too much memory allocation for a different purpose.
Often when the term 'mapping' is used, it is for a file. It may mean exactly the same as allocating. Or it means more; mapping may be using an underlying mechanism provided by virtual memory management systems, where part of virtual memory is 'mapped' to the file, without actually reading the file into physical memory. The trick is, as the memory-mapped file is accessed, the block/page being accessed is read in 'invisibly' to the process when necessary. This uses a mechanism called demand paging. It's benefit is a program can access the file as if it is all read into memory, but only the parts actually accessed are retrieved from the persistent storage system (disk, flash, whatever), which can be a huge win if only small parts of the file are needed.
Further, it simplifies the program, which can be written as if the whole file is in memory. Instead of the application developer trying to keep track of which parts of the file have been loaded into memory, the operating system does that instead.
Even better, the Operating system can be asked to track which blocks/pages have their contents changed, and it can be asked to periodically write that back out to persistent storage. This can even further simplify the application program.
This is popular with some databases.
Mapping basically means assigning. Except we often want a 1 to 1 mapping in the case of functions. If you define the function of an object, physical or just logical, and define it's relationships and how it changes under transformation then you have mapped it.

Neo4j inserting large files - huge difference in time between

I am inserting a set of files (pdfs, of each 2 MB) in my database.
Inserting 100 files at once takes +- 15 seconds, while inserting 250 files at once takes 80 seconds.
I am not quite sure why this big difference is happening, but I assume it is because the amount of free memory is full between this amount. Could this be the problem?
If there is any more detail I can provide, please let me know.
Not exactly sure of what is happening on your side but it really looks like what is described here in the neo4j performance guide.
It could be:
Memory issues
If you are experiencing poor write performance after writing some data
(initially fast, then massive slowdown) it may be the operating system
that is writing out dirty pages from the memory mapped regions of the
store files. These regions do not need to be written out to maintain
consistency so to achieve highest possible write speed that type of
behavior should be avoided.
Transaction size
Are you using multiple transactions to upload your files ?
Many small transactions result in a lot of I/O writes to disc and
should be avoided. Too big transactions can result in OutOfMemory
errors, since the uncommitted transaction data is held on the Java
Heap in memory.
If you are on linux, they also suggest some tuning to improve performance. See here.
You can look up the details on the page.
Also, if you are on linux, you can check memory usage by yourself during import by using this command:
$ free -m
I hope this helps!

Generate thumbnail images at run-time when requested, or pre-generate thumbnail in harddisk?

I was wondering, which way of managing thumbnail images make less impact to web server performance.
This is the scenario:
1) each order can have maximum of 10 images.
2) images does not need to store after order has completed (max period is 2 weeks).
3) potentially, there may have a few thousands of active orders at anytime.
4) orders with images will frequently visit by customers.
IMO, pre-generate thumbnail in hard disk is a better solution as hard disk are cheaper even with RAID.
But what about disk I/O speed, and resource it need to load images? will it take more resource than generate thumbnails at real-time?
It would be most appreciate if you could share your opinion.
I suggest a combination of both - dynamic generation with disk caching. This prevents wasted space from unused images, yet adds absolutely no overhead for repeatedly requested images. SQL and mem caching are not good choices, both require too much RAM. IIS can serve large images from disk while only using 100k of RAM.
While creating http://imageresizing.net, I discovered 29 image resizing pitfalls, and few of them are obvious. I strongly suggest reading the list, even if it's a bit boring. You'll need an HttpModule to be able to pass cached requests off to IIS.
Although - why re-invent the wheel? The ImageResizer library is widely used and well tested.
If the orders are visited frequently by customers, it is better to create the thumbnails ones and store on disk. this way the webserver doesn't need to process the page that long. It will speed up the loading time of your webpages.
It depends on your load. If the resource is being requested multiple times then it makes sense to cache it.
Will there always have to be an image? If not, you can create it on the first request and then cache it either in memory, or more likely a database, for subsequent requests.
However, if you always need the n images to exists per order, and/or you have multiple orders being created regularly, you will be better off passing the thumbnail creation off to a worker thread or some kind of asynchronous page. That way, multiple request's can be stacked up, reducing load on the server.

Determine whether memory location is in CPU cache

It is possible for an operating system to determine whether a page of memory is in DRAM or in swap; for example, simply try to access it and if a page fault occurs, it wasn't.
However, is the same thing possible with CPU cache?
Is there any efficient way to tell whether a given memory location has been loaded into a cache line, or to know when it does so?
In general, I don't think this is possible. It works for DRAM and the pagefile since that is an OS managed resource, cache is managed by the CPU itself.
The OS could do a tight timing loop of a memory read and try to see if it completes fast enough to be in the cache or if it had to go out to main memory - this would be very error prone.
On multi-core/multi-proc systems, there are cache coherency protocols that are used between processors to determine when to they need to invalidate each other's caches, I suppose you could have a custom device that would snoop this protocol that the OS would query.
What are you trying to do? If you want to force something into memory, current x86 processors support prefetching memory into the cache in a non-blocking way, for instance with Visual C++ you could use _mm_prefetch to fetch a line into the cache.
EDIT:
I haven't done this myself, so use at your own risk. To determine cache misses for profiling, you may be able to use some architecture-specific registers. http://download.intel.com/design/processor/manuals/253669.pdf, Appendix A gives "Performance Tuning Events". This can't be used to determine if an individual address is in the cache or when it is loaded in the cache, but can be used for overall stats. I believe this is what vTune (a phenomenal profiler for this level) uses.
If you try to determine this yourself then the very act of running your program could invalidate the relevant cache lines, hence rendering your measurements useless.
This is one of those cases that mirrors the scientific principle that you cannot measure something without affecting that which you are measuring.
X86
dont know how to tell if address IS in cache
BUT here is how to tell if address WAS in cache
rdtsc
save timestamp
mov eax,address
rdtsc read timestamp counter
calculate timestamp difference
if < threshold then was in cache
threshold has to be determined from documentation or empirically
some machines have cache hit/miss counters which would serve equally well

Resources