Please explain the db4o PolePosition report - db4o

Can someone explain me how to read the PolePosition's benchmarking results in the link below?
http://polepos.sourceforge.net/results/PolePositionClientServer.pdf
So, let's take an example of ComplexConcurrency on page 2 in the above pdf.
On the graph, it says time=60000, updates=2, selects=20, threads=1, writes=3, objects=3. Does it mean that it took 60000 ms to run this complete test? The test included 2 update queries, 20 select queries. The application usd only 1 thread. Am I right? What does objects=3 means?
And on the right side of each bar graph there are numbers like 533, 430, 153, etc. What do these numbers signify?
I want to understand this report from db4o's perspective.

As far as I know (could be totally wrong).
Then umbers shown like 533, 430 etc show how many iterations of operations ran within 60 seconds.
Then other number tell what is done withing a single iteration. It does like 2 update uperations, 20 select operations in one iterations. And 1 thread does those operations.
But for the exact info I would need to look at the code.

Related

questions related to wrk2 benchmark tool about their latencies and requests

I have some questions in my mind related to wrk2 benchmark tool. I did a lot of search on them and did not find answers related to them. If you have little understanding related to them then please help me.
What "count" column represents in Detailed Percentile spectrum? example Did they show the total number of requests whose latency is within "value" (column name) range? Correct me if i am wrong.
What "latency(i)" and "requests" represent in done function provided by wr2 and wrk? and How can I get that values? done_function
How can I get the total number of requests generated per minute and their latencies? Does "latency(i)" and "requests" give me some information about them?
What "-B (batch latency)" option in wrk does? My output remains the same whether i use this option or not. batch
In wrk2 readme.md, i didn't understand these lines. can you please explain that.

SPSS: Multiple data lines for individual cases

My dataset looks like this:
ID Time Date_____v1 v2 v3 v4
1 2300 21/01/2002 1 996 5 300
1 0200 22/01/2002 3 1000 6 100
1 0400 22/01/2002 5 930 3 100
1 0700 22/01/2002 1 945 4 200
I have 50+ cases and 15+ variables in both categorical and measurement form (although SPSS will not allow me to set it as Ordinal and Scale I only have the options of Nominal and Ordinal?).
I am looking for trends and cannot find a way to get SPSS to recognise each case as whole rather than individual rows. I have used a pivot table in excel which gives me the means for each variable but I am aware that this can skew the result as it removes extreme readings (I need these ideally).
I have searched this query online multiple times but I have come up blank so far, any suggestions would be gratefully received!
I'm not sure I understand. If you are saying that each case has multiple records (that is multiple lines of data) - which is what it looks like in your example - then either
1) Your DATA LIST command needs to change to add RECORDS= (see the Help for the DATA LIST command); or
2) You will have to use CASESTOVARS (C2V) to put all the variables for a case in the same row of the Data Editor.
I may not be understanding, though.

PPO Update Schedule in OpenAi Baselines Implementations

I'm trying to read through the PPO1 code in OpenAi's Baselines implementation of RL algorithms (https://github.com/openai/baselines) to gain a better understanding as to how PPO works, how one might go about implementing it, etc.
I'm confused as to the difference between the "optim_batchsize" and the "timesteps_per_actorbatch" arguments that are fed into the "learn()" function. What are these hyper-parameters?
In addition, I see in the "run_atari.py" file, the "make_atari" and "wrap_deepmind" functions are used to wrap the environment. In the "make_atari" function, it uses the "EpisodicLifeEnv", which ends the episode once the a life is lost. On average, I see that the episode length in the beginning of training is about 7 - 8 timesteps, but the batch size is 256, so I don't see how any updates can occur. Thanks in advance for your help.
I've been going through it on my own as well....their code is a nightmare!
optim_batchsize is the batch size used for optimizing the policy, timesteps_per_actorbatch is the number of time steps the agent runs before optimizing.
On the episodic thing, I am not sure. Two ways it could happen, one is waiting until the 256 entries are filled before actually updating, or the other one is filling the batch with dummy data that does nothing, effectively only updating the 7 or 8 steps that the episode lasted.

Talend- Memory issues. Working with big files

Before admins start to eating me alive, I would like to say to my defense that I cannot comment in the original publications, because I do not have the power, therefore, I have to ask about this again.
I have issues running a job in talend (Open Studio for BIG DATA!). I have an archive of 3 gb. I do not consider that this is too much since I have a computer that has 32 GB in RAM.
While trying to run my job, first I got an error related to heap memory issue, then it changed for a garbage collector error, and now It doesn't even give me an error. (just do nothing and then stops)
I found this SOLUTIONS and:
a) Talend performance
#Kailash commented that parallel is only on the condition that I have to be subscribed to one of the Talend Platform solutions. My comment/question: So there is no other similar option to parallelize a job with a 3Gb archive size?
b) Talend 10 GB input and lookup out of memory error
#54l3d mentioned that its an option to split the lookup file into manageable chunks (may be 500M), then perform the join in many stages for each chunk. My comment/cry for help/question: how can I do that, I do not know how to split the look up, can someone explain this to me a little bit more graphical
c) How to push a big file data in talend?
just to mention that I also went through the "c" but I don't have any comment about it.
The job I am performing (thanks to #iMezouar) looks like this:
1) I have an inputFile MySQLInput coming from a DB in MySQL (3GB)
2) I used the tFirstRows to make it easier for the process (not working)
3) I used the tSplitRow to transform the data form many simmilar columns to only one column.
4) MySQLOutput
enter image description here
Thanks again for reading me and double thanks for answering.
From what I understand, your query returns a lot of data (3GB), and that is causing an error in your job. I suggest the following :
1. Filter data on the database side : replace tSampleRow by a WHERE clause in your tMysqlInput component in order to retrieve fewer rows in Talend.
2. MySQL jdbc driver by default retrieves all data into memory, so you need to use the stream option in tMysqlInput's advanced settings in order to stream rows.

What is the MAXIMUM number of variables that can be entered into an SPSS Frequencies command?

As it says on the tin. I know that it will be somewhere between 125 and 2013 but trying to streamline my code.
Any help greatly appreciated!
The DOCS say 1000 is the maximum (see the Limitations section). You could reach the limits of displaying tables in the output in memory, making the effective number smaller (depends on the style of tables you have it output as well as the number of rows displayed in the tables).
This isn't to say that outputting 1,000 frequencies is ever a really good idea. Why would you ever want to visually pour over that many tables?

Resources