Adding timestamps in the journal file? - spss

I'm wondering if it's possible to add timestamps to the journal file?
It appears that a date & time are recorded when SPSS is started, but if you have the program open for longer periods of time (i.e. days) it doesn't break it up if the program isn't closed.
Having timestamps would make it much easier to find what I'm looking for the times I look back to find things.
This is what I use to insert timestamps into my output:
HOST COMMAND=['echo %time%'].
However the journal file only shows the syntax.

The journal file is kept flushed and closed by Statistics, so you can probably write to it from another process. I don't think the suggestion above will work, because it will write the code but not the output to the journal. However, using Python you could do something like this.
begin program.
import time
open(r"full path to your journal file", "a").write("* " + time.asctime() + "\n")
end program

I can't see why it shouldn't work, unless you are not using a windows operating system.
On Unix-like system like Linux or Mac which run the bash (shell) you would rather use
HOST COMMAND =['date'].
If you have the Python extension installed you could also use Python code to to print the date and time (which would be a platform independent solution).
BEGIN PROGRAM.
import time
print time.ctime()
END PROGRAM.

Related

How to run a python script in the background on azure

I have a uni project in which I have to run a number of machine learning algorithms like SVM, ME, Naive bayes, etc... and perform a grid search on them, to find the optimal sets of hyper-parameters. Running all these would take an exceedingly long amount of time (48-168 hours total but run- in batches) and considering my computer becomes more or less unusable while I run them, I was attempting to find a solution which allowed me to run my code externally. The scripts I have to run are in python and my plan was to run them on azure to make use of its "Azure for students" $100 credit.
My original plan was to use azure's ml notebook section and then run the python scripts in the terminal they provide. My problem with this route is as far as I can tell, when the browser closes, the computation stops which is a problem. I looked into it, and I found some articles mentioning a combination of 'ctrl-z', 'bg', and 'disown', to disconnect the process from the shell but I thought there should definitely be a better way to do it. (I also wasn't sure how this worked in my case where there were 8 processes running at once using gridsearchcv's n_jobs=-1 feature).
I then realized a better way to do this would be to use pipelines. My intent was to create a number of pipelines of the form:
(Import data in xlsx file) -> (python script to run ML) -> (export data to working directory)
And then run them until all the work is completed. In the first stage I used the parameters,
And I got the error,
My intention was to have the excel file pipe into the python script as a data frame but this implantation (and all the others I've tried) isn't working.
My question first question is, how do I get the excel data to pipe into the python script properly?
My second question is, is there a better way to go about doing this? Would running it on the shell be an easier way to do it? If so, how do ensure it runs while my browser is closed? Are there other services that would be better? My main metrics for this are price (Cheap) and time limit (ability to run for long time) but any suggestions would be greatly appreciated.
I also tried using google colab, this worked but it felt slower than running on my computer.
To run a grid search with AzureML, you would use the Sweep job. The simplest way to kick of a Sweep is via the CLI. See here for an example.
$schema: https://azuremlschemas.azureedge.net/latest/sweepJob.schema.json
type: sweep
trial:
command: >-
python hello-sweep.py
--A ${{inputs.A}}
--B ${{search_space.B}}
--C ${{search_space.C}}
code: src
environment: azureml:AzureML-sklearn-1.0-ubuntu20.04-py38-cpu#latest
inputs:
A: 0.5
compute: azureml:cpu-cluster
sampling_algorithm: random
search_space:
B:
type: choice
values: ["hello", "world", "hello_world"]
C:
type: uniform
min_value: 0.1
max_value: 1.0
objective:
goal: minimize
primary_metric: random_metric
limits:
max_total_trials: 4
max_concurrent_trials: 2
timeout: 3600
display_name: hello-sweep-example
experiment_name: hello-sweep-example
description: Hello sweep job example.
You can start that job using the AzureML v2 CLI with the following command:
az ml job create -f hello-sweep.yml
That will create max_total_trials number of jobs for different parameter combinations as defined in the search_space governed by the sampling_algorithm, which can be random, grid or bayesian.
The actual job that is started is defined under trial. You need a program or script of some sort that you can execute via a command line and that can take parameters via that command line. command is that command that is executed, code is a folder on the local machine that contains the script/program you want to run and environment is a registered environment in your workspace. azureml:AzureML-sklearn-1.0-ubuntu20.04-py38-cpu#latest is one that is predefined in AzureML, but you can also create your own.
If you prefer Python, here is the same thing done in Python.
See here for a blog post on How to do hyperparameter tuning using Azure ML.

Netlogo Behaviorspace How to save data not per tick but based on reporter

I have a netlogo model, for which a run takes about 15 minutes, but goes through a lot of ticks. This is because per tick, not much happens. I want to do quite a few runs in an experiment in behaviorspace. The output (only table output) will be all the output and input variables per tick. However, not all this data is relevant: it's only relevant once a day (day is variable, a run lasts 1095 days).
The result is that the model gets so slow running experiments via behaviorspace. Not only would it be nicer to have output data with just 1095 rows, it perhaps also causes the experiment to slow down tremendously.
How to fix this?
It is possible to write your own output file in a BehaviorSpace experiment. Program your code to create and open an output file that contains only the results you want.
The problem is to keep BehaviorSpace from trying to open the same output file from different model runs running on different processors, which causes a runtime error. I have tried two solutions.
Tell BehaviorSpace to only use one processor for the experiment. Then you can use the same output file for all model runs. If you want the output lines to include which model run it's on, use the primitive behaviorspace-run-number.
Have each model run create its own output file with a unique name. Open the file using something like:
file-open (word "Output-for-run-" behaviorspace-run-number ".csv")
so the output files will be named Output-for-run-1.csv etc.
(If you are not familiar with it, the CSV extension is very useful for writing output files. You can put everything you want to output on a big list, and then when the model finishes write the list into a CSV file with:
csv:to-file (word "Output-for-run-" behaviorspace-run-number ".csv") the-big-list
)

get current date and time in lua in redis

How can I get current date / time in Lua embedded in Redis?
I need to have it in following format - YYYY-MM-DD, HH:MM:SS
Tried with os.date() but it does not recognize it.
Redis' Lua sandbox has only a handful of libraries, and os isn't one of these.
You can call the Redis TIME from Lua like so:
local t = redis.call('TIME')
However, you'll need to find a way to convert the epoch to the desired format and also note that it will stop you script from performing any writes (as it is a non-deterministic command).
Update: as of Redis v3.2, there is a new replication mode for scripts that is effect-based (rather than code-based). When using this mode you can actually call all the random, non-deterministic commands. More information is at EVAL's documentation page
This was already discussed in the comments, but the correct answer should have an answer:
The current time is non-deterministic i.e. it returns different values on repeated calls. This hurts replication. For this reason, the current time should be passed into your LUA script as a parameter.

Can I profile Lua scripts running in Redis?

I have a cluster app that uses a distributed Redis back-end, with dynamically generated Lua scripts dispatched to the redis instances. The Lua component scripts can get fairly complex and have a significant runtime, and I'd like to be able to profile them to find the hot spots.
SLOWLOG is useful for telling me that my scripts are slow, and exactly how slow they are, but that's not my problem. I know how slow they are, I'd like to figure out which parts of them are slow.
The redis EVAL docs are clear that redis does not export any timekeeping functions to lua, which makes it seem like this might be a lost cause.
So, short a custom fork of Redis, is there any way to tell which parts of my Lua script are slower than others?
EDIT
I took Doug's suggestion and used debug.sethook - here's the hook routine I inserted at the top of my script:
redis.call('del', 'line_sample_count')
local function profile()
local line = debug.getinfo(2)['currentline']
redis.call('zincrby', 'line_sample_count', 1, line)
end
debug.sethook(profile, '', 100)
Then, to see the hottest 10 lines of my script:
ZREVRANGE line_sample_count 0 9 WITHSCORES
If your scripts are processing bound (not I/O bound), then you may be able to use the debug.sethook function with a count hook:
The count hook: is called after the interpreter executes every
count instructions. (This event only happens while Lua is executing a
Lua function.)
You'll have to build a profiler based on the counts you receive in your callback.
The PepperfishProfiler would be a good place to start. It uses os.clock which you don't have, but you could just use hook counts for a very crude approximation.
This is also covered in PiL 23.3 – Profiles
In standard Lua C, you can't. It's not a built-in function - it only returns seconds. So, there are two options available: You either write your own Lua extension DLL to return the time in msec, or:
You can do a basic benchmark using a millisecond-resolution time. You can access the current millisecond time with LuaSocket. Though this adds a dependency to your project, it's an effective way to do trivial benchmarking.
require "socket"
t = socket.gettime();

How to combining two files and creating a report with matched fields in COBOL

I have two files :
first file contains jobname and start time which looks like below:
ZPUDA13V STARTED - TIME=00.13.30
ZPUDM00V STARTED - TIME=03.26.54
ZPUDM01V STARTED - TIME=03.26.54
ZPUDM02V STARTED - TIME=03.26.54
ZPUDM03V STARTED - TIME=03.26.56
and the second file contains jobname and Endtime which looks like below:
ZPUDA13V ENDED - TIME=00.13.37
ZPUDM00V ENDED - TIME=03.27.38
ZPUDM01V ENDED - TIME=03.27.34
ZPUDM02V ENDED - TIME=03.27.29
ZPUDM03V ENDED - TIME=03.27.27
Now I am trying to combine these two files to get the report like JOBNAME START TIME ENDTIME.I have used ICETOOL to get the report If I get JOBNAME START TIME ,ENDTIME is SPACES .If I get Endtime ,JOBNAME START TIME gets spaces.
Please let me know how to code the outrec fields as I have coded with almost all possibilites to get the desired one.But still my output is not the same as I required
I have no idea what ICETOOL is (nor the inclination to even look it up in Google :-) but this is a classic COBOL data processing task.
Based on your simple data input, the algorithm would be:
for every record S in startfile:
for every record E in endfile:
if S.jobnname = E.jobname:
ouput S.jobname S.time E.time
next S
endif
endfor
endfor
However, you may need to take into account the fact that:
multiple jobs of the same name may run during the day (multiple entries in the file).
multiple jobs of the same name may run at the same time.
You could get around the first problem by ensuring the E record was the one immediately following the S record (based on time). The second problem is a doozy.
If you're running on z/OS (and you probably are, given the job names), have you considered using information from the SMF records to do this collection and analysis. I'm pretty certain SMF type 30 records hold everything you need.
And assuming this is a mainframe question, here's a shameless plug for a book one of my friends at work has written, check out What On Earth is a Mainframe? by David Stephens (ISBN-13 = 978-1409225355).
I know, i'm toooo late with my resolution, but may be helpful for new comers to stackoverflow
You can make use of JOINKEYS of DFSORT using JCL.
JOINKEYS F1 FIELDS=(01,08,CH,A)
JOINKEYS F2 FIELDS=(01,08,CH,A)
REFORMAT FIELDS=(F1:01,33,F2:25,08)
SORT FIELDS=COPY
OUTREC FIELDS=(01,08,25,08,34,08)
the outrec will hold the data as you need!

Resources