Powershell: What is the max. Value for Throttlelimit of Foreach-Object -Parallel? - powershell-7.2

I have 1 Million Items to check with Powershell. To improve the performance, I want to use ForEach-Object -Parallel. For this reason, I have deployed a very powerful VM.
Docs:
https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.core/foreach-object?view=powershell-7.2#description
The runspace pool size is specified by the ThrottleLimit parameter.
The default runspace pool size is 5. You can still create a new
runspace for each iteration using the UseNewRunspace switch.
I notice that no matter how high I set the limit, the CPU usage does not increase. Is there a technical maximum limit for this parameter? I have the impression that above a certain value the limit is simply capped.
$oneMillionItems | ForEach-Object -Parallel {
Do something ...
} -Throttlelimit 300
PS > $PSVersionTable
Name Value
---- -----
PSVersion 7.2.1
PSEdition Core
GitCommitId 7.2.1
OS Microsoft Windows 10.0.20348
Platform Win32NT
PSCompatibleVersions {1.0, 2.0, 3.0, 4.0…}
PSRemotingProtocolVersion 2.3
SerializationVersion 1.1.0.1
WSManStackVersion 3.0

Related

Can the additional time it takes to execute the first instance of a query be eliminated in neo4j?

Whatever steps I take, the first example of any query I make in neo4j always takes longer than any subsequent execution of the same query. So I guess something other than the store is being cached.
I'm using the latest community container image for 3.5 (3.5.20 at the time of writing)
I have plenty of memory to cache absolutely everything if I want to
I'm using well documented warm-up strategies in order to (allegedly) prime the page cache
The database details...
I run CALL apoc.monitor.store(); and it tells me the size of each store: -
+------------------------------------------------------------------------------------------------------------+
| logSize | stringStoreSize | arrayStoreSize | relStoreSize | propStoreSize | totalStoreSize | nodeStoreSize |
+------------------------------------------------------------------------------------------------------------+
| 1224 | 148165607424 | 3016515584 | 26391839040 | 42701007672 | 241318778285 | 2430128610 |
+------------------------------------------------------------------------------------------------------------+
I run CALL apoc.warmup.run(true, true, true); (before running any queries). It takes about 15 minutes and displays a summary of what it's done. The text it outputs is not easily parsed in its raw form so I've summarised salient parts of it below. Basically it tells me the number of pages loaded for each store, and these are: -
nodePages 296,719
relPages 3,234,294
relGroupPages 4,580
propPages 5,233,608
stringPropPages 18,086,620
arrayPropPages 368,225
indexPages 2,235,227
---
Total 29,459,273
With a page size of 8,192 bytes per page that's approximately 225GiB of pages for the displayed stores
I have enough physical memory and I have already set NEO4J_dbms_memory_pagecache_size to 250G
I set NEO4J_dbms_memory_heap_initial__size and NEO4J_dbms_memory_heap_max__size to 8G
So (allegedly) the page cache is "warm" and I have enough physical memory.
Query timings...
I run my query, which returns 1,813 records, and I execute the same query several times in order to illustrate the issue. I see the following (typical) timings: -
1. 1,821 mS
2. 75 mS
3. 60 mS
4. 51 mS
5. 48 mS
6. 42 mS
7. 38 mS
8. 36 mS
9. 36 mS
The actual values are dependent on the query but the first execution of every query is always significantly longer than the second.
ADDENDUM (16/Jul).
Just to be clear, using apoc.warmup.run does help.
If I don't use it, the first query is much longer still.
Having just restarted the DB (without a warm-up) the first query
took 7,829mS. The 2nd was 116mS, the third 66mS
So, warm-up or not, the first query is always longer.
Question...
What's going on?
Can I do anything more to reduce the initial query time?
Oh, and using the query as the warm up is not the answer - I don't know what queries will be used
Not sure why apoc.warmup.run does not speed up your initial query, but you could just try using an initial query invocation as your "warmup" instead.

Google dataflow streaming pipeline is not distributing workload over several workers after windowing

I'm trying to set up a dataflow streaming pipeline in python. I have quite some experience with batch pipelines. Our basic architecture looks like this:
The first step is doing some basic processing and takes about 2 seconds per message to get to the windowing. We are using sliding windows of 3 seconds and 3 second interval (might change later so we have overlapping windows). As last step we have the SOG prediction that takes about 15ish seconds to process and which is clearly our bottleneck transform.
So, The issue we seem to face is that the workload is perfectly distributed over our workers before the windowing, but the most important transform is not distributed at all. All the windows are processed one at a time seemingly on 1 worker, while we have 50 available.
The logs show us that the sog prediction step has an output once every 15ish seconds which should not be the case if the windows would be processed over more workers, so this builds up huge latency over time which we don't want. With 1 minute of messages, we have a latency of 5 minutes for the last window. When distribution would work, this should only be around 15sec (the SOG prediction time). So at this point we are clueless..
Does anyone see if there is something wrong with our code or how to prevent/circumvent this?
It seems like this is something happening in the internals of google cloud dataflow. Does this also occur in java streaming pipelines?
In batch mode, Everything works fine. There, one could try to do a reshuffle to make sure no fusion etc occurs. But that is not possible after windowing in streaming.
args = parse_arguments(sys.argv if argv is None else argv)
pipeline_options = get_pipeline_options(project=args.project_id,
job_name='XX',
num_workers=args.workers,
max_num_workers=MAX_NUM_WORKERS,
disk_size_gb=DISK_SIZE_GB,
local=args.local,
streaming=args.streaming)
pipeline = beam.Pipeline(options=pipeline_options)
# Build pipeline
# pylint: disable=C0330
if args.streaming:
frames = (pipeline | 'ReadFromPubsub' >> beam.io.ReadFromPubSub(
subscription=SUBSCRIPTION_PATH,
with_attributes=True,
timestamp_attribute='timestamp'
))
frame_tpl = frames | 'CreateFrameTuples' >> beam.Map(
create_frame_tuples_fn)
crops = frame_tpl | 'MakeCrops' >> beam.Map(make_crops_fn, NR_CROPS)
bboxs = crops | 'bounding boxes tfserv' >> beam.Map(
pred_bbox_tfserv_fn, SERVER_URL)
sliding_windows = bboxs | 'Window' >> beam.WindowInto(
beam.window.SlidingWindows(
FEATURE_WINDOWS['goal']['window_size'],
FEATURE_WINDOWS['goal']['window_interval']),
trigger=AfterCount(30),
accumulation_mode=AccumulationMode.DISCARDING)
# GROUPBYKEY (per match)
group_per_match = sliding_windows | 'Group' >> beam.GroupByKey()
_ = group_per_match | 'LogPerMatch' >> beam.Map(lambda x: logging.info(
"window per match per timewindow: # %s, %s", str(len(x[1])), x[1][0][
'timestamp']))
sog = sliding_windows | 'Predict SOG' >> beam.Map(predict_sog_fn,
SERVER_URL_INCEPTION,
SERVER_URL_SOG )
pipeline.run().wait_until_finish()
In beam the unit of parallelism is the key--all the windows for a given key will be produced on the same machine. However, if you have 50+ keys they should get distributed among all workers.
You mentioned that you were unable to add a Reshuffle in streaming. This should be possible; if you're getting errors please file a bug at https://issues.apache.org/jira/projects/BEAM/issues . Does re-windowing into GlobalWindows make the issue with reshuffling go away?
It looks like you do not necessarily need GroupByKey because you are always grouping on the same key. Instead you could maybe use CombineGlobally to append all the elements inside the window in stead of the GroupByKey (with always the same key).
combined = values | beam.CombineGlobally(append_fn).without_defaults()
combined | beam.ParDo(PostProcessFn())
I am not sure how the load distribution works when using CombineGlobally but since it does not process key,value pairs I would expect another mechanism to do the load distribution.

Implicit vs explicit garbage collection

Background:
I have created an API and I did profiling for memory usage & process time for each web service using this guide and memory profiler gem.
I created a table to keep all profiling results like this:
Request | Memory Usage(b) | Retained Memory(b) | Runtime(seconds)
----------------------------------------------------------------------------
Post Login | 444318 | 35649 | 1.254
Post Register | 232071 | 32673 | 0.611
Get 10 Users | 11947286 | 2670333 | 3.456
Get User By ID | 834953 | 131300 | 0.834
Note: all numbers are the average number of calling the same service 3 consecutive times.
And I read many guides and answers(like this and this) says that Garbage Collector is responsible for releasing memory and we should not explicitly interfere in memory management.
Then I just forced the Garbage Collector to start after each action (for test purpose) by adding the following filter in APIController:
after_action :dispose
def dispose
GC.start
end
And I found that Memory usage is reduced too much (more than 70%), retained memory almost same before and runtime is reduced as well.
Request | Memory Usage(b) | Retained Memory(b) | Runtime(seconds)
----------------------------------------------------------------------------
Post Login | 38636 | 34628 | 1.023
Post Register | 37746 | 31522 | 0.583
Get 10 Users | 2673040 | 2669032 | 2.254
Get User By ID | 132281 | 128913 | 0.782
Questions:
Is it good practice to add such filter and what are the side effects?
I thought that runtime will be more than before, but it seems less, what can be the reason?
Update:
I'm using Ruby 2.3.0 and I've used gc_tracer gem to monitor heap status because I'm afraid to have old garbage collection issues highlighted in Ruby 2.0 & 2.1
The issue is that the Ruby GC is triggered on total number of objects,
and not total amount of used memory
Then to do a stress test, I run the following:
while true
"a" * (1024 ** 2)
end
and result is that memory usage does not exceed the following limits (it was exceeding before and GC wont be triggered):
RUBY_GC_MALLOC_LIMIT_MAX
RUBY_GC_OLDMALLOC_LIMIT_MAX
So now I'm pretty sure that same GC issues of 2.0 & 2.1 don't exist anymore in 2.3, but still getting the following positive results by adding above mentioned filter (after_action :dispose):
Heap memory enhanced by 5% to 10% (check this related question)
General execution time enhanced by 20% to 40% (test done using third party tool Postman which consumes my API)
I'm still looking for answers to my two questions above.
Any feedback would be greatly appreciated

Neo4j slow concurrent merges

I have been experiencing some extremely bad slowdowns in Neo4j, and having spent a few days on the issue now, I still can't figure out why. I'm really hoping someone here can help. I've also tried the neo slack support group already, but to no avail.
My setup is as follows: the back-end is a django app that connects through the official drivers (pip package neo4j-driver==1.5.0) to a dockerized Neo4j Enterprise 3.2.3 instance. The data we write is added in infrequent bursts of around 15 concurrent merges to the same portion of the graph, and is triggered when a user interacts with some part of our product (each interaction causing a separate merge).
Each merge operation is the following query:
MERGE (m :main:entity:person {user: $user, task: $task, type: $type,
text: $text})
ON CREATE SET m.source = $list, m.created = $timestamp, m.task_id=id(m)
ON MATCH SET m.source = CASE
WHEN $source IN m.source THEN m.source
ELSE m.source + $list
END SET m.modified = $timestamp
RETURN m.task_id AS task_id;
A PROFILE of this query looks like this. As you can see, the individual processing time is in the ms range. We have tested running this 100+ times in quick succession with no issues. We have a Node key configured as in this schema.
The running system however seems to seize up and we see execution times for these queries hit as high as 2 minutes! A snapshot of the running queries looks like this.
Does anyone have any clues as to what may be going on?
Further system info:
ls data/databases/graph.db/*store.db* | du -ch | tail -1
249.6M total
find data/databases/graph.db/schema/index -regex '.*/native.*' | du -hc | tail -1
249.6M total
ps
1 root 297:51 /usr/lib/jvm/java-1.8-openjdk/jre/bin/java -cp /var/lib/neo4j/plugins:/var/lib/neo4j/conf:/var/lib/neo4j/lib/*:/var/lib/neo4j/plugins/* -server -Xms8G -Xmx8G -XX:+UseG1GC -XX:-OmitStackTraceInFastThrow -XX:+AlwaysPr
printenv | grep NEO
NEO4J_dbms_memory_pagecache_size=4G
NEO4J_dbms_memory_heap_maxSize=8G
The machine is has 16GB total memory and there is nothing else running on it.

Neo4j - failed processing and refused new connection

I am running these in an AWS sever 16GB of memory.
Using Ruby on Rails gem.
d = Description.first
CYPHER 14118ms MATCH (n:`Description`) RETURN n ORDER BY n.uuid LIMIT {limit_1} | {:limit_1=>1}
Except the long time to return the result, so far, so good.
Then
l = d.language
Description#language 723ms MATCH description24138468, description24138468<-[rel1:`DESCRIBED_IN`]-(result_language:`Language`) WHERE (ID(description24138468) = {ID_description24138468}) RETURN result_language | {:ID_description24138468=>24138468}
OK also.
But the next one fails,
l.descriptions.count
Language#descriptions 517684ms MATCH (previous:`Language`), previous-[rel1:`DESCRIBED_IN`]->(next:`Description`) WHERE (ID(previous) = {ID_previous}) RETURN ID(previous), collect(next) | {:ID_previous=>137}
Neo4j::Session::CypherError: Java heap space
There are still 9.5 GB free memory.
During the long execution of the last one I cannot connect to the server using the browser.
Don't know how to fix this, why the error and why no other connections are allowed?
The failed count statement from Ruby on Rails must return about two millions records when executed from the Neo4j shell, like this:
neo4j-sh (?)$ match (l:Language{iso_639_2_code: 'eng'})-[r:DESCRIBED_IN]-(d:Description) return count (d);
+-----------+
| count (d) |
+-----------+
| 2107041 |
+-----------+
1 row
11592 ms
Here is my wrapper.conf file:
#********************************************************************
# Property file references
#********************************************************************
wrapper.java.additional=-Dorg.neo4j.server.properties=conf/neo4j-server.properties
wrapper.java.additional=-Djava.util.logging.config.file=conf/logging.properties
#********************************************************************
# JVM Parameters
#********************************************************************
wrapper.java.additional=-XX:+UseConcMarkSweepGC
wrapper.java.additional=-XX:+CMSClassUnloadingEnabled
wrapper.java.additional=-XX:-OmitStackTraceInFastThrow
wrapper.java.additional=-XX:hashCode=5
# Uncomment the following lines to enable garbage collection logging
#wrapper.java.additional=-Xloggc:data/log/neo4j-gc.log
#wrapper.java.additional=-XX:+PrintGCDetails
#wrapper.java.additional=-XX:+PrintGCDateStamps
#wrapper.java.additional=-XX:+PrintGCApplicationStoppedTime
#wrapper.java.additional=-XX:+PrintPromotionFailure
#wrapper.java.additional=-XX:+PrintTenuringDistribution
# Java Heap Size: by default the Java heap size is dynamically
# calculated based on available system resources.
# Uncomment these lines to set specific initial and maximum
# heap size in MB.
#wrapper.java.initmemory=512
#wrapper.java.maxmemory=512
#********************************************************************
# Wrapper settings
#********************************************************************
# path is relative to the bin dir
wrapper.pidfile=../data/neo4j-server.pid
#********************************************************************
# Wrapper Windows NT/2000/XP Service Properties
#********************************************************************
# WARNING - Do not modify any of these properties when an application
# using this configuration file has been installed as a service.
# Please uninstall the service before modifying this section. The
# service can then be reinstalled.
# Name of the service
wrapper.name=neo4j
# User account to be used for linux installs. Will default to current
# user if not set.
wrapper.user=
#********************************************************************
# Other Neo4j system properties
#********************************************************************
wrapper.java.additional=-Dneo4j.ext.udc.source=debian
The query that ran out of heap space was using the collect() function in the RETURN clause. That would have attempted to return a collection of over 2 million nodes, which is probably why it ran out of space.
If you wanted to return a count instead, you should have used the count() function instead of collect().
I restarted the server with more disk space since previously I had only 6% free disk space left.
I changed some parameters as described in the comments.
For the moment things are working better.

Resources