What are messages in actor programming? - message

This question describes what actors are in actor programming. What are messages? How can you avoid shared state if you send an object in a message (assuming objects exist in actor programming)?

If we think actors are People, then messages are like ... messages.
Say a Boss want to square root of a list of numbers a, and don't want to do all the calculating himeself. He can hire some Workers, and the Boss will know their Phone numbers.
So the Boss will SMS each of the Workers, and tell them to "Find the square root of a_i; reply me at 555-1234 after you've done.". This instruction is a message. The Boss will then wait for the Workers to finish.
+------+ sqrt(i=0, a_i=9) +------------+
| Boss | ------------------------> | Worker 0 |
+------+ +------------+
| sqrt(i=1, a_i=16) +------------+
‘--------------------------> | Worker 1 |
+------------+
....
After the Workers completed the computation, they will send an SMS back the Boss and report the results. This is also done in message passing.
+------+ set_result(i=0, val=3) +------------+
| Boss | <------------------------ | Worker 0 |
+------+ +------------+
^ set_result(i=1, val=4) +------------+
‘--------------------------- | Worker 1 |
+------------+
....
This sounds like Object-oriented programming, but there are no order to which when a message is sent or received — they are passed asynchronously. (However, within the actor themselves, messages are received and queued synchronously.)
When written in code, it may be like
actor Boss:
receive('run'):
worker_addrs = spawn_many(SqrtWorker, len(a)) # hire workers.
for i, addr in enumerate(worker_addrs):
send(addr, 'sqrt', reply_addr=self, i=i, a_i=a[i])
receive('set_value', i, val):
a[i] = val
actor SqrtWorker:
receive('sqrt', reply_addr, i, a_i):
send(reply_addr, 'set_value', i, sqrt(a_i))
quit()
There are no "shared states problem", because a state cannot be shared without copying. In my example above, the elements of list a are copied to each worker. In fact, only the Boss knows the existence of a — this is a local state.
Now what if we really want to make a shared? In actor model, we would convert them into a new actor, and the Phone number of this actor are sent to the Worker.
+------+ sqrt(i=0, a_phoneNum=555-1111) +----------+
| Boss | -------------------------------> | Worker 0 |
+------+ +----------+
+---+
| a |
+---+
The Worker then ask the list actor for the required info (it is possible because the Boss has given the Phone number of a to the Worker.)
+------+ +----------+
| Boss | | Worker 0 |
+------+ +----------+
|
+---+ |
| a | <---------------------------’
+---+ get(i=0)
Some time later the list replies...
+------+ +----------+
| Boss | | Worker 0 |
+------+ +----------+
^
+---+ list_val(i=0, val=9) |
| a | ----------------------------’
+---+
then the Worker can compute the square root after receiving the message list_val.
+------+ set_result(i=0, val=3) +----------+
| Boss | <------------------------------ | Worker 0 |
+------+ +----------+
+---+
| a |
+---+
The Boss finally updates the shared state
+------+ +----------+
| Boss | | Worker 0 |
+------+ +----------+
| set(i=0, val=3)
| +---+
‘------> | a |
+---+
Will there be any problems accessing shared state like this? No — since messages received by a must be run synchronously, all read/write actions will be interfere with each other. Thus there are no need to mess with mutexes.

Related

Time span accumulating fact tables design

I need to design a star schema to process order processing. The progress of an order look like this:
Customer C place an order on item I with quantity 100
Factory F1 take the order partially with quantity 30
Factory F2 take the order partially with quantity 20
Buy from market 50 items
F1 delivery 20 items
F1 delivery 7 items
F1 cancel the contract (we need to buy 3 more item from market)
F2 delivery 20 items
Buy from market 3 items
Complete the order
How can I design a fact table in this case, since the number of step is not fixed, the data types of event is not the same.
I'm sorry for my bad English.
The definition of an Accumulating Snapshot Fact table according to Kimball is:
summarizes the measurement events occurring at predictable steps between the beginning and the end of a process.
For this particular use case I would go with a Transaction Fact Table as the events (steps) are unpredictable, it is more like an event fact table, something similar to logs or audits.
| order_key | date_key | full_datetime | entity_key (customer, factory, etc. varchar) | entity_type | state | quantity |
|-----------|----------|---------------------|----------------------------------------------|-------------|----------|----------|
| 1 | 20190602 | 2019-06-02 04:30:00 | C1 | customer | request | 100 |
| 1 | 20190602 | 2019-06-02 05:30:00 | F1 | factory | receive | 30 |
| 1 | 20190602 | 2019-06-02 05:30:00 | F2 | factory | receive | 20 |
| 1 | 20190602 | 2019-06-02 05:40:00 | Company? | company | buy | 50 |
| 1 | 20190603 | 2019-06-03 06:40:00 | F1 | factory | deliver | 20 |
| 1 | 20190603 | 2019-06-03 02:40:00 | F1 | factory | deliver | 7 |
| 1 | 20190603 | 2019-06-03 04:40:00 | F1 | factory | deliver | 3 |
| 1 | 20190603 | 2019-06-03 06:40:00 | F1 | factory | cancel | |
| 1 | 20190604 | 2019-06-04 07:40:00 | F2 | factory | deliver | 20 |
| 1 | 20190604 | 2019-06-04 07:40:00 | Company? | company | buy | 3 |
| 1 | 20190604 | 2019-06-04 09:40:00 | Company? | company | complete | 100 |
I'm not sure about your reporting needs as they were not specified, but assuming you need to measure lag/durations of unpredictable steps, you could PIVOT and use dynamic SQL to create the required view
SQL Server dynamic PIVOT query?
Let me know if you came up with something different as I'm interested on this particular use case. Good luck

Finding which child is using up all my memory in Erlang

I am troubleshooting a crashing Erlang program. It runs out of memory. It has several children started by OTP (one_for_one in the supervisor), and some started with spawn.
I am starting the program and falling into the Erlang prompt (test#test)1>. I'd like to see how much memory each of these children is using from here. I've searched online and not found anything, but this seems like a common enough need to already have a solution.
How can I find the memory utilization of each child, in Erlang, from the system prompt?
Did you try observer?
when you get the prompt, type observer:start(), then in the Application tab, you can see all the applications for each of them the processes. For each process you can get the memory usage by opening the process_info sub window.
Try erlang:process_info/2 with memory in ItemList
process_info(Pid, ItemList) -> InfoTupleList | [] | undefined
Types
Pid = pid()
ItemList = [Item]
Item = process_info_item()
InfoTupleList = [InfoTuple]
InfoTuple = process_info_result_item()
process_info_item() =
backtrace |
binary |
catchlevel |
current_function |
current_location |
current_stacktrace |
dictionary |
error_handler |
garbage_collection |
garbage_collection_info |
group_leader |
heap_size |
initial_call |
links |
last_calls |
memory |
message_queue_len |
messages |
min_heap_size |
min_bin_vheap_size |
monitored_by |
monitors |
message_queue_data |
priority |
reductions |
registered_name |
sequential_trace_token |
stack_size |
status |
suspending |
total_heap_size |
trace |
trap_exit

Neo4j - count very slow

I am running this query (bisac_code is uniquely indexed).
Execution time is more than 2.5 minutes.
52 main codes are selected from almost 4000 in total.
The total number of wokas is very large, 19 million nodes.
Are there any possibilities to make it run faster?
neo4j-sh (?)$ MATCH (b:Bisac)-[r:INCLUDED_IN]-(w:Woka)
> WHERE (b.bisac_code =~ '.*000000')
> RETURN b.bisac_code as bisac_code, count(w) as wokas_count
> ORDER BY b.bisac_code
> ;
+---------------------------+
| bisac_code | wokas_count |
+---------------------------+
| "ANT000000" | 13865 |
| "ARC000000" | 32905 |
| "ART000000" | 79600 |
| "BIB000000" | 2043 |
| "BIO000000" | 256082 |
| "BUS000000" | 226173 |
| "CGN000000" | 16424 |
| "CKB000000" | 26410 |
| "COM000000" | 44922 |
| "CRA000000" | 18720 |
| "DES000000" | 2713 |
| "DRA000000" | 62610 |
| "EDU000000" | 228182 |
| "FAM000000" | 42951 |
| "FIC000000" | 474004 |
| "FOR000000" | 41999 |
| "GAM000000" | 8803 |
| "GAR000000" | 37844 |
| "HEA000000" | 36939 |
| "HIS000000" | 3908869 |
| "HOM000000" | 5123 |
| "HUM000000" | 29270 |
| "JNF000000" | 40396 |
| "JUV000000" | 200144 |
| "LAN000000" | 89059 |
| "LAW000000" | 153138 |
| "LCO000000" | 1528237 |
| "LIT000000" | 89611 |
| "MAT000000" | 58134 |
| "MED000000" | 80268 |
| "MUS000000" | 75997 |
| "NAT000000" | 35991 |
| "NON000000" | 107513 |
| "OCC000000" | 42134 |
| "PER000000" | 26989 |
| "PET000000" | 4980 |
| "PHI000000" | 72069 |
| "PHO000000" | 8546 |
| "POE000000" | 104609 |
| "POL000000" | 309153 |
| "PSY000000" | 55710 |
| "REF000000" | 96477 |
| "REL000000" | 133619 |
| "SCI000000" | 86017 |
| "SEL000000" | 40901 |
| "SOC000000" | 292713 |
| "SPO000000" | 172284 |
| "STU000000" | 10508 |
| "TEC000000" | 77459 |
| "TRA000000" | 9093 |
| "TRU000000" | 12041 |
| "TRV000000" | 27706 |
+---------------------------+
52 rows
198310 ms
And the response time is not consistent.
After a while drops to less than half of a minute.
52 rows
31207 ms
In Neo4j 2.3 there will be index support for prefix LIKE searches but probably not for postfix ones.
There are two ways of making #user2194039's solution faster:
Use path expression to count the Woka per Bisac:
MATCH (b:Bisac) WHERE (b.bisac_code =~ '.*000000')
WITH b, size((b)-[:INCLUDED_IN]->()) as wokas_count
RETURN b.bisac_code as bisac_code, wokas_count
ORDER BY b.bisac_code
Mark the Bisac's with that pattern with a label
MATCH (b:Bisac) WHERE (b.bisac_code =~ '.*000000') SET b:Main;
MATCH (b:Main:Bisac)
WITH b, size((b)-[:INCLUDED_IN]->()) as wokas_count
RETURN b.bisac_code as bisac_code, wokas_count
ORDER BY b.bisac_code;
The slow speed is caused by your regular expression pattern matching (=~ ). Although your bisac_code is indexed, the regex match causes the index to be ineffective. The index only works when you are matching full bisac_code values.
Cypher does include some string manipulation facilities that might let you get by without using a regex =~, but I doubt it would make any difference, because the index will still be useless.
I might suggest considering if you can further categorize your bisac_codes so that you do not need to do a pattern match. Maybe an extra indexed property that somehow denotes those codes that end in 000000?
If you do not want to add properties, you may try matching only the Bisacs first, and then including the Wokas. Something like this:
MATCH (b:Bisac) WHERE (b.bisac_code =~ '.*000000')
WITH b
MATCH (b)-[r:INCLUDED_IN]-(w:Woka)
RETURN b.bisac_code as bisac_code, count(w) as wokas_count
ORDER BY b.bisac_code
This may help Cypher stick to the 4000 Bisac nodes while doing the pattern match, before getting involved with all 19 million Woka nodes, but I am not sure if this will make a material difference. Even slogging through 4000 nodes (effectively without an index) is a slow process.
Hash Tables in Database Indexing
The reason that your index is ineffective for regex pattern matching is that Neo4j likely uses a hash table for indexing properties. This is common of many databases. Wikipedia has an article here.
The basics though are that the index is not storing all of the properties that you want to search through. It is storing values that represent the properties you want to search through, and the representation is only valid for the whole property. If you are searching for only a part of the property value, the hashes stored in the index are useless, and the database must search through the properties the old-fashioned way -- one by one.
Edit re: your edit
The improvement in response time after running this query multiple times is certainly due to caching. Neo4j is remembering that you access the Bisac nodes and bisac_code properties frequently, and is keeping them in memory. This makes future queries faster because the values do not need to be read off disk.
However, eventually, those nodes a properties will likely be dropped from the cache, as Neo4j finds you manipulating different nodes, which it will cache instead. There are only so many nodes Neo4j can cache before running out of memory, so it picks the most recent and/or frequently used data.

neo4j benchmark, multiple queries, measure time

Is there any way that I can perform my benchmarks for multiple queries in neo4j?
Assuming that I have loaded my graph, I want to initiate 10000 distinct shortest path queries in the database, without loading the data to a client. Is there a way that I can do this in a batch and get the execution times?
Try using the profile keyword inside of the neo4j-shell. This will give you some basic facts about how quickly, and how a query executes.
Here's a simple example:
neo4j-sh (?)$ CREATE (a {label:"foo"})-[:bar]->(b {label: "bar"})-[:bar]->(c {label: "baz"});
+-------------------+
| No data returned. |
+-------------------+
Nodes created: 3
Relationships created: 2
Properties set: 3
1180 ms
neo4j-sh (?)$ profile match (a {label: "foo"}), (c {label: "baz"}), p=shortestPath(a-[*]-c) return p;
+--------------------------------------------------------------------------------------+
| p |
+--------------------------------------------------------------------------------------+
| [Node[0]{label:"foo"},:bar[0]{},Node[1]{label:"bar"},:bar[1]{},Node[2]{label:"baz"}] |
+--------------------------------------------------------------------------------------+
1 row
ColumnFilter
|
+ShortestPath
|
+Filter(0)
|
+AllNodes(0)
|
+Filter(1)
|
+AllNodes(1)
+--------------+------+--------+-------------+-----------------------------------------+
| Operator | Rows | DbHits | Identifiers | Other |
+--------------+------+--------+-------------+-----------------------------------------+
| ColumnFilter | 1 | 0 | | keep columns p |
| ShortestPath | 1 | 0 | p | |
| Filter(0) | 1 | 6 | | Property(c,label(0)) == { AUTOSTRING1} |
| AllNodes(0) | 3 | 4 | c, c | |
| Filter(1) | 1 | 6 | | Property(a,label(0)) == { AUTOSTRING0} |
| AllNodes(1) | 3 | 4 | a, a | |
+--------------+------+--------+-------------+-----------------------------------------+
This other answer indicates that you're usually looking for lower DbHits values to be indicative of better performance, since those are expensive.
The WebAdmin tools (usually at http://localhost:7474/webadmin/ for a local neo4j installation), has Data browser and Console tabs that allow you to enter your query, see the results, and also see the actual time it took to perform the query.
Interestingly, from my limited testing of the Data browser and Console tabs, the latter seems to report faster query times for the same queries. So, the Console probably has less overhead, possibly making its timing results a bit more accurate.

Keeping JavaScript state in an ePub's spine

Is it possible to keep code state across pages in an ePub? More specifically do readers like iBooks allow this type of state?
spine.js
+---------+----------+
| | |
+--------+ +--------+ +--------+
| Page 1 | | Page 2 | | Page |
| Quiz1 | | Quiz2 | | (n) |
| | | | | Result |
| | | | | |
+--------+ +--------+ +--------+
In this example, the last page could contain a score but state is required. WebSQL is out of the question since it's not supported by webkit ereaders and websockets demand a connection. Any thoughts?
No. Each HTML file is independent. To share information, you'll need to use some kind of local storage such as window.localStorage, but it's very hard to find out what device supports what level of HTML5.
UPDATE: This thread says localStorage is in fact supported.

Resources