Apache Beam (GCP Dataflow) - how to get pipeline execution result when using with statement - google-cloud-dataflow

We can get the pipeline result from pipeline.run() method.
pipeline = beam.Pipeline(options=pipeline_options)
lines = pipeline | 'Read' >> ReadFromText(input_bucket_path)
counts = (
lines
| 'Split' >> (beam.ParDo(WordExtractingDoFn()).with_output_types(str))
| 'PairWithOne' >> beam.Map(lambda x: (x, 1))
| 'GroupAndSum' >> beam.CombinePerKey(sum))
def format_result(word, count):
return '%s: %d' % (word, count)
output = counts | 'Format' >> beam.MapTuple(format_result)
# pylint: disable=expression-not-assigned
output | 'Write' >> WriteToText(output_bucket_path)
result = pipeline.run()
When using with statement, how to get the pipeline result to use it to cancel it?
pipeline_result = None
with beam.Pipeline(options=pipeline_options) as pipeline:
lines = pipeline | 'Read' >> ReadFromText(input_bucket_path)
counts = (
lines
| 'Split' >> (beam.ParDo(WordExtractingDoFn()).with_output_types(str))
| 'PairWithOne' >> beam.Map(lambda x: (x, 1))
| 'GroupAndSum' >> beam.CombinePerKey(sum))
def format_result(word, count):
return '%s: %d' % (word, count)
output = counts | 'Format' >> beam.MapTuple(format_result)
# pylint: disable=expression-not-assigned
output | 'Write' >> WriteToText(output_bucket_path)

Related

Need help about my Connect 4 Deep Reinforcement Learning Agent

Here is my code
My env setting
state: np.array, 1 * 6 * 7 (1 input channel for CNN, 6 * 7 the size of the board), 0 = nothing, 1 = agent1's token, -1 =
agent2's token
reward function: 0 if draw or nothing happen, 1 if win, -1 if lose
I know that Connect 4 is a solved game, but I want to try making two value based method agents to learn the game from scratch. I tried ANN with flatten the state as input, and output 7 value(argmax to get action), but this method make the agent converge to a stupid strategy.
Example
_ | _ | _ | _ | _ | _ | _
_ | _ | _ | _ | _ | _ | _
_ | X | _ | _ | _ | _ | _
_ | X | _ | O | _ | _ | _
_ | X | _ | O | _ | _ | _
_ | X | _ | O | _ | _ | _
Then the agent who go first always win with this "strategy".
Then, I tried CNN and n step learning but it still converge to the "strategy", here is the CNN architecture
class Network(nn.Module):
def __init__(self, output_dim: int, learning_rate: float) -> None:
super(Network, self).__init__()
self.feature_layers = nn.Sequential(
nn.Conv2d(1, 64, 4),
nn.ReLU(),
nn.Conv2d(64, 32, 2),
nn.ReLU(),
nn.Conv2d(32, 16, 2),
nn.Flatten(),
)
self.advantage_layers = nn.Sequential(
nn.Linear(32, 16),
nn.ReLU(),
nn.Linear(16, output_dim)
)
self.value_layers = nn.Sequential(
nn.Linear(32, 8),
nn.ReLU(),
nn.Linear(8, 1)
)
self.optimizer = optim.RMSprop(self.parameters(), lr = learning_rate)
self.loss = nn.MSELoss()
def forward(self, x: torch.Tensor) -> torch.Tensor:
feature = self.feature_layers(x)
advantage = self.advantage_layers(feature)
value = self.value_layers(feature)
return value + advantage - advantage.mean(dim = -1, keepdim = True)
Is there anything I did wrong
(I know that MCTS is very good for this case, but I am still learning value based method)

Apache Beam | Python | Dataflow - How to join BigQuery' collections with different keys?

I've faced the following problem.
I'm trying to use INNER JOIN with two tables from Google BigQuery on Apache Beam (Python) for a specific situation. However, I haven't found a native way to deal with it easily.
This query output I'm going to fill a third table on Google BigQuery, for this situation I really need to query it on Google Dataflow. The first table (client) key is the "id" column, and the second table (purchase) key is the "client_id" column.
1.Tables example (consider 'client_table.id = purchase_table.client_id'):
client_table
| id | name | country |
|----|-------------|---------|
| 1 | first user | usa |
| 2 | second user | usa |
purchase_table
| id | client_id | value |
|----|-------------|---------|
| 1 | 1 | 15 |
| 2 | 1 | 120 |
| 3 | 2 | 190 |
2.Code I'm trying to develop (problem in the second line of 'output'):
options = {'project': PROJECT,
'runner': RUNNER,
'region': REGION,
'staging_location': 'gs://bucket/temp',
'temp_location': 'gs://bucket/temp',
'template_location': 'gs://bucket/temp/test_join'}
pipeline_options = beam.pipeline.PipelineOptions(flags=[], **options)
pipeline = beam.Pipeline(options = pipeline_options)
query_results_1 = (
pipeline
| 'ReadFromBQ_1' >> beam.io.Read(beam.io.ReadFromBigQuery(query="select id as client_id, name from client_table", use_standard_sql=True)))
query_results_2 = (
pipeline
| 'ReadFromBQ_2' >> beam.io.Read(beam.io.ReadFromBigQuery(query="select * from purchase_table", use_standard_sql=True)))
output = ( {'query_results_1':query_results_1,'query_results_2':query_results_2}
| 'join' >> beam.GroupBy('client_id')
| 'writeToBQ' >> beam.io.WriteToBigQuery(
table=TABLE,
dataset=DATASET,
project=PROJECT,
schema=SCHEMA,
create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
write_disposition=beam.io.BigQueryDisposition.WRITE_TRUNCATE))
pipeline.run()
3.Equivalent desired output in SQL:
SELECT a.name, b.value * from client_table as a INNER JOIN purchase_table as b on a.id = b.client_id;
You could use either a CoGroupByKey or side inputs (as a broadcast join) depending on your key cardinality. If you have a few keys with many elements each, I suggest the broadcast join.
The first thing you'd need to do is to add a key to your PCollections after the BQ read:
kv_1 = query_results_1 | Map(lambda x: (x["id"], x))
kv_2 = query_results_1 | Map(lambda x: (x["client_id"], x))
Then you can just do the CoGBK or broadcast join. As an example (since it would be easier to understand), I am going to use the code from this session of Beam College. Note that in your example the Value of the KV is a dictionary, so you'd need to make some modifications.
Data
jobs = [
("John", "Data Scientist"),
("Rebecca", "Full Stack Engineer"),
("John", "Data Engineer"),
("Alice", "CEO"),
("Charles", "Web Designer"),
("Ruben", "Tech Writer")
]
hobbies = [
("John", "Baseball"),
("Rebecca", "Football"),
("John", "Piano"),
("Alice", "Photoshop"),
("Charles", "Coding"),
("Rebecca", "Acting"),
("Rebecca", "Reading")
]
Join with CGBK
def inner_join(element):
name = element[0]
jobs = element[1]["jobs"]
hobbies = element[1]["hobbies"]
joined = [{"name": name,
"job": job,
"hobbie": hobbie}
for job in jobs for hobbie in hobbies]
return joined
jobs_create = p | "Create Jobs" >> Create(jobs)
hobbies_create = p | "Create Hobbies" >> Create(hobbies)
cogbk = {"jobs": jobs_create, "hobbies": hobbies_create} | CoGroupByKey()
join = cogbk | FlatMap(inner_join)
Broadcast join with Side Inputs
def broadcast_inner_join(element, side_input):
name = element[0]
job = element[1]
hobbies = side_input.get(name, [])
joined = [{"name": name,
"job": job,
"hobbie": hobbie}
for hobbie in hobbies]
return joined
hobbies_create = (p | "Create Hobbies" >> Create(hobbies)
| beam.GroupByKey()
)
jobs_create = p | "Create Jobs" >> Create(jobs)
boardcast_join = jobs_create | FlatMap(broadcast_inner_join,
side_input=pvalue.AsDict(hobbies_create))

F# pattern-matching gone wrong

I am just starting out with F# so this might be a trivial question but I am not able to understand why the pattern matching in my code acts as it does.
Quick explanation of the code:
The func calcNextMatch should recurse a list and if 2 elements are equal they should be added together.
In the end the func should return a number that is the addition of all digits that has a match with the next digit in the list.
f.ex. [1;3;2;2;5] should return 4
Code:
let rec printList l =
match l with
| head :: tail -> printf "%d " head; printList tail
| [] -> printfn ""
let rec calcNextMatch list =
printList list
match list with
| [] -> 0
| _ :: tail ->
printList tail
let h = Seq.head list
let t = Seq.tryHead tail
printfn "h: %i" h
printfn "t: %O" t
match t with
| Some h ->
printfn "TAIL t: %i is equal to HEAD h: %i" t.Value h
printfn "Calculation is: %i" (t.Value + h)
(t.Value + h) + calcNextMatch tail
| _ -> calcNextMatch tail
let sequence = [ 1;3;2;2;5 ]
let run = calcNextMatch sequence
When I run this code the problem is that the pattern-matching
does not work as I expect it.
f.ex this print output from running the script.
h: 1
t: Some(3)
TAIL t: 3 is equal to HEAD h: 3
this means that F# has matched
match t with
| Some h ->
in a case where t = Some(3) and h = 1
which translates to
match 3 with
| Some 1 ->
and that I do not understand.
The print before the matching states the value of t and h to 3 and 1 but in the pattern-matching the value of h has change to 3
How is this possible?
You can only pattern match against constant literals, otherwise the value get bounded as if was a new let-binding.
In these cases what you do normally is to add a when condition:
match t with
| Some x when x = h ->
Also notice that you can use pattern match further to simplify your code, for instance here:
| _ :: tail ->
printList tail
let h = Seq.head list
You can write:
| h :: tail ->
printList tail
Also all this portion:
| _ :: tail ->
printList tail
let h = Seq.head list
let t = Seq.tryHead tail
printfn "h: %i" h
printfn "t: %O" t
match t with
| Some h ->
printfn "TAIL t: %i is equal to HEAD h: %i" t.Value h
printfn "Calculation is: %i" (t.Value + h)
(t.Value + h) + calcNextMatch tail
becomes:
| h :: tail ->
printList tail
//printfn "h: %i" h
//printfn "t: %O" t
match tail with
| t::_ when t = h ->
printfn "TAIL t: %i is equal to HEAD h: %i" t h
printfn "Calculation is: %i" (t + h)
(t + h) + calcNextMatch tail
And you can unify all matches in one, so your whole function becomes:
let rec calcNextMatch list =
printList list
match list with
| [] -> 0
| h::x::tail when x = h -> x + h + calcNextMatch (x::tail)
| _::tail -> calcNextMatch tail
Finally, when you're done with debugging, you can remove the prints and since the last parameter of your function is the one you match against, you can use the keyword function, also use an as pattern to avoid reconstructing the list:
let rec calcNextMatch = function
| [] -> 0
| h::((x::_) as tail) when x = h -> x + h + calcNextMatch tail
| _::tail -> calcNextMatch tail

Misinterpreted grammar

I have the following grammar piece:
SlotConstraint:
lExpr = [Slot] pred = ('in' | 'inn' | 'from' | 'fromm' | 'is') rExpr = SetSexpr |
lExpr = [Slot] pred = ('in' | 'inn' | 'from' | 'fromm' | 'is')? neg = ('not' | 'not in' | 'not from') rExpr = SetSexpr
;
When I write something like this - a in b or a is not in b it is fine. However I am not able to write a is not b. The question is: why it understands not in or not from but not plain not?
Thanks
do not use whitespace in keywords

Getting substring by index position

i need to extract a substring from a line using the initial and final position. I think it should be easily done with grep, but still haven't figured out how.
An example, a line of n chars, i want to extract the substring starting at k position and ending in the l position of that line.
With obviously l, k < n and l > k
Cut is a good choice for this. You can select ranges and individual fields:
$ echo "123456" | cut -c2-4
234
$ echo "123456" | cut -c1,3,6
136
$ echo "123456" | cut -c1-3,6
1236
Why not use awk?
echo "12345678" | awk '{print substr($0, 3, 2);}'
# prints '34'
If you're using bash you could just do:
LINE="strings"
K=3 ## 4th character starting from index 0
L=5 ## 6th character starting from index 0
echo "${LINE:K:L - K + 1}"
On a loop for a file:
while -r read LINE; do echo "${LINE:K - 1:L - K}"; done < file
As for awk basing that L means the position and not the length where 0 is the starting index:
awk -v k="3" -v l="5" '{ n = length; print substr($0, k, l - k + 1);}' < file
Using sed:
k=3
echo '123456789' | sed 's/^.\{'$k'\}//'
Output:
456789
Is this what your looking for?
String test = "ateststring";
String testa = test.substring(0,4);
String test2 = testa.substring(0,3);
int testc = test.indexOf("test");
String test3 = test.substring(testc,5);
System.out.println(test + ":" + testa + ":" + test2 + ":" +test3);

Resources