I started using apache beam with python and I am stuck every 30 minutes. I am trying to flatten then transformation:
lines = messages | 'decode' >> beam.Map(lambda x: x.decode('utf-8'))
output = ( lines
| 'process' >> beam.Map(process_xmls) # returns list
| 'jsons' >> beam.Map(lambda x: [beam.Create(jsons.dump(model)) for model in x])
| 'flatten' >> beam.Flatten()
| beam.WindowInto(window.FixedWindows(1, 0)))
So after running this code I get this error:
ValueError: Input to Flatten must be an iterable. Got a value of type <class 'apache_beam.pvalue.PCollection'> instead.
What should I do?
The beam.Flatten() operation takes an iterable of PCollections and returns a new PCollection that contains the union of all elements in the input PCollections. It is not possible to have a PCollection of PCollections.
I think what you're looking for here is the beam.FlatMap operation. This differs from beam.Map in that it emits multiple elements per input. For example, if you have a pcollection lines that contained the elements {'two', 'words'} then
lines | beam.Map(list)
would be the PCollection consisting of two lists
{['t', 'w', 'o'], ['w', 'o', 'r', 'd', 's']}
whereas
lines | beam.FlatMap(list)
would result in the PCollection consisting of several letters
{'t', 'w', 'o', 'w', 'o', 'r', 'd', 's'}.
Thus your final program would look something like
lines = messages | 'decode' >> beam.Map(lambda x: x.decode('utf-8'))
output = ( lines
| 'process' >> beam.FlatMap(process_xmls) # concatinates all lists returned by process_xmls into a single PCollection
| 'jsons' >> beam.Map(jsons.dumps) # apply json.dumps to each element
| beam.WindowInto(window.FixedWindows(1, 0)))
(note also json.dumps, returning strings, is probably what you want instead of json.dump which takes a second argument as the file/stream to write to).
Related
I have checked the erlang websites operators etc but i cannot find what || and | Means.
I read somewhere that || means "such that" but what does just one " | " mean?
| is the "cons" operator: It puts an element in front of a list:
1> [1 | [2,3]].
[1,2,3]
2> [[1, 2] | [3,4,5]].
[[1,2],3,4,5]
|| is used in list-comprehensions. In its simplest form, it can be used as a short-hand for map:
3> [2 * X || X <- [1,2,3]].
[2,4,6]
But it becomes much more handy when you want to write multiple generators, creating the cartesian product:
4> [{X, Y} || X <- [1,2,3], Y <- [4, 5, 6]].
[{1,4},{1,5},{1,6},{2,4},{2,5},{2,6},{3,4},{3,5},{3,6}]
You can also do filter along the way. Compare:
5> [X+Y || X <- [1,2,3], Y <- [4,5,6]].
[5,6,7,6,7,8,7,8,9]
to:
6> [X+Y || X <- [1,2,3], Y <- [4,5,6], X+Y > 6].
[7,7,8,7,8,9]
The | operator is essential, in the sense that it is the canonical way how you construct a new list out of an existing head element and a tail of the list. The same notation also works in pattern matching, i.e., it is also how you deconstruct a list.
On the other hand, list comprehensions are mostly syntactic sugar: They can be written using regular function applications and hence is not fundamental to the language. But they can significantly improve readability by getting rid of syntactic noise, mimicking set-comprehension like mathematical notation directly within the language.
How can i transform a term like: 3 * y * w * t^3 in a list made of: List = [3, *, y,...], without using the following predicate:
t2l(Term, List) :-
t2l_(Term, List-X),
X = [].
t2l_(Term, [F|X]-X) :-
Term =.. [F],
!.
t2l_(Term, L1-L4) :-
Term =.. [F, A1, A2],
t2l_(A1, L1-L2),
L2 = [F|L3],
t2l_(A2, L3-L4).
Is there a simple way?
In Prolog, everything that can be expressed by pattern matching should be expressed by pattern matching.
In your case, this is difficult, because you cannot collectively distinguish the integers from other arising terms by pattern matching with the defaulty representation you are using.
In the following, I am not solving the task completely for you, but I am showing how you can solve it once you have a clean representation.
As always when describing a list in Prolog, consider using dcg notation:
term_to_list(y) --> [y].
term_to_list(w) --> [w].
term_to_list(t) --> [t].
term_to_list(i(I)) --> [I].
term_to_list(A * B) -->
term_to_list(A),
[*],
term_to_list(B).
term_to_list(A^B) -->
term_to_list(A),
[^],
term_to_list(B).
In this example, I am using i(I) to symbolically represent the integer I.
Sample query and result:
?- phrase(term_to_list(i(3)*y*w*t^i(3)), Ls).
Ls = [3, *, y, *, w, *, t, ^, 3].
I leave converting the defaulty representation to a clean one as an easy exercise.
Thanks mat for answering, i forgot to close the question. However i have created a new predicate that solve the problem:
term_string(Term, X),
string_codes(X, AList),
ascii_to_list(AList, Y).
ascii_to_list([X | Xs], [Y | Out]) :-
X >= 48,
X =< 57,
!,
number_codes(Y, [X]),
ascii_to_list(Xs, Out).
ascii_to_list([X | Xs], [Y | Out]) :-
char_code(Y, X),
ascii_to_list(Xs, Out).
ascii_to_list([], []).
I would like to be able to input a sequence of integers on one line, such as:
97, 128, 125, 17, 2
and have the Haskell program convert the input into a list of integers, such as:
[97, 128, 135, 17, 2]
so that I can do some math operations like zipWith(ing) the list with another list of integers. Having trouble with this. I tried using the read and words functions but I wasn't able to achieve the expected result. Any ideas?
One possible (again, quick'n'dirty) solution is to use read with the instance defined for lists, which expects strings in the format [item1, item2, item3...]:
convert :: String -> [Int]
convert s = read $ "[" ++ s ++ "]"
A more robust solution would be parsing with filter or similar (as shown in the other answer) or using a parsing library to do the job properly.
The problem with only using words is that the comma (,) will still be included.
A quick-and-dirty hack is probably to first map all characters instead of digits to a space:
import Data.Char(isDigit)
cnv x | isDigit x = x
| otherwise = ' '
and then use:
map read . words . map cnv :: Read b => [Char] -> [b]
demo
*Main> ((map read . words . map cnv) "97, 128, 125, 17, 2" :: [Int]
[97,128,125,17,2]
A potential problem is of course that you omit [A-z] characters, etc. Furthermore this approach is not the most efficient.
An advantage is that by using read all items that can be read are still candidates to process the stream of "words".
Why not filtering?
One can also use a filter evidently to obtain for instance only spaces and digits. For instance
map read . words . filter (\x -> isDigit x || isSpace x)
A potential problem is that it is possible that the numbers are not separated by spaces ( ), but only by commas (,), semi-colons (;), etc. Working with the above expression generates the correct result:
(map read . words . filter (\x -> isDigit x || isSpace x)) "97, 128, 125, 17, 2" :: [Int]
[97,128,125,17,2]
but
(map read . words . filter (\x -> isDigit x || isSpace x)) "97,128,125,17,2" :: [Int]
[97128125172]
doesn't.
The task you're specifying falls under the category of textual parsing. When facing such a problem the safe bet is to approach it with either the "parsec" or the "attoparsec" library. Those libraries provide APIs which abstract over parsing in a safe and composable (hence scalable) way.
Here's how you'd write the "attoparsec" parser for your task:
listOfInts :: Parser [Int]
listOfInts =
sepBy decimal separator
where
separator =
skipSpace *> char ',' *> skipSpace
Note that the provided implementation already allows you to parse a not well formed input, where the separator might have multiple or no spaces before and after the comma. Also note how simple it is to express this already complicated condition using such a parser.
Thank you all for your help. For my application, this seems to work well:
myInput <- getLine
123 23 345 23
(map read . words) myInput::[Int]
I was having a little trouble understanding why the parenthesis go where they do, but this seems to work also:
myInput <- getLine
234 34 235 465 34
map read $ words myInput::[Int]
Since I'm just using spaces to separate the numbers, I don't have to use the filter, but thanks for posting it because now I understand the syntax better.
Don
I am trying to learn some Erlang while I got stuck on these several Erlang pattern matching problems.
Given the module here:
-module(p1).
-export([f2/1]).
f2([A1, A2 | A1]) -> {A2, A1};
f2([A, true | B]) -> {A, B};
f2([A1, A2 | _]) -> {A1,A2};
f2([_|B]) -> [B];
f2([A]) -> {A};
f2(_) -> nothing_matched.
and when I execute p1:f2([x]), I received an empty list which is []. I thought it matches the 5th clause? Is that a literal can also be an atom?
When I execute p1:f2([[a],[b], a]), the result is ([b], [a]) which means it matches the first clause. However I think [a] and a are not the same thing? One is a list but the other is a literal?
Also when I execute p1:f2([2, 7 div 3 > 2 | [5,3]]) it evaluates to (2, false). I mean why 7 div 3 > 2 gets to be false? In other language such as C or Java Yeah I know 7 div 3 == 2 so it makes this statement false. But is it the same in Erlang? Because I just tried it on shell and it gives me 2.3333333.. which is larger than 2 so it will make this statement true. Can someone gives an explaination?
it is because [x] is equal to [x|[]] so it matches f2([_|B]) -> [B];. As you can see B=[] inn your case.
I think you didn't write what you want to do. in the expression [A|B], A is the first element of the list, while B is the rest of the list (so it is a list). That means that [1,2,1] will not match [A1, A2 | A1]; but [[1],2,1] or [[a,b],1,a,b] will.
First, 7 div 3 is 2. And 2 is not greater than 2, it's equal.
Secondly, [x, y] = [x | [y] ], because the right (or rest) part is always a list. That's why you get in the first clause.
I have started learning Erlang recently and came across the following error while trying to pattren match
The following expression is working fine:
{A,_,[B|_],{B}}={abc,23,[22,x],{22}}.
Resulting in
A = abc
B = 22
The following expression is not working:
{A,_,[_|B],{B}}={abc,23,[22,x],{x}}.
Is resulting in
** exception error: no match of right hand side value {abc,23,[22,x],{x}}
However if I replace the ',' in [22 , x with a | like the following its working find and bounding x to B
{A,_,[_|B],{B}}={abc,23,[22|x],{x}}.
{abc,23,[22|x],{x}}
B.
x
Any explanation about this would highly appreciated.
Many thanks in advance
The operator | is used for a recursive definition of a list: [A|B] means that you add the element A to an existing list B. A is the first element of the resulting list, called the head, B is the rest of the list called tail. B can be also split into a head and a tail, and the process can continue until the tail is equal to the empty list [].
The operator , is a separator between list elements, so [A,B] is a list of 2 elements A and B.
The 2 operators can be combined: [A,B,C|D] is a list of at least 3 elements, which are A, B and C, and a tail D which can be empty.
In your test you used another syntax: [23|x]; 23 can be an element of a list (in fact any erlang term can be an element of a list) but x is an atom and cannot be a list tail. Doing this you broke the recursive definition of the list, this structure is not often used and is called an improper list.
when you match [_|B] and [_,x], you assign [x] to B which do not match to x later in the expression
when you match [_|B] and [_|x], you assign x to B which indeed match to x later in the expression, but the right way should be
{A,_,[_|B],{B}}={abc,23,[22,x],{[x]}}.
You need to look closer on how does the | operator works. It basically takes head of list, which is one element, and returns tail of list, which is all the rest. And like "all" suggest tail is also a list. It could be one element list, it could be even empty list, but still it's gonna be a list.
> [Head| Tail] = [23,x].
[23,x]
> Head.
23
> Tail
[x].
So in your pattern matching, you assign to be tail [x], and than try to pattern match on simply x. And that's what's failing.
On side note: you can create new list with | operator, but you should do this with caution. since you could create improper list (and you do with [23 | x]). That's why your "fix" is working.
If you would like to match on two element list, you could do it explicitly with
[A, B] = [23, x].
but this will fail if list have more or less elements.
If you would like to match on only on two first elements, you can still use | operator.
> [A, B | Rest] = [23, x].
[23, x]
> A.
23
> B.
x
> Rest.
[].
And this will fail only with one-element or empty list.