I have one line data like this:
a\tb1,b2,..,bn\tc1,c2,..,cn
in which n is uncertain. And now, I want transform it to some lines like this:
a\tb1\tc1
a\tb2\tc2
...
a\tbn\tcn
Is it possible by pig latin, or has to use UDF?
If using the script:
A = LOAD 'file' AS (a, b, c);
B = FOREACH A GENERATE a, FLATTEN(TOKENIZE(b)), FLATTEN(TOKENIZE(c));
dump B;
I will get the resulr as following:
a\tb1\tc1
a\tb1\tc2
..
a\tb1\tcn
a\tb2\tc1
a\tb2\tc2
..
a\tb2\tcn
..
It isn't the data I wanted. Does anyone have ideas?
IMO too many people who use Pig are resistant to write UDFs. In your case, the UDF you'd need to do this is fairly simple. Here's sample code (untested)
public class InSequenceJoin extends EvalFunc<DataBag>
{
public DataBag exec(Tuple input) throws IOException {
String b = (String) input.get(0);
String c = (String) input.get(1);
String[] bArray = b.split(",");
String[] cArray = c.split(",");
DataBag bag = BagFactory.getInstance().newDefaultBag();
for (int i = 0; i < bArray.length && i < cArray.length; i++) {
Tuple tuple = TupleFactory.getInstance.newTuple(2);
tuple.set(0, bArray[i]);
tuple.set(1, cArray[i]);
bag.add(tuple);
}
return bag;
}
}
define InSequenceJoin mysourcepath.InSequenceJoin();
A = LOAD 'file' AS (a, b, c);
B = FOREACH A GENERATE a, FLATTEN(InSequenceJoin(b,c));
dump B;
You could add validation on if the sizes of the arrays match if you need to in the UDF. You could replace the String split I used in example with whatever you truly require.
I'd try to use datafu's bag UDFs.
Load the data as you've done, then use Enumerate to enumerate the bag elements, then flatten (which gives you the cross join between the bag elements as you've seen) and then you can filter on the indexes added to the bag elements.
See here: https://github.com/linkedin/datafu
Related
I have a list of elements and I need to get a list containing the first element followed by every nth element afterwards. For example: given n = 3 and the list [banana, cherry, apple, pear, kiwi], I need to get the list [banana, pear]. I need this regardless of specific content, since the list depends on user input.
How do I do this using Dart?
You may access list in dart by providing an index like for example:
List<String> fruits = ["banana","cherry","apple","pear","kiwi"];
print(fruits[0]); // Will print to the console "banana";
On your case, you are trying to access index 0 and index 3 which is "banana" and "pear".
You may create a function that accepts an index like:
String getFruit(int index, List<String> fruits) => fruits[index];
print(getFruit[0]); // Will print "banana";
or if you need to actually get the specific ranges you may use:
List<String> fruits =["banana","cherry","apple","pear","kiwi"].getRange(0,4);
// Will give you "banana","cherry","apple","pear
You may check : https://api.dart.dev/be/180791/dart-core/List-class.html for more information.
Edited answer based off the comment:
List<String> getElements(List userInput, nIndex){
List elements = [];
for(int x = 0; x<userInput.length;x++){
if(x % nIndex == 0){
elements.add(userInput[x]);
}
}
return elements;
}
List fruits = ["banana","cherry","apple","pear","kiwi"];
print(getElements(fruits,2));
or you may try to look and use List.retainWhere() depending on your use case.
Dart has a great set of collection operators that make this type of problem pretty straightforward to solve. For example, we could do something like:
extension X<T> on List<T> {
List<T> everyNth(int n) => [for (var i = 0; i < this.length; i += n) this[i]];
}
main() {
final fruit = ["banana", "cherry", "apple", "pear", "kiwi"];
print(fruit.everyNth(3));
}
Output:
[banana, pear]
You can use this extension method, which will work on lists of any type:
extension GetEveryN<T> on List<T> {
List<T> elementsEveryN(int n) {
List<T> result = [];
for(int index = 0; index < length; index +=1) {
if(index % n == 0) {
result.add(this[index]);
}
}
return result;
}
}
Trying it in an example:
List<String> list = ["banana", "cherry","apple", "pear","kiwi"];
print(list.elementsEveryN(2)); // [banana, pear]
I have a situation where I have a list that can be at most 4 elements.
However, if I have only 1-3 elements to put in that list, how can I fill the remainder with null values?
For example, a List<int?> of length 4 with 2 given elements, should result in:
[1,3] -> [1,3,null,null]
Here's what I'm doing, but maybe there is a better way
List.generate(4, (index) {
try {
final id = given.elementAt(index);
return id;
} catch (error) {
return null;
}
});
The simplest version would probably be:
[for (var i = 0; i < 4; i++) i < given.length ? given[i] : null]
You can use List.generate, but in this case, the function is simple enough that a list literal can do the same thing more efficiently.
(Or, as #jamesdlin says, if you want a fixed-length list, use List.generate).
A more indirect variant could be:
List<GivenType?>.filled(4, null)..setAll(0, given)
where you create a list of four nulls first, then write the given list into it. It's probably just more complicated and less efficient.
for example, I have the next table. And I want to query all values which have the first cell "Computer". I have tried QUERY formula "QUERY(A1,B3;"select B where A = 'Computer'")" but it returns only the first B value - Keyboard.
Is it possible to return all values? Thanks.
I understand that you want to read a table composed of different pieces like the one that you posted, and if a match is found on the left column the script will need to return the contents of the right column. If my assumption is incorrect, please forgive me. An example initial table can look like this one:
In the example we will return the rows that contain PRAESENT in the first column. In that case, you can use the following Apps Script:
function so61913445() {
var sheet = SpreadsheetApp.getActive().getActiveSheet();
var data = sheet.getDataRange().getValues();
var matchData = [];
var indexColumn = 0; // Column A
var targetIndex = "PRAESENT";
for (var r = 0; r < data.length; r++) {
for (var c = 0; c < data[0].length; c++) {
if (c == indexColumn && data[r][c] == targetIndex) {
matchData.push(sheet.getRange(r + 1, c + 2, 3, 1).getValues());
}
}
}
for (var r = 0; r < matchData.length; r++) {
for (var c = 0; c < matchData[0].length; c++) {
sheet.appendRow(matchData[r][c]);
}
}
}
In the code, the first step is to declare some variables to read the sheet (using the methods SpreadsheetApp.getActive() and Spreadsheet.getActiveSheet()), the data (with Sheet.getRange() and Range.getValues() methods) and some settings like the target word to match. After that, the code iterates over the data and, if the target word is found, the code will add the contents of the right column into the final array. Finally, the code will repeat the iteration to write the data just under the table using Sheet.appendRow(). The final result will look like this:
Please, ask me any question if you still need some help.
if I have some text in a String like:
"abc=123,def=456,ghi=789"
how could I create a populated HashMap<String,Int> object for it in the easiest, shortest amount of code possible in Kotlin?
I can think of no solution easier than this:
val s = "abc=123,def=456,ghi=789"
val map = s.split(",").associate {
val (left, right) = it.split("=")
left to right.toInt()
}
Or, if you need exactly a HashMap, use .associateTo(HashMap()) { ... }.
Some details:
.associate { ... } receives a function that produces pairs which are then stored into a map as keys and values respectively.
val (left, right) = it.split("=") is the usage of destructuring declarations on the list returned from it.split("="), it takes the first two items from the list.
left to right.toInt() creates a Pair<String, Int> defining a single mapping.
You can map each key/value to a Pair with the to keyword. An iterable of Pair can be mapped to a Map easily with the toMap() extension method.
val s = "abc=123,def=456,ghi=789"
val output = s.split(",")
.map { it.split("=") }
.map { it.first() to it.last().toInt() }
.toMap()
So basically what I have to do is add crabcritters to gridworld randomly, which I did. Then, I need to use the getOccupiedLocations method to print an array of the occupied locations as ordered pairs. Any advice? Here's what I have so far:
package projects.critters;
import info.gridworld.actor.ActorWorld;
import info.gridworld.grid.Location;
public class Lab
{
public static void main (String[] args)
{
ActorWorld world = new ActorWorld();
for (int i =0; i<10; i++)
{
world.add (new CrabCritter());
}
world.show();
}
}
OK so the method getOccupiedLocations() returns an array of Locations. I don't know exactly what you mean by ordered pairs but if you mean (x, y) then that is easy to do. By default when you print out a Location it is an ordered pair. So all you have to do is loop through the occupied locations and print each out. For example:
for(Location l : world.getGrid().getOccupiedLocations()){
System.out.println(l);
}
And that's it...as for the comment: you don't need a language tag as this is a GridWorld question which obviously applies to Java.
Expanding on John Smith's answer. If you only want to print locations occupied by CrabCritter only, you can do this:
for(Location l : world.getGrid().getOccupiedLocations()){
if(world.get(l) instanceof CrabCritter)
System.out.println(l);
}