Parse string into map Golang

Parse string into map Golang - parsing

I have a string like A=B&C=D&E=F, how to parse it into map in golang?
Here is example on Java, but I don't understand this split part
String text = "A=B&C=D&E=F";
Map<String, String> map = new LinkedHashMap<String, String>();
for(String keyValue : text.split(" *& *")) {
String[] pairs = keyValue.split(" *= *", 2);
map.put(pairs[0], pairs.length == 1 ? "" : pairs[1]);
}

Maybe what you really want is to parse an HTTP query string, and url.ParseQuery does that. (What it returns is, more precisely, a url.Values storing a []string for every key, since URLs sometimes have more than one value per key.) It does things like parse HTML escapes (%0A, etc.) that just splitting doesn't. You can find its implementation if you search in the source of url.go.
However, if you do really want to just split on & and = like that Java code did, there are Go analogues for all of the concepts and tools there:
map[string]string is Go's analog of Map<String, String>
strings.Split can split on & for you. SplitN limits the number of pieces split into like the two-argument version of split() in Java does. Note that there might only be one piece so you should check len(pieces) before trying to access pieces[1] say.
for _, piece := range pieces will iterate the pieces you split.
The Java code seems to rely on regexes to trim spaces. Go's Split doesn't use them, but strings.TrimSpace does something like what you want (specifically, strips all sorts of Unicode whitespace from both sides).
I'm leaving the actual implementation to you, but perhaps these pointers can get you started.

import ( "strings" )
var m map[string]string
var ss []string
s := "A=B&C=D&E=F"
ss = strings.Split(s, "&")
m = make(map[string]string)
for _, pair := range ss {
z := strings.Split(pair, "=")
m[z[0]] = z[1]
}
This will do what you want.

There is a very simple way provided by golang net/url package itself.
Change your string to make it a url with query params text := "method://abc.xyz/A=B&C=D&E=F";
Now just pass this string to Parse function provided by net/url.
import (
netURL "net/url"
)
u, err := netURL.Parse(textURL)
if err != nil {
log.Fatal(err)
}
Now u.Query() will return you a map containing your query params. This will also work for complex types.

Here is a demonstration of a couple of methods:
package main
import (
"fmt"
"net/url"
)
func main() {
{
q, e := url.ParseQuery("west=left&east=right")
if e != nil {
panic(e)
}
fmt.Println(q) // map[east:[right] west:[left]]
}
{
u := url.URL{RawQuery: "west=left&east=right"}
q := u.Query()
fmt.Println(q) // map[east:[right] west:[left]]
}
}
https://golang.org/pkg/net/url#ParseQuery
https://golang.org/pkg/net/url#URL.Query

Related

Destructured iteration over variadic arguments like a tuple sequence in D

Let's say I want to process a variadic function which alternately gets passed start and end values of 1 or more intervals and it should return a range of random values in those intervals. You can imagine the input to be a flattened sequence of tuples, all tuple elements spread over one single range.
import std.meta; //variadic template predicates
import std.traits : isFloatingPoint;
import std.range;
auto randomIntervals(T = U[0], U...)(U intervals)
if (U.length/2 > 0 && isFloatingPoint!T && NoDuplicates!U.length == 1) {
import std.random : uniform01;
T[U.length/2] randomValues;
// split and iterate over subranges of size 2
foreach(i, T start, T end; intervals.chunks(2)) { //= intervals.slide(2,2)
randomValues[i] = uniform01 * (end - start) + start,
}
return randomValues.dup;
}
The example is not important, I only use it for explanation. The chunk size could be any finite positive size_t, not only 2 and changing the chunk size should only require changing the number of loop-variables in the foreach loop.
In this form above it will not compile since it would only expect one argument (a range) to the foreach loop. What I would like is something which rather automatically uses or infers a sliding-window as a tuple, derived from the number of given loop-variables, and fills the additional variables with next elements of the range/array + allows for an additional index, optionally. According to the documentation a range of tuples allows destructuring of the tuple elements in place into foreach-loop-variables so the first thing, I thought about, is turning a range into a sequence of tuples but didn't find a convenience function for this.
Is there a simple way to loop over destructured subranges (with such a simplicity as shown in my example code) together with the index? Or is there a (standard library) function which does this job of splitting a range into enumerated tuples of equal size? How to easily turn the range of subranges into a range of tuples?
Is it possible with std.algorithm.iteration.map in this case (EDIT: with a simple function argument to map and without accessing tuple elements)?
EDIT: I want to ignore the last chunk which doesn't fit into the entire tuple. It just is not iterated over.
EDIT: It's not, that I couldn't program this myself, I only hope for a simple notation because this use case of looping over multiple elements is quite useful. If there is something like a "spread" or "rest" operator in D like in JavaScript, please let me know!
Thank you.

(Added as a separate answer because it's significantly different from my previous answer, and wouldn't fit in a comment)
After reading your comments and the discussion on the answers thus far, it seems to me what you seek is something like the below staticChunks function:
unittest {
import std.range : enumerate;
size_t index = 0;
foreach (i, a, b, c; [1,2,3,1,2,3].staticChunks!3.enumerate) {
assert(a == 1);
assert(b == 2);
assert(c == 3);
assert(i == index);
++index;
}
}
import std.range : isInputRange;
auto staticChunks(size_t n, R)(R r) if (isInputRange!R) {
import std.range : chunks;
import std.algorithm : map, filter;
return r.chunks(n).filter!(a => a.length == n).map!(a => a.tuplify!n);
}
auto tuplify(size_t n, R)(R r) if (isInputRange!R) {
import std.meta : Repeat;
import std.range : ElementType;
import std.typecons : Tuple;
import std.array : front, popFront, empty;
Tuple!(Repeat!(n, ElementType!R)) result;
static foreach (i; 0..n) {
result[i] = r.front;
r.popFront();
}
assert(r.empty);
return result;
}
Note that this also deals with the last chunk being a different size, if only by silently throwing it away. If this behavior is undesirable, remove the filter, and deal with it inside tuplify (or don't, and watch the exceptions roll in).

chunks and slide return Ranges, not tuples. Their last element can contain less than the specified size, whereas tuples have a fixed compile time size.
If you need destructuring, you have to implement your own chunks/slide that return tuples. To explicitly add an index to the tuple, use enumerate. Here is an example:
import std.typecons, std.stdio, std.range;
Tuple!(int, int)[] pairs(){
return [
tuple(1, 3),
tuple(2, 4),
tuple(3, 5)
];
}
void main(){
foreach(size_t i, int start, int end; pairs.enumerate){
writeln(i, ' ', start, ' ', end);
}
}
Edit:
As BioTronic said using map is also possible:
foreach(i, start, end; intervals
.chunks(2)
.map!(a => tuple(a[0], a[1]))
.enumerate){

Your question has me a little confused, so I'm sorry if I've misunderstood. What you're basically asking is if foreach(a, b; [1,2,3,4].chunks(2)) could work, right?
The simple solution here is to, as you say, map from chunk to tuple:
import std.typecons : tuple;
import std.algorithm : map;
import std.range : chunks;
import std.stdio : writeln;
unittest {
pragma(msg, typeof([1,2].chunks(2).front));
foreach(a, b; [1,2,3,4].chunks(2).map!(a => tuple(a[0], a[1]))) {
writeln(a, ", ", b);
}
}

At the same time with BioTronic, I tried to code some own solution to this problem (tested on DMD). My solution works for slices (BUT NOT fixed-size arrays) and avoids a call to filter:
import std.range : chunks, isInputRange, enumerate;
import std.range : isRandomAccessRange; //changed from "hasSlicing" to "isRandomAccessRange" thanks to BioTronics
import std.traits : isIterable;
/** turns chunks into tuples */
template byTuples(size_t N, M)
if (isRandomAccessRange!M) { //EDITED
import std.meta : Repeat;
import std.typecons : Tuple;
import std.traits : ForeachType;
alias VariableGroup = Tuple!(Repeat!(N, ForeachType!M)); //Tuple of N repititions of M's Foreach-iterated Type
/** turns N consecutive array elements into a Variable Group */
auto toTuple (Chunk)(Chunk subArray) #nogc #safe pure nothrow
if (isInputRange!Chunk) { //Chunk must be indexable
VariableGroup nextLoopVariables; //fill the tuple with static foreach loop
static foreach(index; 0 .. N) {
static if ( isRandomAccessRange!Chunk ) { // add cases for other ranges here
nextLoopVariables[index] = subArray[index];
} else {
nextLoopVariables[index] = subArray.popFront();
}
}
return nextLoopVariables;
}
/** returns a range of VariableGroups */
auto byTuples(M array) #safe pure nothrow {
import std.algorithm.iteration : map;
static if(!isInputRange!M) {
static assert(0, "Cannot call map() on fixed-size array.");
// auto varGroups = array[].chunks(N); //fixed-size arrays aren't slices by default and cannot be treated like ranges
//WARNING! invoking "map" on a chunk range from fixed-size array will fail and access wrong memory with no warning or exception despite #safe!
} else {
auto varGroups = array.chunks(N);
}
//remove last group if incomplete
if (varGroups.back.length < N) varGroups.popBack();
//NOTE! I don't know why but `map!toTuple` DOES NOT COMPILE! And will cause a template compilation mess.
return varGroups.map!(chunk => toTuple(chunk)); //don't know if it uses GC
}
}
void main() {
testArrayToTuples([1, 3, 2, 4, 5, 7, 9]);
}
// Order of template parameters is relevant.
// You must define parameters implicitly at first to be associated with a template specialization
void testArrayToTuples(U : V[], V)(U arr) {
double[] randomNumbers = new double[arr.length / 2];
// generate random numbers
foreach(i, double x, double y; byTuples!2(arr).enumerate ) { //cannot use UFCS with "byTuples"
import std.random : uniform01;
randomNumbers[i] = (uniform01 * (y - x) + x);
}
foreach(n; randomNumbers) { //'n' apparently works despite shadowing a template parameter
import std.stdio : writeln;
writeln(n);
}
}
Using elementwise operations with the slice operator would not work here because uniform01 in uniform01 * (ends[] - starts[]) + starts[] would only be called once and not multiple times.
EDIT: I also tested some online compilers for D for this code and it's weird that they behave differently for the same code. For compilation of D I can recommend
https://run.dlang.io/ (I would be very surprised if this one wouldn't work)
https://www.mycompiler.io/new/d (but a bit slow)
https://ideone.com (it works but it makes your code public! Don't use with protected code.)
but those didn't work for me:
https://tio.run/#d2 (didn't finish compilation in one case, otherwise wrong results on execution even when using dynamic array for the test)
https://www.tutorialspoint.com/compile_d_online.php (doesn't compile the static foreach)

Easy way to split string into map in go

I have such string:
"k1=v1; k2=v2; k3=v3"
Is there any simple way to make a map[string]string from it?

You will need to use a couple of calls to strings.Split():
s := "k1=v1; k2=v2; k3=v3"
entries := strings.Split(s, "; ")
m := make(map[string]string)
for _, e := range entries {
parts := strings.Split(e, "=")
m[parts[0]] = parts[1]
}
fmt.Println(m)
The first call will separate the different entries in the supplied string while the second will split the key/values apart. A working example can be found here.

Why does this Rascal pattern matching code use so much memory and time?

I'm trying to write what I would think of as an extremely simple piece of code in Rascal: Testing if list A contains list B.
Starting out with some very basic code to create a list of strings
public list[str] makeStringList(int Start, int End)
{
return [ "some string with number <i>" | i <- [Start..End]];
}
public list[str] toTest = makeStringList(0, 200000);
My first try was 'inspired' by the sorting example in the tutor:
public void findClone(list[str] In, str S1, str S2, str S3, str S4, str S5, str S6)
{
switch(In)
{
case [*str head, str i1, str i2, str i3, str i4, str i5, str i6, *str tail]:
{
if(S1 == i1 && S2 == i2 && S3 == i3 && S4 == i4 && S5 == i5 && S6 == i6)
{
println("found duplicate\n\t<i1>\n\t<i2>\n\t<i3>\n\t<i4>\n\t<i5>\n\t<i6>");
}
fail;
}
default:
return;
}
}
Not very pretty, but I expected it to work. Unfortunately, the code runs for about 30 seconds before crashing with an "out of memory" error.
I then tried a better looking alternative:
public void findClone2(list[str] In, list[str] whatWeSearchFor)
{
for ([*str head, *str mid, *str end] := In)
if (mid == whatWeSearchFor)
println("gotcha");
}
with approximately the same result (seems to run a little longer before running out of memory)
Finally, I tried a 'good old' C-style approach with a for-loop
public void findClone3(list[str] In, list[str] whatWeSearchFor)
{
cloneLength = size(whatWeSearchFor);
inputLength = size(In);
if(inputLength < cloneLength) return [];
loopLength = inputLength - cloneLength + 1;
for(int i <- [0..loopLength])
{
isAClone = true;
for(int j <- [0..cloneLength])
{
if(In[i+j] != whatWeSearchFor[j])
isAClone = false;
}
if(isAClone) println("Found clone <whatWeSearchFor> on lines <i> through <i+cloneLength-1>");
}
}
To my surprise, this one works like a charm. No out of memory, and results in seconds.
I get that my first two attempts probably create a lot of temporary string objects that all have to be garbage collected, but I can't believe that the only solution that worked really is the best solution.
Any pointers would be greatly appreciated.
My relevant eclipse.ini settings are
-XX:MaxPermSize=512m
-Xms512m
-Xss64m
-Xmx1G

We'll need to look to see why this is happening. Note that, if you want to use pattern matching, this is maybe a better way to write it:
public void findClone(list[str] In, str S1, str S2, str S3, str S4, str S5, str S6) {
switch(In) {
case [*str head, S1, S2, S3, S4, S5, S6, *str tail]: {
println("found duplicate\n\t<S1>\n\t<S2>\n\t<S3>\n\t<S4>\n\t<S5>\n\t<S6>");
}
default:
return;
}
}
If you do this, you are taking advantage of Rascal's matcher to actually find the matching strings directly, versus your first example in which any string would match but then you needed to use a number of separate comparisons to see if the match represented the combination you were looking for. If I run this on 110145 through 110150 it takes a while but works and it doesn't seem to grow beyond the heap space you allocated to it.
Also, is there a reason you are using fail? Is this to continue searching?

It's an algorithmic issue like Mark Hills said. In Rascal some short code can still entail a lot of nested loops, almost implicitly. Basically every * splice operator on a fresh variable that you use on the pattern side in a list generates one level of loop nesting, except for the last one which is just the rest of the list.
In your code of findClone2 you are first generating all combinations of sublists and then filtering them using the if construct. So that's a correct algorithm, but probably slow. This is your code:
void findClone2(list[str] In, list[str] whatWeSearchFor)
{
for ([*str head, *str mid, *str end] := In)
if (mid == whatWeSearchFor)
println("gotcha");
}
You see how it has a nested loop over In, because it has two effective * operators in the pattern. The code runs therefore in O(n^2), where n is the length of In. I.e. it has quadratic runtime behaviour for the size of the In list. In is a big list so this matters.
In the following new code, we filter first while generating answers, using fewer lines of code:
public void findCloneLinear(list[str] In, list[str] whatWeSearchFor)
{
for ([*str head, *whatWeSearchFor, *str end] := In)
println("gotcha");
}
The second * operator does not generate a new loop because it is not fresh. It just "pastes" the given list values into the pattern. So now there is actually only one effective * which generates a loop which is the first on head. This one makes the algorithm loop over the list. The second * tests if the elements of whatWeSearchFor are all right there in the list after head (this is linear in the size of whatWeSearchFor and then the last *_ just completes the list allowing for more stuff to follow.
It's also nice to know where the clone is sometimes:
public void findCloneLinear(list[str] In, list[str] whatWeSearchFor)
{
for ([*head, *whatWeSearchFor, *_] := In)
println("gotcha at <size(head)>");
}
Rascal does not have an optimising compiler (yet) which might possibly internally transform your algorithms to equivalent optimised ones. So as a Rascal programmer you are still asked to know the effect of loops on your algorithms complexity and know that * is a very short notation for a loop.

How to parse result from ovs-vsctl get interface <interface> statistics

Result example:
{collisions=0, rx_bytes=258, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=3, tx_bytes=648, tx_dropped=0, tx_errors=0, tx_packets=8}
This format is like JSON, but not JSON.
Is there an easy way to parse this into map[string]int? Like json.Unmarshal(data, &value).

If that transport format is not recursively defined, i.e. a key cannot start a sub-structure, then its language is regular. As such, you can soundly parse it with Go's standard regexp package:
Playground link.
package main
import (
"fmt"
"regexp"
"strconv"
)
const data = `{collisions=0, rx_bytes=258, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_over_err=0, rx_packets=3, tx_bytes=648, tx_dropped=0, tx_errors=0, tx_packets=8}`
const regex = `([a-z_]+)=([0-9]+)`
func main() {
ms := regexp.MustCompile(regex).FindAllStringSubmatch(data, -1)
vs := make(map[string]int)
for _, m := range ms {
v, _ := strconv.Atoi(m[2])
vs[m[1]] = v
}
fmt.Printf("%#v\n", vs)
}

Regexp by #thwd is an elegant solution.
You can get a more efficient solution by using strings.Split() to split by comma-space (", ") to get the pairs, and then split again by the equal sign ("=") to get the key-value pairs. After that you just need to put these into a map:
func Parse(s string) (m map[string]int, err error) {
if len(s) < 2 || s[0] != '{' || s[len(s)-1] != '}' {
return nil, fmt.Errorf("Invalid input, no wrapping brackets!")
}
m = make(map[string]int)
for _, v := range strings.Split(s[1:len(s)-1], ", ") {
parts := strings.Split(v, "=")
if len(parts) != 2 {
return nil, fmt.Errorf("Equal sign not found in: %s", v)
}
if m[parts[0]], err = strconv.Atoi(parts[1]); err != nil {
return nil, err
}
}
return
}
Using it:
s := "{collisions=0, rx_bytes=258, ...}"
fmt.Println(Parse(s))
Try it on the Go Playground.
Note: If performance is important, this can be improved by not using strings.Split() in the outer loop, but instead searching for the comma "manually" and maintaining indices, and only "take out" substrings that represent actual keys and values (but this solution would be more complex).
Another solution...
...but this option is much slower so it is only viable if performance is not a key requirement: you can turn your input string into a valid JSON format and after that you can use json.Unmarshal(). Error checks omitted:
s := "{collisions=0, rx_bytes=258, ...}"
// Turn into valid JSON:
s = strings.Replace(s, `=`, `":`, -1)
s = strings.Replace(s, `, `, `, "`, -1)
s = strings.Replace(s, `{`, `{"`, -1)
// And now simply unmarshal:
m := make(map[string]int)
json.Unmarshal([]byte(s), &m)
fmt.Println(m)
Advantage of this solution is that this also works if the destination value you unmarhsal into is a struct:
// Unmarshal into a struct (you don't have to care about all fields)
st := struct {
Collisions int `json:"collisions"`
Rx_bytes int `json:"rx_bytes"`
}{}
json.Unmarshal([]byte(s), &st)
fmt.Printf("%+v\n", st)
Try these on the Go Playground.

What grammar is this?

I have to parse a document containing groups of variable-value-pairs which is serialized to a string e.g. like this:
4^26^VAR1^6^VALUE1^VAR2^4^VAL2^^1^14^VAR1^6^VALUE1^^
Here are the different elements:
Group IDs:
4^26^VAR1^6^VALUE1^VAR2^4^VAL2^^1^14^VAR1^6^VALUE1^^
Length of string representation of each group:
4^26^VAR1^6^VALUE1^VAR2^4^VAL2^^1^14^VAR1^6^VALUE1^^
One of the groups:
4^26^VAR1^6^VALUE1^VAR2^4^VAL2^^1^14 ^VAR1^6^VALUE1^^
Variables:
4^26^VAR1^6^VALUE1^VAR2^4^VAL2^^1^14^VAR1^6^VALUE1^^
Length of string representation of the values:
4^26^VAR1^6^VALUE1^VAR2^4^VAL2^^1^14^VAR1^6^VALUE1^^
The values themselves:
4^26^VAR1^6^VALUE1^VAR2^4^VAL2^^1^14^VAR1^6^VALUE1^^
Variables consist only of alphanumeric characters.
No assumption is made about the values, i.e. they may contain any character, including ^.
Is there a name for this kind of grammar? Is there a parsing library that can handle this mess?
So far I am using my own parser, but due to the fact that I need to detect and handle corrupt serializations the code looks rather messy, thus my question for a parser library that could lift the burden.

The simplest way to approach it is to note that there are two nested levels that work the same way. The pattern is extremely simple:
id^length^content^
At the outer level, this produces a set of groups. Within each group, the content follows exactly the same pattern, only here the id is the variable name, and the content is the variable value.
So you only need to write that logic once and you can use it to parse both levels. Just write a function that breaks a string up into a list of id/content pairs. Call it once to get the groups, and then loop through them calling it again for each content to get the variables in that group.
Breaking it down into these steps, first we need a way to get "tokens" from the string. This function returns an object with three methods, to find out if we're at "end of file", and to grab the next delimited or counted substring:
var tokens = function(str) {
var pos = 0;
return {
eof: function() {
return pos == str.length;
},
delimited: function(d) {
var end = str.indexOf(d, pos);
if (end == -1) {
throw new Error('Expected delimiter');
}
var result = str.substr(pos, end - pos);
pos = end + d.length;
return result;
},
counted: function(c) {
var result = str.substr(pos, c);
pos += c;
return result;
}
};
};
Now we can conveniently write the reusable parse function:
var parse = function(str) {
var parts = {};
var t = tokens(str);
while (!t.eof()) {
var id = t.delimited('^');
var len = t.delimited('^');
var content = t.counted(parseInt(len, 10));
var end = t.counted(1);
if (end !== '^') {
throw new Error('Expected ^ after counted string, instead found: ' + end);
}
parts[id] = content;
}
return parts;
};
It builds an object where the keys are the IDs (or variable names). I'm asuming as they have names that the order isn't significant.
Then we can use that at both levels to create the function to do the whole job:
var parseGroups = function(str) {
var groups = parse(str);
Object.keys(groups).forEach(function(id) {
groups[id] = parse(groups[id]);
});
return groups;
}
For your example, it produces this object:
{
'1': {
VAR1: 'VALUE1'
},
'4': {
VAR1: 'VALUE1',
VAR2: 'VAL2'
}
}

I don't think it's a trivial task to create a grammar for this. But on the other hand, a simple straight forward approach is not that hard. You know the corresponding string length for every critical string. So you just chop your string according to those lengths apart..
where do you see problems?

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Parse string into map Golang - parsing

import ( "strings" ) var m map[string]string var ss []string s := "A=B&C=D&E=F" ss = strings.Split(s, "&") m = make(map[string]string) for _, pair := range ss { z := strings.Split(pair, "=") m[z[0]] = z[1] } This will do what you want.

Related

Destructured iteration over variadic arguments like a tuple sequence in D

Easy way to split string into map in go

Why does this Rascal pattern matching code use so much memory and time?

How to parse result from ovs-vsctl get interface <interface> statistics

What grammar is this?

Categories

Resources