Reading CSV by using java 8 - stream

string s and string c and string c represents a table in csv (comma separated) format) where rows are separated by newline characters and each rows consist of one or more fields separated by commas by using java 8 and String s is represent by csv data.How to read this in java 8 and as well get the maximum number in third column.

Related

How can i parse blob into table with NVARCHAR2 columns

Im using APEX21.2, and i want to parse a BLOB, which is the content of a .csv file (with separator type ';') containing Japanese characters) into an import table with NVARCHA2 columns, so that he knows the Japanese character.
Here is the content of my csv file
列 01;;;;列 05;列 06;;;;;;;;;;ぜんすう;
Column 01;Column 02;Column 03;Column 04;Column 05;Column 06;TOTAL;;;;;;;;;;
;123 A;T12345678;AZERT;QWERTY;;7000;;;;;;;;;;
NB: I already tried with the APEX_DATA_PARSER package, but it does not support strings of type NVARCHAR2
Thank you for help.

Split an a comma seperated list of number like strings as text

I have a column which contains comma seperated numbers
A1: 004,005,0005,00005
I want to split and more stuff with this. When I split I end up with the following. Losing the zeros because excel parses them as text
=split(A1, "," )
4 | 5 | 5 | 5
instead of
004 | 005 | 0005 | 00005
The number of zeros is important. I will pass on the result to get a vertical list
=unique(transpose(arrayformula(trim(split(join(",",!A1:A),",")))))
Adding a single quote before a number will keep the format as is, thus, keeping the leading and trailing zeroes. Please see formula below:
=SPLIT(SUBSTITUTE(","&A1,",","#'"),"#")
Try this:
=ArrayFormula(REGEXREPLACE(SPLIT(REGEXREPLACE(A1,"(\d+)","~$1"),","),"~",""))
What this formula does is first replace every group of numbers with a tilde (~) and that same group of numbers. When SPLIT then acts on this new configuration, splitting at the commas, every group of numbers has the tilde in front of it and so retains all digits (because it is seen as a string and not a number). Finally, the outer REGEXREPLACE just gets rid of the tildes.
With just a single value in A1 you can use a little trick I described over at "Web-Applications" in this cross-website post:
=SPLIT(SUBSTITUTE("'"&A1,"#","#'"),"#")
The single quote will force GS to format the returned elements as being text. We can use this principle inside an arrayformula if you have to concatenate multiple strings. You don't need ARRAYFORMULA() per se, but instead of JOIN() you'd need TEXTJOIN() to use the 2nd parameter and exclude empty cells from being joined.
Formula in B1:
=UNIQUE(TRANSPOSE(SPLIT(SUBSTITUTE("'"&TEXTJOIN(",",TRUE,A1:A),",","#'"),"#")))

Joining two Pandas DataFrames does not work anymore?

I have 2 Pandas Dataframes.
The first one looks like this:
date rank id points
2010-01-04 1 100001 10550
2010-01-04 2 100002 9205
The second one like this:
id name
100001 A
100002 B
I want to join both dataframes via the id column. So the result should look like:
date rank id points name
2010-01-04 1 100001 10550 A
2010-01-04 2 100002 9205 B
Some weeks ago I wrote code for that, but for some reason it does not work anymore. I end up with an empty dataframe after I execute this code for joining:
join = pd.merge(df1,df2, on='id')
Why is join empty?
short story: as pointed out in the comment already, i was comparing strings with integers.
long story: i didn't expect python to parse the id-columns of two input csv files to different datatpyes. df1.id was of type Object. df2.id was of type int. and i needed to find out why df1.id was parsed to Object and not automatically to int, because it only contained numbers.
turns out that it had something to do with the encoding of my CSV file. in notepad++ the file was encoded as plain UTF-8. it seems that pandas did not like this, because when i tried to convert the id column to int, it raised an error like ValueError: invalid literal for int() with base 10: '\ufeff100001'. The number 100001 is the first ID of the first row. So there seems to be some encoded character before this number (at the very beginning of the file) \ufeff that prevented pandas to parse the whole column as int. in notepad++ i then changed the encoding of the file to UTF-8 without BOM and then everything worked.

how to extract comma separated values separately in Erlang

I have inserted the comma separated values into mnesia table column as {"11,1,15"}
After retrieving from table Now i need to extract these comma separated values. while extracting these 11,1,15 as separate values, I am getting problem, because it returning value in below format.
49 | 1,1,12.
But here i need them separately as integer digits.
Can you point me in correct direction? where I am making mistake?
If you want to convert a string of comma separated integers into a list of integers, this can help:
1> String = "11,1,15".
"11,1,15"
2> [list_to_integer(I) || I <- string:tokens(String,",")].
[11,1,15]

Parsing currency into a number in Spoon CSV Text Input

This seems like it should be simple.
I have a CSV file with multiple currency values (so I'd like to avoid writing a bunch of string-manipulation steps if it can be avoided), and I was excited to see that the CSV File Input step has fields like Currency Separator, decimal symbol, grouping symbol (and mine are the default "$", ".", and ",", respectively).
The documentation describes these as for:
Currency Used to interpret numbers like $10,000.00 or E5.000,00
Decimal A decimal point can be a "." (10;000.00) or "," (5.000,00)
Grouping A grouping can be a dot "," (10;000.00) or "." (5.000,00)
(http://wiki.pentaho.com/display/EAI/Text+File+Input)
But as of the current production version (4.4)... these settings do not seem to have an effect.
Has anyone had success with number masks or similar such that a string like "$10,000,238.48" can yield a number that can be pushed into a database? Anything I do is either "Unparsable" in the text input or "truncated field" error at the insert...
When I do a get fields on a text input step with your example number in it, it sets Currency, Decimal, and Group to '$', '.', ',' respectively, and it reads your number just fine. It also sets a Format string of '$#,##0.00;($#,##0.00)', which it seems is the key piece. The text file input step will examine as many rows as you specify from a CSV and guess the formats for each column.
Here is PDI's number formatting table:
Number Formatting Table
If you have different currency formats mixed in the same column, I would use a UDJE step and this answer:
Parsing a Currency string in Java
Or a JavaScript Step and this answer:
Convert Currency string with JavaScript
to strip out all non digit and non decimal point characters, then pass it through a Select Values step. Note, that this will be very tricky if you have mixed decimal separators in the input column.

Resources