Regular expression for validating file input string - ruby-on-rails

I have an text file with below input, I need to validate the input in ruby whether it is correct format or not?
Need to read an each line of text file and validate that whether the input is matching to Integer,s1-integer,s2-integer,s3-integer,s4-integer format other wise I need to raise an error that file input format is mismatch.
The input line is not limited to 5, it can be any number of lines.
Integer,s1-integer,s2-integer,s3-integer,s4-integer
Example inputs:
1,S1­-88,S2­-53,S3­-69,S4­-64
2,S1­-92,S2­-86,S3­-93,S4­-77
3,S1­-53,S2­-59,S3­-72,S4­-59
4,S1­-60,S2­-52,S3­-85,S4­-62
5,S1­-85,S2­-53,S3­-74,S4­-61

If I understand you right, you need to validate the following input:
a number followed by four elements with a format ,S1­-85
The following pattern matches the input of that type:
\d(\,S\d\-\d\d){4}
\d matches a number
(\,S\d\-\d\d) matches a group of type ,S1­-85
{4} tells to match ,S1­-85 group 4 times

Related

How to format decimal places in Aspose.Word Template?

I want all numerical data to be formatted with 2 decimal places.
Is there a way to set this in the template word file (where I output the variable value via
<<[variableName] >>
), or even globally?
To format a numeric expression result, you can specify a format string as an element of the corresponding expression tag.
<<[variableName]:"0.##">>
See the following article for more information:
https://docs.aspose.com/words/net/outputting-expression-results/

How to make a variable non delimited file to be a delimited one

Hello guys I want to convert my non delimited file into a delimited file
Example of the file is as follows.
Name. CIF Address line 1 State Phn Address line 2 Country Billing Address line 3
Alex. 44A. Biston NJ 25478163 4th,floor XY USA 55/2018 kenning
And so on all the data are in this format.
First three lines are metadata and then the data.
How can I make it delimited in proper format using logic.
There are two parts in the problem:
how to find the column widths
how to split each line into fields and output a new line with delimiters
I could not propose an automated solution for the first one, because (not knowing anything about the metadata format), there is no clear way to find where one column ends and the next one begins. Some of the column headings contain multiple space-separated words and space is also used as a separator between the headings (and apparently one cannot use the rule "more than one space means the end of a heading name" because there's only one space between "Address line 2" and "Country" - and they're clearly separate columns. Clearly, finding the correct column widths requires understanding English and this is not something that you can write a program for.
For the second problem, things are much easier - once you have the column positions. If you figure the column positions manually (or programmatically, if you know something about the metadata that I don't - and you have a simple method for finding what's a column heading), then a program written in AWK can do this, for example:
cols="8,15,32,40,53,66,83,105"
awk_prog='BEGIN {
nt=split(cols,tabs,",")
delim=","
ORS=""
}
{ o=1 ;
for (i in tabs) { t=tabs[i] ; f=substr($0,o,t-o); sub(" *$","",f) ; print f
delim ; o=t } ;
print substr($0, o) "\n"
}'
awk -v cols="$cols" "$awk_prog" input_file
NOTE that the above program does not deal correctly with the case when the separator character (e.g. ",") appears inside the data. If you decide to use this as-is, be sure to use a separator that is not present in the input data. It may be better to modify the code to escape any separator characters found in the input data (there are different ways to do this - depends on what you plan to feed the output file to).

Number notation "SK"

I use an ODBC table handler to read data from Excel and CSV files into an AMPL model. But the thing I encountered probably doesn't have much to do with the precise programs and programming language I use.
Among the data are two specific types of strings: three-digit alphabetic and six-digit alphanumeric.
When the three-digit alphabetic type includes a NAN string, AMPL throws an error. As I found out, the reason is that it understands NAN as "NaN" (not a number). It cannot use this as an index.
The six-digit alphanumeric type sometimes include strings like 3E1234. This seems to be a problem because AMPL (or the handler) understands this as a number in scientific notation. So it reads 3*10^1234, which is handled as infinity. So when there is one 3E1234 entry and one 3E1235 entry, it sees them both as infinity.
I understand these two. And although they are annoying, I can work with that. Now I encountered that a string SK1234 is parsed as the number 1234. I have learned a bit of programming in college, but I don't have any idea why this happens. Is the prefix SK anything special?
EDIT: Here is an example that reproduces the error:
The model file:
set INDEX;
param value;
The "run" file:
table Table1 IN "tableproxy" "odbc" "DSN=NDE" "Test.csv": INDEX <- [Index], value ~ Value;
read table Table1;
NDE is a user DSN that uses the Microsoft Text Driver in the appropriate folder.
And the CSV file:
Index,Value
SK1202,1
SK1445,2
SK0124,3
SK7896,4
SK1,5
AB1234,6
After running all this code, I type display INDEX and get
set INDEX := 1202 1445 124 7896 1 Missing;
So the field Index is treated as a numeric field with the first five entries converted to a number. The last entry cannot be converted so it is treated as Missing.
The DSN's setting is that it sets the type according to the first 25 lines. For some reason, it understands the SK... entries as numbers and therefore reads all as numbers.
For the Text ODBC driver to detect column type correctly, values should be quoted:
Index,Value
'SK1202',1
'SK1445',2
'SK0124',3
'SK7896',4
'SK1',5
'AB1234',6

How to write a matches excluding a series of space character (' ') from the input?

I am having a problem with my grails project right now, I wanted to write a matches, that would best fit the allowable characters for my input fields. I have written a matches, that throws an error message if the input characters contain a single space character., but no longer works if the input contains a series of spaces. This is my code:
newPassword nullable: false, minSize: 8, matches: /[0-9a-zA-Z_\[\]\\\^\$\.\|\?\*\+\(\)~!##%&-=]*/, blank: false, notEqualToAnyProperty:['username', 'emailAddress'],validator: { value, obj ->
(obj.currentPassword != value && value != '')
}
These are the sample inputs:
1) 'rain drops' - my matches works, it returns an error message that the input contains an invalid character.
2) ' ' - series of spaces; my program returns an error message that should be displayed for the blank constraint instead of displaying the error message for my matches constraint which is the, "input contains an invalid character", since the input doesn't match the allowable input characters.
Any help from you guys? Thanks!
You shouldn't need to add any begin (^) or end ($) tags to your regular expression, as the matches constraint attempts to match the entire String input against the Pattern, thus your first test correctly fails against the constraint.
For your second test where the input is only a series of spaces ' ', your matches constraint will never run. Both blank and nullable are constraints which can block the running of other constraints if they fail. The matches constraint will not run in your case because the blank constraint returns a failure on an all-whitespace input.
Try matches: /[0-9a-zA-Z_[]\\^\$.\|\?*+()~!##%&-=]+$/
I just added a dollar at the end and changed the star to a plus. Dollar means end of line. Maybe the match is returning true because the first part of the string does indeed match.
The reason I changed the star to a plus is because * matches zero or more. That's the case in your empty string. The + requires one or more.
You can require a min sequence of 8 such chars in the regexp but that might make you lose your minsize validation error message.

Conditional Regular Expression testing of a CSV

I am doing some client side validation in ASP.NET MVC and I found myself trying to do conditional validation on a set of items (ie, if the checkbox is checked then validate and visa versa). This was problematic, to say the least.
To get around this, I figured that I could "cheat" by having a hidden element that would contain all of the information for each set, thus the idea of a CSV string containing this information.
I already use a custom [HiddenRequired] attribute to validate if the hidden input contains a value, with success, but I thought as I will need to validate each piece of data in the csv, that a regular expression would solve this.
My regular expression work is extremely weak and after a good 2 hours I've almost given up.
This is an example of the csv string:
true,3,24,over,0.5
to explain:
true denotes if I should validate the rest. I need to conditionally switch in the regex using this
3 and 24 are integers and will only ever fall in the range 0-24.
over is a string and will either be over or under
0.5 is a decimal value, of unknown precision.
In the validation, all values should be present and at least of the correct type
Is there someone who can either provide such a regex or at least provide some hints, i'm really stuck!
Try this regex:
#"^(true,([01]?\d|2[0-4]),([01]?\d|2[0-4]),(over|under),\d+\.?\d+|false.*)$"
I'll try to explain it using comments. Feel free to ask if anything is unclear. =)
#"
^ # start of line
(
true, # literal true
([01]?\d # Either 0, 1, or nothing followed by a digit
| # or
2[0-4]), # 20 - 24
([01]?\d|2[0-4]), # again
(over|under), # over or under
\d+\.?\d+ # any number of digits, optional dot, any number of digits
| #... OR ...
false.* # false followed by anything
)
$ # end of line
");
I would probably use a Split(',') and validate elements of the resulting array instead of using a regex. Also you should watch out for the \, case (the comma is part of the value).

Resources