Extract UUID from a mixed string in mariadb - parsing

I have a mariadb table as follows:
This is an example of how a row looks like
| 4 | test/1ecb5e71-9105-4a0c-8fa1-7fc8d5e970bd/kuva.jpeg | {"Records":
The content in Records have been omitted to keep it short and simple. When I issue SQL select like this select key_name from minio_images where id=4;, it returns me a normal output like this
+-----------------------------------------------------+
| key_name |
+-----------------------------------------------------+
| test/1ecb5e71-9105-4a0c-8fa1-7fc8d5e970bd/kuva.jpeg |
+-----------------------------------------------------+
1 row in set (0.09 sec)
My question is how can I use select so that it just returns me the UUID in key_name instead of the whole string. For example, 1ecb5e71-9105-4a0c-8fa1-7fc8d5e970bd and not test/1ecb5e71-9105-4a0c-8fa1-7fc8d5e970bd/kuva.jpeg? I'd really appreciate any sort of help with this.

Luckily, I found a similar post # extract substring from mysql column using regex and I tried a solution suggested there, by issuing a SELECT statement which I guess, returns the value from a column based on the regular expression. In my case, the regular expression to extract UUID is '[a-fA-F0-9]{8}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{12}' and the statement was like
SELECT REGEXP_SUBSTR(key_name, '[a-fA-F0-9]{8}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{12}')
FROM minio_images;
Which gave me an output, just like I needed
+--------------------------------------------------------------------------------------------------------+
| REGEXP_SUBSTR(key_name, '[a-fA-F0-9]{8}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{12}') |
+--------------------------------------------------------------------------------------------------------+
| 1ecb5e71-9105-4a0c-8fa1-7fc8d5e970bd |
| 1edd460e-b19a-4b16-a460-d433eac60833 |
| 281a890f-0b8b-4693-9227-fc8c57d6045e |
| 37a14ddb-eeda-41f2-a2b6-ec0bad34aaed |
| 37d4f3d2-2282-4b9f-8e1e-f8a26570c5b4 |
| 387da0c1-1caf-4394-a023-92e7eec19b66 |
| 49a29478-4799-4a8b-8757-42060020fc99 |
| 9214e1f0-77e3-435a-a329-d1829a973903 |
| ae67c69a-a2cf-4c21-88ca-bd17e254bc4c |
| b6491e64-34a6-4aa3-a54e-200b1cd946fe |
| c0f6864b-2ab8-41fa-a1c2-6b974a1895c1 |
| cfd61927-557e-47d2-aeb9-229ec1aba5b4 |
| df566110-c2a0-4d9c-8389-fcbaf6c8bb30 |
+------------------------------------------------------------------------------------- -------------------+
16 rows in set (0.03 sec)

Related

Can you use parameter/variables/placeholders for values for future use in a Specflow scenario?

In previous job I have used DBFit and used parameters (variables/placeholders) for values
example:
|Key? |
|>>Key|
!|Query|SELECT Status FROM Confirm WHERE Name='xyz' |
| Status | Key |
| Confirmed | <<Key |
I am now using SpecFlow and wondered if it has similar functionality
example: ( I have used << and >> here just for explanation )
Given I get Initial for
And the 1st response should contain
| Name | string | "xyz" |
| Key | string | >>{Key} |
When I get Confirm for
Then the 1st response should contain
| Name | string | "xyz" |
| Key | string | <<{Key} |
I think you are looking for Scenario Outlines.
With them, you can specify a table with your parameters. So in your case, it is looking something like this:
Scenario Outline: Title for your Scenario Outline
Given I get Initial for And the 1st response should contain
| field | type | assertion |
| Name | string | "xyz" |
| Key | string | <Key> |
When I get Confirm for
Then the 1st response should contain
| field | type | assertion |
| Name | string | "xyz" |
| Key | string | <Key> |
Examples:
| Key |
| example1 |
| example2 |
| example3 |
Be aware that you have here two different types of tables. The table at your steps is an argument for the step.
The Examples table at the end are the concrete examples. So the Scenario is executed once per each entry in this table.
You can use the parameters from the example table with a simple <COLUMN_NAME> placeholder.
Docs: https://docs.specflow.org/projects/specflow/en/latest/Gherkin/Gherkin-Reference.html#scenario-outline

How to get review cycle duration information from gerrit?

I'm trying to get some data on how long it takes for reviews to go through Gerrit on average.
Looking at some open source code, I see stuff like
reviewCreateTime = moment(mergedReviewsList[review].created);
reviewUpdateTime = moment(mergedReviewsList[review].updated);
interval = reviewUpdateTime.diff(reviewCreateTime, TIME_PERIOD_TYPE);
But with experimentation I don't think this logic is correct because adding a comment to a merged CR changes the updated timestamp.
I know this is possible because at the time of merge, Gerrit prints to the UI Change has been successfully merged by XXX.
I've been digging around in the mysql database but haven't found anything useful. I notice that changes that have been submitted have a submission_id, but I haven't found a table that stores submission information.
After a bunch of digging around, I have come up with one rather ugly but workable solution.
There is a table change_messages
mysql> describe change_messages;
+-----------------------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------------------+-------------+------+-----+---------+-------+
| author_id | int(11) | YES | | NULL | |
| written_on | timestamp | NO | | NULL | |
| message | text | YES | | NULL | |
| patchset_change_id | int(11) | YES | MUL | NULL | |
| patchset_patch_set_id | int(11) | YES | | NULL | |
| change_id | int(11) | NO | PRI | 0 | |
| uuid | varchar(40) | NO | PRI | | |
+-----------------------+-------------+------+-----+---------+-------+
7 rows in set (0.00 sec)
This basically stores stuff like XXXX has been successfully merged by YYYY and XXXX has been successfully cherry-picked as YYYY by ZZZZ.
You can then join this table with changes and datediff on change_messages.written_on and changes.created_on, e.g.
SELECT changes.change_id,
created_on,
written_on,
Datediff(written_on, created_on) diff
FROM change_messages
INNER JOIN changes
ON change_messages.change_id = changes.change_id
WHERE message LIKE 'Change has been successfully merged by %'
ORDER BY written_on;
Now this includes any time the CR was in draft mode. I'll edit this question if I get around to excluding that time.

Neo4j - count very slow

I am running this query (bisac_code is uniquely indexed).
Execution time is more than 2.5 minutes.
52 main codes are selected from almost 4000 in total.
The total number of wokas is very large, 19 million nodes.
Are there any possibilities to make it run faster?
neo4j-sh (?)$ MATCH (b:Bisac)-[r:INCLUDED_IN]-(w:Woka)
> WHERE (b.bisac_code =~ '.*000000')
> RETURN b.bisac_code as bisac_code, count(w) as wokas_count
> ORDER BY b.bisac_code
> ;
+---------------------------+
| bisac_code | wokas_count |
+---------------------------+
| "ANT000000" | 13865 |
| "ARC000000" | 32905 |
| "ART000000" | 79600 |
| "BIB000000" | 2043 |
| "BIO000000" | 256082 |
| "BUS000000" | 226173 |
| "CGN000000" | 16424 |
| "CKB000000" | 26410 |
| "COM000000" | 44922 |
| "CRA000000" | 18720 |
| "DES000000" | 2713 |
| "DRA000000" | 62610 |
| "EDU000000" | 228182 |
| "FAM000000" | 42951 |
| "FIC000000" | 474004 |
| "FOR000000" | 41999 |
| "GAM000000" | 8803 |
| "GAR000000" | 37844 |
| "HEA000000" | 36939 |
| "HIS000000" | 3908869 |
| "HOM000000" | 5123 |
| "HUM000000" | 29270 |
| "JNF000000" | 40396 |
| "JUV000000" | 200144 |
| "LAN000000" | 89059 |
| "LAW000000" | 153138 |
| "LCO000000" | 1528237 |
| "LIT000000" | 89611 |
| "MAT000000" | 58134 |
| "MED000000" | 80268 |
| "MUS000000" | 75997 |
| "NAT000000" | 35991 |
| "NON000000" | 107513 |
| "OCC000000" | 42134 |
| "PER000000" | 26989 |
| "PET000000" | 4980 |
| "PHI000000" | 72069 |
| "PHO000000" | 8546 |
| "POE000000" | 104609 |
| "POL000000" | 309153 |
| "PSY000000" | 55710 |
| "REF000000" | 96477 |
| "REL000000" | 133619 |
| "SCI000000" | 86017 |
| "SEL000000" | 40901 |
| "SOC000000" | 292713 |
| "SPO000000" | 172284 |
| "STU000000" | 10508 |
| "TEC000000" | 77459 |
| "TRA000000" | 9093 |
| "TRU000000" | 12041 |
| "TRV000000" | 27706 |
+---------------------------+
52 rows
198310 ms
And the response time is not consistent.
After a while drops to less than half of a minute.
52 rows
31207 ms
In Neo4j 2.3 there will be index support for prefix LIKE searches but probably not for postfix ones.
There are two ways of making #user2194039's solution faster:
Use path expression to count the Woka per Bisac:
MATCH (b:Bisac) WHERE (b.bisac_code =~ '.*000000')
WITH b, size((b)-[:INCLUDED_IN]->()) as wokas_count
RETURN b.bisac_code as bisac_code, wokas_count
ORDER BY b.bisac_code
Mark the Bisac's with that pattern with a label
MATCH (b:Bisac) WHERE (b.bisac_code =~ '.*000000') SET b:Main;
MATCH (b:Main:Bisac)
WITH b, size((b)-[:INCLUDED_IN]->()) as wokas_count
RETURN b.bisac_code as bisac_code, wokas_count
ORDER BY b.bisac_code;
The slow speed is caused by your regular expression pattern matching (=~ ). Although your bisac_code is indexed, the regex match causes the index to be ineffective. The index only works when you are matching full bisac_code values.
Cypher does include some string manipulation facilities that might let you get by without using a regex =~, but I doubt it would make any difference, because the index will still be useless.
I might suggest considering if you can further categorize your bisac_codes so that you do not need to do a pattern match. Maybe an extra indexed property that somehow denotes those codes that end in 000000?
If you do not want to add properties, you may try matching only the Bisacs first, and then including the Wokas. Something like this:
MATCH (b:Bisac) WHERE (b.bisac_code =~ '.*000000')
WITH b
MATCH (b)-[r:INCLUDED_IN]-(w:Woka)
RETURN b.bisac_code as bisac_code, count(w) as wokas_count
ORDER BY b.bisac_code
This may help Cypher stick to the 4000 Bisac nodes while doing the pattern match, before getting involved with all 19 million Woka nodes, but I am not sure if this will make a material difference. Even slogging through 4000 nodes (effectively without an index) is a slow process.
Hash Tables in Database Indexing
The reason that your index is ineffective for regex pattern matching is that Neo4j likely uses a hash table for indexing properties. This is common of many databases. Wikipedia has an article here.
The basics though are that the index is not storing all of the properties that you want to search through. It is storing values that represent the properties you want to search through, and the representation is only valid for the whole property. If you are searching for only a part of the property value, the hashes stored in the index are useless, and the database must search through the properties the old-fashioned way -- one by one.
Edit re: your edit
The improvement in response time after running this query multiple times is certainly due to caching. Neo4j is remembering that you access the Bisac nodes and bisac_code properties frequently, and is keeping them in memory. This makes future queries faster because the values do not need to be read off disk.
However, eventually, those nodes a properties will likely be dropped from the cache, as Neo4j finds you manipulating different nodes, which it will cache instead. There are only so many nodes Neo4j can cache before running out of memory, so it picks the most recent and/or frequently used data.

Can I have a cucumber example with several values in a single column x row position

Hi here is what I what I have:
Scenario Outline: Seatching for stuff
Given that the following simple things exists:
| id | title | description | temp |
| 1 | First title | First description | low |
| 2 | Second title | Second description with öl | Medium |
| 3 | Third title | Third description | High |
| 11 | A title with number 2 | can searching numbers find this 2 | Exreme |
When I search for <criteria>
Then I should get <result>
And I should not get <excluded>
Examples
|criteria|results | excluded |
| 1 | 1 | 2,3,11 |
| 11 | 11 | 1,2,3 |
| title | 1,2,3 | 11 |
| öl | 2 | 1,3,11 |
| Fir* | 1 | 2,3,11 |
| third | 3 | 1,2,11 |
| High | 3 | 1,2,11 |
As you can see I'm trying to test a search field for a web-application using cucumber and the scenario outline structure in order to test several search criteria.
I'm not sure how to handle the input I would get as result and excluded in my steps.
Maybe this doesn't work at all?
Is there a workaround?
There's nothing wrong with what you're doing. Cucumber will just take that as a single string. The fact that it's actually comma-separated values means nothing to Cucumber.
Your step definition would still look like this:
Then /^I should not get ([^"]*)$/ do |excluded|
# excluded will be a string, "2,3,11"
values = excluded.split(",")
# Do whatever you want with the values
end

SpecFlow - Repeat test X times with list?

Scenario: Change a member to ABC 60 days before anniversary date
Given Repeat When+Then for each of the following IDs:
| ID |
| 0047619101 |
| 0080762602 |
| 0186741901 |
| 0311285102 |
| 0570130101 |
| 0725968201 |
| 0780265749 |
| 0780265750 |
| 0780951340 |
| 0780962551 |
#-----------------------------------------------------------------------
When these events occur:
| WorkflowEventType | WorkflowEntryPoint |
| ABC | Status Change |
Then these commands are executed:
| command name |
| TerminateWorkflow |
And For Member, the following documents were queued:
| Name |
| ABC Packet |
In the above scenario I would like to:
GIVEN - Lookup 10 members from the DB
WHEN + THEN - Do these steps 10 times, once for each record.
Is this possible with SpecFlow?
If so, how would you set it up?
TIA
This is actually quite easy to do, although the documentation takes a bit of searching.
What you want is a scenario outline, like so:
Scenario Outline: Change a member to ABC 60 days before anniversary date
Given I have <memberId>
When these events occur:
| WorkflowEventType | WorkflowEntryPoint |
| ABC | Status Change |
Then these commands are executed:
| command name |
| TerminateWorkflow |
And For <memberId>, the following documents were queued:
| Name |
| ABC Packet |
Examples:
| memberId |
| 0047619101 |
| 0080762602 |
| 0186741901 |
| ...etc... |
This will execute your scenario once for each id in the examples table. You can extend the table to have multiple columns, if needed.
Or, more simply (if you really only have one row in each of your example tables above)
Scenario Outline: Change a member to ABC 60 days before anniversary date
Given I have <memberId>
When A 'ABC' Event Occurs with EntryPoint 'Status Change'
Then a TerminateWorkflow command is executed
And For <memberId>, the 'ABC Packet' document was queued
Examples:
| memberId |
| ...etc... |
For more information see the specflow-wiki on github and the cucumber language syntax for scenario outlines

Resources