difference between detect-secrets and detect-secrets-hook results - devops

I'm evaluating detect-secrets and I'm not sure why I get different results from detect-secrets and the hook.
Here is a log of a simplification:
$ cat docs/how-to-2.md
AZ_STORAGE_CS="DefaultEndpointsProtocol=https;AccountName=storageaccount1234;AccountKey=1OM7c6u5Ocp/zyUMWcRChowzd8czZmxPhzHZ8o45X7tAryr6JFF79+zerFFQS34KzVTK0yadoZGkvZh42A==;EndpointSuffix=core.windows.net"
$ detect-secrets scan --string $(cat docs/how-to-2.md)
AWSKeyDetector : False
ArtifactoryDetector : False
Base64HighEntropyString: True (5.367)
BasicAuthDetector : False
CloudantDetector : False
HexHighEntropyString : False
IbmCloudIamDetector : False
IbmCosHmacDetector : False
JwtTokenDetector : False
KeywordDetector : False
MailchimpDetector : False
PrivateKeyDetector : False
SlackDetector : False
SoftlayerDetector : False
StripeDetector : False
TwilioKeyDetector : False
$ detect-secrets-hook docs/how-to-2.md
$ detect-secrets-hook --baseline .secrets.baseline docs/how-to-2.md
I would have expected that detect-secrets-hook would tell me about that Azure storage account key that has high entropy.
more details about the baseline:
$ cat .secrets.baseline
{
"custom_plugin_paths": [],
"exclude": {
"files": null,
"lines": null
},
"generated_at": "2020-10-09T10:06:54Z",
"plugins_used": [
{
"name": "AWSKeyDetector"
},
{
"name": "ArtifactoryDetector"
},
{
"base64_limit": 4.5,
"name": "Base64HighEntropyString"
},
{
"name": "BasicAuthDetector"
},
{
"name": "CloudantDetector"
},
{
"hex_limit": 3,
"name": "HexHighEntropyString"
},
{
"name": "IbmCloudIamDetector"
},
{
"name": "IbmCosHmacDetector"
},
{
"name": "JwtTokenDetector"
},
{
"keyword_exclude": null,
"name": "KeywordDetector"
},
{
"name": "MailchimpDetector"
},
{
"name": "PrivateKeyDetector"
},
{
"name": "SlackDetector"
},
{
"name": "SoftlayerDetector"
},
{
"name": "StripeDetector"
},
{
"name": "TwilioKeyDetector"
}
],
"results": {
".devcontainer/Dockerfile": [
{
###obfuscated###
}
],
"deployment/export-sp.sh": [
{
###obfuscated###
}
],
"docs/pip-install-from-artifacts-feeds.md": [
{
###obfuscated###
}
]
},
"version": "0.14.3",
"word_list": {
"file": null,
"hash": null
}
}

This is definitely peculiar behavior, but after some investigation, I realize that you've stumbled upon an edge case of the tool.
tl;dr
HighEntropyStringPlugin supports a limited set of characters (not including ;)
To reduce false positives, HighEntropyStringPlugin leverages the heuristic that strings are quoted in certain contexts.
To improve UI, inline string scanning does not require quoted strings.
Therefore, the functionality differs: when run through detect-secrets-hook, it does not parse the string accordingly due to the existence of ;. However, when run through detect-secrets scan --string, it does not require quotes, and breaks the string up.
Detailed Explanation
HighEntropyString tests are pretty noisy, if not aggressively pruned for false positives. One way it attempts to do this is via applying a rather strict regex (source), which requires it to be inside quotes. However, for certain contexts, this quoted requirement is removed (e.g. YAML files, and inline string scanning).
When this quoted requirement is removed, we get the following breakdown:
>>> line = 'AZ_STORAGE_CS="DefaultEndpointsProtocol=https;AccountName=storageaccount1234;AccountKey=1OM7c6u5Ocp/zyUMWcRChowzd8czZmxPhzHZ8o45X7tAryr6JFF79+zerFFQS34KzVTK0yadoZGkvZh42A==;EndpointSuffix=core.windows.net"'
>>> with self.non_quoted_string_regex(is_exact_match=False):
... self.regex.findall(line)
['AZ_STORAGE_CS=', 'DefaultEndpointsProtocol=https', 'AccountName=storageaccount1234', 'AccountKey=1OM7c6u5Ocp/zyUMWcRChowzd8czZmxPhzHZ8o45X7tAryr6JFF79+zerFFQS34KzVTK0yadoZGkvZh42A==', 'EndpointSuffix=core', 'windows', 'net']
When we do so, we can see that the AccountKey=1OM7c6u5Ocp/zyUMWcRChowzd8czZmxPhzHZ8o45X7tAryr6JFF79+zerFFQS34KzVTK0yadoZGkvZh42A== would trigger the base64 plugin, as demonstrated below:
$ detect-secrets scan --string 'AccountKey=1OM7c6u5Ocp/zyUMWcRChowzd8czZmxPhzHZ8o45X7tAryr6JFF79+zerFFQS34KzVTK0yadoZGkvZh42A=='
AWSKeyDetector : False
ArtifactoryDetector : False
Base64HighEntropyString: True (5.367)
BasicAuthDetector : False
CloudantDetector : False
HexHighEntropyString : False
IbmCloudIamDetector : False
IbmCosHmacDetector : False
JwtTokenDetector : False
KeywordDetector : False
MailchimpDetector : False
PrivateKeyDetector : False
SlackDetector : False
SoftlayerDetector : False
StripeDetector : False
TwilioKeyDetector : False
However, when this quoted requirement is applied, then the entire string payload is scanned as one potential secret: DefaultEndpointsProtocol=https;AccountName=storageaccount1234;AccountKey=1OM7c6u5Ocp/zyUMWcRChowzd8czZmxPhzHZ8o45X7tAryr6JFF79+zerFFQS34KzVTK0yadoZGkvZh42A==;EndpointSuffix=core.windows.net
This doesn't get flagged because it fails the original base64 regex rules, which doesn't know how to handle ;.
>>> self.regex.findall(line)
[]
Therefore, this functionality differs, though, not immediately apparent through the invocation pattern described.
How Do I Fix This?
This is a much more challenging question, since allowing other characters would change the entropy calculation, and the probability of flagging strings. There has been some discussion around creating a plugin for all characters, but the team has not yet been able to decide on a default entropy limit for this.

Related

Telegraf splits input data into different outputs

I want to write Telegraf config file which will:
Uses openweathermap input or custom http request result
{
"fields": {
...
"humidity": 97,
"temperature": -11.34,
...
},
"name": "weather",
"tags": {...},
"timestamp": 1675786146
}
Splits result on two similar JSONs:
{
"sensorID": "owm",
"timestamp": 1675786146,
"value": 97,
"type": "humidity"
}
and
{
"sensorID": "owm",
"timestamp": 1675786146,
"value": -11.34,
"type": "temperature"
}
Sends this JSONs into MQTT queue
Is it possible or I must create two different configs and make two api calls?
I found next configuration which solves my problem:
[[outputs.mqtt]]
servers = ["${MQTT_URL}", ]
topic_prefix = "owm/data"
data_format = "json"
json_transformation = '{"sensorID":"owm","type":"temperature","value":fields.main_temp,"timestamp":timestamp}'
[[outputs.mqtt]]
servers = ["${MQTT_URL}", ]
topic_prefix = "owm/data"
data_format = "json"
json_transformation = '{"sensorID":"owm","type":"humidity","value":fields.main_humidity,"timestamp":timestamp}'
[[inputs.http]]
urls = [
"https://api.openweathermap.org/data/2.5/weather?lat={$LAT}&lon={$LON}2&appid=${API_KEY}&units=metric"
]
data_format = "json"
Here we:
Retrieve data from OWM in input plugin.
Transform received data structure in needed structure in two different output plugins. We use this language https://jsonata.org/ for this aim.

OPA/Rego execute function for each element of an array

I am new at OPA/Rego and I am trying to write a policy to check if an Azure Network Security Group contains all the rules that I define on an array
package sample
default compliant = false
toSet(arr) = {x | x := arr[_]}
checkProperty(rule, index, propertySingular, propertyPlural) = true
{
object.get(input.properties.securityRules[index].properties, propertySingular, "") == object.get(rule, propertySingular, "")
count(toSet(object.get(input.properties.securityRules[index].properties, propertyPlural, [])) - toSet(object.get(rule, propertyPlural, []))) == 0
}
existRule(rule) = true
{
input.properties.securityRules[i].name == rule.name
input.properties.securityRules[i].properties.provisioningState == rule.provisioningState
input.properties.securityRules[i].properties.description == rule.description
input.properties.securityRules[i].properties.protocol == rule.protocol
checkProperty(rule, i, "sourcePortRange", "sourcePortRanges")
checkProperty(rule, i, "destinationPortRange", "destinationPortRanges")
checkProperty(rule, i, "sourceAddressPrefix", "sourceAddressPrefixes")
checkProperty(rule, i, "destinationAddressPrefix", "destinationAddressPrefixes")
input.properties.securityRules[i].properties.access == rule.access
input.properties.securityRules[i].properties.priority == rule.priority
input.properties.securityRules[i].properties.direction == rule.direction
}
compliant
{
rules := [
{
"name": "name1",
"provisioningState": "Succeeded",
"description": "description1",
"protocol": "*",
"sourcePortRange": "*",
"destinationPortRange": "53",
"destinationAddressPrefix": "*",
"access": "Allow",
"priority": 1,
"direction": "Inbound",
"sourceAddressPrefixes":
[
"xx.xx.xx.xx",
"xx.xx.xx.xx",
"xx.xx.xx.xx"
],
},
{
"name": "name2",
"provisioningState": "Succeeded",
"description": "description2",
"protocol": "*",
"sourcePortRange": "*",
"destinationPortRange": "54",
"sourceAddressPrefix": "*",
"access": "Allow",
"priority": 2,
"direction": "Outbound",
"destinationAddressPrefixes":
[
"xx.xx.xx.xx",
"xx.xx.xx.xx",
"xx.xx.xx.xx"
]
}
]
#checks
existRule(rules[i])
}
The issue seem to be that when execute existRule(rules[i]) if one of the rules match it returns true, don't mather if other rules doesn't
If I replace existRule(rules[i]) with existRule(rules[0]) or existRule(rules[1]), it return true or false depending on if the rule on that position matchs.
Is there any way to get the result of the execution of existRule(rules[i]) for all the elements of the array?
I already tried result := [existRule(rules[i])] but it only return one element with true
Sure! Use a list comprehension and call the function inside of it, then compare the size of the result to what you had before. Given your example, you would replace existRule(rules[i]) with something like this:
compliantRules := [rule | rule := rules[_]
existRule(rule)]
count(compliantRules) == count(rules)

Is there a way to use OCR to extract specific data from a CAD technical drawing?

I'm trying to use OCR to extract only the base dimensions of a CAD model, but there are other associative dimensions that I don't need (like angles, length from baseline to hole, etc). Here is an example of a technical drawing. (The numbers in red circles are the base dimensions, the rest in purple highlights are the ones to ignore.) How can I tell my program to extract only the base dimensions (the height, length, and width of a block before it goes through the CNC)?
The issue is that the drawings I get are not in a specific format, so I can't tell the OCR where the dimensions are. It has to figure out on its own contextually.
Should I train the program through machine learning by running several iterations and correcting it? If so, what methods are there? The only thing I can think of are Opencv cascade classifiers.
Or are there other methods to solving this problem?
Sorry for the long post. Thanks.
I feel you... it's a very tricky problem, and we spent the last 3 years finding a solution for it. Forgive me for mentioning the own solution, but it will certainly solve your problem: pip install werk24
from werk24 import Hook, W24AskVariantMeasures
from werk24.models.techread import W24TechreadMessage
from werk24.utils import w24_read_sync
from . import get_drawing_bytes # define your own
def recv_measures(message: W24TechreadMessage) -> None:
for cur_measure in message.payload_dict.get('measures'):
print(cur_measure)
if __name__ == "__main__":
# define what information you want to receive from the API
# and what shall be done when the info is available.
hooks = [Hook(ask=W24AskVariantMeasures(), function=recv_measures)]
# submit the request to the Werk24 API
w24_read_sync(get_drawing_bytes(), hooks)
In your example it will return for example the following measure
{
"position": <STRIPPED>
"label": {
"blurb": "ø30 H7 +0.0210/0",
"quantity": 1,
"size": {
"blurb": "30",
"size_type":" "DIAMETER",
"nominal_size": "30.0",
},
"unit": "MILLIMETER",
"size_tolerance": {
"toleration_type": "FIT_SIZE_ISO",
"blurb": "H7",
"deviation_lower": "0.0",
"deviation_upper": "0.0210",
"fundamental_deviation": "H",
"tolerance_grade": {
"grade":7,
"warnings":[]
},
"thread": null,
"chamfer": null,
"depth":null,
"test_dimension": null,
},
"warnings": [],
"confidence": 0.98810
}
or for a GD&T
{
"position": <STRIPPED>,
"frame": {
"blurb": "[⟂|0.05|A]",
"characteristic": "⟂",
"zone_shape": null,
"zone_value": {
"blurb": "0.05",
"width_min": 0.05,
"width_max": null,
"extend_quantity": null,
"extend_shape": null,
"extend": null,
"extend_angle": null
},
"zone_combinations": [],
"zone_offset": null,
"zone_constraint": null,
"feature_filter": null,
"feature_associated": null,
"feature_derived": null,
"reference_association": null,
"reference_parameter": null,
"material_condition": null,
"state": null,
"data": [
{
"blurb": "A"
}
]
}
}
Check the documentation on Werk24 for details.
Although a managed offering, Mixpeek is one free option:
pip install mixpeek
from mixpeek import Mixpeek
mix = Mixpeek(
api_key="my-api-key"
)
mix.upload(file_name="design_spec.dwg", file_path="s3://design_spec_1.dwg")
This /upload endpoint will extract the contents of your DWG file, then when you search for terms it will include the file_path so you can render it in your HTML.
Behind the scenes it uses the open source LibreDWG library to run a number of AutoCAD native commands such as DATAEXTRACTION.
Now you can search for a term and the relevant DWG file (in addition to the context in which it exists) will be returned:
mix.search(query="retainer", include_context=True)
[
{
"file_id": "6377c98b3c4f239f17663d79",
"filename": "design_spec.dwg",
"context": [
{
"texts": [
{
"type": "text",
"value": "DV-34-"
},
{
"type": "hit",
"value": "RETAINER"
},
{
"type": "text",
"value": "."
}
]
}
],
"importance": "100%",
"static_file_url": "s3://design_spec_1.dwg"
}
]
More documentation here: https://docs.mixpeek.com/

parsing JSON file using telegraf input plugin : unexpected Output

I’m new to telegraf and influxdb, and currently looking forward to exploring telegraf, but unfortunetly, I have some difficulty getting started, I will try to explain my problem below:
Objectif: parsing JSON file using telegraf input plugin.
Input : https://wetransfer.com/downloads/0abf7c609d000a7c9300dc20ee0f565120200624164841/ab22bf ( JSON file used )
The input json file is a repetition of the same structure that starts from params and ends at it.
you find below the main part of the input file :
{
"events":[
{
"params":[
{
"name":"element_type",
"value":"Home_Menu"
},
{
"name":"element_id",
"value":""
},
{
"name":"uuid",
"value":"981CD435-E6BC-01E6-4FDC-B57B5CFA9824"
},
{
"name":"section_label",
"value":"HOME"
},
{
"name":"element_label",
"value":""
}
],
"libVersion":"4.2.5",
"context":{
"locale":"ro-RO",
"country":"RO",
"application_name":"spresso",
"application_version":"2.1.8",
"model":"iPhone11,8",
"os_version":"13.5",
"platform":"iOS",
"application_lang_market":"ro_RO",
"platform_width":"320",
"device_type":"mobile",
"platform_height":"480"
},
"date":"2020-05-31T09:38:55.087+03:00",
"ssid":"spresso",
"type":"MOBILEPAGELOAD",
"user":{
"anonymousid":"6BC6DC89-EEDA-4EB6-B6AD-A213A65941AF",
"userid":"2398839"
},
"reception_date":"2020-06-01T03:02:49.613Z",
"event_version":"v1"
}
Issue : Following the documentation, I tried to define a simple telegraf.conf file as below:
[[outputs.influxdb_v2]]
…
[[inputs.file]]
files = ["/home/mouhcine/json/file.json"]
json_name_key = "My_json"
#... Listing all the string fields in the json.(I put only these for simplicity reason).
json_string_fields = ["ssid","type","userid","name","value","country","model"]
data_format = "json"
json_query= "events"
Basically declaring string fields in the telegraf.conf file would do it, but I couldn’t get all the fields that are subset in the json file, like for example what’s inside ( params or context ).
So finally, I get to parse fields with the same level of hierarchy as ssid, type, libVersion, but not the ones inside ( params, context, user).
Output : Screen2 ( attachment ).
OUTPUT
By curiosity, I tried to test the documentation’s example, in order to verify whether I get the same expected result, and the answer is no :/, I don’t get to parse the string field in the subset of the file.
The doc’s example below:
Input :
{
"a": 5,
"b": {
"c": 6,
"my_field": "description"
},
"my_tag_1": "foo",
"name": "my_json"
}
telegraf.conf
[[outputs.influxdb_v2]]
…
[[inputs.file]]
files = ["/home/mouhcine/json/influx.json"]
json_name_key = "name"
tag_keys = ["my_tag_1"]
json_string_fields = ["my_field"]
data_format = "json"
Expected Output : my_json,my_tag_1=foo a=5,b_c=6,my_field="description"
The Result I get : "my_field" is missing.
Output: Screen 1 ( attachement ).
OUTPUT
By the way, I use the influxdb cloud 2, and I apologize for the long description of this little problem, I would appreciate some help please :), Thank you so much in advance.

Lua and Alien structures

I'm trying to redefine this C structure to Lua with alien 0.50 module however I have a two arrays of char at the end. Both szLibraryPath and szLibraryName are originally defined as
char szLibraryPath[MAX_PATH] in C. Can this be done with alien?
LIBRARY_ITEM_DATA = alien.defstruct{
{ "hFile", "long" },
{ "BaseOfDll", "long" },
{ "hFileMapping", "long" },
{ "hFileMappingView", "long" },
{ "szLibraryPath", "byte" }, -- fix to MAX_PATH
{ "szLibraryName", "byte" } -- fix to MAX_PATH
}
Take a look at this answer by the author of Alien.
Your structure should look like this:
LIBRARY_ITEM_DATA = alien.defstruct{
{ "hFile", "long" },
{ "BaseOfDll", "long" },
{ "hFileMapping", "long" },
{ "hFileMappingView", "long" },
{ "additionalFields", "char" }
}
LIBRARY_ITEM_DATA.size = LIBRARY_ITEM_DATA.size + 2*MAX_PATH - 1
And you would get/set the arrays by manually reading/writing bytes at the end of the struct (using the code in the link). To access the second array, add MAX_PATH to all the offsets.

Resources