Parse a report with powershell

Parse a report with powershell - parsing

I'm trying to gather some statistics using powershell.
I have about 4,000 reports that look like this (all in their own Report.txt files - some have more fields. I've sniped it):
What I would like to do (thinking out loud) is take each [ ] and make that a csv header and then add each item under that header under it. Then I can run simple count statements against it and pull out some data. Maybe there is a better/easier way?
The idea is to have some like:
Requested Permissions: android.permission.INTERNET 3,562 of 4,000
Requested Permissions: android.permission.READ_SMS 1 of 4,000
etc...
So far i've stripped down the string so it no longer has white spaces and *.
[ Package name ]
* com.software.application
[ Requested Permissions ]
* android.permission.INTERNET
* android.permission.READ_PHONE_STATE
* android.permission.READ_SMS
* android.permission.RECEIVE_SMS
* android.permission.SEND_SMS
* android.permission.WAKE_LOCK
* android.permission.WRITE_EXTERNAL_STORAGE
* com.google.android.c2dm.permission.RECEIVE
* com.software.application.permission.C2D_MESSAGE
[ Responsible API calls for used Permissions ]
* android/app/NotificationManager;->notify
* android/content/Context;->sendBroadcast
* android/content/Context;->startService
* android/os/PowerManager$WakeLock;->acquire
* android/os/PowerManager$WakeLock;->release
* android/telephony/SmsManager;->sendTextMessage
[ Potentially dangerous Calls ]
* getSystemService
* HttpPost
* sendSMS
Current output example:
[PotentiallydangerousCalls]
getSystemService
HttpPost
sendSMS
So it's a bit cleaner now...

I will put some code if I have time, but here is a roadbook :
1) Write a function (CsvToObject) that take the path and name of a csv file and return a PSObject with a property with each [value]
This function can take advantage of switch ... case PowerShell structure with regex and file usage.
2) Take every *.csv file (Get-ChildItem) , call your function and pipe the result in Export-CSV

Your current output example is structured like an ini file. This Scripting Guy post on ini files might be useful. Some regex help here and here.

Related

How to export a csv from Google Sheet API?

I can't find any reference to an API that enables Rest API clients to export an existing Google Sheet to a csv file.
https://developers.google.com/sheets/
I believe there should be a way to export them.

The following URL gives you the CSV of a Google spreadsheet per sheet. The sheet must be accessible by the public, by anyone with the link (unlisted).
The parameters you need to provide are:
sheet ID (that is simply the ID in the URL of a Google Spreadsheet https://docs.google.com/spreadsheets/d/{{ID}}/edit)
sheet name (that is simply the name of the sheet as given by the user)
https://docs.google.com/spreadsheets/d/{{ID}}/gviz/tq?tqx=out:csv&sheet={{sheet_name}}
With that URL you can run a GET-request to fetch the CSV.
Or paste it in your browser address bar.

You can use the Drive API to do this today -- see https://developers.google.com/drive/v3/web/manage-downloads#downloading_google_documents, however that will limit you to the first sheet of the document. The Sheets API doesn't expose exporting as CSV today, but may offer it in the future.

Nobody's mentioned gspread yet, so here's how I did it:
#open sheet
sheet = gc.open_by_key(sheet_id)
#select worksheet
worksheet = sheet.get_worksheet(0)
#download values into a dataframe
df = pd.DataFrame(worksheet.get_all_records())
#save dataframe as a csv, using the spreadsheet name
filename = sheet.title + '.csv'
df.to_csv(filename, index=False)

Firstly you should make document accessible for anyone. Then you get url. From this url you should extract long id composed from big and small letters and numbers. Then use this script.
#!/bin/bash
long_id="id_assigned_to_your_document"
g_id="number_assigned_to_card_in_google_sheet"
wget --output-document=temp.csv "https://docs.google.com/spreadsheets/d/$long_id/export?gid=$g_id&format=csv&id=$long_id"
If you use only one card in document, their number is: g_id="0"
The problem you will probably have is connected with strange spaces in obtained file. I use this second script to process it
#!/bin/bash
#Delete all lines beginning with a # from a file
#http://stackoverflow.com/questions/8206280/delete-all-lines-beginning-with-a-from-a-file
sed '/^#/ d' temp.csv |
# reomve spaces
# http://stackoverflow.com/questions/9953448/how-to-remove-all-white-spaces-from-a-given-text-file
tr -d "[:blank:]" |
# regexp "1,2" into 1.2
# http://www.funtoo.org/Sed_by_Example,_Part_2
sed 's/\"\([−]\?[0-9]*\),\([0-9]*\)\"/\1.\2/g' > out.csv
Update
As Sam mentioned, api is better solution. There is now great documentation on address:
https://developers.google.com/sheets/quickstart/php
With example that generate output having CSV structure.

If you don't have easy access to or familiarity with PHP, here's a very barebones Google Apps Script Web App that once deployed and the caller permission accepted, should allow clients with an appropriately scoped access token or api key to export an existing Google Sheet to a csv file. It takes a Google Sheets spreadsheet id and sheet name (and optional download filename) as query parameters, and returns the corresponding theoretically RFC 4180 compliant CSV file.
Further instructions on deploying an Apps Script project as a web app are here: https://developers.google.com/apps-script/guides/web#deploying_a_script_as_a_web_app.
You can deploy it and test it out easily in the browser just by visiting the "Current web app URL" (as provided when you publish as web app from the script editor), and accepting the consent screen, or even just visit the one that I deployed (configured to execute as the accessing user, and unverified/scary consent) at the example URL.
The tricky part (as usual) is getting the OAuth token or API key set up, but if you're already calling the Google Sheets V4 API, you've probably already got that dialed in. I used CURL to make sure that it behaved as a REST api, but the technique I used to get an OAuth token there is both a distraction and frankly a little scary to include here since it's really easy to mess up. If you don't already have a way to get one, that's probably a good topic for a separate SO question in any case.
One related (and big!) caveat: I'm not 100% sure how the consent and verification interact with a pure Rest client (i.e. how that works if you DON'T visit this in the browser first...), and/or whether this script would need to be in the same GCP project as the other code that uses the Sheets API. If there's interest, and/or it doesn't work right out of the box, please let me know and I'll happily dig deeper and follow up.
// Example URL, assuming:
// "Current web app URL": https://script.google.com/a/tillerhq.com/macros/s/AKfycbyZlWAW6bpCpnFoPjbdjznDomFRbTNluG4siCBMgOy2qU2AGoA/exec
// spreadsheetId: 1xNDWJXOekpBBV2hPseQwCRR8Qs4LcLOcSLDadVqDA0E
// sheet name: Sheet1
// (optional) filename: mycsv.csv
//
// https://script.google.com/a/tillerhq.com/macros/s/AKfycbyZlWAW6bpCpnFoPjbdjznDomFRbTNluG4siCBMgOy2qU2AGoA/exec?spreadsheetid=1xNDWJXOekpBBV2hPseQwCRR8Qs4LcLOcSLDadVqDA0E&sheetname=Sheet1&filename=mycsv.csv?spreadsheetid=1xNDWJXOekpBBV2hPseQwCRR8Qs4LcLOcSLDadVqDA0E&sheetname=Sheet1&filename=mycsv.csv
//
var REQUIRED_PARAMS = [
'spreadsheetid', // example: "1xNDWJXOekpBBV2hPseQwCRR8Qs4LcLOcSLDadVqDA0E"
'sheetname' // Case-sensitive; example: "Sheet1"
];
// Returns an RFC 4180 compliant CSV for the specified sheet in the specified spreadsheet
function doGet(e) {
REQUIRED_PARAMS.forEach(function(requiredParam) {
if (!e.parameters[requiredParam]) throw new Error('Missing required parameter ' + requiredParam);
});
var spreadsheet = SpreadsheetApp.openById(e.parameters.spreadsheetid);
var sheet = spreadsheet.getSheetByName(e.parameters.sheetname);
if (!sheet) throw new Error("Could not find sheet " + e.parameters.sheetname + " in spreadsheet " + e.parameters.spreadsheetid);
var filename = e.parameters.filename || (spreadsheet.getName() + "_" + e.parameters.sheetname + ".csv");
var numRows = sheet.getLastRow();
var numColumns = sheet.getLastColumn();
var values = sheet.getSheetValues(1, 1, numRows, numColumns);
function quote(s) {
s = s.toString();
if ((s.indexOf("\r") == -1)
&& (s.indexOf("\n") == -1)
&& (s.indexOf(",") == -1)
&& (s.indexOf("\"") == -1)) return s;
// Fields containing line breaks (CRLF)*, double quotes, and commas should be enclosed in double-quotes;
// anything other than that we already returned, so if we get here -- escape it and quote it.
// *That's what the text of the RFC says, but the ABNF (...and Excel) treat EITHER CR or LF as requiring quotes.
// Replace any double quote with a double double quote, and wrap the whole thing in quotes
return "\"" + s.replace(/"/g, '""') + "\"";
};
var csv = values.map(function(row) {
return row.map(quote).join();
}).join("\r\n") + "\r\n";
return ContentService
.createTextOutput(csv)
.setMimeType(ContentService.MimeType.CSV)
.downloadAsFile(filename);
}

Response code Null (0) today, app was working yesterday (API Integration)

I have integrated Asana task lists into my company's website for our development team. All was going well until today - the page I created now errors out with a response code of NULL (or 0). In my very limited experience, this is an issue with the Asana connection, correct? Is this something I can fix on my side or is Asana API currently down (considering this worked up until today)?
We are using the API key, not OAuth, as everyone who has access to this task list is on the same workspace.
Edit:
I have 2 api keys that I am working with - 1 for production, 1 for development. I switch them out when I hand over a branch for my boss to merge in.
Testing API key works just fine.
Production API key does not work - always a response code null when trying to pull back tasks.
I am brand new to API development and I do not know how to use curl to make these calls. I am using a library found here:
https://github.com/ajimix/asana-api-php-class/blob/master/asana.php
More specifically, here is my code:
/*
* We are only interested in the JVZoo workspace
* Unless we are testing, in which we will use Faith In Motion workspace
*/
$JVZ_workspace_id = 18868754507673;
$FIM_workspace_id = 47486153348950;
/*
* Which one are we using right now?
*/
$workspace = $FIM_workspace_id;
/*
* Now lets do the same thing with Projects
* There should only be one project - JVZoo.com
* Unless we are testing - More JVZoo Testing
*
* We do need to dynamically show the project name
* This will help on confusion if we are accidently in the
* wrong project
*
* Then start building an array with these pieces
*/
$JVZ_project = array();
$JVZ_project['id'] = 53244927972665;
$JVZ_project['name'] = "JVZoo.com";
$FIM_project = array();
$FIM_project['id'] = 54787074465868;
$FIM_project['name'] = "More JVZoo Testing";
/*
* Which one are we using?
*/
$project = $FIM_project;
/*
* In order to help reduce load time even more,
* we are not going to return the project name
*
* This way we do not need to ask Asana for the project information
* This does change the layout of the view, however
*
* And finally grab all tasks for each project
* Connection check
*
* Return all tasks from this workspace and hand-filter
* to show both assigned and followed tasks
*/
$tasksJson = $asana->getProjectTasks($project['id']);
if ($asana->responseCode != '200' || is_null($tasksJson))
{
$this->session->set_flashdata('error', 'Error while trying to get tasks from Asana, response code: ' . $asana->responseCode);
redirect_and_continue_processing('/dashboard');
return;
}
FIM is my test environment.
JVZ is my production environment.
/**
* Returns all unarchived tasks of a given project
*
* #param string $projectId
* #param array $opt Array of options to pass
* (#see https://asana.com/developers/documentation/getting-started/input-output-options)
*
* #return string JSON or null
*/
public function getProjectTasks($projectId, array $opts = array())
{
$options = http_build_query($opts);
return $this->askAsana($this->taskUrl . '?project=' . $projectId . '&' . $options);
}
I did a PR on the parameter passed in to line that is returned above. In my FIM environment, I get this:
https://app.asana.com/api/1.0/tasks?project=54787074465868&
For my production environment:
https://app.asana.com/api/1.0/tasks?project=53244927972665&

How to programatically get holdings of an ETF

I am looking for a way to get the holding list of an ETF via a web service such as yahoo finance. So far, YQL has not yielded the desired results.
As an example ZUB.TO is an ETF that has holdings. here is a list of the holdings by querying the yahoo.finance.quotes we do not get the proper information.
The result.
Is there another table somewhere that would contain the holdings?

Perhaps downloading from Yahoo Finance is not working and/or may not work.
Instead how about using the various APIs the ETF providers already have for downloading the Excel or CSV files of the holdings?
Use the "append_df_to_excel" file as file to import, and then use the code below to make Excel file for all the 11 Sector SPDRs provided by SSgA (State Street global Advisors).
Personally I use this for doing breadth analysis.
import pandas as pd
import append_to_excel
# https://stackoverflow.com/questions/20219254/how-to-write-to-an-existing-excel-file-without-overwriting-data-using-pandas
##############################################################################
# Author: Salil Gangal
# Posted on: 08-JUL-2018
# Forum: Stack Overflow
##############################################################################
output_file = 'C:\my_python\SPDR_Holdings.xlsx'
base_url = "http://www.sectorspdr.com/sectorspdr/IDCO.Client.Spdrs.Holdings/Export/ExportExcel?symbol="
data = {
'Ticker' : [ 'XLC','XLY','XLP','XLE','XLF','XLV','XLI','XLB','XLRE','XLK','XLU' ]
, 'Name' : [ 'Communication Services','Consumer Discretionary','Consumer Staples','Energy','Financials','Health Care','Industrials','Materials','Real Estate','Technology','Utilities' ]
}
spdr_df = pd.DataFrame(data)
print(spdr_df)
for i, row in spdr_df.iterrows():
url = base_url + row['Ticker']
df_url = pd.read_excel(url)
header = df_url.iloc[0]
holdings_df = df_url[1:]
holdings_df.set_axis(header, axis='columns', inplace=True)
print("\n\n", row['Ticker'] , "\n")
print(holdings_df)
append_df_to_excel(output_file, holdings_df, sheet_name= row['Ticker'], index=False)
Image of Excel file generated for SPDRs

Declarative representation of a file path with embedded environment variables

I am looking at a situation where I'd like to bring some structure to what would be a string in an typical language. And wondering how to use Rebol's parts box to do it.
So let's say I've got a line that looks like this in the original language I'm trying to dialect:
something = ("/foo/mumble" "/foo/${BAR}/baz")
I want to use Rebol's primitives, so certainly a file path. Here is a random example of what I thought of off the top of my head:
something: [%/foo/mumble [%/foo/ BAR %/baz]]
If it were code you'd use REJOIN or COMBINE. But this is not designed to be executed, it's more like a configuration file. You're not supposed to be running arbitrary code, just getting a list of files.
I'm not sure about how feasible it is to stick with strings and yet still have these type as FILE!. Not all characters work in a FILE!, for instance:
>> load "%/foo/${BAR}/baz"
== [%/foo/$ "BAR" /baz]
It makes me wonder what my options are in Rebol data that's supposed to represent a configuration file. I can use plain old strings and do substitutions like other things do. Maybe REWORD with an OBJECT block to represent the environment?
What is the 'reword' function in Rebol and how do I use it?
In any case, I want to know how to represent a filename in a declarative context with environment variable substitutions like this.

I should use file! Your example need "" after %
f: load {%"/foo/${BAR}/baz"}
replace f "${BAR}" "MYVALUE" ;== %/foo/MYVALUE/baz

you could use path! with parens.
the only issue is the root, for which you can use another character to replace the "%" used for files... let's use '! (note this should be a word 'valid character).
when calling to-block on a path! type, it returns each part as its own token... useful.
to-block '!/path/(foo)/file.txt
== [! path (foo) file.txt]
here is a little script which loads three paths and uses parens as a constructed part of the path and uses tags to escape path-illegal characters (like a space!)
environments: make object! [
foo: "FU"
bar: "BR"
]
paths: [
!/path/(foo)/file.txt
!/root/<escape weird chars $>/(bar ".txt")
!/("__" foo)/path/(bar)
]
parse paths [
some [
(print "------" )
set data path! here: ( insert/only here to-block data to-block data )
(out-path: copy %"" )
into [
path-parts: (?? path-parts)
'!
some [
[ set data [word! | tag! | number!] (
append out-path rejoin ["/" to-string data]
)]
|
into [
( append out-path "/")
some [
set data word! ( append out-path rejoin [to-string get in environments data] )
| set data skip ( append out-path rejoin [ to-string data])
]
]
| here: set data skip (to-error rejoin ["invalid path token (" type? data ") here: " mold here])
]
]
(?? out-path)
]
]
Note this works both in Rebol3 and Rebol2
output is as follows:
------
path-parts: [! path (foo) file.txt]
out-path: %/path/FU/file.txt
------
path-parts: [! root <escape weird chars $> (bar ".txt")]
out-path: %/root/escape%20weird%20chars%20$/BR.txt
------
path-parts: [! ("__" foo) path (bar)]
out-path: %/__FU/path/BR
------

solr sanitizing query

I am using solr with ruby on rails.
It's all working well, I just need to know if there's any existing code to sanitize user input, like a query starting with ? or *

I don't know any code that does this, but theoretically it could be done by looking at the parsing code in Lucene and searching for throw new ParseException (only 16 matches!).
In practice, I think you're better off just catching any solr exceptions in your code and showing an "invalid query" message or something like that.
EDIT: Here are a couple of "sanitizers":
http://pivotallabs.com/users/zach/blog/articles/937-sanitizing-solr-requests
http://github.com/jvoorhis/lucene_query
http://e-mats.org/2010/01/escaping-characters-in-a-solr-query-solr-url/

If you are using Solarium with PHP then you can use the Solarium_Escape::term() method.
/**
* Escape a term
*
* A term is a single word.
* All characters that have a special meaning in a Solr query are escaped.
*
* If you want to use the input as a phrase please use the {#link phrase()}
* method, because a phrase requires much less escaping.\
*
* #link http://lucene.apache.org/java/docs/queryparsersyntax.html#Escaping%20Special%20Characters
*
* #param string $input
* #return string
*/
static public function term($input)
{
$pattern = '/(\+|-|&&|\|\||!|\(|\)|\{|}|\[|]|\^|"|~|\*|\?|:|\\\)/';
return preg_replace($pattern, '\\\$1', $input);
}

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Parse a report with powershell - parsing

Your current output example is structured like an ini file. This Scripting Guy post on ini files might be useful. Some regex help here and here.

Related

How to export a csv from Google Sheet API?

Response code Null (0) today, app was working yesterday (API Integration)

How to programatically get holdings of an ETF

Declarative representation of a file path with embedded environment variables

solr sanitizing query

Categories

Resources