Typo3 adjust processed path for processed images programmatically - image-processing

I'm looking for a solution to generate the processed files for one database in an explicitly via uid defined folder. F.e.:
fileadmin/_processed/<uid>/allProcessedFilesHere
The generation of the files happens via the following code at the moment and I am not able to figure out how to adjust the config array to pass different storage.
$settings['additionalParameters'] = '-quality 80';
$settings['width'] = $imageSettings["width"];
$settings['height'] = $imageSettings["height"];
$processedImage = $file->process(\TYPO3\CMS\Core\Resource\ProcessedFile::CONTEXT_IMAGECROPSCALEMASK, $settings);
So I am looking for something similar to the following, where $uid is just the id of the entry that images shall get processed:
$storageRepository = \TYPO3\CMS\Core\Utility\GeneralUtility::makeInstance('TYPO3\\CMS\\Core\\Resource\\StorageRepository');
$uidForStorageForDBEntry = getStorageUidForDBObject($uid);
$identifiedStorage = $storageRepository->findByUid($uidForStorageForDBEntry);
$settings['storage'] = $identifiedStorage->getUid()
To create one storage per uid seems not be the way to do it right, but I can't figure out another approach at the moment. As there are hundreds of objects with images in many different formats, I don't want to use a _processed folder with 100k image entries inside.

They are integrating the functionality to bind the processed folder to a storage element into the Typo3 Core. It should work in Version 7 LTS.

Related

Creating similar HTML reports for several data files

I need to analyze a dozen similarly formatted data files. I wish to generate a similar html report, containing some statistics and graphs which describe the data, for each file. One html report per one file, same graphs in each, just different numbers. For a single file, this is easy to do for instance using the FsLab journal. Despite my best efforts, I haven't found any way to do this efficiently for many similar files (same format, different numbers).
If I have 10 files, I'd need to copy-paste the journal 10 times and change the line that defines which file to load in each copy. Then whenever I wish to add a new graph I'd need to edit all 10 files. This clearly cannot be the best way to do this.
I am willing to use other methods than the journal and other libraries than FsLab if they suit the problem better, but I'd believe there'd be an easy solution for a basic thing like this.
This is something that is not very nicely supported by the FsLab Journals system, but you can definitely find some way to do this. One simple option I can think of would be to modify the build.fsx script for the journals so that it processes the script repeatedly and uses, e.g. environment variable to specify the input file.
If you are using the standard template, look at the generateJournals functoion:
let generateJournals ctx =
let builtFiles = Journal.processJournals ctx
traceImportant "All journals updated."
Journal.getIndexJournal ctx builtFiles
I think you should be able to modify it along the following lines:
let generateJournals ctx =
// Iterate over all inputs you want to process
for input in inputFiles do
// Set environment variable to keep 'input'
let builtFiles = Journal.processJournals ctx
// Move the resulting files, so that they do not
// get overwritten by the next run
// Just return the journal you want to open first below
traceImportant "All journals updated."
Journal.getIndexJournal ctx builtFiles
Then in the journal, you should be able to use System.Environment to read the variable set in the build script.

Bulk update links within the files on Google Drive?

We use Google Drive (GAFE) to prepare and present teaching/training materials. We'd like to maintain archived versions of past iterations, and then work on a new copy for each consecutive training session.
I've succeeded in making a copy of our training folder (using ericyd's gdrive-copy), and we're happily working away on that, BUT... the files are fairly heavily cross-linked. The Slides, for instance, will have links to the Docs handouts and PDF assignments associated with that lesson. When I made a copy of the whole folder structure, the files copied over, but the links are still all linked to the original files, when in fact what we want is for them to be linked to their respective copies.
This makes sense - obviously, when you make a copy of a file, you usually don't want to changes its contents at the same time. However, when you're making an archive of a whole folder, ideally you'd like the links within the files to update as well.
I can compile a spreadsheet with the file IDs for each "original and copy" pair. Is there any way to iterate through all Google Docs/Sheets/Slides in a folder, and substitute the original URLs from the spreadsheet file with their respective copy URLs?
I'm practically a beginner when it comes to Google Apps Scripts, so while I have found Get All Links in a Document and am guessing it would be part of the answer, I have no clue where to go beyond that.
(Btw, if there's a different way of going about all three, automating fixing the links in Slides would be the most helpful, as that's where the bulk of them are)
I know this is a rather old topic, but I recently ran into similar situation that I needed to solve. In my searching, this is the only reference I could find referring to cross-linking as a result of duplication. Unfortunately, I was not able to come up with a purely automated solution, but through a bit of ingenuity I was able to reduce the number of steps required to update my hyperlinks to reference the duplicated files rather than the originals.
First, I borrowed some script code I found online to generate a list of files within a Google Drive folder and their URL's. I'll post the code below. This generates a new Google Sheet named "URL LIST" (you can change the name if you wish in the script), that once generated you'll need to find on your recent list in your Google Drive and move to the folder containing the copied documents and sheets.
Next, in the Google Sheet that I have my hyperlinks to my documents, I created an additional Tab also called URL LIST, and in A1 added an IMPORTRANGE() to import the URL LIST contents. Once you're done with all of this, you will only have to update this one reference with each copy you make, thus dramatically reducing the number of updates you'll need to make, i.e. IMPORTRANGE() points at a specific URL, so each newly generated URL LIST will have a new URL that the copied document containing your hyperlinks and IMPORTRANGE() will need to point to. Hopefully, that makes sense.
Next, your hyperlinks will need a formula along the lines of =HYPERLINK(VLOOKUP(A1,'URL LIST'!$A$1:$B$10,2,FALSE) to grab the imported URL's. It's important to make sure you that you indicate that the look up range is not sorted, or FALSE, because the order that the script spits out the document list with URL's may change depending on how the folder is sorted at the time of running the script, and will ensure you don't need the list sorted. You can then copy the formula to each cell that you need a hyperlink.
Of equal importance is that your VLOOKUP() search key is exactly as it will be listed in your URL LIST.
This method allowed me to reduce the number of steps of updating hyperlinks from 9 steps down to the 1 step of updating the IMPORTRANGE() each time I make copies.
I hope this helps you or someone else!
Copy and past the following script into your script editor:
// replace your-folder below with the folder name for which you want a listing
function listFolderContents() {
var foldername = 'your-folder';
var folderlisting = 'URL LIST ';
var folders = DriveApp.getFoldersByName(foldername)
var folder = folders.next();
var contents = folder.getFiles();
var ss = SpreadsheetApp.create(folderlisting);
var sheet = ss.getActiveSheet();
sheet.appendRow( ['name', 'link'] );
var file;
var name;
var link;
var row;
while(contents.hasNext()) {
file = contents.next();
name = file.getName();
link = file.getUrl();
sheet.appendRow( [name, link] );
}
};

HDFS Flume sink - Roll by File

Is it possible for HDFS Flume sink to roll whenever a single file (from a Flume source, say Spooling Directory) ends, instead of rolling after certain bytes (hdfs.rollSize), time (hdfs.rollInterval), or events (hdfs.rollCount)?
Can Flume be configured so that a single file is a single event?
Thanks for your input.
Reagarding your first question, it is not possible due to the sinks logic is disconnected from the sources logic. I mean, a sink only sees events being put into the channel which must be processed by him; the sink does not know if an event is the first or the last regarding a file.
Of course, you could try to create your own source (or extend an existing one) in order to add a header to the event with a value meaning "this is the last event". Then, another custom sink could behave depending on such a header: for instance, if the header is not set, then the events are not persisted but stored in memory until the header is seen; then all the information is persisted in the final backend as a bach. Other possibility is that custom sink persists the data in a file until the header is seen; then the file is closed and another one is opened.
Regarding your second question, it depends on the sink. The spooldir source behaves based on the deserializer parameter; by default its value is LINE, what means:
Specify the deserializer used to parse the file into events. Defaults to parsing each line as an event. The class specified must implement EventDeserializer.Builder.
But other custom Java classes can be configured, as said above; for instance, a deserialized for the whole file.
You can set rollsize to a small number combined with BlobDeserializer to load file by file instead of combining into blocks. This is really helpful when you have unsplittable binary files such as PDF or gz files.
This is part of the configuration that is relevant:
#Set deserializer to BlobDeserializer and set the maximum blob size to be 1GB.
#Notice that the blobs have to fit in memory so this doesn't work for files that cannot fit in memory.
agent.sources.spool.deserializer = org.apache.flume.sink.solr.morphline.BlobDeserializer$Builder
agent.sources.spool.deserializer.maxBlobLength = 1000000000
#Set rollSize to 1024 to avoid combining multiple small files into one part.
agent.sinks.hdfsSink.hdfs.rollSize = 1024
agent.sinks.hdfsSink.hdfs.rollCount = 0
agent.sinks.hdfsSink.hdfs.rollInterval = 0
The answer to the question "Can Flume be configured so that a single file is a single event?" is yes.
Yo only have to configure the following property to be 1:
hdfs.rollCount = 1
I'm looking for a solution for your first question, because sometimes the file is too big and it's needed to split the file in several chunks.
You can use any event headers in hdfs.path. ( https://flume.apache.org/FlumeUserGuide.html#hdfs-sink )
If you are using Spooling Directory Source, you can enable putting the file name in the events using fileHeaderKey or basenameHeaderKey ( https://flume.apache.org/FlumeUserGuide.html#spooling-directory-source ).
Can Flume be configured so that a single file is a single event?
It could be, however it is not recommended. The underlying implementation (protobuf) limits file sizes to 64m. Flume events are to be small in size due to its architecture and design. (Fault-tolerance, etc.)

How can I move multiple Pylons Applications into a single Composite Application?

We have several single Pylon websites running but would like to make these more easily reusable.
There is a concept of a "Composite Application" inside pylons, but there seems to be limited instructions on how to achieve this.
Has anyone done this or is aware of a good tutorial on "How to convert multiple pylons apps into a composite app?" ?
I've tried - perhaps too optimistically - to simply copy an existing app into another app and fiddle with the development.ini file, but this does not seem to work. (I'm getting the error "pkg_resources.DistributionNotFound: wiki" in that case)
Thanks
This is done by modifying the WSGI pipeline to dispatch a request to different applications based on request properties (usually URL). The simplest way to modify the pipeline is by PasteDeploy (the package that controls your INI files).
[composite:main]
use = egg:Paste#urlmap
/foo = foo
/bar = bar
/ = baz
[app:foo]
use = myapp#main
[app:bar]
use = yourapp#main
[app:baz]
use = myapp#baz
This creates a composite application that dispatches to different endpoints based on the URL prefix.

Use user-defined asset servers in Rails

I'm building an application in Rails where sets of images can be created, i.e. the user uploads a file of image names and specifies a path where those images can be found on the web.
So, for example, the image file contains:
image1.jpg
image2.jpg
and the path is specified as http://www.user1-server.com/.
Another user could load his own file of image names, but specify another server: http://www.user2-server.com/ or even http://my-fancy-server.com.
Is there any way to use the AssetTagHelper functionality of Rails to help me generate the image tags?
So, if I'm in the context of user 1, e.g. /users/1/images/1, and use:
image_tag("1.jpg")
it should deliver http://www.user1-server.com/images/1.jpg, but for /users/2/images/1 it should return http://www.user2-server.com/images/1.jpg or http://my-fancy-server.com/images/1.jpg.
I don't think you can change asset host in Rails on the fly like that. Rails is smart enough to not set or override the host passed into an image tag, though. Maybe just write a helper method or something to pass the correct host in?

Resources