Which is the syntax of SESSION ID - html-parsing

I'm making a web crawler with python and I sometimes find ":jsessionid=XXXX" in urls. I have made a function to delete it. My function takes an url and deletes from it the pattern ";jsession=XXXX...", where "XXXX..." is a pattern that matches anything till a question mark. I'm not sure if the algorithm is correct, because I don't get the syntax of jsessionid="...".
Anyway, my function is the following, could you please tell me if it's correct or where I can find the syntax of SESSION ID?
def deleteJSessionid(link):
print("originalLink:",link)
p = re.compile(r';jsessionid=[^?]*',re.DOTALL | re.IGNORECASE)
p = p.search(link)
print("\n\n"+p.group()+"\n\n")
start = p.span()[0]
end = p.span()[1]
link = link[:start] + link[end:]
return link

Related

How to concatenate API request URL safely

Let's imagine I have the following parts of a URL:
val url_start = "http://example.com"
val url_part_1 = "&fields[...]&" //This part of url can be in the middle of url or in the end
val url_part_2 = "&include..."
And then I try to concatenate the resulting URL like this:
val complete_url = url_start + url_part_2 + url_part_1
In this case I'd get http://example.com&include...&fields[...]& (don't consider syntax here), which is one & symbol between URL parts which means that concatenation was successful, BUT if I use different concat sequence in a different request like this:
val complete_url = url_start + url_part_1 + url_part_2
I'd get http://example.com&fields[...]&&include..., to be specific && in this case. Is there a way to ensure that concatenation is safer?
To keep you code clean use an array or object to keep your params and doin't keep "?" or "&" as part of urlStart or params. Add these at the end. e.g.
var urlStart = "http://example.com"
var params=[]
params.push ('a=1')
params.push ('b=2')
params.push ('c=3', 'd=4')
url = urlStart + '?' + params.join('&')
console.log (url) // http://example.com?a=1&b=2&c=3&d=4
First, you should note that it is invalid to have query parameters just after domain name; it should be something like http://example.com/?include...&fields[...] (note the /? part, you can replace it with / to make it a path parameter, but it's not likely that the router of the website supports parameters like this). Refer, for example, to this article: https://www.talisman.org/~erlkonig/misc/lunatech%5Ewhat-every-webdev-must-know-about-url-encoding/ to know more about what URLs can be valid.
For the simple abstract approach, you can use Kotlin's joinToString():
val query_part = arrayOf(
"fields[...]",
"include..."
).joinToString("&")
val whole_url = "http://example.com/?" + query_part
print(whole_url) // http://example.com/?fields[...]&include...
This approach is abstract because you can use joinToString() not only for URLs, but for whatever strings you want. This also means that if there will be an & symbol in one of the input strings itself, it will become two parameters in the output string. This is not a problem when you, as a programmer, know what strings will be joined, but if these strings are provided by user, it can become a problem.
For URL-aware approach, you can use URIBuilder from Apache HttpComponents library, but you'll need to import this library first.

php str_replace produces strange results

I am trying to replace some characters in a text block. All of the replacements are working except the one at the beginning of the string variable.
The text block contains:
[FIRST_NAME] [LAST_NAME], This message is to inform you that...
The variables are defined as:
$fname = "John";
$lname = "Doe";
$messagebody = str_replace('[FIRST_NAME]',$fname,$messagebody);
$messagebody = str_replace('[LAST_NAME]',$lname,$messagebody);
The result I get is:
[FIRST_NAME] Doe, This message is to inform you that...
Regardless of which tag I put first or how the syntax is {TAG} $$TAG or [TAG], the first one never gets replaced.
Can anyone tell me why and how to fix this?
Thanks
Until someone can provide me with an explanation for why this is happening, the workaround is to put a string in front and then remove it afterward:
$messagebody = 'START:'.$messagebody;
do what you need to do
$messagebody = substr($messagebody,6);
I believe it must have something to do with the fact that a string starts at position 0 and that maybe the str_replace function starts to look at position 1.

How to compile custom format ini file with redirects?

I'm working with an application that has 3 ini files in a somewhat irritating custom format. I'm trying to compile these into a 'standard' ini file.
I'm hoping for some inspiration in the form of pseudocode to help me code some sort of 'compiler'.
Here's an example of one of these ini files. The less than/greater than indicates a redirect to another section in the file. These redirects could be recursive.. i.e. one redirect then redirects to another. It could also mean a redirect to an external file (3 values are present in that case). Comments start with a # symbol
[PrimaryServer]
name = DEMO1
baseUrl = http://demo1.awesome.com
[SecondaryServer]
name = DEMO2
baseUrl = http://demo2.awesome.com
[LoginUrl]
# This is a standard redirect
baseLoginUrl = <PrimaryServer:baseUrl>
# This is a redirect appended with extra information
fullLoginUrl = <PrimaryServer:baseUrl>/login.php
# Here's a redirect that points to another redirect
enableSSL = <SSLConfiguration:enableSSL>
# This is a key that has mutliple comma-separated values, some of which are redirects.
serverNames = <PrimaryServer:name>,<SecondaryServer:name>,AdditionalRandomServerName
# This one is particularly nasty. It's a redirect to another file...
authenticationMechanism = <Authenication.ini:Mechanisms:PrimaryMechanism>
[SSLConfiguration]
enableSSL = <SSLCertificates:isCertificateInstalled>
[SSLCertificates]
isCertificateInstalled = true
Here's an example of what I'm trying to achieve. I've removed the comments for readability.
[PrimaryServer]
name = DEMO1
baseUrl = http://demo1.awesome.com
[SecondaryServer]
name = DEMO2
baseUrl = http://demo2.awesome.com
[LoginUrl]
baseLoginUrl = http://demo1.awesome.com
fullLoginUrl = http://demo1.awesome.com/login.php
enableSSL = true
serverNames = DEMO1,DEMO2,AdditionalRandomServerName
authenticationMechanism = valueFromExternalFile
[SSLConfiguration]
enableSSL = <SSLCertificates:isCertificateInstalled>
[SSLCertificates]
isCertificateInstalled = true
I'm looking at using ini4j (Java) to achieve this, but am by no means fixed on using that language.
My main questions are:
1) How can I handle the recursive redirects
2) How am I best to handle the redirects that have an additional string, e.g. serverNames
3) Bonus points for any suggestions about how to handle the external redirects. No big deal if that part isn't working just yet.
So far, I'm able to parse and tidy up the file, but I'm struggling with these redirects.
Once again, I'm only hoping for pseudocode. Perhaps I need more coffee, but I'm really puzzled by this one.
Thanks in advance for any suggestions.

Removing a part of a URL with Ruby

Removing the query string from a URL in Ruby could be done like this:
url.split('?')[0]
Where url is the complete URL including the query string (e.g. url = http://www.domain.extension/folder?schnoo=schnok&foo=bar).
Is there a faster way to do this, i.e. without using split, but rather using Rails?
edit: The goal is to redirect from http://www.domain.extension/folder?schnoo=schnok&foo=bar to http://www.domain.extension/folder.
EDIT: I used:
url = 'http://www.domain.extension/folder?schnoo=schnok&foo=bar'
parsed_url = URI.parse(url)
new_url = parsed_url.scheme+"://"+parsed_url.host+parsed_url.path
Easier to read and harder to screw up if you parse and set fragment & query to nil instead of rebuilding the URL.
parsed = URI::parse("http://www.domain.extension/folder?schnoo=schnok&foo=bar#frag")
parsed.fragment = parsed.query = nil
parsed.to_s
# => "http://www.domain.extension/folder"
url = 'http://www.domain.extension/folder?schnoo=schnok&foo=bar'
u = URI.parse(url)
p = CGI.parse(u.query)
# p is now {"schnoo"=>["schnok"], "foo"=>["bar"]}
Take a look on the : how to get query string from passed url in ruby on rails
You can gain performance using Regex
'http://www.domain.extension/folder?schnoo=schnok&foo=bar'[/[^\?]+/]
#=> "http://www.domain.extension/folder"
Probably no need to split the url. When you visit this link, you are pass two parameters to back-end:
http://www.domain.extension/folder?schnoo=schnok&foo=bar
params[:schnoo]=schnok
params[:foo]=bar
Try to monitor your log and you will see them, then you can use them in controller directly.

Nokogiri pull parser (Nokogiri::XML::Reader) issue with self closing tag

I have a huge XML(>400MB) containing products. Using a DOM parser is therefore excluded, so i tried to parse and process it using a pull parser. Below is a snippet from the each_product(&block) method where i iterate over the product list.
Basically, using a stack, i transform each <product> ... </product> node into a hash and process it.
while (reader.read)
case reader.node_type
#start element
when Nokogiri::XML::Node::ELEMENT_NODE
elem_name = reader.name.to_s
stack.push([elem_name, {}])
#text element
when Nokogiri::XML::Node::TEXT_NODE, Nokogiri::XML::Node::CDATA_SECTION_NODE
stack.last[1] = reader.value
#end element
when Nokogiri::XML::Node::ELEMENT_DECL
return if stack.empty?
elem = stack.pop
parent = stack.last
if parent.nil?
yield(elem[1])
elem = nil
next
end
key = elem[0]
parent_childs = parent[1]
# ...
parent_childs[key] = elem[1]
end
The issue is on self-closing tags (EG <country/>), as i can not make the difference between a 'normal' and a 'self-closing' tag. They both are of type Nokogiri::XML::Node::ELEMENT_NODE and i am not able to find any other discriminator in the documentation.
Any ideas on how to solve this issue?
There is a feature request on project page regarding this issue (with the corresponding failing test).
Until it will be fixed and pushed into the current version, we'll stick with good'ol
input_text.gsub! /<([^<>]+)\/>/, '<\1></\1>'
Hey Vlad, well I am not a Nokogiri expert but I've done a test and saw that the self_closing?() method works fine on determining the self closing tags. Give it a try.
P.S. : I know this is an old post :P / the documentation is HERE

Resources