Skip to main content
simply more productive

The Data Specialist

Thanks to our intuitive transformation designer, you can convert data effortlessly to a variety of formats without knowing how to program. It’s not only efficient, it’s also easy to use.
Flowheater Chart

missing escape \t

More
5 months 1 week ago #3786 by brian clark
missing escape \t was created by brian clark
did some testing with couchdb and it seems tab escape is missing from your jsonadaptor.The following characters are reserved characters and can not be used in JSON and must be properly escaped to be used in strings.
  • Backspace to be replaced with \b
  • Form feed to be replaced with \f
  • Newline to be replaced with \n
  • Carriage return to be replaced with \r
  • Tab to be replaced with \t
  • Double quote to be replaced with \"
  • Backslash to be replaced with \\


    I would check this i have html with all of the above and tested it with and without, when all the above are escaped it works fine

    ho hum i got to go back through 87 million html files and fix the json files.

Please Log in or Create an account to join the conversation.

More
5 months 1 week ago #3787 by brian clark
Replied by brian clark on topic missing escape \t
also another issue is using null instead of plain old blank
[
{
  "filename": "file 81\\original_https_knighz.cfd_.html",
  "HTML": "<p>Coming Soon !</p>",
  "Original Url": null
 }
]

when it should be 

{
  "filename": "file 81\\original_https_knighz.cfd_.html",
  "HTML": "<p>Coming Soon !</p>",
  "Original Url": ""
 }

 

Please Log in or Create an account to join the conversation.

More
5 months 1 week ago #3788 by brian clark
Replied by brian clark on topic missing escape \t
https://jsonlint.com/

user that to validate or add your own json validator with preview to save people's sanity!

Please Log in or Create an account to join the conversation.

More
5 months 1 week ago #3789 by brian clark
Replied by brian clark on topic missing escape \t
also the removal or escape
[ ]
{ }
in any content they are really messing with importing across lots of databases with json output. the html will have javascript content within the json string

Please Log in or Create an account to join the conversation.

More
5 months 1 week ago #3790 by brian clark
Replied by brian clark on topic missing escape \t
recap
original url: null
when it should be
original url: ""
[
]
{
}
\t tabs to be escaped
\ or any content with that in messes it all up for most import engines unless they correct it their end. which alot do not. 

 "HTML": "\n<!DOCTYPE HTML>\n<html lang=\"en\">\n\n\n    <head>  \n    <!--\if IE\>

it wont pass validation because of the \> at the end or \any other character
I literally have replaced all the above just for it to be valid with couchdb, mongodb sucks with its 16mb limits. but does correct on import.
couchdb is better but wow is it fussy on import

https://jsoneditoronline.org/#left=local.nuxaci&right=local.mipino

try this validator at least it highlights the issues. try out the zip file i sent... your end needs to be far more stricter for it to be truly valid json (when dealing with html data) 
Cheers
Brian
Attachments:

Please Log in or Create an account to join the conversation.

More
5 months 1 week ago #3791 by brian clark
Replied by brian clark on topic missing escape \t
and also " in the html is not escaped for some odd reason. that messes with validation too

Please Log in or Create an account to join the conversation.

More
5 months 1 week ago #3792 by brian clark
Replied by brian clark on topic missing escape \t
[\[|\]|\{|\}|\\|\t]*

as you have it the structure is correct, just not the full checking of the content
Tested 1 file with this regex and now the json is valid and imports into couchdb and anything else just fine.
So i would correct this issue as the workflow grinds to a hault.
will test further on the most bizarre files

Please Log in or Create an account to join the conversation.

More
5 months 1 week ago #3793 by brian clark
Replied by brian clark on topic missing escape \t
same for csv or any other adaptor
multiple regexes and now it all works, couchdb = 0 errors
I thought these adaptors were by default escaping per standard for x adaptor
 

Please Log in or Create an account to join the conversation.

More
5 months 1 week ago #3794 by FlowHeater-Team
Replied by FlowHeater-Team on topic missing escape \t
Hi Brian,

Thank you for your notification about escaping special characters. I´ve fixed the JSON Adapter in the latest Beta version now. You could download the fixed version here: Download Beta Version

For TextFile Adapter (CSV files) or some other Adapters this escaping doesn´t make sense. In case you´ve got problems with some Adapters please open a new topic for that with detailed information’s, Thanks.

Please note: The JSON Adapter is still under development and just available as Beta version.
 

Best wishes
Robert Stark

Did this answer your question? We would be grateful if you provide a brief comment as feedback. It may also help others who may have encountered a similar problem.

Please Log in or Create an account to join the conversation.

More
5 months 1 week ago #3795 by brian clark
Replied by brian clark on topic missing escape \t
in any normal data sense yes, as most data is text and very basic. but this is RAWhtml the whole page. So a tick to allow complex javascripts as well as anything out of the ordinary.
I had to use regex to make sure its all escaped or at least the minimal odd ball things like tabs within the html itself (it messes up csv if tabular).
Only since \t has been escaped in my case completely removed as I have compressed all the html without whitespaces too. Makes a massive difference over 500 million+ html pages. :0)

Anyway will report anything else about JSON I find. 
Cheers
Brian
 

Please Log in or Create an account to join the conversation.

Time to create page: 0.316 seconds
FlowHeater Logo

FlowHeater - The Data Specialist

Efficient data integration and transformation with FlowHeater – the perfect solution for a seamless transfer of your data.

Legal information

Support & Contact

Contact

Phone:
0951 / 9933 9792

eMail:
This email address is being protected from spambots. You need JavaScript enabled to view it.


Copyright © 2009-2024 by FlowHeater GmbH. All rights reserved.