Etext problems, potential solutions, and your weird Data

Hey guys. Sooooooo imma go out of my realm for a second to talk about a popular topic around here in the SIF world: Etexts.

One of the major problems of languages today in terms of communicating is that weird letters don’t really translate into numbers really well. Granted, the English alphabet is recognized by computers all the time because, historically, English was the major language used by the people who helped advance computers back in the 40s or 50s when they really got rolling.

So that solves the problems of English somewhat, but all languages have to be encoded someway into the computer. 1s and 0s have to be included somewhere.

Somedays (when your really tired doing that “paper” last night when you were really partying out with your friends) you click on a regular MS word doc, but you right click and open in a regular text editor like Notepad and get this crap:

ä_ûø||¤âkì&¥!z’åd³’êP¥ÃÖük¾§š’PKoð„¬  PK  }\E  docProps/core.xmlmÝJÃ@FŸÀwXö>™FA$$é…è•‚ØV¼]f§ébö‡Ý©IßÞmÐ(ØËá;s˜ùšõdñI1ïZY•+)È¡×Æõ­Üm‹;)+§ÕàµòDI®»«C>ÒKô”J”‹\ª1´òÀj€„²*•™p9Üûhç1ö~¨žàzµºK¬´bga£üVj\”á‡Y h KŽTe¿,S´éâÂœü!­áS ‹èO¸ÐS2 8Žc9ÞÌh¾¿‚÷ç§Íüjaܹ*$Ù5k6<Pw¿{}{>ê|¬ØFe\î¶Ô§ª…v_PK ïÙ]ò œ PK  }\E  word/numbering.xmlíÝŽ£6†¯ ÷!åp&˜B¢ÍìAW[µUU»½ œ-2ì\CzÖžöÚz%uþ˜MB dmÞ#ì¼ñcëó0ïÞ‰ÂÞšñ,Hâ™AM£Çb/ñƒx93~ÿôñÁ5zYNcŸ†IÌfÆ ËŒ÷Oß½ÛL㔚3.úõDŠ8›FÞÌXåy: 2oÅ”š=&)‹Eã”áÍÅG¾D”.Ò/‰Ršó ò—ešŽqH“ÌŒ‚ÇÓCŠ‡(ðx’%‹|2M‹Àc‡Ã1‚×¹î>äCâ‹óÝœ…â’8[iv̵Í&WÇ$k™ˆuûmÒ:Wó9݈ç…û mî§<ñX–‰³öeFbÖx€ÛeD[8½æñN”Äeš­;Ε×~×><´]ªW!¯Ï” ëÜȾé9˜sÊ_.xž_ǧA-ŸeQyÁKC¶Iá­(Ϗ Â6ÂÄûÌüïi¼¦¥™ýe-;Ÿeòºä4z5iÖè›%æ™]~[Ñ”½f[þ¿l?ð¤H’1Ñy–sêå?Qïäӏ¾Êv]Âu(šq˜æîŒÌx.έi¸í4xÚe£òä¼C–ï[Dà’ö¥lú÷ï?Ëó?ydz![º§¿ðí!ˆ}Ѷ==3Æ–R7ÓEÀ³ü9ØþH†Ž¹í=(»óý¡8æ‹ÅH{è

And your like: “OMG I BROKE WORD!!! MY PROJECT IS RUINED!!!!11!11!!!!!!11!111”

Don’t worry about it, it just 7-bit ASCII.

To cope with the problem of natural language processing, a branch of machine learning (think artificial intelligence), computer scientists employ many different encodings to protect text or other media from becoming distorted.

The above first  lines in binary looks like this:

11000011101001000101111111000011101110111100001110111000011111000111110011000010101001001100001110100010011010111100001110101100

which translates to:

&#195;&#164;_&#195;&#187;&#195;&#184;||&#194;&#164;&#195;&#162;k&#195;&#

which are html characters your browser can detect and use for spacing, symbols, etc…

So there you have all your data, not alien symbols from another planet. Pretty cool right?

This binary data can be used for any type of media, and we can see this through 3D binary visualizations.

A PDF file.

 

32 bit Windows executable

(All posts from this tool called binwalk, Github repo containing code to make this visualization here)

Here’s a nice Ted Talk talking about binary visualizations here.

Currently, text encoding on etexts for tablets is not really standardized. Most of the laid-back app developers in the store just use zooming as a way of getting you around the document you’re trying to read, but it is very inefficent and makes for tired fingers. Others use an algorithm to help display text in the right way like as discussed in this video:

Some academic uses for text encoding come from LaTeX. I’m pretty sure it’s phased out now by MS’s new mathword thing that tries to do math equations on a document, but someone out there might use it.

 

Anyways, it opens a major problem for people who wish for broad acceptance for the use of etexts. You either have to make an algorithm that displays a certain font and then use it FOR ALL FONTS EVER MADE!!!!

Or you can low-brow it and use a zoom function.

There’s plenty more problems in the world of text, typography, and computers, and if you want more posts detailing this subject just leave a comment.

I suppose that is it for me today. I’ll be sure to check in next week. Last week was really busy. 😛

I leave you with these optional videos you can see I found very compelling. Well…I guess if you like listening to old people like your grandparents it is interesting.

 
The History of Typing and Setting of Text

Jailbreaking (not your phone) in the old days
 

 

2 comments

Leave a Reply

Your email address will not be published. Required fields are marked *