Migo
2 min readMay 4, 2021

--

IO module(python)

Well, some of my friends already know that I’ve drifted into IT world although I just started taking a baby step.

The camp began in the middle of last month. So fast forward a couple of weeks, what I have to say after scratching the surface a little bit is, I came to really admire tech guys who have materialized their ideas into something more tangible(or visible).

Earlier today, I learned the IO module where, from the perspective of the server(or company), clients give information that is made binary through the inbound stream, and the company processes them by queuing them, matching user information and messages and makes them binary again to send to other users through the outbound stream.

In the stream, any information must become byte form. If you happen to have a buffer, it can store that information ‘briefly’ and then the stored information is pushed to the next step when the other information comes in.

In the process of binarization, as mentioned, everything becomes bytes. and 1byte consists of 8 bits. so, it can store up to 2**8 and that is equivalent to ASCII code.

Unfortunately, Korean and other languages are not able to be covered only through ASCII code. So, in computer science, using those language means that you need to allow for twice as much data storage as you need for English.

If it is not ASCII code, what enables you to write down non-English word? that’s Unicode. At the moment there are two different kinds.

  • utf-8
  • utf-16

In python, the default language code is set as ‘utf-8’ you can check if by putting the following.

>>> sys.getdefaultencoding()
'utf-8'

It leaves the tasks of ‘binarizing’ text so we can put them in the stream. We have two different options for that.

>>> a = b'asdd'
>>> type(a)
<class 'bytes'>
>>> bytes('adsd','utf-8')
b'adsd'

Whichever you go with, it is fine but when information is abstracted, you might as well go for the latter. And it is particularly the case when you try to binarize non-English language into bytes. The following is an example.

>>> a= b'adsd'
>>> a
b'adsd'
>>> b= b'오늘'
File "<stdin>", line 1
b= b'오늘'
^
SyntaxError: bytes can only contain ASCII literal characters.

Again, the size of a non-English word is more than 1bytes, obviating the need for using the special method like ‘bytes()’

>>> bytes('아나', 'utf-8')
b'\xec\x95\x84\xeb\x82\x98'

It is worth noting that those languages are converted into hexadecimal form, meaning it requires ‘decoding’ to be read by a human.

>>> s=bytes('대한민국','utf-8')
>>> s
b'\xeb\x8c\x80\xed\x95\x9c\xeb\xaf\xbc\xea\xb5\xad'
>>> s.decode('utf-8')
'대한민국'
>>> str(s, 'utf-8')
'대한민국'

--

--

Migo

Establishment Challenger. Love to put groundless assumption, not always though.