Never “unpickle” file from Unknown Sources!! You can be Hacked!!

Simran
3 min readJul 21, 2020

Have you ever seen files like ….

€•/      Œpandas.core.frame”Œ DataFrame”“”)”}”(Œ_data”Œpandas.core.internals.managers”Œ BlockManager”“”)”(]”(Œpandas.core.indexes.base”Œ
_new_Index”“”h ŒIndex”“”}”(Œdata”Œnumpy.core.multiarray”Œ _reconstruct”“”Œnumpy”Œndarray”“”K …”Cb”‡”R”(KK…”hŒdtype”“”ŒO8”K K‡”R”(KŒ|”NNNJÿÿÿÿJÿÿÿÿK?t”b‰]”Œ! A B C”at”bŒname”Nu†”R”h
Œpandas.core.indexes.range”Œ
RangeIndex”“”}”(h'NŒstart”K Œstop”KŒstep”Ku†”R”e]”hhK …”h‡”R”(KKK†”h!‰]”(Œ!aaa -0.264438 -1.026059 -0.619500”Œ!bbb 0.927272 0.302904 -0.032399”Œ!ccc -0.264273 -0.386314 -0.217601”Œ!ddd -0.871858 -0.348382 1.100491”et”ba]”h
h}”(hhhK …”h‡”R”(KK…”h!‰]”h%at”bh'Nu†”R”a}”Œ0.14.1”}”(Œaxes”h
Œblocks”]”}”(Œvalues”h6Œmgr_locs”Œbuiltins”Œslice”“”K KK‡”R”uaust”bŒ_typ”Œ dataframe”Œ _metadata”]”Œattrs”}”ub.

Once you would have seen them. These are “Pickle Files”. They can be HDF5 file or others too, but in this story, we’ll talk about pickle files.

Firstly, let’s understand what is pickle file, what is pickling?

Imagine you trained a machine learning model or made a python DataFrame object. Now we all are aware of the time it takes to train the model. It might take 2 hours, or even 24 hours sometimes and for complex models, it even takes 2 days. So what will happen when you switch off your PC?

Oops!! All efforts gone futile! Don’t worry, you can save your model in pickle file or hdf5 format, or any other binary data format. Here, we will focus on pickle file only.

Pickling means converting object to byte code, which means for instance, if you have a csv file which can be seen in the figure below

Figure: CSV File displayed through Pandas

Now let’s convert this to pickle format

data.to_pickle("pickled_file")

Now “pickled_file” will be saved and if we open it through any text editor, it will look like ….

Figure: Snapshot of “pickled_file”

Now we can unpickle it or read it using pandas read_pickle method.

Figure: Reading “pickled_file”

Now, this story will explain how can you be hacked?

We are humans and hence, curious beings. If someone sends us a pickle file we will surely unpickle it to see what it has in it. But hackers know about your curious nature, that file can contain malicious data.

The official python document says:

WarningThe pickle module is not secure. Only unpickle data you trust.It is possible to construct malicious pickle data which will execute arbitrary code during unpickling. Never unpickle data that could have come from an untrusted source, or that could have been tampered with.Consider signing data with hmac if you need to ensure that it has not been tampered with.Safer serialization formats such as json may be more appropriate if you are processing untrusted data. See Comparison with json.

Link: https://docs.python.org/3/library/pickle.html

--

--

Simran

A spiritual & honest being | M.Tech. 1st-year student | Passionate about teaching, helping others, data & machine learning | Believes in VOLUNESIA.