Profiling users with .bash_history

$
Can we be identified by our bash behaviour?

Cyber attribution is hard. And most methods lead to the computer, not the person. By analyzing the patterns in attacker's commands, we may be able to link different attacks to him, or at least to someone who had the same training.

For my dissertation, I collect (anonymized) .bash_history files from many users and use some ML™ to link them together. For that, I need a lot of data and would appreciate if you shared your history with me. I'll take zsh as well.

To see how I handle your data securely and anonymously, read on.

How it works

Bash logs can contain sensitive information such as filenames or passwords. To protect your privacy, all files are processed using an anonymizing script. This website is hosted on github pages, so you can inspect the source code.
In short, it goes through the file word by word and only keeps the string if it is:

  • on a list of ~250 common, mostly built-in commands or operators, such as: ls, sudo, ifconfig, vim, |, >
  • recognized as a short option/flag – a hyphen followed by 1 to 4 alphanumeric characters, for example -p, -alt, -rf
  • regexed as a long option/flag – double hyphen followed by alphanumeric strings, which are separated by another hyphen. That keeps --follow-symlinks, --help, --version, but not --password=pass, --time=WORD

Everything else gets hashed. This way I can still see some patterns while knowing very little about you. Make sure you upload all your files in one go - I have no other way of telling that they belong to the same user.

As the default size of bash history is mere 500 lines, please try to upload multiple files (different machines, timeframes, as long as it’s all you). If you are by any chance authorized to upload logs of other users (!), submit each of them separately, in a different session. In the next step, you'll learn how to submit your files.

Submit your files

There are two ways of submitting your history, the first being easier and the second more robust and private. If you use other shell than bash, adjust accordingly.

Method 1: upload .bash_history files On most UNIX systems, these are located in the home directory (~/.bash_history). After uploading, they'll get anonymized in your browser using js, you can inspect the result and decide whether to submit them.

Method 2: anonymize on your computer If you are more careful, download the anonymizing python script to process the history on your computer and upload already anonymized files. Use this this method if you're using zsh or other shell flavour with a different structure of the history file.

1. Choose the files

These will not be submitted yet.

2. Anonymize and inspect
3. Consent and submit

If you have any additional questions, email me at antonin.kanat.17@ucl.ac.uk. To raise a complaint, please contact Dr Ingolf Becker (i.becker@ucl.ac.uk). If you feel your complaint has not been handled to your satisfaction, you can contact the Chair of the Security and Crime Science Research Ethics Committee under scs.ethics@ucl.ac.uk.
You can withdraw from the study at any time by emailing antonin.kanat.17@ucl.ac.uk.

Contact

My name is Antonín Kanát and this is a part of my undergraduate dissertation at UCL Department of Security and Crime Science. Big thanks to Ingolf for all his support!

If you have any questions or comments, please do not hesitate to contact me at antonin.kanat.17@ucl.ac.uk or my supervisor Ingolf Becker (i.becker@ucl.ac.uk). Thanks again for your help!

Finally, please spread this webpage among your friends! As you can imagine, having plenty data is vital.