Status of This Memo
This document proposes a method to use public Internet as a global infinite
storage for personal information and a way to group and separate a public
personal information from other information.
Copyright Notice
Copyright (C) The Internet Society (2005).
This document and translations of it may be copied and
furnished to others, and derivative works that comment on or
otherwise explain it or assist in its implmentation may be
prepared, copied, published and distributed, in whole or in
part, without restriction of any kind, provided that the above
copyright notice and this paragraph are included on all such
copies and derivative works. However, this document itself may
not be modified in any way, such as by removing the copyright
notice or references to the Internet Society or other Internet
organizations, except as needed for the purpose of developing
Internet standards in which case the procedures for copyrights
defined in the Internet Standards process must be followed, or
as required to translate it into languages other than English.
The limited permissions granted above are perpetual and will
not be revoked by the Internet Society or its successors or
assigns.
This document and the information contained herein is provided
on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET
ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE
OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY
IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A
PARTICULAR PURPOSE."
Grouping and storing personal information in Internet as in a global 'Cloud'
V.Gavrilov June 2009
The Internet today (so called WEB2) is more and more used for decentralized
storing and editing of articles, comments, blogs, bulletins, wikis which are
hosted on different public sites. But there
is no a standard way of extracting all the information published by a user,
grouping and searching of such information, an easy way of separating
ones personalized information from the miriads of published articles by
other users.
This grouping exists only in editor's head as personal associations
and memory. With time passed by - some publishings are lost, URLs and
domains may change and boookmarks are expiring, so the only way to find
one's published information is to search the Internet again and ... try
to find one's information by hand - from millions of results returned by a
search engine, narrowing requests and searches.
Proposed here is a simple method of grouping personal information by attaching
of a signature to every published message where the signature is a short
result from a one-way hash-function generated from a combination of a user's
name, date of birth and a personalized message - to avoid collisions.
To avoid confusion with the term "digital signature" from asymmetric
cryptography - let's from now on name our signature/hash as NID (Network ID).
A scenario could look like the following: while browsers still not supporting
NID - a user could temporarily copy/paste NID during every personal post.
In the future - user will be able to query a favourite search engine for a NID and will get
only his personal stuff - been separated from the rest of Internet information
because NID forms a word generated from a personal unique information which
will highly unlikely occur somewhere else.
In future - browsers or search engine could support a checkbox: "Personal/Public"
and selecting of this checkbox - will allow to extract only the personal
information from the Internet, the user ever published (using this NID).
This is a simplest scenario. More complex usages of different groupings of
personal information (forming multiple groups inside one personalized group by
means of attached to the NID keywords) - may be easily shown.
Let's consider a simple implementation of such NID generator.
It is obvious that the compactness of NID is highly desired. And it seems
that for such purpose - even 128-bit MD5 algorithm can be used successfully.
(Actually, the published vulnerabilities of MD5 on collisions will not have
impact on this particular usage due to the predefined and short string, from
which the key is generated and secondly - even if a collision will occur - it will
not have a big impact, so will not be such critical). From another side -
22-char string hash result (base64 encoded 128-bit binary output from MD5) is
short enough - to be stored extensively in the Internet in every post or article.
Generating of a personalized NID may be hypotetically demonstrated by the following
UNIX command * (it will be actually longer that 22-char since the standard built-in
md5 uses less compact than base64 encoding: HEX):
% echo "Vasili Gavrilov 01011968 my cat's name - Kuzma" | md5
% XQgYnkgYSBwZXJzZXZlcm
where "my cat's name - Kuzma" is a seed/salt, added for avoiding collisions
of multiple persons having the same name and were born on the same date and also -
to avoid generating of the NID by another person - to extract somebody else's
information (if the name, DOB and protocol of NID generating are known).
What should be noted here is that there should be at least minimal protocol of
what fields are to be used and in which sequence - for generating NID - to avoid
collisions by using of too simple feeds into md5+base64 combination.
This RFC is intended for begining of discussing of this convention.
For example - the protocol could require to write first name, last name,
Date of Birth and "salt" - in this order (in any case, with any delimiter or
vice versa - with predefined delimiter and casing - TBD. Benefits of that?).
An extension of the protocol could be an attaching of a personal keyword or
an association - to the NID.
For example, when storing something connected with photo - the user could attach
"photo" at the end of the NID:
XQgYnkgYSBwZXJzZXZlcmphoto
and in future - searching the Internet for this string will give a user all
the entries ever stored with this key.
Storing of multiple keys NIDs with multiple keywords will allow to create
arbitrary groups and search engines will do their regular job for intersecting
of the groups.
What should be noted here is that it will be hard for another person - to get
somebody else's information due to irreversibility of the hash function and
existence of the 'salt' acting as a 'password'.
No one is restricted to use more than one NID, so this is very different
from assigning of NID to every user forever and so - this seems to be very
privacy-friendly approach also.
In future - browsers (or search engines) may support transparent appending of NID
to the requests and searching for the past personal postings connected with "photo" and
"vacation" will be able to achieve by just entry in a browser:
"photo vacation" - as it is done currently against common data.
Since above-mentioned checkbox "Personal" will be checked-in - a browser will attach
locally saved (or saved in a Cookie or a session) NID and will send a more restricted
request returning only the user personal postings (from multiple sites) containing
both keywords "photo" and "vacation".
We could imagine other extensions such as attaching of a counter or
another id at the end - to allow saving of redundant (the same) data into
multiple sites and for easier distinguishing of the duplicated data in the browser.
This can be further elaborated.
The above-mentioned procedure allows to use public internet as an infinite storage
of personal data and easy extraction and grouping of such data and separating
of public data and data saved by a person. This also transforms saving of the data
into the Internet into saving into one global 'Cloud' and abstracts the location
(URL may change but the information will still remain searchable).
This will allow the personal data to be distributed equally on multiple public
storages and in future - possibly to organize personal distributed services, working
with personal data in really parallel way.
*) A reference tool for generating such signature is here: http://sourceforge.net/projects/nid/
