Digitalized your mind: Data Classification

From year 2000 IT change to next big change - EU GDPR

How many of us remember the IT workload when we came from history 1900 to new and magnificent 2000 century and Windows 2000 Active Directory and changes in the application and so worth - was great time for trainers. Running from customer to customer and keeping Microsoft MOC 1560 NT 4 to Windows 2000 Upgrade and our own Windows Server 2000 courses.

That was the history part and lets keep it there - but this brings at least to my mind how we have prepared to May 25th 2018 when the EU GDRP comes in to effect.

The answers varies, some organization does not even know from this and others have started to prepare and others are between and or does not do anything.

Other questions is the possibility to use the law wrong way where criminal organization start to recruit people or asks people to go from one organization to another ask's them to show what they have from them and please forgot me.

Sound's like DDOS Denial-of-Service-Attack's and all based on law.

How many requests one organization can carry and use resource for this? How many resource and FTE's time they can spend to go through the request and understand if there is any risks that yes there might be some personal data from that user user who made the request.

What if I will pay 5$ per user to make the query to any global services offering organization like eBay, Amazon, Microsoft, Netflix - you know what I mean. Let's assume that someone invest 10 000$ and get 2000 users to make the request to three company Amazon, eBay and Netflix and find data from them. Based on the law they need to analyze what they have from those user's and it might be nothing - those users has not even create account to any of those services but the labour cost to do the analysis is somewhere. Then investing another 10 000$ for the same user to register and create account to those services and make the request to be forgotten after registration. Can they say that we dont have any data from your while we did the check earlier and based on law -- wrong answer. They need to do the analysis again witch equal to labour cost, time away from more productive and so on. Then after being attacked this way too many times they answer to next user that no way, go away - we are not going to analyze you and forgot you - but this time the request like forgot dead relative

This is only an illustrative example where organizations needs to prepare and the question stays still - are they ready 25th of May 2018 unless the law.... who know's

Nevertheless, this brings to quite major topics on the table:
1. Do we know data we have and where?
2. Is our applications supporting this, application done and published earlier and/or applications in development phase?

Let's take other illustrative example using Facebook as an example. Your husband or wife has been active in Facebook and suddenly he or she died and husband or wife has to fight to get the account and profile deleted (witch opens other question of groups he/she has created and is the owner for it - what happens to those groups and data insight them witch mostly might be personal data - but that is another story and let's go back to org example) and finally got confirmation from it. Then, suddenly something happens to at the Facebook service and they are forced to restore the data - what happens then?

I don't know - sorry.

But I can bet that for relative's, it's not fun at all to see the wife or husband back and active.

This is just and example based on no knowledge how Facebook works to avoid this kind of situations but again bring the same questions to my mind - Are they ready?

So if go back to the key questions and start to think those two and start from knowledge. Structured data is easier (or not) to understand and know what we have while it is usually in database and we know where the application and database is used - correct?. The opposite - unstructured data is or should be big head ache to organizations business, risk management, security and IT. While this is not related only to data, it's not enough that you analyze what you have in your collaboration tools and file shares but it goes also to identity and now talking privileged accounts running applications witch might user file shares as part of the small, legacy applications, do you know those, when the password has changed last time, do you have any detect and control process in place.........
The world legacy and history have huge weight here where the data has been migrated from upgraded storage during years without deleting - usually - anything.

It's payback time -unfortunately. Same way that organization using legacy Notes mail and application has explain that they saved them to bankruptcy while staying in the same license and hardware too long - it was good idea and the cheapest in short term but there is no free cheese.
It cannot be expected that if we stay in the same version, others will also do, and that there is no influence in today's social world to the brand where people to share the they are using old tools and techics in their daily work. To days digi native will vote with their feet and we can bet that their social friends will now the reason per yesterday.

Let's get back again to the unknown unstructured data and what it is. So data migrated between years from old to one without deleting together with growing data trends and user behavior. Traditional file shares does not have - usually - data classification, index and search capabilities, versioning, available from mobile - you know and can name those - there is not unless you have purchased 3rd party like Veritas Enterprise Vault archiving tools, tool you have tried to get way during email migration to the Exchange Online as good example.

So you find anykind of, age, usage and amount of data, from where you might use and know about 10-20% and other data is just storage cost - and now we are back in business - Euros, Dollars, Pesetas, Ruplas you name it - Money talks and in here with small example.

Assumptions:

24000 users
50 GB average disk quota per user
20% active and valuable data
3,5$ / GigaByte the managed storage cost (can be from 2-5$ per gigabyte)

Calculation (24 000*50GB*3,5$)=4 200 000,00 dollars per year - not bad.

But lets calculate what is the Dark Data size and price - so the data without any values (note here that even if old data it can be valuable like old product drawnings, contracts and so) 24 000*50*0,8=960 000 GB = 3 360 000,00 dollars for nothing. For me it sounds quite good business case.

As said, theoretically it is easy show the business case but this really requires more analysis while the 80% of total storage usually includes installation medias, backups, virtual machines disks, .ISO images, zip files and so on and today and even more in the future movies and audios files and of course including unknown amount of duplicates.

So if look back to title and year 2000 there are some common like

Yes, it impacts to whole organization
Yes, it requires change
Yes, it requires finance or you take the risk of penalties. Recommend to read with your risk organization together with business, legal, security and IT.
Yes, it include your directory services
Yes you need end user training and communication

If done correctly with right partner you might achieve benefits like

Yes, you can sleep your nights
Yes, this is the time and place to upgrade and adopt governance and start to monitor
Yes, you increase your security
Yes, you make your or increase your data's value
Yes, the data in static file server should be available from any device and any time and any where (who remember this from Microsoft and who and when?)
Yes, might reduce your storage cost
Yes, this time to create and adopt workflows and retention polices to start know the data and let automation to take care unvaluable data
Yes, your outsourcing contracts will safe you
Yes and No - you might need to run project to change the partner or hosting provider to sleep your nights
Your data is available
Better user experience and work performance while data can be found.
Yes, you stop the snowball effect where the situation creates exeption witch creates exeption witch makes everything more complex, increase the security risk and time consuming equals to money, frankly.
and much much more.

But this was today's story.

Todays picture brings the summer, sun and hot roads.. Feel it.

Shortly - I would

"All comments, thoughts and pictures are my own and I don't have legal background"

How to understand what data our employees are sending and where those are used.

Last time we looked the life from understanding what is happening inside our network, let's extend our mindset also to understand how to protect the information moved inside and outside our network and how to understand and make visible, where the file is opened. Cool - is in't?

One additional thing bringing more complexity is of course the hybrid setup. Our planet is not so black and white instead it have some shade of other colors between black and white :-).

So the identity is not black white, the data location is not black and white but the IT still lives in black and white to manage same resources with less money - you got it.

So if we start from basic we need to understand the data and classify it, witch makes this big change management and communication issue from the end user view. They have been familiar to save data where ever they feel comfortable or has been used to - even that there might been some guides and policies to store the data here and there without able to reuse what colleagues has created - mine is always the best and that's why I started from scratch or use only copies what I have created.

Back to classification - in very pragmatic view the data classification can be defined to couple class:

Secret
Confidental
Internal
Not restricted / Public
and Personal witch makes this even funnier based on EU GDPR - nice word again.

Sounds clear and should be easy after we have configured the new classification to our organization and when people are starting to create new document those will be classified but what about the 345 Billions old, legacy file we have like Summerparty2001 pictures and invitation and food list. In this time youngster usually says OMG - still saving so old data - you are so old school. True unfortunately - organization has migrated data transition after transition after transition from NT 3,51 or maybe from OS/2 or WARP to Windows NT 4 to Windows 2000 to Windows 2003 to Windows 2008 to Windows 2012 R2 file servers and now thinking to migrate the data to Windows Server 2016 R2 and so on. And every transition we purchase more storage, build and configure more sophisticated storage solution with maybe dedublication to save the storage but still not touching the root cause. Let's avoid opening the backup discussion here - sorry we can't. We backup the local branch office serves offering local network share to the users for data parallel to be the first place for user desktop backups - ups, same file in X:\data\path\salespresentation.pptx Drive as file and in in X:\Backups\GasMonkey\backup22022002.something and so on. While we don't have back up solution and tape's in branch office we some how copy the data to central data center where the both files and backup files are copied to tape and archive it. Simple, nice and easy - well no.

Let's take one variable here and call it human, you know the person who talk and walk do all kind of funny things. So it saves the file created in it's PC to local drive and copy it to the local network drive parallel to send it in email to 20 best friends who might need that file or maybe not and each of these best friends save the file to their local PC and maybe even in the local network drive in their office witch is then backed up to the central data center in that region not forgetting the automated backup scripts copying the file to local network share, from where other scripts copy the file to data center were it will be backed up to tape witch maybe never ever has been really tested from bottom up.

And suddently the file is 2, 3 5, 10 or 45 times stored and using the storage capacity with value of 0 when we looked the name of the file - salesguide_2005draft.doc - frankly for this does not sound fun instead.....

And short conclusion is that technology is not limiting and root cause for this - it is the human and lack of policies and governance with data classification with retention/archiving period, detect and control and proactive communication and owned by business, lead by example with commitment.

Sounds familiar - be honest.

You got the point, we need to classify the data and we must have meta data witch triggers and is used in retention. Like start workflow to get approval to delete or save other 6 months to all files classified Internal/security and have Draft Meta Data attribute. This actually come back to the terms workflow - automate - process witch are not technical IT terminology only and we might ask from our self that are these features normal disk systems and file share give to us if your answer is yes - are those in use, if answer is no - only questions is why?

So classification is needed and it must be able to configure it automatically during document creation based on the data content like social security, bank account, credit card and so on but also allowing users to overdrive the automatic rule.

Check more data from Wikipedia using following link - if it just work.
meta data
or using following link to digital guardian digital guardian data classification

Will continue next more from Meta Data in next article and as usually

"All ideas and thoughts are my own like pictures unless told the source"

To be Continued ..

Biker's meeting Haltiala / Finland August 2016 - approx 200 bikers ( mostly age over 40)

Digitalized your mind

Thursday, October 20, 2016

Remember and learned something from year 2000 change --> Be smarter with EU GDPR

Monday, October 17, 2016

EU GDPR will be there but how to start the journey - Chapter 2