I’m always looking for a way to save space in SQL Server. From archiving old data to just flat out deleting unused objects, I take great joy in removing superfluous stuff. The less junk in the system, the easier it is to focus on the things that matter.
..and fit it in a 10 kg bag
The biggest useless space eaters are tables that are (supposedly) no longer used. I could script them with data to a file, but what if they’re 100+GB? I could also back them up to another DB and then drop them from the database; that would certainly free up the space in the original DB. What if they’re needed for some process that I was unaware of and we can’t wait for the time to restore/move them back?
My conundrum was this. So, I decided to implement a process that looked at a single DBA-controlled schema and compressed every table created prior to a certain date. I could TRANSFER the superfluous table to that schema, and leave it. At some point in the future a job would come along and compress it.
If the data was needed within X days, then the table could easily be transferred back to the original schema, no harm: no foul. Also, I would save space as tables would be automatically PAGE compressed and could be decompressed if needed. De/Compression is really fast in SQL Server.
It’s Compression Time
So, this super-simple stored procedure was created prCompressCleanupTables (click for github link). It takes the following parameters:
@CompressBeforeDate – A DATETIME variable that accepts how old the table must be before it is compressed (Looks at the created date)
@Schema – Sysname variable that takes the schema name that you want to compress. Keep in mind that this is the same schema for every database, so make sure it’s unique (I use the ‘Cleanup’ schema personally, hence the name).
It skips the following databases by default: master, tempdb, model, msdb, distribution, ReportServer, SSISDB. It will skip any database that is in any state other than ONLINE, too.
Also remember that compression is locked to certain editions of SQL Server, as well as being 2008+ (you really need to upgrade if being 2008 is a limiting factor).
I’m Also A Client
I have this implemented as a job on several servers which checks weekly for new tables to compress in the appropriate databases. It checks for any tables created prior to GETDATE() – 60. I have to say, that it runs very quickly even on large tables.
In the wake of the NSA/British Intelligence scandal, and the continuing surveillance of Internet Service Providers and websites such as Google, the interest in personal privacy has grown. While this article won’t be a long-form argument for personal privacy (mostly because I don’t think I need to do so), there are a few relatively easy things you can do to keep your online persona under your control and there’s good reason for it.
The oft-repeated adage is that “you shouldn’t put anything on the internet that you want to keep private.” While this sounds logical and simple, it usually isn’t. So much of what we do is tied up in the Internet. If you’ve ever bought anything online, done a web search or even paid a bill via a website, then that information is stored somewhere and is accessible to someone. And, while many make the argument that they have nothing to hide, the truth is that you probably do.
Not all of us have a murder or mob ties to cover up necessarily, but almost everyone has a debit/credit card information that we don’t want out there, or a few less-than-flattering pictures. On a different note, just because what you’re doing isn’t illegal, it doesn’t mean that you want to broadcast it to the world. In fact, you might be breaking the law without even knowing it.
That said, what do we do about it? Is there any way to hide everything we do on the net from everyone? The answer is: not really, but there are things you can do to minimize the amount of data you drop into the internet, and at best make it anonymous (not directly tied to you).
I’m going to outline a few steps you can take if you’re concerned about your privacy that will give you the most return for time invested. Much like my recent post on internet security, this is a short list of simple to do things that give you the greatest “bang for your buck”.
Your Browsing and Searching
The browser is where most websites will get the information they collect on you. Most of it is pretty general, the OS/browser you’re using, how long you were on the site, and things like that. However, sites that are more clandestine or that you use frequently can collect a large amount of information about you.
Take Google for example. This is a website that we know collects data on its users and we know has been syphoned by the NSA (National Security Agency). When you log into any of their services, or do any searches from the site, all that information is stored and linked together. This data, over time, can build a pretty accurate picture of you based on your search and browsing habits. It’s not even necessary for you to give Google a name for them to find out who you are, as this can be mined from the data you give them. If you’re constantly going to a few sites and logging in, and any one of them has your name anywhere on it, then that can be linked back to your data.
The data doesn’t even necessarily have to come from you. The recent Facebook breach allowed people to access the contact lists of people they didn’t even know and download them. If you are in the contact lists of people who have Facebook and they’ve uploaded their contact lists to Facebook, then you’re on the site… even if you’re not on the site. Your information can be compromised if you’ve never had an account.
There’s not too much that you can do about the Facebook debacle, short of making sure that no one who has you as a contact uploads their data to the site. Though, there are a few things that you can do in general to reduce your footprint online.
As mentioned in my Internet security post, set your browser to hide you online. Most major browsers now have a “Do Not Track” option in them that will tell sites that you want to opt out of being watched. Most “good” sites will honour this and not track you. However, a few will still do so.
To combat this, we need to take the browser work a bit further. Having the browser automatically use incognito mode (Chromium/Chrome) will greatly reduce the amount of tracking data that the browser can pass on. However, incognito mode can cause problems with certain websites, so, you can do like I do and have the browser clear everything every time you close it. Firefox has this option, and while it’s not as robust as the incognito/stealth mode, it does make browsing significantly easier. Every time I close the browser and reopen it, it’s like I’ve just installed the browser; websites have nothing to track because as far as they can see, I’ve never been to any websites.
Now, if you don’t want to make any changes to your browser or you want another layer of security, you can change the search engine that you use. While the biggest ones such as Yahoo! and Bing also collect your data and share it, there are ones that are built specifically with privacy in mind. The main engine I use to do all of my searching is DuckDuckGo which keeps no logs on its users and sets up an encrypted connection (via SSL) between you and the search engine so nothing can be intercepted.*
Using the above techniques you can keep your search history private, or at the very least separate you from your searches.
Your Connection and Software
This is all well and good, but it doesn’t protect you against someone snooping on your connection to the internet. Even though you’re anonymous to the search engine, you’re not so anonymous someone who’s watching you browse, such as your ISP or someone sniffing packets in a cafe. To secure that, we’re going to need to hide your internet connection.
The easiest (cheapest) way to do this is to always try the https:// version of a website before the http:// (note the “s” for secure). This little change will create a secure connection between you and the website, making your traffic unintelligible to a malicious viewer. Not all sites support this, but some of the big ones do. The site you’re going to will still be visible, but the contents will not. Keep in mind that this is the “free” option and is very hit-or-miss.
Private Internet Access’s “Why use a VPN?” video.
Another option, which is the route I would recommend, is to push all your data through an encrypted VPN (Virtual Private Network). There are a lot of them out there, depending on how much privacy you want and what price you’re willing to pay for it. Some offer a full range of services including news access as well as other benefits like VyperVPN (will run you about $20/month) or simple unlogged access like PrivateInternetAccess **(about $4/month). In both cases, the system creates an SSL (Secure Socket Layer) VPN between you and their servers and then pipes you with an anonymous IP out to the internet.
Someone spying on you would only see a mass of garbled data being sent to some server somewhere where it disappears. Any website or person on the internet would see your data coming from a block of IPs owned by a VPN company. There’s virtually no way to connect the two (no pun intended).
If you’re only concerned about eaves-dropping when you’re out and about, you can also use something like Hamachi Log Me In to create an SSL VPN between a mobile device/another computer and a home machine. Keep in mind that with this system, anyone watching your home machine will be able to see the data unencrypted. The secure connection is only between your remote device and the home computer.
Lastly, the software you use on the internet that isn’t your browser, such as Skype or Yahoo! messenger is also targetable. While there’s only a little you can do to secure these, you can do a few things. First of all, check your privacy settings and make sure you have everything locked down. Most of these services have a small but useful section in the options called “Privacy”. Also, make sure your chat history isn’t being saved. You can turn this off in every messenger. While it doesn’t guarantee that the data isn’t being stored elsewhere, it does reduce the lifetime of the data and the chance that it will be recovered.
Am I Private Yet?
So the question remains as to what affect will all this have? The truth is that we don’t know entirely. Depending on who’s targeting you and why, the things listed above, if implemented properly, can range from significant annoyance to complete blackout. However, if you implement no privacy measures you can rest assured that some, if not all, of your data is being collected and catalogued.
Not all of these may be for you. But a smattering of them in some form or another will help, especially the VPN services, and I recommend you at the very least lock down your browser as mentioned above and in my previous internet security post. Even if you think you have nothing to hide, you may find out in the worst way possible, that yes, you did.
* You can get the add-on/plugin for your browser of choice as well so it’s automatically in the upper right search box on your browser.
** If you’re just looking for privacy and nothing else, this is the way to go.
I’m of two minds about the creation of digital textbooks. On the one hand, they are cheaper to produce and therefor cheaper for the student. Also, they have a lot of extra capabilities, such as the ability to copy/paste or reference easily for papers and assignments.
But on the darker side, they do give the publisher quite a bit of power. DRM can force textbooks to “expire” (I’ve dealt with this myself), so you can’t reference it years later if you want, and they’re usually completely unportable, in the sense that they don’t work in any reader but the publisher’s reader.
Now there’s a third thing: classroom control. I’ll admit that I didn’t always do every bit of required reading for courses, but I was pretty good about it, maybe even better than most. However, the idea that a professor can see which pages I’ve looked at and what I’ve highlighted, without me sharing it explicitly is a little disconcerting.
There exists a textbook that will report back to your professors about whether you’ve been reading it, according to a report Tuesday from the New York Times. A startup named CourseSmart now offers an education package to schools that allows professors to, among other things, monitor what their students read in course textbooks as well as passages they highlight.
CourseSmart acts as a provider of digital textbooks working with publishers like McGraw-Hill, Pearson, and John Wiley and Sons. The NY Times describes books in use at Texas A&M University, which present an “engagement index” to professors that can be used to evaluate students’ performance in class.