Problem of storage and delivering static content is quiet actual nowadays. Lots of people needs big and reliable storages for storing static images and many other static files and delivering it to end users. Most popular solution still is NFS mounted storage, which is accessible from all front-ends, but this solution has big bottlenecks.
- Hard to backup.
- Everything relies on NAS.
- Statically mounted external storage is needed.
Now lets dig deeper:
Hard to Backup :Some of you will say that this is not so ! But lets imagine that you have 10TB of small images which your application regularly use and this images are very critical. Standard rsync and or tar could take lots of time and system resources, which is definitely not what we want.
Everything relies on NAS : So what ? We an buy a reliable NAS/SAN with cool RAID(1-10) storage and use it. But if we have a closer look we will see that for having 10TB space with for example 10x1TB 15k RPM SAS drives we will need at least 11x drives + RAID controller. Everything is good so far, but wait what is the price for that. After digging internet shops and price-lists you will see that this is quiet expensive, especial if your data is very critical and you need hot-backup aka second NAS/SAN. Another bottleneck is that in that in this solution you will have to do . This is expensive and hard to achieve. And at least by order but not by meaning is that you will have to share same IO device for all. This is truly a problem for large scale deployments.
Statically mounted external storage is needed: This means that all your system will rely on externally mounted device and regardless how reliable is that, it is some king of SPOF.
So combining this all will show that classical shared storage architecture is hard to implement, expensive and has slow performance for large deployments. This may not me a big deal if, you are IT of Bank, and you management has lots of money and very little “imagination”. In this case this article is not for you
So for everyone else:
lets summarize what we need:
- Reliable storage.
- Low latency to access file.
- Easy management and backup.
- Reliability and fault tolerance.
- Easy access and less programming overhead.
After spending lots of time for finding a solution for mentioned problems we found seems ideal solution:
- (Will act as storage and deliver files )Four our needs free, community edition is much more than enough
- (Will act as reverse proxy and URL filter)
Before starting let’s summarize what these two tools will give us:
Riak: Wonderful, fully clusterized NoSQL server written in Erlang. It works asynchronously, has great performance and easy access via REST, protobuf and . It also has built in realtime Search index and Implementation. But for now we will use only small par of Riak, aka storage for static files. In this scenario we must look on several benefits against shared storage solution.
- Low latency to access files. (Riak uses Single Seek to Retrieve any value )
- Horizontally scaleable. (Just add more and more cheap servers to the cluster)
- Much more throughput (for example 10 servers with 1xGbit por will have total 10 Gbit minus about 10% internal utilization)
- No need for expensive Raids, SAN etc
So lets start my favorite part: Installation and configuration of mentioned above. As I’m Debian fan, I will do this on current Stable release
First you need t download and install Riak. At the moment of writing this article was the latest version of Riak but before just copy-pasting check out for latest version here: .
Download and Install Riak:
# cd /usr/local/src# wget http://s3.amazonaws.com/downloads.basho.com/riak/CURRENT/debian/6/riak_1.2.1-1_amd64.deb# dpkg -i riak_1.2.1-1_amd64.deb
Done! Riak is installed. Do not start it for now. Just in case:
# /etc/init.d/riak restart
Now we need to clusterize it and make some configuration changes. By default Riak binds on 127.0.0.1 whic ix not a good idea fo clusters so change it to internal ip address of server,do not bind Riak on servers public IP is that exist .
edit /etc/riak/app.config and change:
{pb_ip, "192.168.235.111" }, and {http, [ {"192.168.235.111", 8098 } ]},
127.0.0.1 localhost192.168.235.111 riak1.your-domain.com riak1
192.168.235.112 riak2.your-domain.com riak2192.168.235.113 riak3.your-domain.com riak3192.168.235.11N riakN.your-domain.com riakN
# mkfs.xfs /dev/sdb1# mount /dev/sdb1 /mnt# mv /var/lib/riak/* /mnt/# umount /mnt# mount /dev/sdb1 /var/lib/riak# chown -R riak.riak /var/lib/riak
# mount /dev/sdb1 /opt# mkdir /opt/riak# chown riak.riak /opt/riak# mv /var/lib/riak/* /opt/riak
{riak_core, [{ring_state_dir, "/opt/riak/riak/ring"},...--------...{bitcask, [{data_root, "/opt/riak/bitcask"} ]},{eleveldb, [{data_root, "/opt/riak/leveldb"}]},
{https, [ {"192.168.235.111", 8069 } ]}, {ssl, [ {certfile, "/etc/riak/ssl/riak.crt"}, {keyfile, "/etc/riak/ssl/riak.pem"}]},
-name riak@127.0.0.1to-name riak@riak1.your-domain.com
/etc/init.d/riak restart
riak-admin cluster join riak@riak1.your-domain.com
riak-admin cluster planriak-admin cluster commit
curl -v -X PUT http://192.168.235.111:8098/riak/images/foo.jpg -H "Content-type: image/jpg" --data-binary @./foo.jpg
upstream riak { server 192.168.235.111:8098 fail_timeout=30s; server 192.168.235.112:8098 fail_timeout=30s; server 192.168.235.113:8098 fail_timeout=30s;}server { listen 80; server_name your.public.domain; if ( $uri !~ \. ) { return 403; } # Require URI with file extension if ( $uri !~ ^/.*/.* ) { return 403; } # Disable access to Riak / if ( $uri ~ ^/.*/.*/.* ) { return 403;} # Disable Link walk MR etc location / { if ($request_method = GET){ proxy_pass http://riak; rewrite ^/(.*) /riak/$1 break; # Remove /riak from external URL (Hide Riak) } proxy_redirect off; proxy_next_upstream error timeout invalid_header http_500; proxy_connect_timeout 2; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header Referer ""; # Zero up referer or Riak will 403 all requests proxy_hide_header X-Riak-Vclock; # Remove Riak specific headers proxy_hide_header Link; # Remove Riak specific headers proxy_hide_header ETag; # Remove Another Riak header proxy_hide_header Server; } }