One of my friend planed to start an Ubuntu source services through the support of his college, so I helped him to do that. It’s also the first time for me, but I think it’s not so complex.

1.Determination about what you really want to serve

There are two kinds of mirrors, one served as archive and another release. Follow the instructions on http://www.ubuntu.com/getubuntu/mirror, we know that we need how much disk to host them.

Of course my friend want to serve as an archive because that can be more beneficial. On his first stage, he doesn’t want to host the whole mirror, just for the latest version – Ubuntu 9.04 for a test run. This will cost less than 200GB disk storage.

2.Considerations about hardware

As a mirror which serves many file transfers, we need good hard disks as well as good Internet link, the disk space is the more the better, and of course the link speed is the faster the better,:)

But for my friend, on the first stage, he cannot choose them because now should be a qualification for his college, if he’s going well, soon he can do it as a big project.

3.Decisions on software

Firstly, we need to choose an operating system.
Here I recommend Debian, Ubuntu Server and CentOS to my friend, he finally chose Debian lenny with its ext3 file system. In my opinion, the server will deal with mostly files, so we need a platform to make our work easier. Consider about ext4, that’s a good file system.

Secondly, we need to chose a software to handle the HTTP requests and another for FTP. Just like what I have repeated for times, we just serve files, so we don’t need PHP and other web application at all. For this reason, Apache may not be a good choice. Here we choose Nginx for HTTP service, and vsftpd for FTP service and installed with debian packages, we don’t need to compile them because there is not alternative changes need to be done. Lighttpd is also a good candidate for HTTP. The configuration process is kind of troublesome, we spent too much time to do it than expected. All in all, it can work well finally.
Here are the configuration files for both software:
1)nginx:/etc/nginx/nginx.conf & /etc/nginx/sites-enabled/default

a) /etc/nginx/nginx.conf:

user www-data www-data;
worker_processes  3;
error_log  /var/log/nginx/error.log;
pid        /var/run/nginx.pid;
worker_rlimit_nofile 51200;
events {
use epoll;
worker_connections  51200;
}
http {
include       /etc/nginx/mime.types;
default_type  application/octet-stream;
access_log  /var/log/nginx/access.log;
server_names_hash_bucket_size 128;
client_header_buffer_size 32k;
large_client_header_buffers 4 32k;
client_max_body_size 16m;
sendfile        on;
tcp_nopush     on;
#keepalive_timeout  0;
keepalive_timeout  65;
tcp_nodelay        on;
gzip  on;
gzip_min_length  1k;
gzip_buffers     4 16k;
gzip_http_version 1.0;
gzip_comp_level 2;
gzip_types       text/plain application/x-javascript text/css application/xml;
gzip_vary on;
limit_zone   limit  $binary_remote_addr  10m;
include /etc/nginx/conf.d/*.conf;
include /etc/nginx/sites-enabled/*;
}

b)/etc/nginx/sites-enabled/default:

server {
listen   80;
//replace the server_name section with your real server name.
server_name  repo.domain;
access_log  /var/log/nginx/repo.access.log;

location  / {
root   /var/repo;
index  index.html index.htm;
limit_conn   limit  50;
autoindex on;
}
error_page   500 502 503 504  /50x.html;
location = /50x.html {
root   /var/www/nginx-default;
}
}

2)vsftpd: /etc/vsftpd.conf

anonymous_enable=YES
anon_root=/var/repo/
local_enable=NO
write_enable=NO
local_umask=022
dirmessage_enable=YES
xferlog_enable=YES
connect_from_port_20=YES
xferlog_std_format=YES
idle_session_timeout=60
data_connection_timeout=120
ftpd_banner=Welcome to NJNU Ubuntu FTP mirror service.
listen=YES
listen_ipv6=NO
pam_service_name=vsftpd
tcp_wrappers=YES

Thirdly, get rsync work, here is an script to do it:

#!/bin/sh
RSYNCSOURCE=rsync://archive.ubuntu.com/ubuntu
BASEDIR=/var/repo/
rsync –recursive –times –links –hard-links
–stats
–exclude “Packages*” –exclude “Sources*”
–exclude “Release*”
${RSYNCSOURCE} ${BASEDIR}
rsync –recursive –times –links –hard-links
–stats –delete –delete-after
–exclude “project/trace/${HOSTNAME}”
${RSYNCSOURCE} ${BASEDIR}

This script will do a two-stage rsync, so it can secure that users won’t get error while we are running rsync.
Then add this script to the crontab of a specific user, so that it will run every 24 hours.

5.Long-term maintainance

It’s simply a server, so you need to care about issues about the stablity, security and also think about the development of your own source server. You may need to consider if you want to be an official mirror or just a private one.