#!/usr/bin/perl -w eval 'exec /usr/bin/perl -w -S $0 ${1+"$@"}' if 0; # not running under some shell =pod =head1 NAME tv_grab_my_astro - Grab TV listings for Malaysia using Astro JSON feeds. =head1 SYNOPSIS tv_grab_my_astro --help tv_grab_my_astro --configure [--config-file FILE] [--gui OPTION] tv_grab_my_astro [--config-file FILE] [--days N] [--offset N] [--output FILE] [--quiet] [--debug] tv_grab_my_astro --list-channels [--config-file FILE] [--output FILE] [--quiet] [--debug] =head1 DESCRIPTION Output TV and listings in XMLTV format for channels available on Malaysia's Astro subscription satellite service. First you must run B<tv_grab_my_astro --configure> to choose which stations you want to receive. Then running B<tv_grab_my_astro> with no arguments will get a listings for the stations you chose for five days including today. =head1 OPTIONS B<--configure> Prompt for which stations to download and write the configuration file. B<--config-file FILE> Set the name of the configuration file, the default is B<~/.xmltv/tv_grab_my_astro.conf>. This is the file written by B<--configure> and read when grabbing. B<--gui OPTION> Use this option to enable a graphical interface to be used. OPTION may be 'Tk', or left blank for the best available choice. Additional allowed values of OPTION are 'Term' for normal terminal output (default) and 'TermNoProgressBar' to disable the use of Term::ProgressBar. B<--output FILE> When grabbing, write output to FILE rather than standard output. B<--days N> When grabbing, grab N days rather than 5. B<--offset N> Start grabbing at today + N days. N may be negative. B<--quiet> Suppress the progress-bar normally shown on standard error. B<--debug> Provide more information on progress to stderr to help in debugging. B<--list-channels> Output a list of all channels that data is available for. The list is in xmltv-format. B<--version> Show the version of the grabber. B<--help> Print a help message and exit. =head1 ERROR HANDLING If the grabber fails to download data for some channel on a specific day, it will print an errormessage to STDERR and then continue with the other channels and days. The grabber will exit with a status code of 1 to indicate that the data is incomplete. =head1 ENVIRONMENT VARIABLES The environment variable HOME can be set to change where configuration files are stored. All configuration is stored in $HOME/.xmltv/. On Windows, it might be necessary to set HOME to a path without spaces in it. =head1 SUPPORTED CHANNELS For information on supported channels, see http://www.astro.com.my/ =head1 AUTHOR Jason Rumney, jasonr -at- gnu -dot- org. This documentation and some code copied from tv_grab_se_swedb by Mattias Holmlund, mattias -at- holmlund -dot- se, which was in turn partially copied from tv_grab_uk by Ed Avis, ed -at- membled -dot- com. Overall code based on documentation at http://wiki.xmltv.org/index.php/HowtoWriteAGrabber =head1 BUGS =cut use strict; use XMLTV; use XMLTV::ProgressBar; use XMLTV::Options qw/ParseOptions/; use XMLTV::Configure::Writer; use JSON; use DateTime; use Date::Parse; use Date::Format; use File::Path; use File::Basename; use LWP::Simple qw($ua get); sub t; $ua->agent("xmltv/$XMLTV::VERSION"); use HTTP::Cache::Transparent; # Although we use HTTP::Cache::Transparent, this undocumented --cache # option for debugging is still useful, since it will always use a # cached copy of a page, without contacting the server at all. # use XMLTV::Memoize; XMLTV::Memoize::check_argv('get'); my $default_root_url = 'http://api-epg.astro.com.my/api'; my $default_cachedir = get_default_cachedir(); my ($opt, $conf) = ParseOptions( { grabber_name => "tv_grab_my_astro", capabilities => [qw/baseline manualconfig apiconfig/], stage_sub => \&config_stage, listchannels_sub => \&list_channels, version =*> "0.1", description => "Malaysia (www.astro.com.my)", } ); #check config if (not defined( $conf->{cachedir} )) { print STDERR "No cachedir defined in config file " . $opt->{'config-file'} . "\n" . "Please run the grabber with --configure.\n"; exit 1; } if (not defined ( $conf->{'root-url'} )) { print STDERR "No root-url defined in config file " . $opt->{'config-file'} . "\n" . "Please run the grabber with --configure.\n"; } if (not defined ( $conf->{'channel'} )) { print STDERR "No channels selected in config file " . $opt->{'config-file'} . "\n" . "Please run the grabber with --configure.\n"; exit 1; } # Astro webserver is slow to generate responses, with lots of little # JSON requests, so initialise the caching to use a 12hr timeout. init_cachedir( $conf->{cachedir}->[0] ); HTTP::Cache::Transparent::init( { BasePath => $conf->{cachedir}->[0], NoUpdate => 12*3600, Verbose => $opt->{debug}, ApproveContent => sub { return $_[0]->is_success }, } ); # Get the actual data and print it to stdout. my ($xmldecl, $channels) = load_channels( $conf->{'root-url'}->[0] ); my ($odoc, $root ); my $warnings = 0; binmode STDOUT, ":utf8"; write_header( $xmldecl ); write_channel_list( $conf->{channel} ); my $now = DateTime->today(); t $now; my $date = $now; $date = $now->add_duration($DateTime::Duration->new( days => $opt->{offset})) if ( $opt->{offset} ); my $bar = undef; $bar = new XMLTV::ProgressBar( { name => 'downloading listings', count => $opt->{days} * @{$conf->{channel}}, } ) if (not $opt->{quiet}) && (not $opt->{debug}); for ( my $i=0; $i < $opt->{days}; $i++ ) { t "Date: $date"; foreach my $channel_id (@{$conf->{channel}}) { # We already warned the user if the channel doesn't exist. if ( exists $channels->{$channel_id} ) { t " $channel_id"; my ( $ch_name, $ch_code, $url, $ch_num, $ch_cat, $ch_astroid ) = @{$channels->{$channel_id}}; print_data( $url, $channel_id, $date, $ch_astroid, $ch_cat ) or warning( "Failed to download data for $channel_id on " . UnixDate( $date, "%Y-%m-%d" ) . "." ); } $bar->update() if defined( $bar ); } $date = $date->add_duration(DateTime::Duration->new(days => 1)); } $bar->finish() if defined( $bar ); write_footer(); # Signal that something went wrong if there are warnings. exit(1) if $warnings; t "Exiting without warnings."; exit(0); sub t { my( $message ) = @_; print STDERR $message . "\n" if $opt->{debug}; } sub warning { my( $message ) = @_; print STDERR $message . "\n"; $warnings++; } sub list_channels { my ( $conf, $opt ) = @_; ( $xmldecl, $channels ) = load_channels( $conf->{'root-url'}->[0] ); my $result=""; my $fh = new IO::Scalar \$result; my $oldfh = select( $fh ); write_header( $xmldecl ); write_channel_list ( [sort keys %{$channels}] ); write_footer(); select( $oldfh ); $fh->close(); return $result; } sub config_stage { my( $stage, $conf ) = @_; die "Unknown stage $stage" if $stage ne "start"; my $result; my $writer = new XMLTV::Configure::Writer( OUTPUT => \$result, encoding => 'utf-8' ); $writer->start( { grabber => 'tv_grab_my_astro' } ); $writer->write_string( { id => 'root-url', title => [ [ 'Root URL for grabbing data', 'en' ] ], description => [ [ 'The file at this URL describes which channels are available ' . 'and where data can be found for them. ', 'en' ] ], default => $default_root_url, } ); $writer->write_string( { id => 'cachedir', title => [ [ 'Directory to store the cache in', 'en' ] ], description => [ [ 'tv_grab_my_astro uses a cache with files that it has already ' . 'downloaded. Please specify where the cache shall be stored. ', 'en' ] ], default => $default_cachedir, } ); $writer->end( 'select-channels' ); return $result; } sub get_default_cachedir { my $winhome = $ENV{HOMEDRIVE} . $ENV{HOMEPATH} if defined( $ENV{HOMEDRIVE} ) and defined( $ENV{HOMEPATH} ); my $home = $ENV{HOME} || $winhome || "."; return "$home/.xmltv/cache"; } sub init_cachedir { my ( $path ) = @_; if ( not -d $path ) { mkpath( $path ) or die "Failed to create cache-directory $path: $@"; } } sub load_channels { my ($baseurl) = @_; my %channels; my $xmldecl = "<\?xml version='1.0' encoding='utf-8'\?>\n"; my $full_url = $baseurl . "/pack?format=jsonp"; my $download = get( $full_url ); my $json = new JSON; my $chan_data = $json->allow_nonref->utf8->relaxed->decode($download); $chan_data = $chan_data->{'services'}; foreach my $ch (@{$chan_data}) { my $ch_id = $ch->{'service_id'}; my $ch_name = $ch->{'title'}; my $ch_code = $ch->{'service_key'}; my $ch_num = $ch->{'channel_number'}; my $ch_cat = $ch->{'channel_category'} || 'unknown'; my $ch_uri = "$ch_num.$ch_cat.astro.com.my"; $ch_uri =~ s/ /_/g; $ch_name = xml_escape($ch_name); $ch_uri = lc($ch_uri); $channels{$ch_uri} = [ $ch_name, $ch_code, $baseurl, $ch_num, $ch_cat, $ch_id ]; } # TODO: Parse from astro website return ($xmldecl, \%channels); } sub print_data { my( $url, $channel_id, $date, $astroid, $category ) = @_; my $start_date = $date->ymd('-') . 'T00:00'; my $end_date = $date->ymd('-') . 'T23:59'; my $full_url = "$url/guide/start/$start_date/end/$end_date/channels/$astroid?format=jsonp"; my $download = get( $full_url ); my $json = new JSON; my $schedule = $json->allow_nonref->utf8->relaxed->decode($download); if ($schedule && ref($schedule) eq "HASH") { $schedule = $schedule->{'guides'}; foreach my $programme (@{$schedule}) { my $event_id = $programme->{'event_id'}; if ($event_id != 0) { my $prog_id = $programme->{'program_id'}; my $title = $programme->{'name'}; my $episode_title = $programme->{''}; my $episode_id = $programme->{'group_order_num'}; my $desc = $programme->{'description'}; my $startTime = $programme->{'display_datetime_utc'}; my $stopTime = $programme->{'display_datetime_end_utc'}; my $start = str2time($startTime); my $stop = str2time($stopTime); $start = time2str('%Y%m%d%H%M%S %z', $start); $stop = time2str('%Y%m%d%H%M%S %z', $stop); # XML escaping of fields that are likely to contain problem chars. $title = xml_escape($title); print " <programme channel=\"$channel_id\" start=\"$start\" stop=\"$stop\">\n" . " <title>$title</title>\n"; if ($episode_title) { $episode_title = xml_escape($episode_title); print " <sub-title>$episode_title</sub-title>\n"; } if ($desc) { $desc = xml_escape($desc); print " <desc>$desc</desc>\n"; } print " <category>$category</category>\n"; print " <url>http://api-epg.astro.com.my/api/event/$event_id?format=html</url>\n"; if ($episode_id) { print " <episode-num system=\"onscreen\">$episode_id</episode-num>\n"; } print " </programme>\n"; } } } if ($@) { # Exception occurred above. Add a comment to the xmltv file # about the error, and allow the details for other channels to # keep being processed. print "<!-- Error parsing channel listings for " . $channel_id . "\n from URL: " . $full_url . "\n Message: " . xml_escape($@) . "\n-->"; } return 1; } sub write_header { my ( $xmldecl ) = @_; print $xmldecl; print '<!DOCTYPE tv SYSTEM "xmltv.dtd">' . "\n"; print '<tv>' . "\n"; } sub write_channel_list { my ( $channel_list ) = @_; # Write list of channels. t 'Writing list of channels.'; foreach my $channel_id (@{$channel_list}) { if ( not exists $channels->{$channel_id} ) { print STDERR "Unknown channel $channel_id." . " See http://www.astro.com.my/" . " for a list of available channels or run" . " tv_grab_my_astro --configure to reconfigure."; next; } my ( $ch_name, $ch_code, $namespace, $ch_num, $ch_cat, $ch_astroid ) = @{$channels->{$channel_id}}; print " <channel id=\"$channel_id\">\n" . " <display-name>$ch_name</display-name>\n" . " <display-name>$ch_num</display-name>\n" . " <icon src=\"http://www.astro.com.my/channels/images/channellogo/$ch_num.png\"/>\n" . " <url>http://api-epg.astro.com.my/api/service/$ch_astroid?format=html</url>\n" . " </channel>\n"; } } sub write_footer { print "</tv>\n"; } sub xml_escape() { my ($string) = @_; if ($string) { $string =~ s/&/&/g; $string =~ s/</</g; $string =~ s/>/>/g; $string =~ s/"/"/g; # Some of the feeds come in as ISO8859-1 and I can't figure out how # to get them encoded properly as UTF-8 in perl, so convert all non-ASCII # to character references. $string =~ s/([^\x{20}-\x{7E}])/'&#' . ord($1) . ';'/gse; } return $string; } ### Setup indentation in Emacs ## Local Variables: ## perl-indent-level: 4 ## perl-continued-statement-offset: 4 ## perl-continued-brace-offset: 0 ## perl-brace-offset: -4 ## perl-brace-imaginary-offset: 0 ## perl-label-offset: -2 ## cperl-indent-level: 4 ## cperl-brace-offset: 0 ## cperl-continued-brace-offset: 0 ## cperl-label-offset: -2 ## cperl-extra-newline-before-brace: t ## cperl-merge-trailing-else: nil ## cperl-continued-statement-offset: 2 ## End:
Friday, 12 April 2013
Astro xmltv listings
For quite a while now, I've been pulling TV listings from the Astro website. At first this involved scraping some HTML and the scripts I had were pretty ugly and worked only for the channels I wanted. Then they added some RSS feeds, which made the scraping a bit easier and less prone to breaking every time they updated the look of the website (which was often). For a couple of years now, the RSS feed has been replaced with a JSON API, and I had a pretty good grabber written for that, but never got around to sharing it. Recently they switched to a dedicated server for the JSON API, and gave it an overhaul, so my script stopped working. Now that I've updated it to the new API, its a good time to share.
Subscribe to:
Post Comments (Atom)
11 comments:
Thanks for this.
I actually use a VB app that does this, but notice that the api call doesn't get shows that finish on the next day (i.e. moves that start at 10.30 and finish at 1.30am)
Do you have the same issues?
Hmm. ok. I tried getting more than 1 day, but it seems to only return a single day. I will revert back to that and see if I can get it to work again.
thanks. :)
[deleted by accident because Blogger showed me two copies] Yes I get warnings about that all the time too, but I think it only causes a problem for the last day in the schedule. I tend to fetch 7 or 14 days at a time and update daily, so I never have a problem in practice.
To clarify; my tv_grab script fetches one day at a time, then use tv_sort to combine the next 7 days into a single schedule.
Thanks Jason.
when I tried to do day by day, i cant get any shows that span midnight.
i.e. if it starts at 11pm, and the endtime is next morning, the shows doesnt appear in either day.
Do you see this happening?
If i do a multi-day select I do get those shows, but the response to the multi day call seems to be hit and miss?
When I check now, you're right, the last program on each channel every day is missing. In my previous version, it used to work, so I guess they must be missing from the JSON feeds.
Astro seem to have the same problem, so it looks like it is a problem with the API. I've tried getting the data spanning multiple days, specifying shorter times, but it looks like the API just gives you one day of programs (excluding any that start before, or end on or after midnight) whatever parameters you give it.
Yeah, thanks Jason.
Glad its not my code that's the problem :)
Guess, I will try first for a full week, and If i only get a day, then loop through each day instead.
Hello Jason,
I would like to know more about the Malaysian EPG data. Please let me know how to get in touch with you.
Thanks,
Gokul
I haven't had time to update the script lately and it is failing to get listings for about half the channels. You can email me specific questions at jasonrumney AT gmail.com
Post a Comment