i have access_logs
around 500MB
,i am giving sample as
10.223.157.186 - - [15/Jul/2009:14:58:59 -0700] "GET / HTTP/1.1" 403 15779
10.223.157.186 - - [15/Jul/2009:14:58:59 -0700] "GET /favicon.ico HTTP/1.1" 404 5397
10.216.113.172 - - [29/Apr/2010:07:19:48 -0700] "GET / HTTP/1.1" 200 68831
how can i extract month from timestamp?
Expected output :
year month day event occurrence
2009 jul 15 GET /favicon.ico HTTP/1.1
2009 apr 29 GET / HTTP/1.1
i tried this
add jar /usr/lib/hive/lib/hive-contrib-0.7.1-cdh3u2.jar;
create table log(ip string, gt string, gt1 string, timestamp string, id1 string, s1 string, s2 string) row format serde 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
with serdeproperties('input.regex'= '^(\\S+) (\\S+) (\\S+) \\[([[\\w/]+:(\\d{2}:\\d{2}):\\d{2}\\s[+\\-]\\d{4}:/]+\\s[+\\-]\\d{4})\\] "(.+?)" (\\S+) (\\S+)')location '/path';
If i understand correctly string functions will not work in this situation.i am new to regex
& hive
.
help me..thanks in advance
I'm not familiar with hadoop/hive, but as far as regexes go, if I were using ruby:
log_file = %Q[
10.223.157.186 - - [15/Jul/2009:14:58:59 -0700] "GET / HTTP/1.1" 403 15779
10.223.157.186 - - [15/Jul/2009:14:58:59 -0700] "GET /favicon.ico HTTP/1.1" 404 5397
10.216.113.172 - - [29/Apr/2010:07:19:48 -0700] "GET / HTTP/1.1" 200 68831
]
converted_lines = log_file.split("\n").map do |line|
regex = /^.*? - - \[(\d+)\/(\w+)\/(\d{4}).*?\] (.*)/
matches = regex.match(line)
output = [
[:year, matches[3]],
[:month, matches[2]],
[:day, matches[1]],
[:event_occurrence, matches[4]],
]
end
Hope that helps.
i have access_logs
around 500MB
,i am giving sample as
10.223.157.186 - - [15/Jul/2009:14:58:59 -0700] "GET / HTTP/1.1" 403 15779
10.223.157.186 - - [15/Jul/2009:14:58:59 -0700] "GET /favicon.ico HTTP/1.1" 404 5397
10.216.113.172 - - [29/Apr/2010:07:19:48 -0700] "GET / HTTP/1.1" 200 68831
how can i extract month from timestamp?
Expected output :
year month day event occurrence
2009 jul 15 GET /favicon.ico HTTP/1.1
2009 apr 29 GET / HTTP/1.1
i tried this
add jar /usr/lib/hive/lib/hive-contrib-0.7.1-cdh3u2.jar;
create table log(ip string, gt string, gt1 string, timestamp string, id1 string, s1 string, s2 string) row format serde 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
with serdeproperties('input.regex'= '^(\\S+) (\\S+) (\\S+) \\[([[\\w/]+:(\\d{2}:\\d{2}):\\d{2}\\s[+\\-]\\d{4}:/]+\\s[+\\-]\\d{4})\\] "(.+?)" (\\S+) (\\S+)')location '/path';
If i understand correctly string functions will not work in this situation.i am new to regex
& hive
.
help me..thanks in advance
I'm not familiar with hadoop/hive, but as far as regexes go, if I were using ruby:
log_file = %Q[
10.223.157.186 - - [15/Jul/2009:14:58:59 -0700] "GET / HTTP/1.1" 403 15779
10.223.157.186 - - [15/Jul/2009:14:58:59 -0700] "GET /favicon.ico HTTP/1.1" 404 5397
10.216.113.172 - - [29/Apr/2010:07:19:48 -0700] "GET / HTTP/1.1" 200 68831
]
converted_lines = log_file.split("\n").map do |line|
regex = /^.*? - - \[(\d+)\/(\w+)\/(\d{4}).*?\] (.*)/
matches = regex.match(line)
output = [
[:year, matches[3]],
[:month, matches[2]],
[:day, matches[1]],
[:event_occurrence, matches[4]],
]
end
Hope that helps.
0 commentaires:
Enregistrer un commentaire